meridian.data.load.DataFrameDataLoader

从 Pandas DataFrame 读取数据。

继承自:InputDataLoader

此类会从 Pandas DataFrame 读取输入数据。coord_to_columns 属性会存储从目标 InputData 坐标和数组名称到 DataFrame 列名称的映射(如果它们不同)。字段包括:

  • geotimekpirevenue_per_kpipopulation(单列)
  • controls(多列)
  • (1) mediamedia_spend(多列)
  • (2) reachfrequencyrf_spend(多列)
  • non_media_treatments(多列,可选)
  • organic_media(多列,可选)
  • organic_reachorganic_frequency(多列,可选)

DataFrame 必须包含 (1) 或 (2),但不必同时包含这两项。此外,每个媒体渠道都必须出现在 (1) 或 (2) 中,但不能同时出现。

请注意以下几点:

  • 时间列的值必须采用 yyyy-mm-dd 日期格式。
  • 在国家级模型中,geopopulation 是可选的。如果提供了 population,它将被重置为默认值 1.0
  • 如果提供了 media 数据,则必须提供 media_to_channelmedia_spend_to_channel。如果提供了 reachfrequency 数据,则必须提供 reach_to_channelfrequency_to_channelrf_spend_to_channel
  • 如果提供了 organic_reachorganic_frequency 数据,则必须提供 organic_reach_to_channelorganic_frequency_to_channel

示例:

# df = [...]
coord_to_columns = CoordToColumns(
  geo='dmas',
  time='dates',
  kpi='conversions',
  revenue_per_kpi='revenue_per_conversions',
  controls=['control_income'],
  population='populations',
  media=['impressions_tv', 'impressions_fb', 'impressions_search'],
  media_spend=['spend_tv', 'spend_fb', 'spend_search'],
  reach=['reach_yt'],
  frequency=['frequency_yt'],
  rf_spend=['rf_spend_yt'],
  non_media_treatments=['price', 'discount']
  organic_media=['organic_impressions_blog'],
  organic_reach=['organic_reach_newsletter'],
  organic_frequency=['organic_frequency_newsletter'],
)
media_to_channel = {
    'impressions_tv': 'tv',
    'impressions_fb': 'fb',
    'impressions_search': 'search',
}
media_spend_to_channel = {
    'spend_tv': 'tv', 'spend_fb': 'fb', 'spend_search': 'search'
}
reach_to_channel = {'reach_yt': 'yt'}
frequency_to_channel = {'frequency_yt': 'yt'}
rf_spend_to_channel = {'rf_spend_yt': 'yt'}
organic_reach_to_channel = {'organic_reach_newsletter': 'newsletter'}
organic_frequency_to_channel = {'organic_frequency_newsletter': 'newsletter'}

data_loader = DataFrameDataLoader(
    df=df,
    coord_to_columns=coord_to_columns,
    kpi_type='non-revenue',
    media_to_channel=media_to_channel,
    media_spend_to_channel=media_spend_to_channel,
    reach_to_channel=reach_to_channel,
    frequency_to_channel=frequency_to_channel,
    rf_spend_to_channel=rf_spend_to_channel,
    organic_reach_to_channel=organic_reach_to_channel,
    organic_frequency_to_channel=organic_frequency_to_channel,
)
data = data_loader.load()

df 要从中读取数据的 pd.DataFrame 对象。必须满足以下条件之一:

  • DataFrame 中没有 NA
  • 对于任意数量的初始时段,所有非媒体数据列(kpirevenue_per_kpimedia_spendcontrolspopulation)中都只包含媒体数据和 NA。
coord_to_columns 一个 CoordToColumns 对象,其字段是 InputData 的所需坐标,值是 DataFrame 中列(或列表)的当前名称。示例:
coord_to_columns = CoordToColumns(
    geo='dmas',
    time='dates',
    kpi='conversions',
    revenue_per_kpi='revenue_per_conversions',
    media=['impressions_tv', 'impressions_yt', 'impressions_search'],
    spend=['spend_tv', 'spend_yt', 'spend_search'],
    controls=['control_income'],
    population=population,
)

kpi_type 一个字符串,用于表示相应 KPI 是 'revenue' 类型还是 'non-revenue' 类型。如果 kpi_type'non-revenue',且存在 revenue_per_kpi,系统会使用投资回报率校准,并针对收入运行分析。如果同一 kpi_type 中不存在 revenue_per_kpi,系统会使用自定义投资回报率校准,并针对 KPI 运行分析。
media_to_channel 一个字典,其键是 DataFrame 中 media 数据的实际列名称,其值是所需的渠道名称。这些数据与 media_spend 数据相同。示例:

media_to_channel = {'media_tv': 'tv', 'media_yt': 'yt', 'media_fb': 'fb'}

media_spend_to_channel 一个字典,其键是 DataFrame 中 media_spend 数据的实际列名称,其值是所需的渠道名称。这些数据与 media 数据相同。示例:

media_spend_to_channel = {
    'spend_tv': 'tv', 'spend_yt': 'yt', 'spend_fb': 'fb'
}

reach_to_channel 一个字典,其键是 DataFrame 中 reach 数据的实际列名称,其值是所需的渠道名称。这些数据与 rf_spend 数据相同。示例:

reach_to_channel = {'reach_tv': 'tv', 'reach_yt': 'yt', 'reach_fb': 'fb'}

frequency_to_channel 一个字典,其键是 DataFrame 中 frequency 数据的实际列名称,其值是所需的渠道名称。这些数据与 rf_spend 数据相同。示例:

frequency_to_channel = {
    'frequency_tv': 'tv', 'frequency_yt': 'yt', 'frequency_fb': 'fb'
}

rf_spend_to_channel 一个字典,其键是 DataFrame 中 rf_spend 数据的实际列名称,其值是所需的渠道名称。这些数据与 reachfrequency 数据相同。示例:

rf_spend_to_channel = {
    'rf_spend_tv': 'tv', 'rf_spend_yt': 'yt', 'rf_spend_fb': 'fb'
}

organic_reach_to_channel 一个字典,其键是 DataFrame 中 organic_reach 数据的实际列名称,其值是所需的渠道名称。这些数据与 organic_frequency 数据相同。 示例:

organic_reach_to_channel = {
    'organic_reach_newsletter': 'newsletter',
}

organic_frequency_to_channel 一个字典,其键是 DataFrame 中 organic_frequency 数据的实际列名称,其值是所需的渠道名称。这些数据与 organic_reach 数据相同。示例:

organic_frequency_to_channel = {
    'organic_frequency_newsletter': 'newsletter',
}

方法

load

查看源代码

从 DataFrame 读取数据并返回 InputData 对象。

__eq__

返回 self==value。

frequency_to_channel None
media_spend_to_channel None
media_to_channel None
organic_frequency_to_channel None
organic_reach_to_channel None
reach_to_channel None
rf_spend_to_channel None