![]() |
从 Pandas DataFrame
读取数据。
继承自:InputDataLoader
meridian.data.load.DataFrameDataLoader(
df: pd.DataFrame,
coord_to_columns: CoordToColumns,
kpi_type: str,
media_to_channel: (Mapping[str, str] | None) = None,
media_spend_to_channel: (Mapping[str, str] | None) = None,
reach_to_channel: (Mapping[str, str] | None) = None,
frequency_to_channel: (Mapping[str, str] | None) = None,
rf_spend_to_channel: (Mapping[str, str] | None) = None,
organic_reach_to_channel: (Mapping[str, str] | None) = None,
organic_frequency_to_channel: (Mapping[str, str] | None) = None
)
此类会从 Pandas DataFrame
读取输入数据。coord_to_columns
属性会存储从目标 InputData
坐标和数组名称到 DataFrame 列名称的映射(如果它们不同)。字段包括:
geo
、time
、kpi
、revenue_per_kpi
、population
(单列)controls
(多列)- (1)
media
、media_spend
(多列) - (2)
reach
、frequency
、rf_spend
(多列) non_media_treatments
(多列,可选)organic_media
(多列,可选)organic_reach
、organic_frequency
(多列,可选)
DataFrame
必须包含 (1) 或 (2),但不必同时包含这两项。此外,每个媒体渠道都必须出现在 (1) 或 (2) 中,但不能同时出现。
请注意以下几点:
- 时间列的值必须采用 yyyy-mm-dd 日期格式。
- 在国家级模型中,
geo
和population
是可选的。如果提供了population
,它将被重置为默认值1.0
。 - 如果提供了
media
数据,则必须提供media_to_channel
和media_spend_to_channel
。如果提供了reach
和frequency
数据,则必须提供reach_to_channel
、frequency_to_channel
和rf_spend_to_channel
。 - 如果提供了
organic_reach
和organic_frequency
数据,则必须提供organic_reach_to_channel
和organic_frequency_to_channel
。
示例:
# df = [...]
coord_to_columns = CoordToColumns(
geo='dmas',
time='dates',
kpi='conversions',
revenue_per_kpi='revenue_per_conversions',
controls=['control_income'],
population='populations',
media=['impressions_tv', 'impressions_fb', 'impressions_search'],
media_spend=['spend_tv', 'spend_fb', 'spend_search'],
reach=['reach_yt'],
frequency=['frequency_yt'],
rf_spend=['rf_spend_yt'],
non_media_treatments=['price', 'discount']
organic_media=['organic_impressions_blog'],
organic_reach=['organic_reach_newsletter'],
organic_frequency=['organic_frequency_newsletter'],
)
media_to_channel = {
'impressions_tv': 'tv',
'impressions_fb': 'fb',
'impressions_search': 'search',
}
media_spend_to_channel = {
'spend_tv': 'tv', 'spend_fb': 'fb', 'spend_search': 'search'
}
reach_to_channel = {'reach_yt': 'yt'}
frequency_to_channel = {'frequency_yt': 'yt'}
rf_spend_to_channel = {'rf_spend_yt': 'yt'}
organic_reach_to_channel = {'organic_reach_newsletter': 'newsletter'}
organic_frequency_to_channel = {'organic_frequency_newsletter': 'newsletter'}
data_loader = DataFrameDataLoader(
df=df,
coord_to_columns=coord_to_columns,
kpi_type='non-revenue',
media_to_channel=media_to_channel,
media_spend_to_channel=media_spend_to_channel,
reach_to_channel=reach_to_channel,
frequency_to_channel=frequency_to_channel,
rf_spend_to_channel=rf_spend_to_channel,
organic_reach_to_channel=organic_reach_to_channel,
organic_frequency_to_channel=organic_frequency_to_channel,
)
data = data_loader.load()
方法
load
load() -> meridian.data.input_data.InputData
从 DataFrame 读取数据并返回 InputData 对象。
__eq__
__eq__(
other
)
返回 self==value。
类变量 | |
---|---|
frequency_to_channel |
None
|
media_spend_to_channel |
None
|
media_to_channel |
None
|
organic_frequency_to_channel |
None
|
organic_reach_to_channel |
None
|
reach_to_channel |
None
|
rf_spend_to_channel |
None
|