加入新推出的
Discord 社区,展开实时讨论,获得同行支持,并直接与 Meridian 团队互动!
加载包含覆盖面和频次的地理位置级数据
使用集合让一切井井有条
根据您的偏好保存内容并对其进行分类。
以下各部分中提供了模拟数据,作为每种数据类型和格式的示例。
CSV
使用 CsvDataLoader
加载模拟 CSV 数据:
将列名称映射到变量类型。所需的变量类型为 time
、geo
、controls
、population
、kpi
和 revenue_per_kpi
。对于没有覆盖面和频次数据的媒体渠道,您必须将其媒体曝光和媒体支出分别分配至 media
和 media_spend
类别。相反,对于拥有覆盖面和频次数据的媒体渠道,您必须将其覆盖面、频次和媒体支出分别映射到 reach
、frequency
和 rf_spend
类别。如需了解每个变量的定义,请参阅收集和整理数据。
coord_to_columns = load.CoordToColumns(
time='time',
geo='geo',
controls=['GQV', 'Discount', 'Competitor_Sales'],
population='population',
kpi='conversions',
revenue_per_kpi='revenue_per_conversion',
media=[
'Channel0_impression',
'Channel1_impression',
'Channel2_impression',
'Channel3_impression',
],
media_spend=[
'Channel0_spend',
'Channel1_spend',
'Channel2_spend',
'Channel3_spend',
],
reach =['Channel4_reach', 'Channel5_reach'],
frequency=['Channel4_frequency', 'Channel5_frequency'],
rf_spend=['Channel4_spend', 'Channel5_spend'],
)
将媒体曝光、覆盖面、频次和媒体支出映射到要在双页输出中显示的指定渠道名称。下例中的 Channel0_impression
和 Channel0_spend
连接到同一个渠道 Channel0
。此外,Channel4_reach
、Channel4_frequency
和 Channel4_spend
连接到同一个渠道 Channel4
。
correct_media_to_channel = {
'Channel0_impression': 'Channel0',
'Channel1_impression': 'Channel1',
'Channel2_impression': 'Channel2',
'Channel3_impression': 'Channel3',
}
correct_media_spend_to_channel = {
'Channel0_spend': 'Channel0',
'Channel1_spend': 'Channel1',
'Channel2_spend': 'Channel2',
'Channel3_spend': 'Channel3',
}
correct_reach_to_channel = {
'Channel4_reach': 'Channel4',
'Channel5_reach': 'Channel5',
}
correct_frequency_to_channel = {
'Channel4_frequency': 'Channel4',
'Channel5_frequency': 'Channel5',
}
correct_rf_spend_to_channel = {
'Channel4_spend': 'Channel4',
'Channel5_spend': 'Channel5',
}
使用 CsvDataLoader
加载数据:
loader = load.CsvDataLoader(
csv_path=f'/{PATH}/{FILENAME}.csv',
kpi_type='non_revenue',
coord_to_columns=coord_to_columns,
media_to_channel=correct_media_to_channel,
media_spend_to_channel=correct_media_spend_to_channel,
reach_to_channel=correct_reach_to_channel,
frequency_to_channel=correct_frequency_to_channel,
rf_spend_to_channel=correct_rf_spend_to_channel,
)
data = loader.load()
其中:
kpi_type
是 'revenue'
或 'non_revenue'
。
PATH
表示指向数据文件位置的路径。
FILENAME
表示数据文件的名称。
Xarray 数据集
使用 XrDatasetDataLoader
加载序列化模拟 Xarray 数据集:
使用 pickle
加载数据:
import pickle
with open(f'/{PATH}/{FILENAME}.pkl', 'r') as fh:
dataset=pickle.load(fh)
其中:
PATH
表示指向数据文件位置的路径。
FILENAME
表示数据文件的名称。
将数据集传递给 XrDatasetDataLoader
。使用 name_mapping
实参映射坐标和数组。如果输入数据集内的名称与所需名称不同,请提供映射。所需的坐标名称为 geo
、time
、control_variable
、media_channel
和 rf_channel
,其中 rf_channel
用于指定拥有覆盖面和频次数据的渠道。所需的数据变量名称为 kpi
、revenue_per_kpi
、controls
、population
、media
、media_spend
、reach
、frequency
和 rf_spend
。
loader = load.XrDatasetDataLoader(
dataset,
kpi_type='non_revenue',
name_mapping={
'channel': 'media_channel',
'control': 'control_variable',
'conversions': 'kpi',
'revenue_per_conversion': 'revenue_per_kpi',
'control_value': 'controls',
'spend': 'media_spend',
'reach': 'reach',
'frequency': 'frequency',
'rf_spend': 'rf_spend',
},
)
data = loader.load()
其中:
kpi_type
是 'revenue'
或 'non_revenue'
。
Numpy 多维数组
如需直接加载 NumPy 多维数组,请使用 NDArrayInputDataBuilder
:
将数据创建为单独的 NumPy 多维数组。
import numpy as np
kpi_nd = np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]])
controls_nd = np.array([
[[1, 5], [2, 6], [3, 4]],
[[7, 8], [9, 10], [11, 12]],
[[13, 14], [15, 16], [17, 18]],
])
population_nd = np.array([1, 2, 3])
revenue_per_kpi_nd = np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]])
reach_nd = np.array([
[[1, 5], [2, 6], [3, 4]],
[[7, 8], [9, 10], [11, 12]],
[[13, 14], [15, 16], [17, 18]],
])
frequency_nd = np.array([
[[1, 5], [2, 6], [3, 4]],
[[7, 8], [9, 10], [11, 12]],
[[13, 14], [15, 16], [17, 18]],
])
rf_spend_nd = np.array([
[[1, 5], [2, 6], [3, 4]],
[[7, 8], [9, 10], [11, 12]],
[[13, 14], [15, 16], [17, 18]],
])
使用 NDArrayInputDataBuilder
设置时间和地理位置,并根据 Meridian 输入数据中的要求指定渠道或维度名称。如需了解每个变量的定义,请参阅收集和整理数据。
from meridian.data import nd_array_input_data_builder as data_builder
builder = (
data_builder.NDArrayInputDataBuilder(kpi_type='non_revenue')
)
builder.time_coords = ['2024-01-02', '2024-01-03', '2024-01-01']
builder.media_time_coords = ['2024-01-02', '2024-01-03', '2024-01-01']
builder.geos = ['B', 'A', 'C']
builder = (
builder
.with_kpi(kpi_nd)
.with_revenue_per_kpi(revenue_per_kpi_nd)
.with_population(population_nd)
.with_controls(
controls_nd,
control_names=["control0", "control1"])
.with_reach(
r_nd=reach_nd,
f_nd=frequency_nd,
rfs_nd=rf_spend_nd,
rf_channels=["channel0", "channel1"]
)
)
data = builder.build()
其中:
kpi_type
是 'revenue'
或 'non_revenue'
。
使用 DataFrameInputDataBuilder
加载模拟的其他数据格式(例如 excel
):
将数据(例如 excel
电子表格)读入一个或多个 Pandas DataFrame
。
import pandas as pd
df = pd.read_excel(
'https://github.com/google/meridian/raw/main/meridian/data/simulated_data/xlsx/geo_media_rf.xlsx',
engine='openpyxl',
)
使用 DataFrameInputDataBuilder
将列名称映射到 Meridian 输入数据所需的变量类型。如需了解每个变量的定义,请参阅收集和整理数据。
from meridian.data import data_frame_input_data_builder as data_builder
builder = data_builder.DataFrameInputDataBuilder(
kpi_type='non_revenue',
default_kpi_column="conversions",
default_revenue_per_kpi_column="revenue_per_conversion",
)
builder = (
builder
.with_kpi(df)
.with_revenue_per_kpi(df)
.with_population(df)
.with_controls(df, control_cols=["GQV", "Discount", "Competitor_Sales"])
.with_reach(
df,
reach_cols = ['Channel4_reach', 'Channel5_reach'],
frequency_cols = ['Channel4_frequency', 'Channel5_frequency'],
rf_spend_cols = ['Channel4_spend', 'Channel5_spend'],
rf_channels = ['Channel4', 'Channel5'],
)
)
data = builder.build()
其中:
kpi_type
是 'revenue'
或 'non_revenue'
。
接下来,您可以创建模型。
如未另行说明,那么本页面中的内容已根据知识共享署名 4.0 许可获得了许可,并且代码示例已根据 Apache 2.0 许可获得了许可。有关详情,请参阅 Google 开发者网站政策。Java 是 Oracle 和/或其关联公司的注册商标。
最后更新时间 (UTC):2025-08-04。
[null,null,["最后更新时间 (UTC):2025-08-04。"],[[["\u003cp\u003eSimulated data examples are provided for CSV, Xarray Dataset, and other data formats like Excel, with each format having its own loading method.\u003c/p\u003e\n"],["\u003cp\u003eLoading CSV data requires mapping column names to variable types such as \u003ccode\u003etime\u003c/code\u003e, \u003ccode\u003egeo\u003c/code\u003e, \u003ccode\u003econtrols\u003c/code\u003e, \u003ccode\u003epopulation\u003c/code\u003e, \u003ccode\u003ekpi\u003c/code\u003e, and \u003ccode\u003erevenue_per_kpi\u003c/code\u003e, and also differentiating between media channels with and without reach and frequency data.\u003c/p\u003e\n"],["\u003cp\u003eLoading Xarray Dataset involves using \u003ccode\u003epickle\u003c/code\u003e to read the data, then mapping coordinate and array names within the dataset to required names like \u003ccode\u003egeo\u003c/code\u003e, \u003ccode\u003etime\u003c/code\u003e, \u003ccode\u003econtrol_variable\u003c/code\u003e, \u003ccode\u003emedia_channel\u003c/code\u003e, and \u003ccode\u003erf_channel\u003c/code\u003e.\u003c/p\u003e\n"],["\u003cp\u003eLoading data from other formats, like Excel, requires mapping column names to variable types, just like CSV, and then loading the data into a \u003ccode\u003eDataFrameDataLoader\u003c/code\u003e after reading it into a dataframe.\u003c/p\u003e\n"],["\u003cp\u003eIn all three data formats, media exposure, reach, frequency, and media spend must be mapped to their corresponding channel names for output display.\u003c/p\u003e\n"]]],["The document details how to load simulated data in CSV, Xarray Dataset, and other formats using specific data loaders. For CSV and other formats, users must map column names to variable types like `time`, `geo`, and `kpi`, differentiating between media channels with and without reach/frequency data. They also need to map media, spend, reach, and frequency to channel names. For Xarray Dataset, users load data via pickle and map dataset coordinates and variables using `name_mapping`. Then load the data.\n"],null,["# Load geo-level data with reach and frequency\n\nSimulated data is provided as an example for each data type and format in the\nfollowing sections.\n\nCSV\n---\n\nTo load the\n[simulated CSV data](https://github.com/google/meridian/tree/main/meridian/data/simulated_data/csv/geo_media_rf.csv)\nusing `CsvDataLoader`:\n\n1. Map the column names to the variable types. The required variable types are\n `time`, `geo`, `controls`, `population`, `kpi`, and `revenue_per_kpi`. For\n media channels that don't have reach and frequency data, you must assign\n their media exposure and media spend to the categories of `media` and\n `media_spend`, respectively. Conversely, for media channels that do possess\n reach and frequency data, you must map their reach, frequency, and media\n spend to the categories of `reach`, `frequency`, and `rf_spend`\n correspondingly. For the definition of each variable, see [Collect and\n organize your data](/meridian/docs/user-guide/collect-data).\n\n coord_to_columns = load.CoordToColumns(\n time='time',\n geo='geo',\n controls=['GQV', 'Discount', 'Competitor_Sales'],\n population='population',\n kpi='conversions',\n revenue_per_kpi='revenue_per_conversion',\n media=[\n 'Channel0_impression',\n 'Channel1_impression',\n 'Channel2_impression',\n 'Channel3_impression',\n ],\n media_spend=[\n 'Channel0_spend',\n 'Channel1_spend',\n 'Channel2_spend',\n 'Channel3_spend',\n ],\n reach =['Channel4_reach', 'Channel5_reach'],\n frequency=['Channel4_frequency', 'Channel5_frequency'],\n rf_spend=['Channel4_spend', 'Channel5_spend'],\n )\n\n2. Map the media exposure, reach, frequency, and the media spends to the\n designated channel names that you want to display in the two-page output. In\n the following example, `Channel0_impression` and `Channel0_spend` are\n connected to the same channel, `Channel0`. Additionally, `Channel4_reach`,\n `Channel4_frequency`, and `Channel4_spend` are connected to the same\n channel, `Channel4`.\n\n correct_media_to_channel = {\n 'Channel0_impression': 'Channel0',\n 'Channel1_impression': 'Channel1',\n 'Channel2_impression': 'Channel2',\n 'Channel3_impression': 'Channel3',\n }\n correct_media_spend_to_channel = {\n 'Channel0_spend': 'Channel0',\n 'Channel1_spend': 'Channel1',\n 'Channel2_spend': 'Channel2',\n 'Channel3_spend': 'Channel3',\n }\n\n correct_reach_to_channel = {\n 'Channel4_reach': 'Channel4',\n 'Channel5_reach': 'Channel5',\n }\n correct_frequency_to_channel = {\n 'Channel4_frequency': 'Channel4',\n 'Channel5_frequency': 'Channel5',\n }\n correct_rf_spend_to_channel = {\n 'Channel4_spend': 'Channel4',\n 'Channel5_spend': 'Channel5',\n }\n\n3. Load the data using `CsvDataLoader`:\n\n loader = load.CsvDataLoader(\n csv_path=f'/{PATH}/{FILENAME}.csv',\n kpi_type='non_revenue',\n coord_to_columns=coord_to_columns,\n media_to_channel=correct_media_to_channel,\n media_spend_to_channel=correct_media_spend_to_channel,\n reach_to_channel=correct_reach_to_channel,\n frequency_to_channel=correct_frequency_to_channel,\n rf_spend_to_channel=correct_rf_spend_to_channel,\n )\n data = loader.load()\n\n Where:\n - `kpi_type` is either `'revenue'` or `'non_revenue'`.\n - `PATH` is the path to the data file location.\n - `FILENAME` is the name of your data file.\n\nXarray Dataset\n--------------\n\nTo load the pickled\n[simulated Xarray Dataset](https://github.com/google/meridian/tree/main/meridian/data/simulated_data/pkl/geo_media_rf.pkl)\nusing `XrDatasetDataLoader`:\n\n1. Load the data using `pickle`:\n\n import pickle\n with open(f'/{PATH}/{FILENAME}.pkl', 'r') as fh:\n dataset=pickle.load(fh)\n\n Where:\n - `PATH` is the path to the data file location.\n - `FILENAME` is the name of your data file.\n2. Pass the dataset to `XrDatasetDataLoader`. Use the `name_mapping` argument\n to map the coordinates and arrays. Provide mapping if the names in the input\n dataset are different from the required names. The required coordinate\n names are `geo`, `time`, `control_variable`, `media_channel`, and\n `rf_channel`, where `rf_channel` designates the channels having reach and\n frequency data. The required data variables names are `kpi`,\n `revenue_per_kpi`, `controls`, `population`, `media`, `media_spend`,\n `reach`, `frequency`, and `rf_spend`.\n\n loader = load.XrDatasetDataLoader(\n dataset,\n kpi_type='non_revenue',\n name_mapping={\n 'channel': 'media_channel',\n 'control': 'control_variable',\n 'conversions': 'kpi',\n 'revenue_per_conversion': 'revenue_per_kpi',\n 'control_value': 'controls',\n 'spend': 'media_spend',\n 'reach': 'reach',\n 'frequency': 'frequency',\n 'rf_spend': 'rf_spend',\n },\n )\n\n data = loader.load()\n\n Where:\n - `kpi_type` is either `'revenue'` or `'non_revenue'`.\n\nNumpy ndarray\n-------------\n\nTo load numpy ndarrays directly, use `NDArrayInputDataBuilder`:\n\n1. Create the data into separate numpy ndarrays.\n\n import numpy as np\n\n kpi_nd = np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]])\n controls_nd = np.array([\n [[1, 5], [2, 6], [3, 4]],\n [[7, 8], [9, 10], [11, 12]],\n [[13, 14], [15, 16], [17, 18]],\n ])\n population_nd = np.array([1, 2, 3])\n revenue_per_kpi_nd = np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]])\n reach_nd = np.array([\n [[1, 5], [2, 6], [3, 4]],\n [[7, 8], [9, 10], [11, 12]],\n [[13, 14], [15, 16], [17, 18]],\n ])\n frequency_nd = np.array([\n [[1, 5], [2, 6], [3, 4]],\n [[7, 8], [9, 10], [11, 12]],\n [[13, 14], [15, 16], [17, 18]],\n ])\n rf_spend_nd = np.array([\n [[1, 5], [2, 6], [3, 4]],\n [[7, 8], [9, 10], [11, 12]],\n [[13, 14], [15, 16], [17, 18]],\n ])\n\n2. Use a\n [`NDArrayInputDataBuilder`](https://github.com/google/meridian/blob/4624447e0aace5c24d42b58dd1cfd8fe0dc00971/meridian/data/nd_array_input_data_builder.py#L25)\n to set time and geos, as well as give channel or dimension\n names as required in a Meridian input data.\n For the definition of each variable, see\n [Collect and organize your data](/meridian/docs/user-guide/collect-data).\n\n from meridian.data import nd_array_input_data_builder as data_builder\n\n builder = (\n data_builder.NDArrayInputDataBuilder(kpi_type='non_revenue')\n )\n builder.time_coords = ['2024-01-02', '2024-01-03', '2024-01-01']\n builder.media_time_coords = ['2024-01-02', '2024-01-03', '2024-01-01']\n builder.geos = ['B', 'A', 'C']\n builder = (\n builder\n .with_kpi(kpi_nd)\n .with_revenue_per_kpi(revenue_per_kpi_nd)\n .with_population(population_nd)\n .with_controls(\n controls_nd,\n control_names=[\"control0\", \"control1\"])\n .with_reach(\n r_nd=reach_nd,\n f_nd=frequency_nd,\n rfs_nd=rf_spend_nd,\n rf_channels=[\"channel0\", \"channel1\"]\n )\n )\n\n data = builder.build()\n\n Where:\n - `kpi_type` is either `'revenue'` or `'non_revenue'`.\n\nPandas DataFrame or other data formats\n--------------------------------------\n\nTo load the [simulated other data\nformat](https://github.com/google/meridian/tree/main/meridian/data/simulated_data/xlsx/geo_media_rf.xlsx)\n(such as `excel`) using `DataFrameInputDataBuilder`:\n\n1. Read the data (such as an `excel` spreadsheet) into one or more Pandas `DataFrame`(s).\n\n import pandas as pd\n\n df = pd.read_excel(\n 'https://github.com/google/meridian/raw/main/meridian/data/simulated_data/xlsx/geo_media_rf.xlsx',\n engine='openpyxl',\n )\n\n2. Use a\n [`DataFrameInputDataBuilder`](https://github.com/google/meridian/blob/4624447e0aace5c24d42b58dd1cfd8fe0dc00971/meridian/data/data_frame_input_data_builder.py#L25)\n to map column names to the variable types required in a Meridian input data.\n For the definition of each variable, see\n [Collect and organize your data](/meridian/docs/user-guide/collect-data).\n\n from meridian.data import data_frame_input_data_builder as data_builder\n\n builder = data_builder.DataFrameInputDataBuilder(\n kpi_type='non_revenue',\n default_kpi_column=\"conversions\",\n default_revenue_per_kpi_column=\"revenue_per_conversion\",\n )\n builder = (\n builder\n .with_kpi(df)\n .with_revenue_per_kpi(df)\n .with_population(df)\n .with_controls(df, control_cols=[\"GQV\", \"Discount\", \"Competitor_Sales\"])\n .with_reach(\n df,\n reach_cols = ['Channel4_reach', 'Channel5_reach'],\n frequency_cols = ['Channel4_frequency', 'Channel5_frequency'],\n rf_spend_cols = ['Channel4_spend', 'Channel5_spend'],\n rf_channels = ['Channel4', 'Channel5'],\n )\n )\n\n data = builder.build()\n\n Where:\n - `kpi_type` is either `'revenue'` or `'non_revenue'`.\n\nNext, you can [create your model](/meridian/docs/user-guide/modeling-overview)."]]