加入新推出的
Discord 社区,展开实时讨论,获得同行支持,并直接与 Meridian 团队互动!
加载包含自然媒体和非媒体处理变量的地理位置级数据
使用集合让一切井井有条
根据您的偏好保存内容并对其进行分类。
以下各部分中提供了模拟数据,作为每种数据类型和格式的示例。
CSV
使用 CsvDataLoader
加载模拟 CSV 数据:
将列名称映射到变量类型。所需的变量类型为 time
、geo
、controls
、population
、kpi
、revenue_per_kpi
、media
和 media_spend
。对于没有直接费用的媒体渠道,您必须将其媒体曝光分配给 organic_media
。对于非媒体处理,您必须将相应的列名称分配给 non_media_treatments
。如需了解每个变量的定义,请参阅收集和整理数据。
coord_to_columns = load.CoordToColumns(
time='time',
geo='geo',
controls=['GQV', 'Competitor_Sales'],
population='population',
kpi='conversions',
revenue_per_kpi='revenue_per_conversion',
media=[
'Channel0_impression',
'Channel1_impression',
'Channel2_impression',
'Channel3_impression',
'Channel4_impression',
],
media_spend=[
'Channel0_spend',
'Channel1_spend',
'Channel2_spend',
'Channel3_spend',
'Channel4_spend',
],
organic_media=['Organic_channel0_impression'],
non_media_treatments=['Promo'],
)
将媒体变量和媒体支出映射到要在双页输出中显示的指定渠道名称。在以下示例中,Channel0_impression
和 Channel0_spend
连接到同一个渠道 Channel0
。
correct_media_to_channel = {
'Channel0_impression': 'Channel0',
'Channel1_impression': 'Channel1',
'Channel2_impression': 'Channel2',
'Channel3_impression': 'Channel3',
'Channel4_impression': 'Channel4',
}
correct_media_spend_to_channel = {
'Channel0_spend': 'Channel0',
'Channel1_spend': 'Channel1',
'Channel2_spend': 'Channel2',
'Channel3_spend': 'Channel3',
'Channel4_spend': 'Channel4',
}
使用 CsvDataLoader
加载数据:
loader = load.CsvDataLoader(
csv_path=f'/{PATH}/{FILENAME}.csv',
kpi_type='non_revenue',
coord_to_columns=coord_to_columns,
media_to_channel=correct_media_to_channel,
media_spend_to_channel=correct_media_spend_to_channel,
)
data = loader.load()
其中:
kpi_type
是 'revenue'
或 'non_revenue'
。
PATH
表示指向数据文件位置的路径。
FILENAME
表示数据文件的名称。
Xarray 数据集
使用 XrDatasetDataLoader
加载模拟 Xarray 数据集:
使用 pickle
加载数据:
import pickle
with open(f'/{PATH}/{FILENAME}.pkl', 'r') as fh:
XrDataset=pickle.load(fh)
其中:
PATH
表示指向数据文件位置的路径。
FILENAME
表示数据文件的名称。
将数据集传递给 XrDatasetDataLoader
。使用 name_mapping
实参映射坐标和数组。如果输入数据集内的名称与所需名称不同,请提供映射。所需的坐标名称为 geo
、time
、control_variable
、media_channel
、organic_media_channel
和 non_media_channel
。所需的数据变量名称为 kpi
、revenue_per_kpi
、controls
、population
、media
、media_spend
、organic_media
和 non_media_treatments
。
loader = load.XrDatasetDataLoader(
XrDataset,
kpi_type='non_revenue',
name_mapping={'channel': 'media_channel',
'control': 'control_variable',
'organic_channel': 'organic_media_channel',
'non_media_treatment': 'non_media_channel',
'conversions': 'kpi',
'revenue_per_conversion': 'revenue_per_kpi',
'control_value': 'controls',
'spend': 'media_spend',
'non_media_treatment_value': 'non_media_treatments'},
)
data = loader.load()
其中:
kpi_type
是 'revenue'
或 'non_revenue'
。
Numpy 多维数组
如需直接加载 NumPy 多维数组,请使用 NDArrayInputDataBuilder
:
将数据创建为单独的 NumPy 多维数组。
import numpy as np
kpi_nd = np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]])
controls_nd = np.array([
[[1, 5], [2, 6], [3, 4]],
[[7, 8], [9, 10], [11, 12]],
[[13, 14], [15, 16], [17, 18]],
])
population_nd = np.array([1, 2, 3])
revenue_per_kpi_nd = np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]])
media_nd = np.array([
[[1, 5], [2, 6], [3, 4]],
[[7, 8], [9, 10], [11, 12]],
[[13, 14], [15, 16], [17, 18]],
])
media_spend_nd = np.array([
[[1, 5], [2, 6], [3, 4]],
[[7, 8], [9, 10], [11, 12]],
[[13, 14], [15, 16], [17, 18]],
])
organic_media_nd = np.array([
[[1, 5], [2, 6], [3, 4]],
[[7, 8], [9, 10], [11, 12]],
[[13, 14], [15, 16], [17, 18]],
])
non_media_treatments_nd = np.array([
[[1, 5], [2, 6], [3, 4]],
[[7, 8], [9, 10], [11, 12]],
[[13, 14], [15, 16], [17, 18]],
])
使用 NDArrayInputDataBuilder
设置时间和地理位置,并根据 Meridian 输入数据中的要求指定渠道或维度名称。如需了解每个变量的定义,请参阅收集和整理数据。
from meridian.data import nd_array_input_data_builder as data_builder
builder = (
data_builder.NDArrayInputDataBuilder(kpi_type='non_revenue')
)
builder.time_coords = ['2024-01-02', '2024-01-03', '2024-01-01']
builder.media_time_coords = ['2024-01-02', '2024-01-03', '2024-01-01']
builder.geos = ['B', 'A', 'C']
builder = (
builder
.with_kpi(kpi_nd)
.with_revenue_per_kpi(revenue_per_kpi_nd)
.with_population(population_nd)
.with_controls(
controls_nd,
control_names=["control0", "control1"])
.with_media(
m_nd=media_nd,
ms_nd=media_spend_nd,
media_channels=["channel0", "channel1"]
)
.with_organic_media(
organic_media_nd,
organic_media_channels=["organic_channel0", "organic_channel1"]
).with_non_media_treatments(
non_media_treatments_nd,
non_media_channel_names=["non_media_channel0", "non_media_channel1"]
)
)
data = builder.build()
其中:
kpi_type
是 'revenue'
或 'non_revenue'
。
使用 DataFrameInputDataBuilder
加载模拟的其他数据格式(例如 excel
):
将数据(例如 excel
电子表格)读入一个或多个 Pandas DataFrame
。
import pandas as pd
df = pd.read_excel(
'https://github.com/google/meridian/raw/main/meridian/data/simulated_data/xlsx/geo_all_channels.xlsx',
engine='openpyxl',
)
使用 DataFrameInputDataBuilder
将列名称映射到 Meridian 输入数据所需的变量类型。如需了解每个变量的定义,请参阅收集和整理数据。
from meridian.data import data_frame_input_data_builder as data_builder
builder = data_builder.DataFrameInputDataBuilder(
kpi_type='non_revenue',
default_kpi_column="conversions",
default_revenue_per_kpi_column="revenue_per_conversion",
)
builder = (
builder
.with_kpi(df)
.with_revenue_per_kpi(df)
.with_population(df)
.with_controls(df, control_cols=["GQV", "Competitor_Sales"])
)
channels = ["Channel0", "Channel1", "Channel2", "Channel3", "Channel4"]
builder = builder.with_media(
df,
media_cols=[f"{channel}_impression" for channel in channels],
media_spend_cols=[f"{channel}_spend" for channel in channels],
media_channels=channels,
)
builder = (
builder
.with_organic_media(
df,
organic_media_cols = ["Organic_channel0_impression"],
organic_media_channels = ["Organic_channel0"],
)
.with_non_media_treatments(
df,
non_media_treatment_cols=['Promo']
)
)
data = builder.build()
其中:
kpi_type
是 'revenue'
或 'non_revenue'
。
接下来,您可以创建模型。
如未另行说明,那么本页面中的内容已根据知识共享署名 4.0 许可获得了许可,并且代码示例已根据 Apache 2.0 许可获得了许可。有关详情,请参阅 Google 开发者网站政策。Java 是 Oracle 和/或其关联公司的注册商标。
最后更新时间 (UTC):2025-08-16。
[null,null,["最后更新时间 (UTC):2025-08-16。"],[[["\u003cp\u003eSimulated data examples are provided for CSV, Xarray Dataset, and other formats like Excel, showcasing how to load data with different loaders.\u003c/p\u003e\n"],["\u003cp\u003eWhen loading CSV data, you need to map column names to variable types like \u003ccode\u003etime\u003c/code\u003e, \u003ccode\u003egeo\u003c/code\u003e, \u003ccode\u003econtrols\u003c/code\u003e, \u003ccode\u003epopulation\u003c/code\u003e, \u003ccode\u003ekpi\u003c/code\u003e, \u003ccode\u003erevenue_per_kpi\u003c/code\u003e, \u003ccode\u003emedia\u003c/code\u003e, and \u003ccode\u003emedia_spend\u003c/code\u003e using \u003ccode\u003eCsvDataLoader\u003c/code\u003e.\u003c/p\u003e\n"],["\u003cp\u003eLoading Xarray Dataset data requires using \u003ccode\u003eXrDatasetDataLoader\u003c/code\u003e and mapping coordinates and arrays through the \u003ccode\u003ename_mapping\u003c/code\u003e argument, accommodating variations in input dataset names.\u003c/p\u003e\n"],["\u003cp\u003eFor other data formats, like Excel, \u003ccode\u003eDataFrameDataLoader\u003c/code\u003e is used after mapping column names to the required variable types, and reading the data into a DataFrame.\u003c/p\u003e\n"],["\u003cp\u003eEach data loader, such as \u003ccode\u003eCsvDataLoader\u003c/code\u003e, \u003ccode\u003eXrDatasetDataLoader\u003c/code\u003e, and \u003ccode\u003eDataFrameDataLoader\u003c/code\u003e, requires setting the \u003ccode\u003ekpi_type\u003c/code\u003e as either \u003ccode\u003e'revenue'\u003c/code\u003e or \u003ccode\u003e'non_revenue'\u003c/code\u003e.\u003c/p\u003e\n"]]],["Three data formats—CSV, Xarray Dataset, and others (like Excel)—are detailed. For CSV and other formats, users must map column names to variable types like `time`, `geo`, `controls`, `population`, `kpi`, etc., and map media variables to channel names. They load data using `CsvDataLoader` or `DataFrameDataLoader`, specifying `kpi_type`, paths, and mappings. For Xarray, users load the data using pickle and pass the data to `XrDatasetDataLoader`, providing mappings for coordinates and arrays, then load it. Each loader has a load function to load the data.\n"],null,["# Load geo-level data with organic media and non-media treatments\n\nSimulated data is provided as an example for each data type and format in the\nfollowing sections.\n\nCSV\n---\n\nTo load the [simulated\nCSV](https://github.com/google/meridian/tree/main/meridian/data/simulated_data/csv/geo_all_channels.csv)\ndata using `CsvDataLoader`:\n\n1. Map the column names to the variable types. The required variable types are\n `time`, `geo`, `controls`, `population`, `kpi`, `revenue_per_kpi`, `media`,\n and `media_spend`. For media channels that have no direct cost, you must\n assign their media exposure to `organic_media`. For non-media treatments,\n you must assign the corresponding columns names to `non_media_treatments`.\n For the definition of each variable, see [Collect and organize your\n data](/meridian/docs/user-guide/collect-data).\n\n coord_to_columns = load.CoordToColumns(\n time='time',\n geo='geo',\n controls=['GQV', 'Competitor_Sales'],\n population='population',\n kpi='conversions',\n revenue_per_kpi='revenue_per_conversion',\n media=[\n 'Channel0_impression',\n 'Channel1_impression',\n 'Channel2_impression',\n 'Channel3_impression',\n 'Channel4_impression',\n ],\n media_spend=[\n 'Channel0_spend',\n 'Channel1_spend',\n 'Channel2_spend',\n 'Channel3_spend',\n 'Channel4_spend',\n ],\n organic_media=['Organic_channel0_impression'],\n non_media_treatments=['Promo'],\n )\n\n2. Map the media variables and the media spends to the designated channel names\n that you want to display in the two-page output. In the following example,\n `Channel0_impression` and `Channel0_spend` are connected to the same\n channel, `Channel0`.\n\n correct_media_to_channel = {\n 'Channel0_impression': 'Channel0',\n 'Channel1_impression': 'Channel1',\n 'Channel2_impression': 'Channel2',\n 'Channel3_impression': 'Channel3',\n 'Channel4_impression': 'Channel4',\n }\n correct_media_spend_to_channel = {\n 'Channel0_spend': 'Channel0',\n 'Channel1_spend': 'Channel1',\n 'Channel2_spend': 'Channel2',\n 'Channel3_spend': 'Channel3',\n 'Channel4_spend': 'Channel4',\n }\n\n3. Load the data using `CsvDataLoader`:\n\n loader = load.CsvDataLoader(\n csv_path=f'/{PATH}/{FILENAME}.csv',\n kpi_type='non_revenue',\n coord_to_columns=coord_to_columns,\n media_to_channel=correct_media_to_channel,\n media_spend_to_channel=correct_media_spend_to_channel,\n )\n data = loader.load()\n\n Where:\n - `kpi_type` is either `'revenue'` or `'non_revenue'`.\n - `PATH` is the path to the data file location.\n - `FILENAME` is the name of your data file.\n\nXarray Dataset\n--------------\n\nTo load the [simulated Xarray\nDataset](https://github.com/google/meridian/tree/main/meridian/data/simulated_data/pkl/geo_all_channels.pkl)\nusing `XrDatasetDataLoader`:\n\n1. Load the data using `pickle`:\n\n import pickle\n with open(f'/{PATH}/{FILENAME}.pkl', 'r') as fh:\n XrDataset=pickle.load(fh)\n\n Where:\n - `PATH` is the path to the data file location.\n - `FILENAME` is the name of your data file.\n2. Pass the dataset to `XrDatasetDataLoader`. Use the `name_mapping` argument\n to map the coordinates and arrays. Provide mapping if the names in the input\n dataset are different from the required names. The required coordinate names\n are `geo`, `time`, `control_variable`, `media_channel`,\n `organic_media_channel`, and `non_media_channel`. The required data\n variables names are `kpi`, `revenue_per_kpi`, `controls`, `population`,\n `media`, `media_spend`, `organic_media`, and `non_media_treatments`.\n\n loader = load.XrDatasetDataLoader(\n XrDataset,\n kpi_type='non_revenue',\n name_mapping={'channel': 'media_channel',\n 'control': 'control_variable',\n 'organic_channel': 'organic_media_channel',\n 'non_media_treatment': 'non_media_channel',\n 'conversions': 'kpi',\n 'revenue_per_conversion': 'revenue_per_kpi',\n 'control_value': 'controls',\n 'spend': 'media_spend',\n 'non_media_treatment_value': 'non_media_treatments'},\n )\n\n data = loader.load()\n\n Where:\n - `kpi_type` is either `'revenue'` or `'non_revenue'`.\n\nNumpy ndarray\n-------------\n\nTo load numpy ndarrays directly, use `NDArrayInputDataBuilder`:\n\n1. Create the data into separate numpy ndarrays.\n\n import numpy as np\n\n kpi_nd = np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]])\n controls_nd = np.array([\n [[1, 5], [2, 6], [3, 4]],\n [[7, 8], [9, 10], [11, 12]],\n [[13, 14], [15, 16], [17, 18]],\n ])\n population_nd = np.array([1, 2, 3])\n revenue_per_kpi_nd = np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]])\n media_nd = np.array([\n [[1, 5], [2, 6], [3, 4]],\n [[7, 8], [9, 10], [11, 12]],\n [[13, 14], [15, 16], [17, 18]],\n ])\n media_spend_nd = np.array([\n [[1, 5], [2, 6], [3, 4]],\n [[7, 8], [9, 10], [11, 12]],\n [[13, 14], [15, 16], [17, 18]],\n ])\n organic_media_nd = np.array([\n [[1, 5], [2, 6], [3, 4]],\n [[7, 8], [9, 10], [11, 12]],\n [[13, 14], [15, 16], [17, 18]],\n ])\n non_media_treatments_nd = np.array([\n [[1, 5], [2, 6], [3, 4]],\n [[7, 8], [9, 10], [11, 12]],\n [[13, 14], [15, 16], [17, 18]],\n ])\n\n2. Use a\n [`NDArrayInputDataBuilder`](https://github.com/google/meridian/blob/4624447e0aace5c24d42b58dd1cfd8fe0dc00971/meridian/data/nd_array_input_data_builder.py#L25)\n to set time and geos, as well as give channel or dimension\n names as required in a Meridian input data.\n For the definition of each variable, see\n [Collect and organize your data](/meridian/docs/user-guide/collect-data).\n\n from meridian.data import nd_array_input_data_builder as data_builder\n\n builder = (\n data_builder.NDArrayInputDataBuilder(kpi_type='non_revenue')\n )\n builder.time_coords = ['2024-01-02', '2024-01-03', '2024-01-01']\n builder.media_time_coords = ['2024-01-02', '2024-01-03', '2024-01-01']\n builder.geos = ['B', 'A', 'C']\n builder = (\n builder\n .with_kpi(kpi_nd)\n .with_revenue_per_kpi(revenue_per_kpi_nd)\n .with_population(population_nd)\n .with_controls(\n controls_nd,\n control_names=[\"control0\", \"control1\"])\n .with_media(\n m_nd=media_nd,\n ms_nd=media_spend_nd,\n media_channels=[\"channel0\", \"channel1\"]\n )\n .with_organic_media(\n organic_media_nd,\n organic_media_channels=[\"organic_channel0\", \"organic_channel1\"]\n ).with_non_media_treatments(\n non_media_treatments_nd,\n non_media_channel_names=[\"non_media_channel0\", \"non_media_channel1\"]\n )\n )\n\n data = builder.build()\n\n Where:\n - `kpi_type` is either `'revenue'` or `'non_revenue'`.\n\nPandas DataFrame or other data formats\n--------------------------------------\n\nTo load the [simulated other data\nformat](https://github.com/google/meridian/tree/main/meridian/data/simulated_data/xlsx/geo_all_channels.xlsx)\n(such as `excel`) using `DataFrameInputDataBuilder`:\n\n1. Read the data (such as an `excel` spreadsheet) into one or more Pandas `DataFrame`(s).\n\n import pandas as pd\n\n df = pd.read_excel(\n 'https://github.com/google/meridian/raw/main/meridian/data/simulated_data/xlsx/geo_all_channels.xlsx',\n engine='openpyxl',\n )\n\n2. Use a\n [`DataFrameInputDataBuilder`](https://github.com/google/meridian/blob/4624447e0aace5c24d42b58dd1cfd8fe0dc00971/meridian/data/data_frame_input_data_builder.py#L25)\n to map column names to the variable types required in a Meridian input data.\n For the definition of each variable, see\n [Collect and organize your data](/meridian/docs/user-guide/collect-data).\n\n from meridian.data import data_frame_input_data_builder as data_builder\n\n builder = data_builder.DataFrameInputDataBuilder(\n kpi_type='non_revenue',\n default_kpi_column=\"conversions\",\n default_revenue_per_kpi_column=\"revenue_per_conversion\",\n )\n builder = (\n builder\n .with_kpi(df)\n .with_revenue_per_kpi(df)\n .with_population(df)\n .with_controls(df, control_cols=[\"GQV\", \"Competitor_Sales\"])\n )\n channels = [\"Channel0\", \"Channel1\", \"Channel2\", \"Channel3\", \"Channel4\"]\n builder = builder.with_media(\n df,\n media_cols=[f\"{channel}_impression\" for channel in channels],\n media_spend_cols=[f\"{channel}_spend\" for channel in channels],\n media_channels=channels,\n )\n builder = (\n builder\n .with_organic_media(\n df,\n organic_media_cols = [\"Organic_channel0_impression\"],\n organic_media_channels = [\"Organic_channel0\"],\n )\n .with_non_media_treatments(\n df,\n non_media_treatment_cols=['Promo']\n )\n )\n\n data = builder.build()\n\n Where:\n - `kpi_type` is either `'revenue'` or `'non_revenue'`.\n\nNext, you can [create your model](/meridian/docs/user-guide/modeling-overview)."]]