Join the newly launched
Discord community for real-time discussions, peer support, and direct interaction with the Meridian team!
Load geo-level data with organic media and non-media treatments
Stay organized with collections
Save and categorize content based on your preferences.
Simulated data is provided as an example for each data type and format in the
following sections.
CSV
To load the simulated
CSV
data using CsvDataLoader
:
Map the column names to the variable types. The required variable types are
time
, geo
, controls
, population
, kpi
, revenue_per_kpi
, media
,
and media_spend
. For media channels that have no direct cost, you must
assign their media exposure to organic_media
. For non-media treatments,
you must assign the corresponding columns names to non_media_treatments
.
For the definition of each variable, see Collect and organize your
data.
coord_to_columns = load.CoordToColumns(
time='time',
geo='geo',
controls=['GQV', 'Competitor_Sales'],
population='population',
kpi='conversions',
revenue_per_kpi='revenue_per_conversion',
media=[
'Channel0_impression',
'Channel1_impression',
'Channel2_impression',
'Channel3_impression',
'Channel4_impression',
],
media_spend=[
'Channel0_spend',
'Channel1_spend',
'Channel2_spend',
'Channel3_spend',
'Channel4_spend',
],
organic_media=['Organic_channel0_impression'],
non_media_treatments=['Promo'],
)
Map the media variables and the media spends to the designated channel names
that you want to display in the two-page output. In the following example,
Channel0_impression
and Channel0_spend
are connected to the same
channel, Channel0
.
correct_media_to_channel = {
'Channel0_impression': 'Channel0',
'Channel1_impression': 'Channel1',
'Channel2_impression': 'Channel2',
'Channel3_impression': 'Channel3',
'Channel4_impression': 'Channel4',
}
correct_media_spend_to_channel = {
'Channel0_spend': 'Channel0',
'Channel1_spend': 'Channel1',
'Channel2_spend': 'Channel2',
'Channel3_spend': 'Channel3',
'Channel4_spend': 'Channel4',
}
Load the data using CsvDataLoader
:
loader = load.CsvDataLoader(
csv_path=f'/{PATH}/{FILENAME}.csv',
kpi_type='non_revenue',
coord_to_columns=coord_to_columns,
media_to_channel=correct_media_to_channel,
media_spend_to_channel=correct_media_spend_to_channel,
)
data = loader.load()
Where:
kpi_type
is either 'revenue'
or 'non_revenue'
.
PATH
is the path to the data file location.
FILENAME
is the name of your data file.
Xarray Dataset
To load the simulated Xarray
Dataset
using XrDatasetDataLoader
:
Load the data using pickle
:
import pickle
with open(f'/{PATH}/{FILENAME}.pkl', 'r') as fh:
XrDataset=pickle.load(fh)
Where:
PATH
is the path to the data file location.
FILENAME
is the name of your data file.
Pass the dataset to XrDatasetDataLoader
. Use the name_mapping
argument
to map the coordinates and arrays. Provide mapping if the names in the input
dataset are different from the required names. The required coordinate names
are geo
, time
, control_variable
, media_channel
,
organic_media_channel
, and non_media_channel
. The required data
variables names are kpi
, revenue_per_kpi
, controls
, population
,
media
, media_spend
, organic_media
, and non_media_treatments
.
loader = load.XrDatasetDataLoader(
XrDataset,
kpi_type='non_revenue',
name_mapping={'channel': 'media_channel',
'control': 'control_variable',
'organic_channel': 'organic_media_channel',
'non_media_treatment': 'non_media_channel',
'conversions': 'kpi',
'revenue_per_conversion': 'revenue_per_kpi',
'control_value': 'controls',
'spend': 'media_spend',
'non_media_treatment_value': 'non_media_treatments'},
)
data = loader.load()
Where:
kpi_type
is either 'revenue'
or 'non_revenue'
.
Numpy ndarray
To load numpy ndarrays directly, use NDArrayInputDataBuilder
:
Create the data into separate numpy ndarrays.
import numpy as np
kpi_nd = np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]])
controls_nd = np.array([
[[1, 5], [2, 6], [3, 4]],
[[7, 8], [9, 10], [11, 12]],
[[13, 14], [15, 16], [17, 18]],
])
population_nd = np.array([1, 2, 3])
revenue_per_kpi_nd = np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]])
media_nd = np.array([
[[1, 5], [2, 6], [3, 4]],
[[7, 8], [9, 10], [11, 12]],
[[13, 14], [15, 16], [17, 18]],
])
media_spend_nd = np.array([
[[1, 5], [2, 6], [3, 4]],
[[7, 8], [9, 10], [11, 12]],
[[13, 14], [15, 16], [17, 18]],
])
organic_media_nd = np.array([
[[1, 5], [2, 6], [3, 4]],
[[7, 8], [9, 10], [11, 12]],
[[13, 14], [15, 16], [17, 18]],
])
non_media_treatments_nd = np.array([
[[1, 5], [2, 6], [3, 4]],
[[7, 8], [9, 10], [11, 12]],
[[13, 14], [15, 16], [17, 18]],
])
Use a
NDArrayInputDataBuilder
to set time and geos, as well as give channel or dimension
names as required in a Meridian input data.
For the definition of each variable, see
Collect and organize your data.
from meridian.data import nd_array_input_data_builder as data_builder
builder = (
data_builder.NDArrayInputDataBuilder(kpi_type='non_revenue')
)
builder.time_coords = ['2024-01-02', '2024-01-03', '2024-01-01']
builder.media_time_coords = ['2024-01-02', '2024-01-03', '2024-01-01']
builder.geos = ['B', 'A', 'C']
builder = (
builder
.with_kpi(kpi_nd)
.with_revenue_per_kpi(revenue_per_kpi_nd)
.with_population(population_nd)
.with_controls(
controls_nd,
control_names=["control0", "control1"])
.with_media(
m_nd=media_nd,
ms_nd=media_spend_nd,
media_channels=["channel0", "channel1"]
)
.with_organic_media(
organic_media_nd,
organic_media_channels=["organic_channel0", "organic_channel1"]
).with_non_media_treatments(
non_media_treatments_nd,
non_media_channel_names=["non_media_channel0", "non_media_channel1"]
)
)
data = builder.build()
Where:
kpi_type
is either 'revenue'
or 'non_revenue'
.
To load the simulated other data
format
(such as excel
) using DataFrameInputDataBuilder
:
Read the data (such as an excel
spreadsheet) into one or more Pandas DataFrame
(s).
import pandas as pd
df = pd.read_excel(
'https://github.com/google/meridian/raw/main/meridian/data/simulated_data/xlsx/geo_all_channels.xlsx',
engine='openpyxl',
)
Use a
DataFrameInputDataBuilder
to map column names to the variable types required in a Meridian input data.
For the definition of each variable, see
Collect and organize your data.
from meridian.data import data_frame_input_data_builder as data_builder
builder = data_builder.DataFrameInputDataBuilder(
kpi_type='non_revenue',
default_kpi_column="conversions",
default_revenue_per_kpi_column="revenue_per_conversion",
)
builder = (
builder
.with_kpi(df)
.with_revenue_per_kpi(df)
.with_population(df)
.with_controls(df, control_cols=["GQV", "Competitor_Sales"])
)
channels = ["Channel0", "Channel1", "Channel2", "Channel3", "Channel4"]
builder = builder.with_media(
df,
media_cols=[f"{channel}_impression" for channel in channels],
media_spend_cols=[f"{channel}_spend" for channel in channels],
media_channels=channels,
)
builder = (
builder
.with_organic_media(
df,
organic_media_cols = ["Organic_channel0_impression"],
organic_media_channels = ["Organic_channel0"],
)
.with_non_media_treatments(
df,
non_media_treatment_cols=['Promo']
)
)
data = builder.build()
Where:
kpi_type
is either 'revenue'
or 'non_revenue'
.
Next, you can create your model.
Except as otherwise noted, the content of this page is licensed under the Creative Commons Attribution 4.0 License, and code samples are licensed under the Apache 2.0 License. For details, see the Google Developers Site Policies. Java is a registered trademark of Oracle and/or its affiliates.
Last updated 2025-07-10 UTC.
[null,null,["Last updated 2025-07-10 UTC."],[[["\u003cp\u003eSimulated data examples are provided for CSV, Xarray Dataset, and other formats like Excel, showcasing how to load data with different loaders.\u003c/p\u003e\n"],["\u003cp\u003eWhen loading CSV data, you need to map column names to variable types like \u003ccode\u003etime\u003c/code\u003e, \u003ccode\u003egeo\u003c/code\u003e, \u003ccode\u003econtrols\u003c/code\u003e, \u003ccode\u003epopulation\u003c/code\u003e, \u003ccode\u003ekpi\u003c/code\u003e, \u003ccode\u003erevenue_per_kpi\u003c/code\u003e, \u003ccode\u003emedia\u003c/code\u003e, and \u003ccode\u003emedia_spend\u003c/code\u003e using \u003ccode\u003eCsvDataLoader\u003c/code\u003e.\u003c/p\u003e\n"],["\u003cp\u003eLoading Xarray Dataset data requires using \u003ccode\u003eXrDatasetDataLoader\u003c/code\u003e and mapping coordinates and arrays through the \u003ccode\u003ename_mapping\u003c/code\u003e argument, accommodating variations in input dataset names.\u003c/p\u003e\n"],["\u003cp\u003eFor other data formats, like Excel, \u003ccode\u003eDataFrameDataLoader\u003c/code\u003e is used after mapping column names to the required variable types, and reading the data into a DataFrame.\u003c/p\u003e\n"],["\u003cp\u003eEach data loader, such as \u003ccode\u003eCsvDataLoader\u003c/code\u003e, \u003ccode\u003eXrDatasetDataLoader\u003c/code\u003e, and \u003ccode\u003eDataFrameDataLoader\u003c/code\u003e, requires setting the \u003ccode\u003ekpi_type\u003c/code\u003e as either \u003ccode\u003e'revenue'\u003c/code\u003e or \u003ccode\u003e'non_revenue'\u003c/code\u003e.\u003c/p\u003e\n"]]],["Three data formats—CSV, Xarray Dataset, and others (like Excel)—are detailed. For CSV and other formats, users must map column names to variable types like `time`, `geo`, `controls`, `population`, `kpi`, etc., and map media variables to channel names. They load data using `CsvDataLoader` or `DataFrameDataLoader`, specifying `kpi_type`, paths, and mappings. For Xarray, users load the data using pickle and pass the data to `XrDatasetDataLoader`, providing mappings for coordinates and arrays, then load it. Each loader has a load function to load the data.\n"],null,["# Load geo-level data with organic media and non-media treatments\n\nSimulated data is provided as an example for each data type and format in the\nfollowing sections.\n\nCSV\n---\n\nTo load the [simulated\nCSV](https://github.com/google/meridian/tree/main/meridian/data/simulated_data/csv/geo_all_channels.csv)\ndata using `CsvDataLoader`:\n\n1. Map the column names to the variable types. The required variable types are\n `time`, `geo`, `controls`, `population`, `kpi`, `revenue_per_kpi`, `media`,\n and `media_spend`. For media channels that have no direct cost, you must\n assign their media exposure to `organic_media`. For non-media treatments,\n you must assign the corresponding columns names to `non_media_treatments`.\n For the definition of each variable, see [Collect and organize your\n data](/meridian/docs/user-guide/collect-data).\n\n coord_to_columns = load.CoordToColumns(\n time='time',\n geo='geo',\n controls=['GQV', 'Competitor_Sales'],\n population='population',\n kpi='conversions',\n revenue_per_kpi='revenue_per_conversion',\n media=[\n 'Channel0_impression',\n 'Channel1_impression',\n 'Channel2_impression',\n 'Channel3_impression',\n 'Channel4_impression',\n ],\n media_spend=[\n 'Channel0_spend',\n 'Channel1_spend',\n 'Channel2_spend',\n 'Channel3_spend',\n 'Channel4_spend',\n ],\n organic_media=['Organic_channel0_impression'],\n non_media_treatments=['Promo'],\n )\n\n2. Map the media variables and the media spends to the designated channel names\n that you want to display in the two-page output. In the following example,\n `Channel0_impression` and `Channel0_spend` are connected to the same\n channel, `Channel0`.\n\n correct_media_to_channel = {\n 'Channel0_impression': 'Channel0',\n 'Channel1_impression': 'Channel1',\n 'Channel2_impression': 'Channel2',\n 'Channel3_impression': 'Channel3',\n 'Channel4_impression': 'Channel4',\n }\n correct_media_spend_to_channel = {\n 'Channel0_spend': 'Channel0',\n 'Channel1_spend': 'Channel1',\n 'Channel2_spend': 'Channel2',\n 'Channel3_spend': 'Channel3',\n 'Channel4_spend': 'Channel4',\n }\n\n3. Load the data using `CsvDataLoader`:\n\n loader = load.CsvDataLoader(\n csv_path=f'/{PATH}/{FILENAME}.csv',\n kpi_type='non_revenue',\n coord_to_columns=coord_to_columns,\n media_to_channel=correct_media_to_channel,\n media_spend_to_channel=correct_media_spend_to_channel,\n )\n data = loader.load()\n\n Where:\n - `kpi_type` is either `'revenue'` or `'non_revenue'`.\n - `PATH` is the path to the data file location.\n - `FILENAME` is the name of your data file.\n\nXarray Dataset\n--------------\n\nTo load the [simulated Xarray\nDataset](https://github.com/google/meridian/tree/main/meridian/data/simulated_data/pkl/geo_all_channels.pkl)\nusing `XrDatasetDataLoader`:\n\n1. Load the data using `pickle`:\n\n import pickle\n with open(f'/{PATH}/{FILENAME}.pkl', 'r') as fh:\n XrDataset=pickle.load(fh)\n\n Where:\n - `PATH` is the path to the data file location.\n - `FILENAME` is the name of your data file.\n2. Pass the dataset to `XrDatasetDataLoader`. Use the `name_mapping` argument\n to map the coordinates and arrays. Provide mapping if the names in the input\n dataset are different from the required names. The required coordinate names\n are `geo`, `time`, `control_variable`, `media_channel`,\n `organic_media_channel`, and `non_media_channel`. The required data\n variables names are `kpi`, `revenue_per_kpi`, `controls`, `population`,\n `media`, `media_spend`, `organic_media`, and `non_media_treatments`.\n\n loader = load.XrDatasetDataLoader(\n XrDataset,\n kpi_type='non_revenue',\n name_mapping={'channel': 'media_channel',\n 'control': 'control_variable',\n 'organic_channel': 'organic_media_channel',\n 'non_media_treatment': 'non_media_channel',\n 'conversions': 'kpi',\n 'revenue_per_conversion': 'revenue_per_kpi',\n 'control_value': 'controls',\n 'spend': 'media_spend',\n 'non_media_treatment_value': 'non_media_treatments'},\n )\n\n data = loader.load()\n\n Where:\n - `kpi_type` is either `'revenue'` or `'non_revenue'`.\n\nNumpy ndarray\n-------------\n\nTo load numpy ndarrays directly, use `NDArrayInputDataBuilder`:\n\n1. Create the data into separate numpy ndarrays.\n\n import numpy as np\n\n kpi_nd = np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]])\n controls_nd = np.array([\n [[1, 5], [2, 6], [3, 4]],\n [[7, 8], [9, 10], [11, 12]],\n [[13, 14], [15, 16], [17, 18]],\n ])\n population_nd = np.array([1, 2, 3])\n revenue_per_kpi_nd = np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]])\n media_nd = np.array([\n [[1, 5], [2, 6], [3, 4]],\n [[7, 8], [9, 10], [11, 12]],\n [[13, 14], [15, 16], [17, 18]],\n ])\n media_spend_nd = np.array([\n [[1, 5], [2, 6], [3, 4]],\n [[7, 8], [9, 10], [11, 12]],\n [[13, 14], [15, 16], [17, 18]],\n ])\n organic_media_nd = np.array([\n [[1, 5], [2, 6], [3, 4]],\n [[7, 8], [9, 10], [11, 12]],\n [[13, 14], [15, 16], [17, 18]],\n ])\n non_media_treatments_nd = np.array([\n [[1, 5], [2, 6], [3, 4]],\n [[7, 8], [9, 10], [11, 12]],\n [[13, 14], [15, 16], [17, 18]],\n ])\n\n2. Use a\n [`NDArrayInputDataBuilder`](https://github.com/google/meridian/blob/4624447e0aace5c24d42b58dd1cfd8fe0dc00971/meridian/data/nd_array_input_data_builder.py#L25)\n to set time and geos, as well as give channel or dimension\n names as required in a Meridian input data.\n For the definition of each variable, see\n [Collect and organize your data](/meridian/docs/user-guide/collect-data).\n\n from meridian.data import nd_array_input_data_builder as data_builder\n\n builder = (\n data_builder.NDArrayInputDataBuilder(kpi_type='non_revenue')\n )\n builder.time_coords = ['2024-01-02', '2024-01-03', '2024-01-01']\n builder.media_time_coords = ['2024-01-02', '2024-01-03', '2024-01-01']\n builder.geos = ['B', 'A', 'C']\n builder = (\n builder\n .with_kpi(kpi_nd)\n .with_revenue_per_kpi(revenue_per_kpi_nd)\n .with_population(population_nd)\n .with_controls(\n controls_nd,\n control_names=[\"control0\", \"control1\"])\n .with_media(\n m_nd=media_nd,\n ms_nd=media_spend_nd,\n media_channels=[\"channel0\", \"channel1\"]\n )\n .with_organic_media(\n organic_media_nd,\n organic_media_channels=[\"organic_channel0\", \"organic_channel1\"]\n ).with_non_media_treatments(\n non_media_treatments_nd,\n non_media_channel_names=[\"non_media_channel0\", \"non_media_channel1\"]\n )\n )\n\n data = builder.build()\n\n Where:\n - `kpi_type` is either `'revenue'` or `'non_revenue'`.\n\nPandas DataFrame or other data formats\n--------------------------------------\n\nTo load the [simulated other data\nformat](https://github.com/google/meridian/tree/main/meridian/data/simulated_data/xlsx/geo_all_channels.xlsx)\n(such as `excel`) using `DataFrameInputDataBuilder`:\n\n1. Read the data (such as an `excel` spreadsheet) into one or more Pandas `DataFrame`(s).\n\n import pandas as pd\n\n df = pd.read_excel(\n 'https://github.com/google/meridian/raw/main/meridian/data/simulated_data/xlsx/geo_all_channels.xlsx',\n engine='openpyxl',\n )\n\n2. Use a\n [`DataFrameInputDataBuilder`](https://github.com/google/meridian/blob/4624447e0aace5c24d42b58dd1cfd8fe0dc00971/meridian/data/data_frame_input_data_builder.py#L25)\n to map column names to the variable types required in a Meridian input data.\n For the definition of each variable, see\n [Collect and organize your data](/meridian/docs/user-guide/collect-data).\n\n from meridian.data import data_frame_input_data_builder as data_builder\n\n builder = data_builder.DataFrameInputDataBuilder(\n kpi_type='non_revenue',\n default_kpi_column=\"conversions\",\n default_revenue_per_kpi_column=\"revenue_per_conversion\",\n )\n builder = (\n builder\n .with_kpi(df)\n .with_revenue_per_kpi(df)\n .with_population(df)\n .with_controls(df, control_cols=[\"GQV\", \"Competitor_Sales\"])\n )\n channels = [\"Channel0\", \"Channel1\", \"Channel2\", \"Channel3\", \"Channel4\"]\n builder = builder.with_media(\n df,\n media_cols=[f\"{channel}_impression\" for channel in channels],\n media_spend_cols=[f\"{channel}_spend\" for channel in channels],\n media_channels=channels,\n )\n builder = (\n builder\n .with_organic_media(\n df,\n organic_media_cols = [\"Organic_channel0_impression\"],\n organic_media_channels = [\"Organic_channel0\"],\n )\n .with_non_media_treatments(\n df,\n non_media_treatment_cols=['Promo']\n )\n )\n\n data = builder.build()\n\n Where:\n - `kpi_type` is either `'revenue'` or `'non_revenue'`.\n\nNext, you can [create your model](/meridian/docs/user-guide/modeling-overview)."]]