Rejoignez la nouvelle communauté
Discord pour discuter en temps réel, obtenir l'aide de vos pairs et communiquer directement avec l'équipe Meridian.
Charger des données au niveau géographique avec des traitements média naturel et non média
Restez organisé à l'aide des collections
Enregistrez et classez les contenus selon vos préférences.
Des données simulées sont fournies à titre d'exemple pour chaque type et format de données dans les sections suivantes.
CSV
Pour charger les données CSV simulées à l'aide de CsvDataLoader
:
Mappez les noms de colonnes avec les types de variables. Les types de variables requis sont time
, geo
, controls
, population
, kpi
, revenue_per_kpi
, media
et media_spend
. Pour les canaux média sans coût direct, vous devez attribuer leur exposition média à organic_media
. Pour les traitements non média, vous devez attribuer les noms de colonnes correspondants à non_media_treatments
.
Pour connaître la définition de chaque variable, consultez Collecter et organiser vos données.
coord_to_columns = load.CoordToColumns(
time='time',
geo='geo',
controls=['GQV', 'Competitor_Sales'],
population='population',
kpi='conversions',
revenue_per_kpi='revenue_per_conversion',
media=[
'Channel0_impression',
'Channel1_impression',
'Channel2_impression',
'Channel3_impression',
'Channel4_impression',
],
media_spend=[
'Channel0_spend',
'Channel1_spend',
'Channel2_spend',
'Channel3_spend',
'Channel4_spend',
],
organic_media=['Organic_channel0_impression'],
non_media_treatments=['Promo'],
)
Mappez les variables et les dépenses média sur les noms de canaux désignés que vous souhaitez afficher dans la sortie sur deux pages. Dans l'exemple suivant, Channel0_impression
et Channel0_spend
sont connectés au même canal : Channel0
.
correct_media_to_channel = {
'Channel0_impression': 'Channel0',
'Channel1_impression': 'Channel1',
'Channel2_impression': 'Channel2',
'Channel3_impression': 'Channel3',
'Channel4_impression': 'Channel4',
}
correct_media_spend_to_channel = {
'Channel0_spend': 'Channel0',
'Channel1_spend': 'Channel1',
'Channel2_spend': 'Channel2',
'Channel3_spend': 'Channel3',
'Channel4_spend': 'Channel4',
}
Chargez les données à l'aide de CsvDataLoader
:
loader = load.CsvDataLoader(
csv_path=f'/{PATH}/{FILENAME}.csv',
kpi_type='non_revenue',
coord_to_columns=coord_to_columns,
media_to_channel=correct_media_to_channel,
media_spend_to_channel=correct_media_spend_to_channel,
)
data = loader.load()
Où :
kpi_type
correspond à 'revenue'
ou 'non_revenue'
.
PATH
est le chemin d'accès au fichier de données.
FILENAME
est le nom de votre fichier de données.
Ensemble de données Xarray
Pour charger l'ensemble de données Xarray simulées à l'aide de XrDatasetDataLoader
:
Chargez les données à l'aide de pickle
:
import pickle
with open(f'/{PATH}/{FILENAME}.pkl', 'r') as fh:
XrDataset=pickle.load(fh)
Où :
PATH
est le chemin d'accès au fichier de données.
FILENAME
est le nom de votre fichier de données.
Transmettez l'ensemble de données à XrDatasetDataLoader
. Utilisez l'argument name_mapping
pour mapper les coordonnées et les tableaux. Fournissez un mappage si les noms de l'ensemble de données d'entrée sont différents des noms requis. Les noms de coordonnées requis sont geo
, time
, control_variable
, media_channel
, organic_media_channel
et non_media_channel
. Les noms des variables de données requises sont kpi
, revenue_per_kpi
, controls
, population
, media
, media_spend
, organic_media
et non_media_treatments
.
loader = load.XrDatasetDataLoader(
XrDataset,
kpi_type='non_revenue',
name_mapping={'channel': 'media_channel',
'control': 'control_variable',
'organic_channel': 'organic_media_channel',
'non_media_treatment': 'non_media_channel',
'conversions': 'kpi',
'revenue_per_conversion': 'revenue_per_kpi',
'control_value': 'controls',
'spend': 'media_spend',
'non_media_treatment_value': 'non_media_treatments'},
)
data = loader.load()
Où :
kpi_type
correspond à 'revenue'
ou 'non_revenue'
.
ndarray Numpy
Pour charger directement des tableaux ndarray Numpy, utilisez NDArrayInputDataBuilder
:
Ajoutez les données dans des tableaux ndarray Numpy distincts.
import numpy as np
kpi_nd = np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]])
controls_nd = np.array([
[[1, 5], [2, 6], [3, 4]],
[[7, 8], [9, 10], [11, 12]],
[[13, 14], [15, 16], [17, 18]],
])
population_nd = np.array([1, 2, 3])
revenue_per_kpi_nd = np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]])
media_nd = np.array([
[[1, 5], [2, 6], [3, 4]],
[[7, 8], [9, 10], [11, 12]],
[[13, 14], [15, 16], [17, 18]],
])
media_spend_nd = np.array([
[[1, 5], [2, 6], [3, 4]],
[[7, 8], [9, 10], [11, 12]],
[[13, 14], [15, 16], [17, 18]],
])
organic_media_nd = np.array([
[[1, 5], [2, 6], [3, 4]],
[[7, 8], [9, 10], [11, 12]],
[[13, 14], [15, 16], [17, 18]],
])
non_media_treatments_nd = np.array([
[[1, 5], [2, 6], [3, 4]],
[[7, 8], [9, 10], [11, 12]],
[[13, 14], [15, 16], [17, 18]],
])
Utilisez un module NDArrayInputDataBuilder
pour définir l'heure et les données géographiques, ainsi que pour attribuer les noms requis aux canaux ou aux dimensions dans les données d'entrée de Meridian.
Pour connaître la définition de chaque variable, consultez Collecter et organiser vos données.
from meridian.data import nd_array_input_data_builder as data_builder
builder = (
data_builder.NDArrayInputDataBuilder(kpi_type='non_revenue')
)
builder.time_coords = ['2024-01-02', '2024-01-03', '2024-01-01']
builder.media_time_coords = ['2024-01-02', '2024-01-03', '2024-01-01']
builder.geos = ['B', 'A', 'C']
builder = (
builder
.with_kpi(kpi_nd)
.with_revenue_per_kpi(revenue_per_kpi_nd)
.with_population(population_nd)
.with_controls(
controls_nd,
control_names=["control0", "control1"])
.with_media(
m_nd=media_nd,
ms_nd=media_spend_nd,
media_channels=["channel0", "channel1"]
)
.with_organic_media(
organic_media_nd,
organic_media_channels=["organic_channel0", "organic_channel1"]
).with_non_media_treatments(
non_media_treatments_nd,
non_media_channel_names=["non_media_channel0", "non_media_channel1"]
)
)
data = builder.build()
Où :
kpi_type
correspond à 'revenue'
ou 'non_revenue'
.
Pour charger un autre format de données simulées (par exemple, excel
) à l'aide de DataFrameInputDataBuilder
:
Lisez les données (par exemple, une feuille de calcul excel
) dans un ou plusieurs DataFrame
pandas.
import pandas as pd
df = pd.read_excel(
'https://github.com/google/meridian/raw/main/meridian/data/simulated_data/xlsx/geo_all_channels.xlsx',
engine='openpyxl',
)
Utilisez un module DataFrameInputDataBuilder
pour mapper les noms de colonnes aux types de variables requis dans les données d'entrée de Meridian.
Pour connaître la définition de chaque variable, consultez Collecter et organiser vos données.
from meridian.data import data_frame_input_data_builder as data_builder
builder = data_builder.DataFrameInputDataBuilder(
kpi_type='non_revenue',
default_kpi_column="conversions",
default_revenue_per_kpi_column="revenue_per_conversion",
)
builder = (
builder
.with_kpi(df)
.with_revenue_per_kpi(df)
.with_population(df)
.with_controls(df, control_cols=["GQV", "Competitor_Sales"])
)
channels = ["Channel0", "Channel1", "Channel2", "Channel3", "Channel4"]
builder = builder.with_media(
df,
media_cols=[f"{channel}_impression" for channel in channels],
media_spend_cols=[f"{channel}_spend" for channel in channels],
media_channels=channels,
)
builder = (
builder
.with_organic_media(
df,
organic_media_cols = ["Organic_channel0_impression"],
organic_media_channels = ["Organic_channel0"],
)
.with_non_media_treatments(
df,
non_media_treatment_cols=['Promo']
)
)
data = builder.build()
Où :
kpi_type
correspond à 'revenue'
ou 'non_revenue'
.
Vous pouvez ensuite créer votre modèle.
Sauf indication contraire, le contenu de cette page est régi par une licence Creative Commons Attribution 4.0, et les échantillons de code sont régis par une licence Apache 2.0. Pour en savoir plus, consultez les Règles du site Google Developers. Java est une marque déposée d'Oracle et/ou de ses sociétés affiliées.
Dernière mise à jour le 2025/08/16 (UTC).
[null,null,["Dernière mise à jour le 2025/08/16 (UTC)."],[[["\u003cp\u003eSimulated data examples are provided for CSV, Xarray Dataset, and other formats like Excel, showcasing how to load data with different loaders.\u003c/p\u003e\n"],["\u003cp\u003eWhen loading CSV data, you need to map column names to variable types like \u003ccode\u003etime\u003c/code\u003e, \u003ccode\u003egeo\u003c/code\u003e, \u003ccode\u003econtrols\u003c/code\u003e, \u003ccode\u003epopulation\u003c/code\u003e, \u003ccode\u003ekpi\u003c/code\u003e, \u003ccode\u003erevenue_per_kpi\u003c/code\u003e, \u003ccode\u003emedia\u003c/code\u003e, and \u003ccode\u003emedia_spend\u003c/code\u003e using \u003ccode\u003eCsvDataLoader\u003c/code\u003e.\u003c/p\u003e\n"],["\u003cp\u003eLoading Xarray Dataset data requires using \u003ccode\u003eXrDatasetDataLoader\u003c/code\u003e and mapping coordinates and arrays through the \u003ccode\u003ename_mapping\u003c/code\u003e argument, accommodating variations in input dataset names.\u003c/p\u003e\n"],["\u003cp\u003eFor other data formats, like Excel, \u003ccode\u003eDataFrameDataLoader\u003c/code\u003e is used after mapping column names to the required variable types, and reading the data into a DataFrame.\u003c/p\u003e\n"],["\u003cp\u003eEach data loader, such as \u003ccode\u003eCsvDataLoader\u003c/code\u003e, \u003ccode\u003eXrDatasetDataLoader\u003c/code\u003e, and \u003ccode\u003eDataFrameDataLoader\u003c/code\u003e, requires setting the \u003ccode\u003ekpi_type\u003c/code\u003e as either \u003ccode\u003e'revenue'\u003c/code\u003e or \u003ccode\u003e'non_revenue'\u003c/code\u003e.\u003c/p\u003e\n"]]],["Three data formats—CSV, Xarray Dataset, and others (like Excel)—are detailed. For CSV and other formats, users must map column names to variable types like `time`, `geo`, `controls`, `population`, `kpi`, etc., and map media variables to channel names. They load data using `CsvDataLoader` or `DataFrameDataLoader`, specifying `kpi_type`, paths, and mappings. For Xarray, users load the data using pickle and pass the data to `XrDatasetDataLoader`, providing mappings for coordinates and arrays, then load it. Each loader has a load function to load the data.\n"],null,["# Load geo-level data with organic media and non-media treatments\n\nSimulated data is provided as an example for each data type and format in the\nfollowing sections.\n\nCSV\n---\n\nTo load the [simulated\nCSV](https://github.com/google/meridian/tree/main/meridian/data/simulated_data/csv/geo_all_channels.csv)\ndata using `CsvDataLoader`:\n\n1. Map the column names to the variable types. The required variable types are\n `time`, `geo`, `controls`, `population`, `kpi`, `revenue_per_kpi`, `media`,\n and `media_spend`. For media channels that have no direct cost, you must\n assign their media exposure to `organic_media`. For non-media treatments,\n you must assign the corresponding columns names to `non_media_treatments`.\n For the definition of each variable, see [Collect and organize your\n data](/meridian/docs/user-guide/collect-data).\n\n coord_to_columns = load.CoordToColumns(\n time='time',\n geo='geo',\n controls=['GQV', 'Competitor_Sales'],\n population='population',\n kpi='conversions',\n revenue_per_kpi='revenue_per_conversion',\n media=[\n 'Channel0_impression',\n 'Channel1_impression',\n 'Channel2_impression',\n 'Channel3_impression',\n 'Channel4_impression',\n ],\n media_spend=[\n 'Channel0_spend',\n 'Channel1_spend',\n 'Channel2_spend',\n 'Channel3_spend',\n 'Channel4_spend',\n ],\n organic_media=['Organic_channel0_impression'],\n non_media_treatments=['Promo'],\n )\n\n2. Map the media variables and the media spends to the designated channel names\n that you want to display in the two-page output. In the following example,\n `Channel0_impression` and `Channel0_spend` are connected to the same\n channel, `Channel0`.\n\n correct_media_to_channel = {\n 'Channel0_impression': 'Channel0',\n 'Channel1_impression': 'Channel1',\n 'Channel2_impression': 'Channel2',\n 'Channel3_impression': 'Channel3',\n 'Channel4_impression': 'Channel4',\n }\n correct_media_spend_to_channel = {\n 'Channel0_spend': 'Channel0',\n 'Channel1_spend': 'Channel1',\n 'Channel2_spend': 'Channel2',\n 'Channel3_spend': 'Channel3',\n 'Channel4_spend': 'Channel4',\n }\n\n3. Load the data using `CsvDataLoader`:\n\n loader = load.CsvDataLoader(\n csv_path=f'/{PATH}/{FILENAME}.csv',\n kpi_type='non_revenue',\n coord_to_columns=coord_to_columns,\n media_to_channel=correct_media_to_channel,\n media_spend_to_channel=correct_media_spend_to_channel,\n )\n data = loader.load()\n\n Where:\n - `kpi_type` is either `'revenue'` or `'non_revenue'`.\n - `PATH` is the path to the data file location.\n - `FILENAME` is the name of your data file.\n\nXarray Dataset\n--------------\n\nTo load the [simulated Xarray\nDataset](https://github.com/google/meridian/tree/main/meridian/data/simulated_data/pkl/geo_all_channels.pkl)\nusing `XrDatasetDataLoader`:\n\n1. Load the data using `pickle`:\n\n import pickle\n with open(f'/{PATH}/{FILENAME}.pkl', 'r') as fh:\n XrDataset=pickle.load(fh)\n\n Where:\n - `PATH` is the path to the data file location.\n - `FILENAME` is the name of your data file.\n2. Pass the dataset to `XrDatasetDataLoader`. Use the `name_mapping` argument\n to map the coordinates and arrays. Provide mapping if the names in the input\n dataset are different from the required names. The required coordinate names\n are `geo`, `time`, `control_variable`, `media_channel`,\n `organic_media_channel`, and `non_media_channel`. The required data\n variables names are `kpi`, `revenue_per_kpi`, `controls`, `population`,\n `media`, `media_spend`, `organic_media`, and `non_media_treatments`.\n\n loader = load.XrDatasetDataLoader(\n XrDataset,\n kpi_type='non_revenue',\n name_mapping={'channel': 'media_channel',\n 'control': 'control_variable',\n 'organic_channel': 'organic_media_channel',\n 'non_media_treatment': 'non_media_channel',\n 'conversions': 'kpi',\n 'revenue_per_conversion': 'revenue_per_kpi',\n 'control_value': 'controls',\n 'spend': 'media_spend',\n 'non_media_treatment_value': 'non_media_treatments'},\n )\n\n data = loader.load()\n\n Where:\n - `kpi_type` is either `'revenue'` or `'non_revenue'`.\n\nNumpy ndarray\n-------------\n\nTo load numpy ndarrays directly, use `NDArrayInputDataBuilder`:\n\n1. Create the data into separate numpy ndarrays.\n\n import numpy as np\n\n kpi_nd = np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]])\n controls_nd = np.array([\n [[1, 5], [2, 6], [3, 4]],\n [[7, 8], [9, 10], [11, 12]],\n [[13, 14], [15, 16], [17, 18]],\n ])\n population_nd = np.array([1, 2, 3])\n revenue_per_kpi_nd = np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]])\n media_nd = np.array([\n [[1, 5], [2, 6], [3, 4]],\n [[7, 8], [9, 10], [11, 12]],\n [[13, 14], [15, 16], [17, 18]],\n ])\n media_spend_nd = np.array([\n [[1, 5], [2, 6], [3, 4]],\n [[7, 8], [9, 10], [11, 12]],\n [[13, 14], [15, 16], [17, 18]],\n ])\n organic_media_nd = np.array([\n [[1, 5], [2, 6], [3, 4]],\n [[7, 8], [9, 10], [11, 12]],\n [[13, 14], [15, 16], [17, 18]],\n ])\n non_media_treatments_nd = np.array([\n [[1, 5], [2, 6], [3, 4]],\n [[7, 8], [9, 10], [11, 12]],\n [[13, 14], [15, 16], [17, 18]],\n ])\n\n2. Use a\n [`NDArrayInputDataBuilder`](https://github.com/google/meridian/blob/4624447e0aace5c24d42b58dd1cfd8fe0dc00971/meridian/data/nd_array_input_data_builder.py#L25)\n to set time and geos, as well as give channel or dimension\n names as required in a Meridian input data.\n For the definition of each variable, see\n [Collect and organize your data](/meridian/docs/user-guide/collect-data).\n\n from meridian.data import nd_array_input_data_builder as data_builder\n\n builder = (\n data_builder.NDArrayInputDataBuilder(kpi_type='non_revenue')\n )\n builder.time_coords = ['2024-01-02', '2024-01-03', '2024-01-01']\n builder.media_time_coords = ['2024-01-02', '2024-01-03', '2024-01-01']\n builder.geos = ['B', 'A', 'C']\n builder = (\n builder\n .with_kpi(kpi_nd)\n .with_revenue_per_kpi(revenue_per_kpi_nd)\n .with_population(population_nd)\n .with_controls(\n controls_nd,\n control_names=[\"control0\", \"control1\"])\n .with_media(\n m_nd=media_nd,\n ms_nd=media_spend_nd,\n media_channels=[\"channel0\", \"channel1\"]\n )\n .with_organic_media(\n organic_media_nd,\n organic_media_channels=[\"organic_channel0\", \"organic_channel1\"]\n ).with_non_media_treatments(\n non_media_treatments_nd,\n non_media_channel_names=[\"non_media_channel0\", \"non_media_channel1\"]\n )\n )\n\n data = builder.build()\n\n Where:\n - `kpi_type` is either `'revenue'` or `'non_revenue'`.\n\nPandas DataFrame or other data formats\n--------------------------------------\n\nTo load the [simulated other data\nformat](https://github.com/google/meridian/tree/main/meridian/data/simulated_data/xlsx/geo_all_channels.xlsx)\n(such as `excel`) using `DataFrameInputDataBuilder`:\n\n1. Read the data (such as an `excel` spreadsheet) into one or more Pandas `DataFrame`(s).\n\n import pandas as pd\n\n df = pd.read_excel(\n 'https://github.com/google/meridian/raw/main/meridian/data/simulated_data/xlsx/geo_all_channels.xlsx',\n engine='openpyxl',\n )\n\n2. Use a\n [`DataFrameInputDataBuilder`](https://github.com/google/meridian/blob/4624447e0aace5c24d42b58dd1cfd8fe0dc00971/meridian/data/data_frame_input_data_builder.py#L25)\n to map column names to the variable types required in a Meridian input data.\n For the definition of each variable, see\n [Collect and organize your data](/meridian/docs/user-guide/collect-data).\n\n from meridian.data import data_frame_input_data_builder as data_builder\n\n builder = data_builder.DataFrameInputDataBuilder(\n kpi_type='non_revenue',\n default_kpi_column=\"conversions\",\n default_revenue_per_kpi_column=\"revenue_per_conversion\",\n )\n builder = (\n builder\n .with_kpi(df)\n .with_revenue_per_kpi(df)\n .with_population(df)\n .with_controls(df, control_cols=[\"GQV\", \"Competitor_Sales\"])\n )\n channels = [\"Channel0\", \"Channel1\", \"Channel2\", \"Channel3\", \"Channel4\"]\n builder = builder.with_media(\n df,\n media_cols=[f\"{channel}_impression\" for channel in channels],\n media_spend_cols=[f\"{channel}_spend\" for channel in channels],\n media_channels=channels,\n )\n builder = (\n builder\n .with_organic_media(\n df,\n organic_media_cols = [\"Organic_channel0_impression\"],\n organic_media_channels = [\"Organic_channel0\"],\n )\n .with_non_media_treatments(\n df,\n non_media_treatment_cols=['Promo']\n )\n )\n\n data = builder.build()\n\n Where:\n - `kpi_type` is either `'revenue'` or `'non_revenue'`.\n\nNext, you can [create your model](/meridian/docs/user-guide/modeling-overview)."]]