Únete a la comunidad recientemente lanzada de
Discord para participar en debates en tiempo real, obtener asistencia de otros miembros y comunicarte directamente con el equipo de Meridian.
Cómo cargar datos a nivel geográfico cuando sí hay información de alcance y frecuencia
Organiza tus páginas con colecciones
Guarda y categoriza el contenido según tus preferencias.
Se proporcionan datos simulados como ejemplo para cada tipo y formato de datos en las siguientes secciones:
CSV
Para cargar los datos de CSV simulados utilizando using CsvDataLoader
, haz lo siguiente:
Asocia los nombres de las columnas con los tipos de variables. Los tipos de variables obligatorios son time
, geo
, controls
, population
, kpi
y revenue_per_kpi
. En el caso de los canales de medios que no tienen datos de alcance y frecuencia, debes asignar su exposición y su inversión a las categorías de media
y media_spend
, respectivamente. Por el contrario, para los canales de medios que sí tienen datos de alcance y frecuencia, debes asignar su alcance, su frecuencia y su inversión a las categorías de reach
, frequency
y rf_spend
, respectivamente. Para conocer la definición de cada variable, consulta Cómo recopilar y organizar tus datos.
coord_to_columns = load.CoordToColumns(
time='time',
geo='geo',
controls=['GQV', 'Discount', 'Competitor_Sales'],
population='population',
kpi='conversions',
revenue_per_kpi='revenue_per_conversion',
media=[
'Channel0_impression',
'Channel1_impression',
'Channel2_impression',
'Channel3_impression',
],
media_spend=[
'Channel0_spend',
'Channel1_spend',
'Channel2_spend',
'Channel3_spend',
],
reach =['Channel4_reach', 'Channel5_reach'],
frequency=['Channel4_frequency', 'Channel5_frequency'],
rf_spend=['Channel4_spend', 'Channel5_spend'],
)
Asigna la exposición, el alcance, la frecuencia y la inversión de medios a los nombres de canal designados que deseas mostrar en el informe de dos páginas. En el siguiente ejemplo, Channel0_impression
y Channel0_spend
se relacionan con el mismo canal, Channel0
. Asimismo, Channel4_reach
, Channel4_frequency
y Channel4_spend
se relacionan con el mismo canal, Channel4
.
correct_media_to_channel = {
'Channel0_impression': 'Channel0',
'Channel1_impression': 'Channel1',
'Channel2_impression': 'Channel2',
'Channel3_impression': 'Channel3',
}
correct_media_spend_to_channel = {
'Channel0_spend': 'Channel0',
'Channel1_spend': 'Channel1',
'Channel2_spend': 'Channel2',
'Channel3_spend': 'Channel3',
}
correct_reach_to_channel = {
'Channel4_reach': 'Channel4',
'Channel5_reach': 'Channel5',
}
correct_frequency_to_channel = {
'Channel4_frequency': 'Channel4',
'Channel5_frequency': 'Channel5',
}
correct_rf_spend_to_channel = {
'Channel4_spend': 'Channel4',
'Channel5_spend': 'Channel5',
}
Carga los datos con CsvDataLoader
:
loader = load.CsvDataLoader(
csv_path=f'/{PATH}/{FILENAME}.csv',
kpi_type='non_revenue',
coord_to_columns=coord_to_columns,
media_to_channel=correct_media_to_channel,
media_spend_to_channel=correct_media_spend_to_channel,
reach_to_channel=correct_reach_to_channel,
frequency_to_channel=correct_frequency_to_channel,
rf_spend_to_channel=correct_rf_spend_to_channel,
)
data = loader.load()
Donde:
kpi_type
es 'revenue'
o 'non_revenue'
.
PATH
es la ruta de acceso a la ubicación del archivo de datos.
FILENAME
es el nombre de tu archivo de datos.
Conjunto de datos Xarray
Para cargar el conjunto de datos Xarray simulado y serializado con XrDatasetDataLoader
, haz lo siguiente:
Carga los datos con pickle
:
import pickle
with open(f'/{PATH}/{FILENAME}.pkl', 'r') as fh:
dataset=pickle.load(fh)
Donde:
PATH
es la ruta de acceso a la ubicación del archivo de datos.
FILENAME
es el nombre de tu archivo de datos.
Pasa el conjunto de datos a XrDatasetDataLoader
. Usa el argumento name_mapping
para asociar las coordenadas y los arrays. Procede a la asociación si los nombres incluidos en el conjunto de datos de entrada no coinciden con los nombres requeridos. Los nombres de coordenada obligatorios son geo
, time
, control_variable
, media_channel
y rf_channel
, donde rf_channel
designa los canales que tienen datos de alcance y frecuencia. Los nombres obligatorios de variables de datos son kpi
, revenue_per_kpi
, controls
, population
, media
, media_spend
, reach
, frequency
y rf_spend
.
loader = load.XrDatasetDataLoader(
dataset,
kpi_type='non_revenue',
name_mapping={
'channel': 'media_channel',
'control': 'control_variable',
'conversions': 'kpi',
'revenue_per_conversion': 'revenue_per_kpi',
'control_value': 'controls',
'spend': 'media_spend',
'reach': 'reach',
'frequency': 'frequency',
'rf_spend': 'rf_spend',
},
)
data = loader.load()
Donde:
kpi_type
es 'revenue'
o 'non_revenue'
.
Ndarray de NumPy
Para cargar ndarrays de numPy directamente, usa NDArrayInputDataBuilder
:
Crea los datos en ndarrays de numPy separados.
import numpy as np
kpi_nd = np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]])
controls_nd = np.array([
[[1, 5], [2, 6], [3, 4]],
[[7, 8], [9, 10], [11, 12]],
[[13, 14], [15, 16], [17, 18]],
])
population_nd = np.array([1, 2, 3])
revenue_per_kpi_nd = np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]])
reach_nd = np.array([
[[1, 5], [2, 6], [3, 4]],
[[7, 8], [9, 10], [11, 12]],
[[13, 14], [15, 16], [17, 18]],
])
frequency_nd = np.array([
[[1, 5], [2, 6], [3, 4]],
[[7, 8], [9, 10], [11, 12]],
[[13, 14], [15, 16], [17, 18]],
])
rf_spend_nd = np.array([
[[1, 5], [2, 6], [3, 4]],
[[7, 8], [9, 10], [11, 12]],
[[13, 14], [15, 16], [17, 18]],
])
Usa un NDArrayInputDataBuilder
para establecer la hora y las ubicaciones geográficas, así como para asignar nombres de canales o dimensiones según sea necesario en los datos de entrada de Meridian.
Para conocer la definición de cada variable, consulta Cómo recopilar y organizar tus datos.
from meridian.data import nd_array_input_data_builder as data_builder
builder = (
data_builder.NDArrayInputDataBuilder(kpi_type='non_revenue')
)
builder.time_coords = ['2024-01-02', '2024-01-03', '2024-01-01']
builder.media_time_coords = ['2024-01-02', '2024-01-03', '2024-01-01']
builder.geos = ['B', 'A', 'C']
builder = (
builder
.with_kpi(kpi_nd)
.with_revenue_per_kpi(revenue_per_kpi_nd)
.with_population(population_nd)
.with_controls(
controls_nd,
control_names=["control0", "control1"])
.with_reach(
r_nd=reach_nd,
f_nd=frequency_nd,
rfs_nd=rf_spend_nd,
rf_channels=["channel0", "channel1"]
)
)
data = builder.build()
Donde:
kpi_type
es 'revenue'
o 'non_revenue'
.
Para cargar otro formato de datos simulados (como excel
) con DataFrameInputDataBuilder
, haz lo siguiente:
Lee los datos (como una hoja de cálculo de excel
) en uno o más DataFrame
de Pandas.
import pandas as pd
df = pd.read_excel(
'https://github.com/google/meridian/raw/main/meridian/data/simulated_data/xlsx/geo_media_rf.xlsx',
engine='openpyxl',
)
Usa DataFrameInputDataBuilder
para asociar los nombres de las columnas con los tipos de variables requeridos en los datos de entrada de Meridian.
Para conocer la definición de cada variable, consulta Cómo recopilar y organizar tus datos.
from meridian.data import data_frame_input_data_builder as data_builder
builder = data_builder.DataFrameInputDataBuilder(
kpi_type='non_revenue',
default_kpi_column="conversions",
default_revenue_per_kpi_column="revenue_per_conversion",
)
builder = (
builder
.with_kpi(df)
.with_revenue_per_kpi(df)
.with_population(df)
.with_controls(df, control_cols=["GQV", "Discount", "Competitor_Sales"])
.with_reach(
df,
reach_cols = ['Channel4_reach', 'Channel5_reach'],
frequency_cols = ['Channel4_frequency', 'Channel5_frequency'],
rf_spend_cols = ['Channel4_spend', 'Channel5_spend'],
rf_channels = ['Channel4', 'Channel5'],
)
)
data = builder.build()
Donde:
kpi_type
es 'revenue'
o 'non_revenue'
.
A continuación, puedes crear tu modelo.
Salvo que se indique lo contrario, el contenido de esta página está sujeto a la licencia Atribución 4.0 de Creative Commons, y los ejemplos de código están sujetos a la licencia Apache 2.0. Para obtener más información, consulta las políticas del sitio de Google Developers. Java es una marca registrada de Oracle o sus afiliados.
Última actualización: 2025-08-04 (UTC)
[null,null,["Última actualización: 2025-08-04 (UTC)"],[[["\u003cp\u003eSimulated data examples are provided for CSV, Xarray Dataset, and other data formats like Excel, with each format having its own loading method.\u003c/p\u003e\n"],["\u003cp\u003eLoading CSV data requires mapping column names to variable types such as \u003ccode\u003etime\u003c/code\u003e, \u003ccode\u003egeo\u003c/code\u003e, \u003ccode\u003econtrols\u003c/code\u003e, \u003ccode\u003epopulation\u003c/code\u003e, \u003ccode\u003ekpi\u003c/code\u003e, and \u003ccode\u003erevenue_per_kpi\u003c/code\u003e, and also differentiating between media channels with and without reach and frequency data.\u003c/p\u003e\n"],["\u003cp\u003eLoading Xarray Dataset involves using \u003ccode\u003epickle\u003c/code\u003e to read the data, then mapping coordinate and array names within the dataset to required names like \u003ccode\u003egeo\u003c/code\u003e, \u003ccode\u003etime\u003c/code\u003e, \u003ccode\u003econtrol_variable\u003c/code\u003e, \u003ccode\u003emedia_channel\u003c/code\u003e, and \u003ccode\u003erf_channel\u003c/code\u003e.\u003c/p\u003e\n"],["\u003cp\u003eLoading data from other formats, like Excel, requires mapping column names to variable types, just like CSV, and then loading the data into a \u003ccode\u003eDataFrameDataLoader\u003c/code\u003e after reading it into a dataframe.\u003c/p\u003e\n"],["\u003cp\u003eIn all three data formats, media exposure, reach, frequency, and media spend must be mapped to their corresponding channel names for output display.\u003c/p\u003e\n"]]],["The document details how to load simulated data in CSV, Xarray Dataset, and other formats using specific data loaders. For CSV and other formats, users must map column names to variable types like `time`, `geo`, and `kpi`, differentiating between media channels with and without reach/frequency data. They also need to map media, spend, reach, and frequency to channel names. For Xarray Dataset, users load data via pickle and map dataset coordinates and variables using `name_mapping`. Then load the data.\n"],null,["# Load geo-level data with reach and frequency\n\nSimulated data is provided as an example for each data type and format in the\nfollowing sections.\n\nCSV\n---\n\nTo load the\n[simulated CSV data](https://github.com/google/meridian/tree/main/meridian/data/simulated_data/csv/geo_media_rf.csv)\nusing `CsvDataLoader`:\n\n1. Map the column names to the variable types. The required variable types are\n `time`, `geo`, `controls`, `population`, `kpi`, and `revenue_per_kpi`. For\n media channels that don't have reach and frequency data, you must assign\n their media exposure and media spend to the categories of `media` and\n `media_spend`, respectively. Conversely, for media channels that do possess\n reach and frequency data, you must map their reach, frequency, and media\n spend to the categories of `reach`, `frequency`, and `rf_spend`\n correspondingly. For the definition of each variable, see [Collect and\n organize your data](/meridian/docs/user-guide/collect-data).\n\n coord_to_columns = load.CoordToColumns(\n time='time',\n geo='geo',\n controls=['GQV', 'Discount', 'Competitor_Sales'],\n population='population',\n kpi='conversions',\n revenue_per_kpi='revenue_per_conversion',\n media=[\n 'Channel0_impression',\n 'Channel1_impression',\n 'Channel2_impression',\n 'Channel3_impression',\n ],\n media_spend=[\n 'Channel0_spend',\n 'Channel1_spend',\n 'Channel2_spend',\n 'Channel3_spend',\n ],\n reach =['Channel4_reach', 'Channel5_reach'],\n frequency=['Channel4_frequency', 'Channel5_frequency'],\n rf_spend=['Channel4_spend', 'Channel5_spend'],\n )\n\n2. Map the media exposure, reach, frequency, and the media spends to the\n designated channel names that you want to display in the two-page output. In\n the following example, `Channel0_impression` and `Channel0_spend` are\n connected to the same channel, `Channel0`. Additionally, `Channel4_reach`,\n `Channel4_frequency`, and `Channel4_spend` are connected to the same\n channel, `Channel4`.\n\n correct_media_to_channel = {\n 'Channel0_impression': 'Channel0',\n 'Channel1_impression': 'Channel1',\n 'Channel2_impression': 'Channel2',\n 'Channel3_impression': 'Channel3',\n }\n correct_media_spend_to_channel = {\n 'Channel0_spend': 'Channel0',\n 'Channel1_spend': 'Channel1',\n 'Channel2_spend': 'Channel2',\n 'Channel3_spend': 'Channel3',\n }\n\n correct_reach_to_channel = {\n 'Channel4_reach': 'Channel4',\n 'Channel5_reach': 'Channel5',\n }\n correct_frequency_to_channel = {\n 'Channel4_frequency': 'Channel4',\n 'Channel5_frequency': 'Channel5',\n }\n correct_rf_spend_to_channel = {\n 'Channel4_spend': 'Channel4',\n 'Channel5_spend': 'Channel5',\n }\n\n3. Load the data using `CsvDataLoader`:\n\n loader = load.CsvDataLoader(\n csv_path=f'/{PATH}/{FILENAME}.csv',\n kpi_type='non_revenue',\n coord_to_columns=coord_to_columns,\n media_to_channel=correct_media_to_channel,\n media_spend_to_channel=correct_media_spend_to_channel,\n reach_to_channel=correct_reach_to_channel,\n frequency_to_channel=correct_frequency_to_channel,\n rf_spend_to_channel=correct_rf_spend_to_channel,\n )\n data = loader.load()\n\n Where:\n - `kpi_type` is either `'revenue'` or `'non_revenue'`.\n - `PATH` is the path to the data file location.\n - `FILENAME` is the name of your data file.\n\nXarray Dataset\n--------------\n\nTo load the pickled\n[simulated Xarray Dataset](https://github.com/google/meridian/tree/main/meridian/data/simulated_data/pkl/geo_media_rf.pkl)\nusing `XrDatasetDataLoader`:\n\n1. Load the data using `pickle`:\n\n import pickle\n with open(f'/{PATH}/{FILENAME}.pkl', 'r') as fh:\n dataset=pickle.load(fh)\n\n Where:\n - `PATH` is the path to the data file location.\n - `FILENAME` is the name of your data file.\n2. Pass the dataset to `XrDatasetDataLoader`. Use the `name_mapping` argument\n to map the coordinates and arrays. Provide mapping if the names in the input\n dataset are different from the required names. The required coordinate\n names are `geo`, `time`, `control_variable`, `media_channel`, and\n `rf_channel`, where `rf_channel` designates the channels having reach and\n frequency data. The required data variables names are `kpi`,\n `revenue_per_kpi`, `controls`, `population`, `media`, `media_spend`,\n `reach`, `frequency`, and `rf_spend`.\n\n loader = load.XrDatasetDataLoader(\n dataset,\n kpi_type='non_revenue',\n name_mapping={\n 'channel': 'media_channel',\n 'control': 'control_variable',\n 'conversions': 'kpi',\n 'revenue_per_conversion': 'revenue_per_kpi',\n 'control_value': 'controls',\n 'spend': 'media_spend',\n 'reach': 'reach',\n 'frequency': 'frequency',\n 'rf_spend': 'rf_spend',\n },\n )\n\n data = loader.load()\n\n Where:\n - `kpi_type` is either `'revenue'` or `'non_revenue'`.\n\nNumpy ndarray\n-------------\n\nTo load numpy ndarrays directly, use `NDArrayInputDataBuilder`:\n\n1. Create the data into separate numpy ndarrays.\n\n import numpy as np\n\n kpi_nd = np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]])\n controls_nd = np.array([\n [[1, 5], [2, 6], [3, 4]],\n [[7, 8], [9, 10], [11, 12]],\n [[13, 14], [15, 16], [17, 18]],\n ])\n population_nd = np.array([1, 2, 3])\n revenue_per_kpi_nd = np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]])\n reach_nd = np.array([\n [[1, 5], [2, 6], [3, 4]],\n [[7, 8], [9, 10], [11, 12]],\n [[13, 14], [15, 16], [17, 18]],\n ])\n frequency_nd = np.array([\n [[1, 5], [2, 6], [3, 4]],\n [[7, 8], [9, 10], [11, 12]],\n [[13, 14], [15, 16], [17, 18]],\n ])\n rf_spend_nd = np.array([\n [[1, 5], [2, 6], [3, 4]],\n [[7, 8], [9, 10], [11, 12]],\n [[13, 14], [15, 16], [17, 18]],\n ])\n\n2. Use a\n [`NDArrayInputDataBuilder`](https://github.com/google/meridian/blob/4624447e0aace5c24d42b58dd1cfd8fe0dc00971/meridian/data/nd_array_input_data_builder.py#L25)\n to set time and geos, as well as give channel or dimension\n names as required in a Meridian input data.\n For the definition of each variable, see\n [Collect and organize your data](/meridian/docs/user-guide/collect-data).\n\n from meridian.data import nd_array_input_data_builder as data_builder\n\n builder = (\n data_builder.NDArrayInputDataBuilder(kpi_type='non_revenue')\n )\n builder.time_coords = ['2024-01-02', '2024-01-03', '2024-01-01']\n builder.media_time_coords = ['2024-01-02', '2024-01-03', '2024-01-01']\n builder.geos = ['B', 'A', 'C']\n builder = (\n builder\n .with_kpi(kpi_nd)\n .with_revenue_per_kpi(revenue_per_kpi_nd)\n .with_population(population_nd)\n .with_controls(\n controls_nd,\n control_names=[\"control0\", \"control1\"])\n .with_reach(\n r_nd=reach_nd,\n f_nd=frequency_nd,\n rfs_nd=rf_spend_nd,\n rf_channels=[\"channel0\", \"channel1\"]\n )\n )\n\n data = builder.build()\n\n Where:\n - `kpi_type` is either `'revenue'` or `'non_revenue'`.\n\nPandas DataFrame or other data formats\n--------------------------------------\n\nTo load the [simulated other data\nformat](https://github.com/google/meridian/tree/main/meridian/data/simulated_data/xlsx/geo_media_rf.xlsx)\n(such as `excel`) using `DataFrameInputDataBuilder`:\n\n1. Read the data (such as an `excel` spreadsheet) into one or more Pandas `DataFrame`(s).\n\n import pandas as pd\n\n df = pd.read_excel(\n 'https://github.com/google/meridian/raw/main/meridian/data/simulated_data/xlsx/geo_media_rf.xlsx',\n engine='openpyxl',\n )\n\n2. Use a\n [`DataFrameInputDataBuilder`](https://github.com/google/meridian/blob/4624447e0aace5c24d42b58dd1cfd8fe0dc00971/meridian/data/data_frame_input_data_builder.py#L25)\n to map column names to the variable types required in a Meridian input data.\n For the definition of each variable, see\n [Collect and organize your data](/meridian/docs/user-guide/collect-data).\n\n from meridian.data import data_frame_input_data_builder as data_builder\n\n builder = data_builder.DataFrameInputDataBuilder(\n kpi_type='non_revenue',\n default_kpi_column=\"conversions\",\n default_revenue_per_kpi_column=\"revenue_per_conversion\",\n )\n builder = (\n builder\n .with_kpi(df)\n .with_revenue_per_kpi(df)\n .with_population(df)\n .with_controls(df, control_cols=[\"GQV\", \"Discount\", \"Competitor_Sales\"])\n .with_reach(\n df,\n reach_cols = ['Channel4_reach', 'Channel5_reach'],\n frequency_cols = ['Channel4_frequency', 'Channel5_frequency'],\n rf_spend_cols = ['Channel4_spend', 'Channel5_spend'],\n rf_channels = ['Channel4', 'Channel5'],\n )\n )\n\n data = builder.build()\n\n Where:\n - `kpi_type` is either `'revenue'` or `'non_revenue'`.\n\nNext, you can [create your model](/meridian/docs/user-guide/modeling-overview)."]]