Únete a la comunidad recientemente lanzada de
Discord para participar en debates en tiempo real, obtener asistencia de otros miembros y comunicarte directamente con el equipo de Meridian.
Cómo cargar datos a nivel nacional
Organiza tus páginas con colecciones
Guarda y categoriza el contenido según tus preferencias.
Se proporcionan datos simulados como ejemplo para cada tipo y formato de datos en las siguientes secciones:
CSV
Para cargar los datos de CSV simulados con CsvDataLoader
, haz lo siguiente:
Asocia los nombres de las columnas con los tipos de variables. Los tipos de variables requeridos son time
, controls
, kpi
, revenue_per_kpi
, media
y media_spend
. Para obtener la definición de cada variable, consulta Cómo recopilar y organizar tus datos.
coord_to_columns = load.CoordToColumns(
time='time',
controls=['GQV', 'Discount', 'Competitor_Sales'],
kpi='conversions',
revenue_per_kpi='revenue_per_conversion',
media=[
'Channel0_impression',
'Channel1_impression',
'Channel2_impression',
'Channel3_impression',
'Channel4_impression',
'Channel5_impression',
],
media_spend=[
'Channel0_spend',
'Channel1_spend',
'Channel2_spend',
'Channel3_spend',
'Channel4_spend',
'Channel5_spend',
],
)
Asocia las variables de medios y las inversiones en medios con los nombres de los canales designados que deseas mostrar en el resultado de dos páginas. En el siguiente ejemplo, Channel0_impression
y Channel0_spend
están conectados al mismo canal, Channel0
.
correct_media_to_channel = {
'Channel0_impression': 'Channel0',
'Channel1_impression': 'Channel1',
'Channel2_impression': 'Channel2',
'Channel3_impression': 'Channel3',
'Channel4_impression': 'Channel4',
'Channel5_impression': 'Channel5',
}
correct_media_spend_to_channel = {
'Channel0_spend': 'Channel0',
'Channel1_spend': 'Channel1',
'Channel2_spend': 'Channel2',
'Channel3_spend': 'Channel3',
'Channel4_spend': 'Channel4',
'Channel5_spend': 'Channel5',
}
Carga los datos con CsvDataLoader
:
loader = load.CsvDataLoader(
csv_path=f'/{PATH}/{FILENAME}.csv',
kpi_type='non_revenue',
coord_to_columns=coord_to_columns,
media_to_channel=correct_media_to_channel,
media_spend_to_channel=correct_media_spend_to_channel,
)
data = loader.load()
Donde:
kpi_type
es 'revenue'
o 'non_revenue'
.
PATH
es la ruta de acceso a la ubicación del archivo de datos.
FILENAME
es el nombre de tu archivo de datos.
Conjunto de datos Xarray
Para cargar el conjunto de datos Xarray simulado con XrDatasetDataLoader
, haz lo siguiente:
Carga los datos con pickle
:
import pickle
with open(f'/{PATH}/{FILENAME}.pkl', 'r') as fh:
XrDataset=pickle.load(fh)
Donde:
PATH
es la ruta de acceso a la ubicación del archivo de datos.
FILENAME
es el nombre de tu archivo de datos.
Pasa el conjunto de datos a XrDatasetDataLoader
. Usa el argumento name_mapping
para asociar las coordenadas y los arrays. Procede a la asociación si los nombres incluidos en el conjunto de datos de entrada no coinciden con los nombres requeridos. Los nombres de coordenadas requeridos son time
, control_variable
y media_channel
. Los nombres de las variables de datos requeridos son kpi
, revenue_per_kpi
, controls
, media
y media_spend
.
loader = load.XrDatasetDataLoader(
XrDataset,
kpi_type='non_revenue',
name_mapping={'channel': 'media_channel',
'control': 'control_variable',
'conversions': 'kpi',
'revenue_per_conversion': 'revenue_per_kpi',
'control_value': 'controls',
'spend': 'media_spend'},
)
data = loader.load()
Donde:
kpi_type
es 'revenue'
o 'non_revenue'
.
Ndarray de NumPy
Para cargar ndarrays de numPy directamente, usa NDArrayInputDataBuilder
:
Crea los datos en ndarrays de numPy separados.
import numpy as np
kpi_nd = np.array([[1, 2, 3]])
controls_nd = np.array([[[1, 2], [3, 4], [5, 6]]])
revenue_per_kpi_nd = np.array([[1, 2, 3]])
media_nd = np.array([[[1, 2], [3, 4], [5, 6]]])
media_spend_nd = np.array([[[1, 2], [3, 4], [5, 6]]])
Usa un NDArrayInputDataBuilder
para establecer horarios y asignar nombres de canales o dimensiones según sea necesario en los datos de entrada de Meridian.
Para conocer la definición de cada variable, consulta Cómo recopilar y organizar tus datos.
from meridian.data import nd_array_input_data_builder as data_builder
builder = (
data_builder.NDArrayInputDataBuilder(kpi_type='non_revenue')
)
builder.time_coords = ['2024-01-02', '2024-01-03', '2024-01-01']
builder.media_time_coords = ['2024-01-02', '2024-01-03', '2024-01-01']
builder = (
builder
.with_kpi(kpi_nd)
.with_revenue_per_kpi(revenue_per_kpi_nd)
.with_controls(
controls_nd,
control_names=["control0", "control1"])
.with_media(
m_nd=media_nd,
ms_nd=media_spend_nd,
media_channels=["channel0", "channel1"]
)
)
data = builder.build()
Donde:
kpi_type
es 'revenue'
o 'non_revenue'
.
Para cargar otro formato de datos simulados (como excel
) con DataFrameInputDataBuilder
, haz lo siguiente:
Lee los datos (como una hoja de cálculo de excel
) en uno o más DataFrame
de Pandas.
import pandas as pd
df = pd.read_excel(
'https://github.com/google/meridian/raw/main/meridian/data/simulated_data/xlsx/national_media.xlsx',
engine='openpyxl',
)
Usa DataFrameInputDataBuilder
para asociar los nombres de las columnas con los tipos de variables requeridos en los datos de entrada de Meridian.
Para conocer la definición de cada variable, consulta Cómo recopilar y organizar tus datos.
from meridian.data import data_frame_input_data_builder as data_builder
builder = data_builder.DataFrameInputDataBuilder(
kpi_type='non_revenue',
default_kpi_column="conversions",
default_revenue_per_kpi_column="revenue_per_conversion",
)
builder = (
builder
.with_kpi(df)
.with_revenue_per_kpi(df)
.with_controls(df, control_cols=["GQV", "Discount", "Competitor_Sales"])
)
channels = ["Channel0", "Channel1", "Channel2", "Channel3", "Channel4", "Channel5"]
builder = builder.with_media(
df,
media_cols=[f"{channel}_impression" for channel in channels],
media_spend_cols=[f"{channel}_spend" for channel in channels],
media_channels=channels,
)
data = builder.build()
Donde:
kpi_type
es 'revenue'
o 'non_revenue'
.
A continuación, puedes crear tu modelo.
Salvo que se indique lo contrario, el contenido de esta página está sujeto a la licencia Atribución 4.0 de Creative Commons, y los ejemplos de código están sujetos a la licencia Apache 2.0. Para obtener más información, consulta las políticas del sitio de Google Developers. Java es una marca registrada de Oracle o sus afiliados.
Última actualización: 2025-08-04 (UTC)
[null,null,["Última actualización: 2025-08-04 (UTC)"],[[["\u003cp\u003eSimulated data examples are provided for CSV, Xarray Dataset, and other formats like Excel.\u003c/p\u003e\n"],["\u003cp\u003eLoading CSV data requires mapping column names to variable types such as \u003ccode\u003etime\u003c/code\u003e, \u003ccode\u003econtrols\u003c/code\u003e, \u003ccode\u003ekpi\u003c/code\u003e, \u003ccode\u003erevenue_per_kpi\u003c/code\u003e, \u003ccode\u003emedia\u003c/code\u003e, and \u003ccode\u003emedia_spend\u003c/code\u003e, and media variables and spends need mapping to channel names, using \u003ccode\u003eCsvDataLoader\u003c/code\u003e.\u003c/p\u003e\n"],["\u003cp\u003eXarray Dataset loading involves using \u003ccode\u003eXrDatasetDataLoader\u003c/code\u003e, passing the dataset after loading with \u003ccode\u003epickle\u003c/code\u003e, and mapping coordinates and arrays with the \u003ccode\u003ename_mapping\u003c/code\u003e argument.\u003c/p\u003e\n"],["\u003cp\u003eLoading data from other formats like Excel utilizes \u003ccode\u003eDataFrameDataLoader\u003c/code\u003e after mapping column names to variable types and mapping media to channels, and reading the data.\u003c/p\u003e\n"],["\u003cp\u003eAll three loaders use a \u003ccode\u003ekpi_type\u003c/code\u003e parameter which can either be \u003ccode\u003e'revenue'\u003c/code\u003e or \u003ccode\u003e'non_revenue'\u003c/code\u003e.\u003c/p\u003e\n"]]],[],null,["# Load national-level data\n\nSimulated data is provided as an example for each data type and format in the\nfollowing sections.\n\nCSV\n---\n\nTo load the [simulated CSV\ndata](https://github.com/google/meridian/tree/main/meridian/data/simulated_data/csv/national_media.csv)\nusing `CsvDataLoader`:\n\n1. Map the column names to the variable types. The required variable types are\n `time`, `controls`, `kpi`, `revenue_per_kpi`, `media` and `media_spend`. For\n the definition of each variable, see [Collect and organize your data](/meridian/docs/user-guide/collect-data).\n\n coord_to_columns = load.CoordToColumns(\n time='time',\n controls=['GQV', 'Discount', 'Competitor_Sales'],\n kpi='conversions',\n revenue_per_kpi='revenue_per_conversion',\n media=[\n 'Channel0_impression',\n 'Channel1_impression',\n 'Channel2_impression',\n 'Channel3_impression',\n 'Channel4_impression',\n 'Channel5_impression',\n ],\n media_spend=[\n 'Channel0_spend',\n 'Channel1_spend',\n 'Channel2_spend',\n 'Channel3_spend',\n 'Channel4_spend',\n 'Channel5_spend',\n ],\n )\n\n2. Map the media variables and the media spends to the designated channel names\n that you want to display in the two-page output. In the following example,\n `Channel0_impression` and `Channel0_spend` are connected to the same\n channel, `Channel0`.\n\n correct_media_to_channel = {\n 'Channel0_impression': 'Channel0',\n 'Channel1_impression': 'Channel1',\n 'Channel2_impression': 'Channel2',\n 'Channel3_impression': 'Channel3',\n 'Channel4_impression': 'Channel4',\n 'Channel5_impression': 'Channel5',\n }\n correct_media_spend_to_channel = {\n 'Channel0_spend': 'Channel0',\n 'Channel1_spend': 'Channel1',\n 'Channel2_spend': 'Channel2',\n 'Channel3_spend': 'Channel3',\n 'Channel4_spend': 'Channel4',\n 'Channel5_spend': 'Channel5',\n }\n\n3. Load the data using `CsvDataLoader`:\n\n loader = load.CsvDataLoader(\n csv_path=f'/{PATH}/{FILENAME}.csv',\n kpi_type='non_revenue',\n coord_to_columns=coord_to_columns,\n media_to_channel=correct_media_to_channel,\n media_spend_to_channel=correct_media_spend_to_channel,\n )\n data = loader.load()\n\n Where:\n - `kpi_type` is either `'revenue'` or `'non_revenue'`.\n - `PATH` is the path to the data file location.\n - `FILENAME` is the name of your data file.\n\nXarray Dataset\n--------------\n\nTo load the [simulated Xarray\nDataset](https://github.com/google/meridian/tree/main/meridian/data/simulated_data/pkl/national_media.pkl)\nusing `XrDatasetDataLoader`:\n\n1. Load the data using `pickle`:\n\n import pickle\n with open(f'/{PATH}/{FILENAME}.pkl', 'r') as fh:\n XrDataset=pickle.load(fh)\n\n Where:\n - `PATH` is the path to the data file location.\n - `FILENAME` is the name of your data file.\n2. Pass the dataset to `XrDatasetDataLoader`. Use the `name_mapping` argument\n to map the coordinates and arrays. Provide mapping if the names in the input\n dataset are different from the required names. The required coordinate\n names are `time`, `control_variable`, and `media_channel`. The required\n data variables names are\n `kpi`, `revenue_per_kpi`, `controls`, `media`, and `media_spend`.\n\n loader = load.XrDatasetDataLoader(\n XrDataset,\n kpi_type='non_revenue',\n name_mapping={'channel': 'media_channel',\n 'control': 'control_variable',\n 'conversions': 'kpi',\n 'revenue_per_conversion': 'revenue_per_kpi',\n 'control_value': 'controls',\n 'spend': 'media_spend'},\n )\n\n data = loader.load()\n\n Where:\n - `kpi_type` is either `'revenue'` or `'non_revenue'`.\n\nNumpy ndarray\n-------------\n\nTo load numpy ndarrays directly, use `NDArrayInputDataBuilder`:\n\n1. Create the data into separate numpy ndarrays.\n\n import numpy as np\n\n kpi_nd = np.array([[1, 2, 3]])\n controls_nd = np.array([[[1, 2], [3, 4], [5, 6]]])\n revenue_per_kpi_nd = np.array([[1, 2, 3]])\n media_nd = np.array([[[1, 2], [3, 4], [5, 6]]])\n media_spend_nd = np.array([[[1, 2], [3, 4], [5, 6]]])\n\n2. Use a\n [`NDArrayInputDataBuilder`](https://github.com/google/meridian/blob/4624447e0aace5c24d42b58dd1cfd8fe0dc00971/meridian/data/nd_array_input_data_builder.py#L25)\n to set times, as well as give channel or dimension\n names as required in a Meridian input data.\n For the definition of each variable, see\n [Collect and organize your data](/meridian/docs/user-guide/collect-data).\n\n from meridian.data import nd_array_input_data_builder as data_builder\n\n builder = (\n data_builder.NDArrayInputDataBuilder(kpi_type='non_revenue')\n )\n builder.time_coords = ['2024-01-02', '2024-01-03', '2024-01-01']\n builder.media_time_coords = ['2024-01-02', '2024-01-03', '2024-01-01']\n builder = (\n builder\n .with_kpi(kpi_nd)\n .with_revenue_per_kpi(revenue_per_kpi_nd)\n .with_controls(\n controls_nd,\n control_names=[\"control0\", \"control1\"])\n .with_media(\n m_nd=media_nd,\n ms_nd=media_spend_nd,\n media_channels=[\"channel0\", \"channel1\"]\n )\n )\n\n data = builder.build()\n\n Where:\n - `kpi_type` is either `'revenue'` or `'non_revenue'`.\n\nPandas DataFrames or other data formats\n---------------------------------------\n\nTo load the [simulated other data\nformat](https://github.com/google/meridian/tree/main/meridian/data/simulated_data/xlsx/national_media.xlsx)\n(such as `excel`) using `DataFrameInputDataBuilder`:\n\n1. Read the data (such as an `excel` spreadsheet) into one or more Pandas `DataFrame`(s).\n\n import pandas as pd\n\n df = pd.read_excel(\n 'https://github.com/google/meridian/raw/main/meridian/data/simulated_data/xlsx/national_media.xlsx',\n engine='openpyxl',\n )\n\n2. Use a\n [`DataFrameInputDataBuilder`](https://github.com/google/meridian/blob/4624447e0aace5c24d42b58dd1cfd8fe0dc00971/meridian/data/data_frame_input_data_builder.py#L25)\n to map column names to the variable types required in a Meridian input data.\n For the definition of each variable, see\n [Collect and organize your data](/meridian/docs/user-guide/collect-data).\n\n from meridian.data import data_frame_input_data_builder as data_builder\n\n builder = data_builder.DataFrameInputDataBuilder(\n kpi_type='non_revenue',\n default_kpi_column=\"conversions\",\n default_revenue_per_kpi_column=\"revenue_per_conversion\",\n )\n builder = (\n builder\n .with_kpi(df)\n .with_revenue_per_kpi(df)\n .with_controls(df, control_cols=[\"GQV\", \"Discount\", \"Competitor_Sales\"])\n )\n channels = [\"Channel0\", \"Channel1\", \"Channel2\", \"Channel3\", \"Channel4\", \"Channel5\"]\n builder = builder.with_media(\n df,\n media_cols=[f\"{channel}_impression\" for channel in channels],\n media_spend_cols=[f\"{channel}_spend\" for channel in channels],\n media_channels=channels,\n )\n\n data = builder.build()\n\n Where:\n - `kpi_type` is either `'revenue'` or `'non_revenue'`.\n\nNext, you can [create your model](/meridian/docs/user-guide/modeling-overview)."]]