Participe da nova comunidade do
Discord para discussões em tempo real, suporte de colegas e interação direta com a equipe do Meridian.
Carregar dados geográficos com alcance e frequência
Mantenha tudo organizado com as coleções
Salve e categorize o conteúdo com base nas suas preferências.
Os dados simulados são fornecidos como exemplos de cada tipo e formato de dados nas seções a seguir.
CSV
Para carregar os dados CSV simulados usando CsvDataLoader
:
Mapeie os nomes das colunas para os tipos de variáveis. Os tipos obrigatórios de variáveis são time
, geo
, controls
, population
, kpi
e revenue_per_kpi
. Para canais de mídia que não têm dados de alcance e frequência, atribua a exposição e o gasto de mídia às categorias media
e media_spend
, respectivamente. Por outro lado, para canais de mídia que têm esses dados, é necessário mapear o alcance, a frequência e o gasto de mídia para as categorias reach
, frequency
e rf_spend
. Saiba a definição de cada variável em Coletar e organizar os dados.
coord_to_columns = load.CoordToColumns(
time='time',
geo='geo',
controls=['GQV', 'Discount', 'Competitor_Sales'],
population='population',
kpi='conversions',
revenue_per_kpi='revenue_per_conversion',
media=[
'Channel0_impression',
'Channel1_impression',
'Channel2_impression',
'Channel3_impression',
],
media_spend=[
'Channel0_spend',
'Channel1_spend',
'Channel2_spend',
'Channel3_spend',
],
reach =['Channel4_reach', 'Channel5_reach'],
frequency=['Channel4_frequency', 'Channel5_frequency'],
rf_spend=['Channel4_spend', 'Channel5_spend'],
)
Mapeie a exposição, o alcance, a frequência e os gastos de mídia para os nomes dos canais que você quer mostrar no relatório de duas páginas. No exemplo a seguir, Channel0_impression
e Channel0_spend
estão conectados ao mesmo canal, Channel0
. Além disso, Channel4_reach
, Channel4_frequency
e Channel4_spend
estão conectados ao mesmo canal, Channel4
.
correct_media_to_channel = {
'Channel0_impression': 'Channel0',
'Channel1_impression': 'Channel1',
'Channel2_impression': 'Channel2',
'Channel3_impression': 'Channel3',
}
correct_media_spend_to_channel = {
'Channel0_spend': 'Channel0',
'Channel1_spend': 'Channel1',
'Channel2_spend': 'Channel2',
'Channel3_spend': 'Channel3',
}
correct_reach_to_channel = {
'Channel4_reach': 'Channel4',
'Channel5_reach': 'Channel5',
}
correct_frequency_to_channel = {
'Channel4_frequency': 'Channel4',
'Channel5_frequency': 'Channel5',
}
correct_rf_spend_to_channel = {
'Channel4_spend': 'Channel4',
'Channel5_spend': 'Channel5',
}
Use CsvDataLoader
para carregar os dados:
loader = load.CsvDataLoader(
csv_path=f'/{PATH}/{FILENAME}.csv',
kpi_type='non_revenue',
coord_to_columns=coord_to_columns,
media_to_channel=correct_media_to_channel,
media_spend_to_channel=correct_media_spend_to_channel,
reach_to_channel=correct_reach_to_channel,
frequency_to_channel=correct_frequency_to_channel,
rf_spend_to_channel=correct_rf_spend_to_channel,
)
data = loader.load()
Em que:
kpi_type
é 'revenue'
ou 'non_revenue'
.
PATH
é o caminho do local do arquivo de dados.
FILENAME
é o nome do arquivo de dados.
Conjunto de dados Xarray
Para carregar o conjunto de dados Xarray simulado usando XrDatasetDataLoader
:
Use pickle
para carregar os dados:
import pickle
with open(f'/{PATH}/{FILENAME}.pkl', 'r') as fh:
dataset=pickle.load(fh)
Em que:
PATH
é o caminho do local do arquivo de dados.
FILENAME
é o nome do arquivo de dados.
Transmita o conjunto de dados para XrDatasetDataLoader
. Use o argumento name_mapping
para mapear as coordenadas e matrizes. Forneça o mapeamento se os nomes no conjunto de dados de entrada forem diferentes dos nomes obrigatórios. Os nomes obrigatórios de coordenadas são geo
, time
, control_variable
, media_channel
e rf_channel
, sendo que rf_channel
indica que os canais têm dados de alcance e frequência. Os nomes obrigatórios de variáveis de dados são kpi
, revenue_per_kpi
, controls
, population
, media
, media_spend
, reach
, frequency
e rf_spend
.
loader = load.XrDatasetDataLoader(
dataset,
kpi_type='non_revenue',
name_mapping={
'channel': 'media_channel',
'control': 'control_variable',
'conversions': 'kpi',
'revenue_per_conversion': 'revenue_per_kpi',
'control_value': 'controls',
'spend': 'media_spend',
'reach': 'reach',
'frequency': 'frequency',
'rf_spend': 'rf_spend',
},
)
data = loader.load()
Em que:
kpi_type
é 'revenue'
ou 'non_revenue'
.
Matriz n-dimensional do Numpy
Para carregar matrizes n-dimensionais do Numpy diretamente, use NDArrayInputDataBuilder
:
Crie os dados em matrizes n-dimensionais separadas do Numpy.
import numpy as np
kpi_nd = np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]])
controls_nd = np.array([
[[1, 5], [2, 6], [3, 4]],
[[7, 8], [9, 10], [11, 12]],
[[13, 14], [15, 16], [17, 18]],
])
population_nd = np.array([1, 2, 3])
revenue_per_kpi_nd = np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]])
reach_nd = np.array([
[[1, 5], [2, 6], [3, 4]],
[[7, 8], [9, 10], [11, 12]],
[[13, 14], [15, 16], [17, 18]],
])
frequency_nd = np.array([
[[1, 5], [2, 6], [3, 4]],
[[7, 8], [9, 10], [11, 12]],
[[13, 14], [15, 16], [17, 18]],
])
rf_spend_nd = np.array([
[[1, 5], [2, 6], [3, 4]],
[[7, 8], [9, 10], [11, 12]],
[[13, 14], [15, 16], [17, 18]],
])
Use um NDArrayInputDataBuilder
para definir horários e locais, bem como para atribuir nomes de canais ou dimensões conforme necessário em dados de entrada do Meridian.
Saiba a definição de cada variável em Coletar e organizar os dados.
from meridian.data import nd_array_input_data_builder as data_builder
builder = (
data_builder.NDArrayInputDataBuilder(kpi_type='non_revenue')
)
builder.time_coords = ['2024-01-02', '2024-01-03', '2024-01-01']
builder.media_time_coords = ['2024-01-02', '2024-01-03', '2024-01-01']
builder.geos = ['B', 'A', 'C']
builder = (
builder
.with_kpi(kpi_nd)
.with_revenue_per_kpi(revenue_per_kpi_nd)
.with_population(population_nd)
.with_controls(
controls_nd,
control_names=["control0", "control1"])
.with_reach(
r_nd=reach_nd,
f_nd=frequency_nd,
rfs_nd=rf_spend_nd,
rf_channels=["channel0", "channel1"]
)
)
data = builder.build()
Em que:
kpi_type
é 'revenue'
ou 'non_revenue'
.
Para carregar os outros formatos de dados simulados (como excel
) usando DataFrameInputDataBuilder
:
Leia os dados (como uma planilha excel
) em um ou mais DataFrame
do Pandas.
import pandas as pd
df = pd.read_excel(
'https://github.com/google/meridian/raw/main/meridian/data/simulated_data/xlsx/geo_media_rf.xlsx',
engine='openpyxl',
)
Use um DataFrameInputDataBuilder
e mapeie os nomes das colunas para os tipos de variáveis necessários nos dados de entrada do Meridian.
Saiba a definição de cada variável em Coletar e organizar os dados.
from meridian.data import data_frame_input_data_builder as data_builder
builder = data_builder.DataFrameInputDataBuilder(
kpi_type='non_revenue',
default_kpi_column="conversions",
default_revenue_per_kpi_column="revenue_per_conversion",
)
builder = (
builder
.with_kpi(df)
.with_revenue_per_kpi(df)
.with_population(df)
.with_controls(df, control_cols=["GQV", "Discount", "Competitor_Sales"])
.with_reach(
df,
reach_cols = ['Channel4_reach', 'Channel5_reach'],
frequency_cols = ['Channel4_frequency', 'Channel5_frequency'],
rf_spend_cols = ['Channel4_spend', 'Channel5_spend'],
rf_channels = ['Channel4', 'Channel5'],
)
)
data = builder.build()
Em que:
kpi_type
é 'revenue'
ou 'non_revenue'
.
Em seguida, crie o modelo.
Exceto em caso de indicação contrária, o conteúdo desta página é licenciado de acordo com a Licença de atribuição 4.0 do Creative Commons, e as amostras de código são licenciadas de acordo com a Licença Apache 2.0. Para mais detalhes, consulte as políticas do site do Google Developers. Java é uma marca registrada da Oracle e/ou afiliadas.
Última atualização 2025-08-04 UTC.
[null,null,["Última atualização 2025-08-04 UTC."],[[["\u003cp\u003eSimulated data examples are provided for CSV, Xarray Dataset, and other data formats like Excel, with each format having its own loading method.\u003c/p\u003e\n"],["\u003cp\u003eLoading CSV data requires mapping column names to variable types such as \u003ccode\u003etime\u003c/code\u003e, \u003ccode\u003egeo\u003c/code\u003e, \u003ccode\u003econtrols\u003c/code\u003e, \u003ccode\u003epopulation\u003c/code\u003e, \u003ccode\u003ekpi\u003c/code\u003e, and \u003ccode\u003erevenue_per_kpi\u003c/code\u003e, and also differentiating between media channels with and without reach and frequency data.\u003c/p\u003e\n"],["\u003cp\u003eLoading Xarray Dataset involves using \u003ccode\u003epickle\u003c/code\u003e to read the data, then mapping coordinate and array names within the dataset to required names like \u003ccode\u003egeo\u003c/code\u003e, \u003ccode\u003etime\u003c/code\u003e, \u003ccode\u003econtrol_variable\u003c/code\u003e, \u003ccode\u003emedia_channel\u003c/code\u003e, and \u003ccode\u003erf_channel\u003c/code\u003e.\u003c/p\u003e\n"],["\u003cp\u003eLoading data from other formats, like Excel, requires mapping column names to variable types, just like CSV, and then loading the data into a \u003ccode\u003eDataFrameDataLoader\u003c/code\u003e after reading it into a dataframe.\u003c/p\u003e\n"],["\u003cp\u003eIn all three data formats, media exposure, reach, frequency, and media spend must be mapped to their corresponding channel names for output display.\u003c/p\u003e\n"]]],["The document details how to load simulated data in CSV, Xarray Dataset, and other formats using specific data loaders. For CSV and other formats, users must map column names to variable types like `time`, `geo`, and `kpi`, differentiating between media channels with and without reach/frequency data. They also need to map media, spend, reach, and frequency to channel names. For Xarray Dataset, users load data via pickle and map dataset coordinates and variables using `name_mapping`. Then load the data.\n"],null,["# Load geo-level data with reach and frequency\n\nSimulated data is provided as an example for each data type and format in the\nfollowing sections.\n\nCSV\n---\n\nTo load the\n[simulated CSV data](https://github.com/google/meridian/tree/main/meridian/data/simulated_data/csv/geo_media_rf.csv)\nusing `CsvDataLoader`:\n\n1. Map the column names to the variable types. The required variable types are\n `time`, `geo`, `controls`, `population`, `kpi`, and `revenue_per_kpi`. For\n media channels that don't have reach and frequency data, you must assign\n their media exposure and media spend to the categories of `media` and\n `media_spend`, respectively. Conversely, for media channels that do possess\n reach and frequency data, you must map their reach, frequency, and media\n spend to the categories of `reach`, `frequency`, and `rf_spend`\n correspondingly. For the definition of each variable, see [Collect and\n organize your data](/meridian/docs/user-guide/collect-data).\n\n coord_to_columns = load.CoordToColumns(\n time='time',\n geo='geo',\n controls=['GQV', 'Discount', 'Competitor_Sales'],\n population='population',\n kpi='conversions',\n revenue_per_kpi='revenue_per_conversion',\n media=[\n 'Channel0_impression',\n 'Channel1_impression',\n 'Channel2_impression',\n 'Channel3_impression',\n ],\n media_spend=[\n 'Channel0_spend',\n 'Channel1_spend',\n 'Channel2_spend',\n 'Channel3_spend',\n ],\n reach =['Channel4_reach', 'Channel5_reach'],\n frequency=['Channel4_frequency', 'Channel5_frequency'],\n rf_spend=['Channel4_spend', 'Channel5_spend'],\n )\n\n2. Map the media exposure, reach, frequency, and the media spends to the\n designated channel names that you want to display in the two-page output. In\n the following example, `Channel0_impression` and `Channel0_spend` are\n connected to the same channel, `Channel0`. Additionally, `Channel4_reach`,\n `Channel4_frequency`, and `Channel4_spend` are connected to the same\n channel, `Channel4`.\n\n correct_media_to_channel = {\n 'Channel0_impression': 'Channel0',\n 'Channel1_impression': 'Channel1',\n 'Channel2_impression': 'Channel2',\n 'Channel3_impression': 'Channel3',\n }\n correct_media_spend_to_channel = {\n 'Channel0_spend': 'Channel0',\n 'Channel1_spend': 'Channel1',\n 'Channel2_spend': 'Channel2',\n 'Channel3_spend': 'Channel3',\n }\n\n correct_reach_to_channel = {\n 'Channel4_reach': 'Channel4',\n 'Channel5_reach': 'Channel5',\n }\n correct_frequency_to_channel = {\n 'Channel4_frequency': 'Channel4',\n 'Channel5_frequency': 'Channel5',\n }\n correct_rf_spend_to_channel = {\n 'Channel4_spend': 'Channel4',\n 'Channel5_spend': 'Channel5',\n }\n\n3. Load the data using `CsvDataLoader`:\n\n loader = load.CsvDataLoader(\n csv_path=f'/{PATH}/{FILENAME}.csv',\n kpi_type='non_revenue',\n coord_to_columns=coord_to_columns,\n media_to_channel=correct_media_to_channel,\n media_spend_to_channel=correct_media_spend_to_channel,\n reach_to_channel=correct_reach_to_channel,\n frequency_to_channel=correct_frequency_to_channel,\n rf_spend_to_channel=correct_rf_spend_to_channel,\n )\n data = loader.load()\n\n Where:\n - `kpi_type` is either `'revenue'` or `'non_revenue'`.\n - `PATH` is the path to the data file location.\n - `FILENAME` is the name of your data file.\n\nXarray Dataset\n--------------\n\nTo load the pickled\n[simulated Xarray Dataset](https://github.com/google/meridian/tree/main/meridian/data/simulated_data/pkl/geo_media_rf.pkl)\nusing `XrDatasetDataLoader`:\n\n1. Load the data using `pickle`:\n\n import pickle\n with open(f'/{PATH}/{FILENAME}.pkl', 'r') as fh:\n dataset=pickle.load(fh)\n\n Where:\n - `PATH` is the path to the data file location.\n - `FILENAME` is the name of your data file.\n2. Pass the dataset to `XrDatasetDataLoader`. Use the `name_mapping` argument\n to map the coordinates and arrays. Provide mapping if the names in the input\n dataset are different from the required names. The required coordinate\n names are `geo`, `time`, `control_variable`, `media_channel`, and\n `rf_channel`, where `rf_channel` designates the channels having reach and\n frequency data. The required data variables names are `kpi`,\n `revenue_per_kpi`, `controls`, `population`, `media`, `media_spend`,\n `reach`, `frequency`, and `rf_spend`.\n\n loader = load.XrDatasetDataLoader(\n dataset,\n kpi_type='non_revenue',\n name_mapping={\n 'channel': 'media_channel',\n 'control': 'control_variable',\n 'conversions': 'kpi',\n 'revenue_per_conversion': 'revenue_per_kpi',\n 'control_value': 'controls',\n 'spend': 'media_spend',\n 'reach': 'reach',\n 'frequency': 'frequency',\n 'rf_spend': 'rf_spend',\n },\n )\n\n data = loader.load()\n\n Where:\n - `kpi_type` is either `'revenue'` or `'non_revenue'`.\n\nNumpy ndarray\n-------------\n\nTo load numpy ndarrays directly, use `NDArrayInputDataBuilder`:\n\n1. Create the data into separate numpy ndarrays.\n\n import numpy as np\n\n kpi_nd = np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]])\n controls_nd = np.array([\n [[1, 5], [2, 6], [3, 4]],\n [[7, 8], [9, 10], [11, 12]],\n [[13, 14], [15, 16], [17, 18]],\n ])\n population_nd = np.array([1, 2, 3])\n revenue_per_kpi_nd = np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]])\n reach_nd = np.array([\n [[1, 5], [2, 6], [3, 4]],\n [[7, 8], [9, 10], [11, 12]],\n [[13, 14], [15, 16], [17, 18]],\n ])\n frequency_nd = np.array([\n [[1, 5], [2, 6], [3, 4]],\n [[7, 8], [9, 10], [11, 12]],\n [[13, 14], [15, 16], [17, 18]],\n ])\n rf_spend_nd = np.array([\n [[1, 5], [2, 6], [3, 4]],\n [[7, 8], [9, 10], [11, 12]],\n [[13, 14], [15, 16], [17, 18]],\n ])\n\n2. Use a\n [`NDArrayInputDataBuilder`](https://github.com/google/meridian/blob/4624447e0aace5c24d42b58dd1cfd8fe0dc00971/meridian/data/nd_array_input_data_builder.py#L25)\n to set time and geos, as well as give channel or dimension\n names as required in a Meridian input data.\n For the definition of each variable, see\n [Collect and organize your data](/meridian/docs/user-guide/collect-data).\n\n from meridian.data import nd_array_input_data_builder as data_builder\n\n builder = (\n data_builder.NDArrayInputDataBuilder(kpi_type='non_revenue')\n )\n builder.time_coords = ['2024-01-02', '2024-01-03', '2024-01-01']\n builder.media_time_coords = ['2024-01-02', '2024-01-03', '2024-01-01']\n builder.geos = ['B', 'A', 'C']\n builder = (\n builder\n .with_kpi(kpi_nd)\n .with_revenue_per_kpi(revenue_per_kpi_nd)\n .with_population(population_nd)\n .with_controls(\n controls_nd,\n control_names=[\"control0\", \"control1\"])\n .with_reach(\n r_nd=reach_nd,\n f_nd=frequency_nd,\n rfs_nd=rf_spend_nd,\n rf_channels=[\"channel0\", \"channel1\"]\n )\n )\n\n data = builder.build()\n\n Where:\n - `kpi_type` is either `'revenue'` or `'non_revenue'`.\n\nPandas DataFrame or other data formats\n--------------------------------------\n\nTo load the [simulated other data\nformat](https://github.com/google/meridian/tree/main/meridian/data/simulated_data/xlsx/geo_media_rf.xlsx)\n(such as `excel`) using `DataFrameInputDataBuilder`:\n\n1. Read the data (such as an `excel` spreadsheet) into one or more Pandas `DataFrame`(s).\n\n import pandas as pd\n\n df = pd.read_excel(\n 'https://github.com/google/meridian/raw/main/meridian/data/simulated_data/xlsx/geo_media_rf.xlsx',\n engine='openpyxl',\n )\n\n2. Use a\n [`DataFrameInputDataBuilder`](https://github.com/google/meridian/blob/4624447e0aace5c24d42b58dd1cfd8fe0dc00971/meridian/data/data_frame_input_data_builder.py#L25)\n to map column names to the variable types required in a Meridian input data.\n For the definition of each variable, see\n [Collect and organize your data](/meridian/docs/user-guide/collect-data).\n\n from meridian.data import data_frame_input_data_builder as data_builder\n\n builder = data_builder.DataFrameInputDataBuilder(\n kpi_type='non_revenue',\n default_kpi_column=\"conversions\",\n default_revenue_per_kpi_column=\"revenue_per_conversion\",\n )\n builder = (\n builder\n .with_kpi(df)\n .with_revenue_per_kpi(df)\n .with_population(df)\n .with_controls(df, control_cols=[\"GQV\", \"Discount\", \"Competitor_Sales\"])\n .with_reach(\n df,\n reach_cols = ['Channel4_reach', 'Channel5_reach'],\n frequency_cols = ['Channel4_frequency', 'Channel5_frequency'],\n rf_spend_cols = ['Channel4_spend', 'Channel5_spend'],\n rf_channels = ['Channel4', 'Channel5'],\n )\n )\n\n data = builder.build()\n\n Where:\n - `kpi_type` is either `'revenue'` or `'non_revenue'`.\n\nNext, you can [create your model](/meridian/docs/user-guide/modeling-overview)."]]