Treten Sie der neuen
Discord-Community bei. Sie können sich dort in Echtzeit austauschen, Unterstützung von anderen Nutzern erhalten und direkt mit dem Meridian-Team interagieren.
Daten auf geografischer Ebene mit Reichweite und Häufigkeit laden
Mit Sammlungen den Überblick behalten
Sie können Inhalte basierend auf Ihren Einstellungen speichern und kategorisieren.
In den folgenden Abschnitten werden für jeden Datentyp und jedes Format simulierte Daten als Beispiel zur Verfügung gestellt.
CSV
So laden Sie die simulierten CSV-Daten mit CsvDataLoader
:
Ordnen Sie die Spaltennamen den Variablentypen zu. Die erforderlichen Variablentypen sind
time
, geo
, controls
, population
, kpi
und revenue_per_kpi
. Für Media-Channels, für die keine Daten zu Reichweite und Häufigkeit vorliegen, müssen Sie die Media-Präsenz und die Media-Ausgaben den Kategorien media
bzw.
media_spend
zuweisen. Bei Media-Channels mit Reichweiten- und Häufigkeitsdaten müssen Sie die Reichweite, Häufigkeit und Media-Ausgaben den Kategorien reach
, frequency
und rf_spend
zuordnen. Die Definition der einzelnen Variablen finden Sie unter Daten erheben und organisieren.
coord_to_columns = load.CoordToColumns(
time='time',
geo='geo',
controls=['GQV', 'Discount', 'Competitor_Sales'],
population='population',
kpi='conversions',
revenue_per_kpi='revenue_per_conversion',
media=[
'Channel0_impression',
'Channel1_impression',
'Channel2_impression',
'Channel3_impression',
],
media_spend=[
'Channel0_spend',
'Channel1_spend',
'Channel2_spend',
'Channel3_spend',
],
reach =['Channel4_reach', 'Channel5_reach'],
frequency=['Channel4_frequency', 'Channel5_frequency'],
rf_spend=['Channel4_spend', 'Channel5_spend'],
)
Ordnen Sie die Media-Präsenz, Reichweite, Häufigkeit und Media-Ausgaben den angegebenen Channelnamen zu, die in der zweiseitigen Ausgabe angezeigt werden sollen. Im folgenden Beispiel sind Channel0_impression
und Channel0_spend
mit demselben Channel (Channel0
) verbunden. Außerdem sind Channel4_reach
,
Channel4_frequency
und Channel4_spend
mit demselben Channel (Channel4
) verbunden.
correct_media_to_channel = {
'Channel0_impression': 'Channel0',
'Channel1_impression': 'Channel1',
'Channel2_impression': 'Channel2',
'Channel3_impression': 'Channel3',
}
correct_media_spend_to_channel = {
'Channel0_spend': 'Channel0',
'Channel1_spend': 'Channel1',
'Channel2_spend': 'Channel2',
'Channel3_spend': 'Channel3',
}
correct_reach_to_channel = {
'Channel4_reach': 'Channel4',
'Channel5_reach': 'Channel5',
}
correct_frequency_to_channel = {
'Channel4_frequency': 'Channel4',
'Channel5_frequency': 'Channel5',
}
correct_rf_spend_to_channel = {
'Channel4_spend': 'Channel4',
'Channel5_spend': 'Channel5',
}
Laden Sie die Daten mit CsvDataLoader
:
loader = load.CsvDataLoader(
csv_path=f'/{PATH}/{FILENAME}.csv',
kpi_type='non_revenue',
coord_to_columns=coord_to_columns,
media_to_channel=correct_media_to_channel,
media_spend_to_channel=correct_media_spend_to_channel,
reach_to_channel=correct_reach_to_channel,
frequency_to_channel=correct_frequency_to_channel,
rf_spend_to_channel=correct_rf_spend_to_channel,
)
data = loader.load()
Dabei gilt:
kpi_type
ist entweder 'revenue'
oder 'non_revenue'
.
PATH
ist der Pfad zum Speicherort der Datendatei.
FILENAME
ist der Name Ihrer Datendatei.
Xarray-Dataset
So laden Sie das serialisierte (pickled) simulierte Xarray-Dataset mit XrDatasetDataLoader
:
Laden Sie die Daten mit pickle
:
import pickle
with open(f'/{PATH}/{FILENAME}.pkl', 'r') as fh:
dataset=pickle.load(fh)
Dabei gilt:
PATH
ist der Pfad zum Speicherort der Datendatei.
FILENAME
ist der Name Ihrer Datendatei.
Übergeben Sie das Dataset an XrDatasetDataLoader
. Verwenden Sie das Argument name_mapping
, um die Koordinaten und Arrays zuzuordnen. Geben Sie eine Zuordnung an, wenn sich die Namen im Dataset für die Eingabe von den erforderlichen Namen unterscheiden. Die Namen der erforderlichen Koordinaten sind geo
, time
, control_variable
, media_channel
und rf_channel
. Dabei steht rf_channel
für die Channels mit Daten zu Reichweite und Häufigkeit. Die Namen der erforderlichen Datenvariablen sind kpi
, revenue_per_kpi
, controls
, population
, media
, media_spend
, reach
, frequency
und rf_spend
.
loader = load.XrDatasetDataLoader(
dataset,
kpi_type='non_revenue',
name_mapping={
'channel': 'media_channel',
'control': 'control_variable',
'conversions': 'kpi',
'revenue_per_conversion': 'revenue_per_kpi',
'control_value': 'controls',
'spend': 'media_spend',
'reach': 'reach',
'frequency': 'frequency',
'rf_spend': 'rf_spend',
},
)
data = loader.load()
Dabei gilt:
kpi_type
ist entweder 'revenue'
oder 'non_revenue'
.
N-dimensionales Array von NumPy
Wenn Sie n-dimensionale Arrays von NumPy direkt laden möchten, verwenden Sie NDArrayInputDataBuilder
:
Erstellen Sie die Daten in separaten n-dimensionalen Arrays von NumPy.
import numpy as np
kpi_nd = np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]])
controls_nd = np.array([
[[1, 5], [2, 6], [3, 4]],
[[7, 8], [9, 10], [11, 12]],
[[13, 14], [15, 16], [17, 18]],
])
population_nd = np.array([1, 2, 3])
revenue_per_kpi_nd = np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]])
reach_nd = np.array([
[[1, 5], [2, 6], [3, 4]],
[[7, 8], [9, 10], [11, 12]],
[[13, 14], [15, 16], [17, 18]],
])
frequency_nd = np.array([
[[1, 5], [2, 6], [3, 4]],
[[7, 8], [9, 10], [11, 12]],
[[13, 14], [15, 16], [17, 18]],
])
rf_spend_nd = np.array([
[[1, 5], [2, 6], [3, 4]],
[[7, 8], [9, 10], [11, 12]],
[[13, 14], [15, 16], [17, 18]],
])
Verwenden Sie einen NDArrayInputDataBuilder
, um Zeit und geografische Einheiten festzulegen und Channel- oder Dimensionsnamen anzugeben, wie in Meridian-Eingabedaten erforderlich.
Die Definition der einzelnen Variablen finden Sie unter Daten erheben und organisieren.
from meridian.data import nd_array_input_data_builder as data_builder
builder = (
data_builder.NDArrayInputDataBuilder(kpi_type='non_revenue')
)
builder.time_coords = ['2024-01-02', '2024-01-03', '2024-01-01']
builder.media_time_coords = ['2024-01-02', '2024-01-03', '2024-01-01']
builder.geos = ['B', 'A', 'C']
builder = (
builder
.with_kpi(kpi_nd)
.with_revenue_per_kpi(revenue_per_kpi_nd)
.with_population(population_nd)
.with_controls(
controls_nd,
control_names=["control0", "control1"])
.with_reach(
r_nd=reach_nd,
f_nd=frequency_nd,
rfs_nd=rf_spend_nd,
rf_channels=["channel0", "channel1"]
)
)
data = builder.build()
Dabei gilt:
kpi_type
ist entweder 'revenue'
oder 'non_revenue'
.
So laden Sie ein anderes Format für simulierte Daten (z. B. excel
) mit DataFrameInputDataBuilder
:
Lesen Sie die Daten (z. B. eine excel
-Tabelle) in ein oder mehrere Pandas-DataFrame
s ein.
import pandas as pd
df = pd.read_excel(
'https://github.com/google/meridian/raw/main/meridian/data/simulated_data/xlsx/geo_media_rf.xlsx',
engine='openpyxl',
)
Verwenden Sie einen DataFrameInputDataBuilder
, um Spaltennamen den Variablentypen zuzuordnen, die in Meridian-Eingabedaten erforderlich sind.
Die Definition der einzelnen Variablen finden Sie unter Daten erheben und organisieren.
from meridian.data import data_frame_input_data_builder as data_builder
builder = data_builder.DataFrameInputDataBuilder(
kpi_type='non_revenue',
default_kpi_column="conversions",
default_revenue_per_kpi_column="revenue_per_conversion",
)
builder = (
builder
.with_kpi(df)
.with_revenue_per_kpi(df)
.with_population(df)
.with_controls(df, control_cols=["GQV", "Discount", "Competitor_Sales"])
.with_reach(
df,
reach_cols = ['Channel4_reach', 'Channel5_reach'],
frequency_cols = ['Channel4_frequency', 'Channel5_frequency'],
rf_spend_cols = ['Channel4_spend', 'Channel5_spend'],
rf_channels = ['Channel4', 'Channel5'],
)
)
data = builder.build()
Dabei gilt:
kpi_type
ist entweder 'revenue'
oder 'non_revenue'
.
Als Nächstes können Sie Ihr Modell erstellen.
Sofern nicht anders angegeben, sind die Inhalte dieser Seite unter der Creative Commons Attribution 4.0 License und Codebeispiele unter der Apache 2.0 License lizenziert. Weitere Informationen finden Sie in den Websiterichtlinien von Google Developers. Java ist eine eingetragene Marke von Oracle und/oder seinen Partnern.
Zuletzt aktualisiert: 2025-08-04 (UTC).
[null,null,["Zuletzt aktualisiert: 2025-08-04 (UTC)."],[[["\u003cp\u003eSimulated data examples are provided for CSV, Xarray Dataset, and other data formats like Excel, with each format having its own loading method.\u003c/p\u003e\n"],["\u003cp\u003eLoading CSV data requires mapping column names to variable types such as \u003ccode\u003etime\u003c/code\u003e, \u003ccode\u003egeo\u003c/code\u003e, \u003ccode\u003econtrols\u003c/code\u003e, \u003ccode\u003epopulation\u003c/code\u003e, \u003ccode\u003ekpi\u003c/code\u003e, and \u003ccode\u003erevenue_per_kpi\u003c/code\u003e, and also differentiating between media channels with and without reach and frequency data.\u003c/p\u003e\n"],["\u003cp\u003eLoading Xarray Dataset involves using \u003ccode\u003epickle\u003c/code\u003e to read the data, then mapping coordinate and array names within the dataset to required names like \u003ccode\u003egeo\u003c/code\u003e, \u003ccode\u003etime\u003c/code\u003e, \u003ccode\u003econtrol_variable\u003c/code\u003e, \u003ccode\u003emedia_channel\u003c/code\u003e, and \u003ccode\u003erf_channel\u003c/code\u003e.\u003c/p\u003e\n"],["\u003cp\u003eLoading data from other formats, like Excel, requires mapping column names to variable types, just like CSV, and then loading the data into a \u003ccode\u003eDataFrameDataLoader\u003c/code\u003e after reading it into a dataframe.\u003c/p\u003e\n"],["\u003cp\u003eIn all three data formats, media exposure, reach, frequency, and media spend must be mapped to their corresponding channel names for output display.\u003c/p\u003e\n"]]],["The document details how to load simulated data in CSV, Xarray Dataset, and other formats using specific data loaders. For CSV and other formats, users must map column names to variable types like `time`, `geo`, and `kpi`, differentiating between media channels with and without reach/frequency data. They also need to map media, spend, reach, and frequency to channel names. For Xarray Dataset, users load data via pickle and map dataset coordinates and variables using `name_mapping`. Then load the data.\n"],null,["# Load geo-level data with reach and frequency\n\nSimulated data is provided as an example for each data type and format in the\nfollowing sections.\n\nCSV\n---\n\nTo load the\n[simulated CSV data](https://github.com/google/meridian/tree/main/meridian/data/simulated_data/csv/geo_media_rf.csv)\nusing `CsvDataLoader`:\n\n1. Map the column names to the variable types. The required variable types are\n `time`, `geo`, `controls`, `population`, `kpi`, and `revenue_per_kpi`. For\n media channels that don't have reach and frequency data, you must assign\n their media exposure and media spend to the categories of `media` and\n `media_spend`, respectively. Conversely, for media channels that do possess\n reach and frequency data, you must map their reach, frequency, and media\n spend to the categories of `reach`, `frequency`, and `rf_spend`\n correspondingly. For the definition of each variable, see [Collect and\n organize your data](/meridian/docs/user-guide/collect-data).\n\n coord_to_columns = load.CoordToColumns(\n time='time',\n geo='geo',\n controls=['GQV', 'Discount', 'Competitor_Sales'],\n population='population',\n kpi='conversions',\n revenue_per_kpi='revenue_per_conversion',\n media=[\n 'Channel0_impression',\n 'Channel1_impression',\n 'Channel2_impression',\n 'Channel3_impression',\n ],\n media_spend=[\n 'Channel0_spend',\n 'Channel1_spend',\n 'Channel2_spend',\n 'Channel3_spend',\n ],\n reach =['Channel4_reach', 'Channel5_reach'],\n frequency=['Channel4_frequency', 'Channel5_frequency'],\n rf_spend=['Channel4_spend', 'Channel5_spend'],\n )\n\n2. Map the media exposure, reach, frequency, and the media spends to the\n designated channel names that you want to display in the two-page output. In\n the following example, `Channel0_impression` and `Channel0_spend` are\n connected to the same channel, `Channel0`. Additionally, `Channel4_reach`,\n `Channel4_frequency`, and `Channel4_spend` are connected to the same\n channel, `Channel4`.\n\n correct_media_to_channel = {\n 'Channel0_impression': 'Channel0',\n 'Channel1_impression': 'Channel1',\n 'Channel2_impression': 'Channel2',\n 'Channel3_impression': 'Channel3',\n }\n correct_media_spend_to_channel = {\n 'Channel0_spend': 'Channel0',\n 'Channel1_spend': 'Channel1',\n 'Channel2_spend': 'Channel2',\n 'Channel3_spend': 'Channel3',\n }\n\n correct_reach_to_channel = {\n 'Channel4_reach': 'Channel4',\n 'Channel5_reach': 'Channel5',\n }\n correct_frequency_to_channel = {\n 'Channel4_frequency': 'Channel4',\n 'Channel5_frequency': 'Channel5',\n }\n correct_rf_spend_to_channel = {\n 'Channel4_spend': 'Channel4',\n 'Channel5_spend': 'Channel5',\n }\n\n3. Load the data using `CsvDataLoader`:\n\n loader = load.CsvDataLoader(\n csv_path=f'/{PATH}/{FILENAME}.csv',\n kpi_type='non_revenue',\n coord_to_columns=coord_to_columns,\n media_to_channel=correct_media_to_channel,\n media_spend_to_channel=correct_media_spend_to_channel,\n reach_to_channel=correct_reach_to_channel,\n frequency_to_channel=correct_frequency_to_channel,\n rf_spend_to_channel=correct_rf_spend_to_channel,\n )\n data = loader.load()\n\n Where:\n - `kpi_type` is either `'revenue'` or `'non_revenue'`.\n - `PATH` is the path to the data file location.\n - `FILENAME` is the name of your data file.\n\nXarray Dataset\n--------------\n\nTo load the pickled\n[simulated Xarray Dataset](https://github.com/google/meridian/tree/main/meridian/data/simulated_data/pkl/geo_media_rf.pkl)\nusing `XrDatasetDataLoader`:\n\n1. Load the data using `pickle`:\n\n import pickle\n with open(f'/{PATH}/{FILENAME}.pkl', 'r') as fh:\n dataset=pickle.load(fh)\n\n Where:\n - `PATH` is the path to the data file location.\n - `FILENAME` is the name of your data file.\n2. Pass the dataset to `XrDatasetDataLoader`. Use the `name_mapping` argument\n to map the coordinates and arrays. Provide mapping if the names in the input\n dataset are different from the required names. The required coordinate\n names are `geo`, `time`, `control_variable`, `media_channel`, and\n `rf_channel`, where `rf_channel` designates the channels having reach and\n frequency data. The required data variables names are `kpi`,\n `revenue_per_kpi`, `controls`, `population`, `media`, `media_spend`,\n `reach`, `frequency`, and `rf_spend`.\n\n loader = load.XrDatasetDataLoader(\n dataset,\n kpi_type='non_revenue',\n name_mapping={\n 'channel': 'media_channel',\n 'control': 'control_variable',\n 'conversions': 'kpi',\n 'revenue_per_conversion': 'revenue_per_kpi',\n 'control_value': 'controls',\n 'spend': 'media_spend',\n 'reach': 'reach',\n 'frequency': 'frequency',\n 'rf_spend': 'rf_spend',\n },\n )\n\n data = loader.load()\n\n Where:\n - `kpi_type` is either `'revenue'` or `'non_revenue'`.\n\nNumpy ndarray\n-------------\n\nTo load numpy ndarrays directly, use `NDArrayInputDataBuilder`:\n\n1. Create the data into separate numpy ndarrays.\n\n import numpy as np\n\n kpi_nd = np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]])\n controls_nd = np.array([\n [[1, 5], [2, 6], [3, 4]],\n [[7, 8], [9, 10], [11, 12]],\n [[13, 14], [15, 16], [17, 18]],\n ])\n population_nd = np.array([1, 2, 3])\n revenue_per_kpi_nd = np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]])\n reach_nd = np.array([\n [[1, 5], [2, 6], [3, 4]],\n [[7, 8], [9, 10], [11, 12]],\n [[13, 14], [15, 16], [17, 18]],\n ])\n frequency_nd = np.array([\n [[1, 5], [2, 6], [3, 4]],\n [[7, 8], [9, 10], [11, 12]],\n [[13, 14], [15, 16], [17, 18]],\n ])\n rf_spend_nd = np.array([\n [[1, 5], [2, 6], [3, 4]],\n [[7, 8], [9, 10], [11, 12]],\n [[13, 14], [15, 16], [17, 18]],\n ])\n\n2. Use a\n [`NDArrayInputDataBuilder`](https://github.com/google/meridian/blob/4624447e0aace5c24d42b58dd1cfd8fe0dc00971/meridian/data/nd_array_input_data_builder.py#L25)\n to set time and geos, as well as give channel or dimension\n names as required in a Meridian input data.\n For the definition of each variable, see\n [Collect and organize your data](/meridian/docs/user-guide/collect-data).\n\n from meridian.data import nd_array_input_data_builder as data_builder\n\n builder = (\n data_builder.NDArrayInputDataBuilder(kpi_type='non_revenue')\n )\n builder.time_coords = ['2024-01-02', '2024-01-03', '2024-01-01']\n builder.media_time_coords = ['2024-01-02', '2024-01-03', '2024-01-01']\n builder.geos = ['B', 'A', 'C']\n builder = (\n builder\n .with_kpi(kpi_nd)\n .with_revenue_per_kpi(revenue_per_kpi_nd)\n .with_population(population_nd)\n .with_controls(\n controls_nd,\n control_names=[\"control0\", \"control1\"])\n .with_reach(\n r_nd=reach_nd,\n f_nd=frequency_nd,\n rfs_nd=rf_spend_nd,\n rf_channels=[\"channel0\", \"channel1\"]\n )\n )\n\n data = builder.build()\n\n Where:\n - `kpi_type` is either `'revenue'` or `'non_revenue'`.\n\nPandas DataFrame or other data formats\n--------------------------------------\n\nTo load the [simulated other data\nformat](https://github.com/google/meridian/tree/main/meridian/data/simulated_data/xlsx/geo_media_rf.xlsx)\n(such as `excel`) using `DataFrameInputDataBuilder`:\n\n1. Read the data (such as an `excel` spreadsheet) into one or more Pandas `DataFrame`(s).\n\n import pandas as pd\n\n df = pd.read_excel(\n 'https://github.com/google/meridian/raw/main/meridian/data/simulated_data/xlsx/geo_media_rf.xlsx',\n engine='openpyxl',\n )\n\n2. Use a\n [`DataFrameInputDataBuilder`](https://github.com/google/meridian/blob/4624447e0aace5c24d42b58dd1cfd8fe0dc00971/meridian/data/data_frame_input_data_builder.py#L25)\n to map column names to the variable types required in a Meridian input data.\n For the definition of each variable, see\n [Collect and organize your data](/meridian/docs/user-guide/collect-data).\n\n from meridian.data import data_frame_input_data_builder as data_builder\n\n builder = data_builder.DataFrameInputDataBuilder(\n kpi_type='non_revenue',\n default_kpi_column=\"conversions\",\n default_revenue_per_kpi_column=\"revenue_per_conversion\",\n )\n builder = (\n builder\n .with_kpi(df)\n .with_revenue_per_kpi(df)\n .with_population(df)\n .with_controls(df, control_cols=[\"GQV\", \"Discount\", \"Competitor_Sales\"])\n .with_reach(\n df,\n reach_cols = ['Channel4_reach', 'Channel5_reach'],\n frequency_cols = ['Channel4_frequency', 'Channel5_frequency'],\n rf_spend_cols = ['Channel4_spend', 'Channel5_spend'],\n rf_channels = ['Channel4', 'Channel5'],\n )\n )\n\n data = builder.build()\n\n Where:\n - `kpi_type` is either `'revenue'` or `'non_revenue'`.\n\nNext, you can [create your model](/meridian/docs/user-guide/modeling-overview)."]]