Treten Sie der neuen
Discord-Community bei. Sie können sich dort in Echtzeit austauschen, Unterstützung von anderen Nutzern erhalten und direkt mit dem Meridian-Team interagieren.
Daten auf geografischer Ebene mit organischen Media- und nicht mediabezogenen Testvariablen laden
Mit Sammlungen den Überblick behalten
Sie können Inhalte basierend auf Ihren Einstellungen speichern und kategorisieren.
In den folgenden Abschnitten werden für jeden Datentyp und jedes Format simulierte Daten als Beispiel zur Verfügung gestellt.
CSV
So laden Sie die simulierten CSV-Daten mit CsvDataLoader
:
Ordnen Sie die Spaltennamen den Variablentypen zu. Die erforderlichen Variablentypen sind time
, geo
, controls
, population
, kpi
, revenue_per_kpi
, media
und media_spend
. Für Media-Channels ohne direkte Kosten müssen Sie die Media-Präsenz organic_media
zuweisen. Für nicht mediabezogene Testvariablen müssen Sie die entsprechenden Spaltennamen non_media_treatments
zuweisen.
Die Definition der einzelnen Variablen finden Sie unter Daten erheben und organisieren.
coord_to_columns = load.CoordToColumns(
time='time',
geo='geo',
controls=['GQV', 'Competitor_Sales'],
population='population',
kpi='conversions',
revenue_per_kpi='revenue_per_conversion',
media=[
'Channel0_impression',
'Channel1_impression',
'Channel2_impression',
'Channel3_impression',
'Channel4_impression',
],
media_spend=[
'Channel0_spend',
'Channel1_spend',
'Channel2_spend',
'Channel3_spend',
'Channel4_spend',
],
organic_media=['Organic_channel0_impression'],
non_media_treatments=['Promo'],
)
Ordnen Sie die Media-Variablen und ‑Ausgaben den angegebenen Channel-Namen zu, die in der zweiseitigen Ausgabe angezeigt werden sollen. Im folgenden Beispiel sind Channel0_impression
und Channel0_spend
mit demselben Channel (Channel0
) verbunden.
correct_media_to_channel = {
'Channel0_impression': 'Channel0',
'Channel1_impression': 'Channel1',
'Channel2_impression': 'Channel2',
'Channel3_impression': 'Channel3',
'Channel4_impression': 'Channel4',
}
correct_media_spend_to_channel = {
'Channel0_spend': 'Channel0',
'Channel1_spend': 'Channel1',
'Channel2_spend': 'Channel2',
'Channel3_spend': 'Channel3',
'Channel4_spend': 'Channel4',
}
Laden Sie die Daten mit CsvDataLoader
:
loader = load.CsvDataLoader(
csv_path=f'/{PATH}/{FILENAME}.csv',
kpi_type='non_revenue',
coord_to_columns=coord_to_columns,
media_to_channel=correct_media_to_channel,
media_spend_to_channel=correct_media_spend_to_channel,
)
data = loader.load()
Dabei gilt:
kpi_type
ist entweder 'revenue'
oder 'non_revenue'
.
PATH
ist der Pfad zum Speicherort der Datendatei.
FILENAME
ist der Name Ihrer Datendatei.
Xarray-Dataset
So laden Sie das simulierte Xarray-Dataset mit XrDatasetDataLoader
:
Laden Sie die Daten mit pickle
:
import pickle
with open(f'/{PATH}/{FILENAME}.pkl', 'r') as fh:
XrDataset=pickle.load(fh)
Dabei gilt:
PATH
ist der Pfad zum Speicherort der Datendatei.
FILENAME
ist der Name Ihrer Datendatei.
Übergeben Sie das Dataset an XrDatasetDataLoader
. Verwenden Sie das Argument name_mapping
, um die Koordinaten und Arrays zuzuordnen. Geben Sie eine Zuordnung an, wenn sich die Namen im Dataset für die Eingabe von den erforderlichen Namen unterscheiden. Die Namen der erforderlichen Koordinaten sind geo
, time
, control_variable
, media_channel
, organic_media_channel
und non_media_channel
. Die Namen der erforderlichen Datenvariablen sind kpi
, revenue_per_kpi
, controls
, population
, media
, media_spend
, organic_media
und non_media_treatments
.
loader = load.XrDatasetDataLoader(
XrDataset,
kpi_type='non_revenue',
name_mapping={'channel': 'media_channel',
'control': 'control_variable',
'organic_channel': 'organic_media_channel',
'non_media_treatment': 'non_media_channel',
'conversions': 'kpi',
'revenue_per_conversion': 'revenue_per_kpi',
'control_value': 'controls',
'spend': 'media_spend',
'non_media_treatment_value': 'non_media_treatments'},
)
data = loader.load()
Dabei gilt:
kpi_type
ist entweder 'revenue'
oder 'non_revenue'
.
N-dimensionales Array von NumPy
Wenn Sie n-dimensionale Arrays von NumPy direkt laden möchten, verwenden Sie NDArrayInputDataBuilder
:
Erstellen Sie die Daten in separaten n-dimensionalen Arrays von NumPy.
import numpy as np
kpi_nd = np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]])
controls_nd = np.array([
[[1, 5], [2, 6], [3, 4]],
[[7, 8], [9, 10], [11, 12]],
[[13, 14], [15, 16], [17, 18]],
])
population_nd = np.array([1, 2, 3])
revenue_per_kpi_nd = np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]])
media_nd = np.array([
[[1, 5], [2, 6], [3, 4]],
[[7, 8], [9, 10], [11, 12]],
[[13, 14], [15, 16], [17, 18]],
])
media_spend_nd = np.array([
[[1, 5], [2, 6], [3, 4]],
[[7, 8], [9, 10], [11, 12]],
[[13, 14], [15, 16], [17, 18]],
])
organic_media_nd = np.array([
[[1, 5], [2, 6], [3, 4]],
[[7, 8], [9, 10], [11, 12]],
[[13, 14], [15, 16], [17, 18]],
])
non_media_treatments_nd = np.array([
[[1, 5], [2, 6], [3, 4]],
[[7, 8], [9, 10], [11, 12]],
[[13, 14], [15, 16], [17, 18]],
])
Verwenden Sie einen NDArrayInputDataBuilder
, um Zeit und geografische Einheiten festzulegen und Channel- oder Dimensionsnamen anzugeben, wie in Meridian-Eingabedaten erforderlich.
Die Definition der einzelnen Variablen finden Sie unter Daten erheben und organisieren.
from meridian.data import nd_array_input_data_builder as data_builder
builder = (
data_builder.NDArrayInputDataBuilder(kpi_type='non_revenue')
)
builder.time_coords = ['2024-01-02', '2024-01-03', '2024-01-01']
builder.media_time_coords = ['2024-01-02', '2024-01-03', '2024-01-01']
builder.geos = ['B', 'A', 'C']
builder = (
builder
.with_kpi(kpi_nd)
.with_revenue_per_kpi(revenue_per_kpi_nd)
.with_population(population_nd)
.with_controls(
controls_nd,
control_names=["control0", "control1"])
.with_media(
m_nd=media_nd,
ms_nd=media_spend_nd,
media_channels=["channel0", "channel1"]
)
.with_organic_media(
organic_media_nd,
organic_media_channels=["organic_channel0", "organic_channel1"]
).with_non_media_treatments(
non_media_treatments_nd,
non_media_channel_names=["non_media_channel0", "non_media_channel1"]
)
)
data = builder.build()
Dabei gilt:
kpi_type
ist entweder 'revenue'
oder 'non_revenue'
.
So laden Sie ein anderes Format für simulierte Daten (z. B. excel
) mit DataFrameInputDataBuilder
:
Lesen Sie die Daten (z. B. eine excel
-Tabelle) in ein oder mehrere Pandas-DataFrame
s ein.
import pandas as pd
df = pd.read_excel(
'https://github.com/google/meridian/raw/main/meridian/data/simulated_data/xlsx/geo_all_channels.xlsx',
engine='openpyxl',
)
Verwenden Sie einen DataFrameInputDataBuilder
, um Spaltennamen den Variablentypen zuzuordnen, die in Meridian-Eingabedaten erforderlich sind.
Die Definition der einzelnen Variablen finden Sie unter Daten erheben und organisieren.
from meridian.data import data_frame_input_data_builder as data_builder
builder = data_builder.DataFrameInputDataBuilder(
kpi_type='non_revenue',
default_kpi_column="conversions",
default_revenue_per_kpi_column="revenue_per_conversion",
)
builder = (
builder
.with_kpi(df)
.with_revenue_per_kpi(df)
.with_population(df)
.with_controls(df, control_cols=["GQV", "Competitor_Sales"])
)
channels = ["Channel0", "Channel1", "Channel2", "Channel3", "Channel4"]
builder = builder.with_media(
df,
media_cols=[f"{channel}_impression" for channel in channels],
media_spend_cols=[f"{channel}_spend" for channel in channels],
media_channels=channels,
)
builder = (
builder
.with_organic_media(
df,
organic_media_cols = ["Organic_channel0_impression"],
organic_media_channels = ["Organic_channel0"],
)
.with_non_media_treatments(
df,
non_media_treatment_cols=['Promo']
)
)
data = builder.build()
Dabei gilt:
kpi_type
ist entweder 'revenue'
oder 'non_revenue'
.
Als Nächstes können Sie Ihr Modell erstellen.
Sofern nicht anders angegeben, sind die Inhalte dieser Seite unter der Creative Commons Attribution 4.0 License und Codebeispiele unter der Apache 2.0 License lizenziert. Weitere Informationen finden Sie in den Websiterichtlinien von Google Developers. Java ist eine eingetragene Marke von Oracle und/oder seinen Partnern.
Zuletzt aktualisiert: 2025-08-16 (UTC).
[null,null,["Zuletzt aktualisiert: 2025-08-16 (UTC)."],[[["\u003cp\u003eSimulated data examples are provided for CSV, Xarray Dataset, and other formats like Excel, showcasing how to load data with different loaders.\u003c/p\u003e\n"],["\u003cp\u003eWhen loading CSV data, you need to map column names to variable types like \u003ccode\u003etime\u003c/code\u003e, \u003ccode\u003egeo\u003c/code\u003e, \u003ccode\u003econtrols\u003c/code\u003e, \u003ccode\u003epopulation\u003c/code\u003e, \u003ccode\u003ekpi\u003c/code\u003e, \u003ccode\u003erevenue_per_kpi\u003c/code\u003e, \u003ccode\u003emedia\u003c/code\u003e, and \u003ccode\u003emedia_spend\u003c/code\u003e using \u003ccode\u003eCsvDataLoader\u003c/code\u003e.\u003c/p\u003e\n"],["\u003cp\u003eLoading Xarray Dataset data requires using \u003ccode\u003eXrDatasetDataLoader\u003c/code\u003e and mapping coordinates and arrays through the \u003ccode\u003ename_mapping\u003c/code\u003e argument, accommodating variations in input dataset names.\u003c/p\u003e\n"],["\u003cp\u003eFor other data formats, like Excel, \u003ccode\u003eDataFrameDataLoader\u003c/code\u003e is used after mapping column names to the required variable types, and reading the data into a DataFrame.\u003c/p\u003e\n"],["\u003cp\u003eEach data loader, such as \u003ccode\u003eCsvDataLoader\u003c/code\u003e, \u003ccode\u003eXrDatasetDataLoader\u003c/code\u003e, and \u003ccode\u003eDataFrameDataLoader\u003c/code\u003e, requires setting the \u003ccode\u003ekpi_type\u003c/code\u003e as either \u003ccode\u003e'revenue'\u003c/code\u003e or \u003ccode\u003e'non_revenue'\u003c/code\u003e.\u003c/p\u003e\n"]]],["Three data formats—CSV, Xarray Dataset, and others (like Excel)—are detailed. For CSV and other formats, users must map column names to variable types like `time`, `geo`, `controls`, `population`, `kpi`, etc., and map media variables to channel names. They load data using `CsvDataLoader` or `DataFrameDataLoader`, specifying `kpi_type`, paths, and mappings. For Xarray, users load the data using pickle and pass the data to `XrDatasetDataLoader`, providing mappings for coordinates and arrays, then load it. Each loader has a load function to load the data.\n"],null,["# Load geo-level data with organic media and non-media treatments\n\nSimulated data is provided as an example for each data type and format in the\nfollowing sections.\n\nCSV\n---\n\nTo load the [simulated\nCSV](https://github.com/google/meridian/tree/main/meridian/data/simulated_data/csv/geo_all_channels.csv)\ndata using `CsvDataLoader`:\n\n1. Map the column names to the variable types. The required variable types are\n `time`, `geo`, `controls`, `population`, `kpi`, `revenue_per_kpi`, `media`,\n and `media_spend`. For media channels that have no direct cost, you must\n assign their media exposure to `organic_media`. For non-media treatments,\n you must assign the corresponding columns names to `non_media_treatments`.\n For the definition of each variable, see [Collect and organize your\n data](/meridian/docs/user-guide/collect-data).\n\n coord_to_columns = load.CoordToColumns(\n time='time',\n geo='geo',\n controls=['GQV', 'Competitor_Sales'],\n population='population',\n kpi='conversions',\n revenue_per_kpi='revenue_per_conversion',\n media=[\n 'Channel0_impression',\n 'Channel1_impression',\n 'Channel2_impression',\n 'Channel3_impression',\n 'Channel4_impression',\n ],\n media_spend=[\n 'Channel0_spend',\n 'Channel1_spend',\n 'Channel2_spend',\n 'Channel3_spend',\n 'Channel4_spend',\n ],\n organic_media=['Organic_channel0_impression'],\n non_media_treatments=['Promo'],\n )\n\n2. Map the media variables and the media spends to the designated channel names\n that you want to display in the two-page output. In the following example,\n `Channel0_impression` and `Channel0_spend` are connected to the same\n channel, `Channel0`.\n\n correct_media_to_channel = {\n 'Channel0_impression': 'Channel0',\n 'Channel1_impression': 'Channel1',\n 'Channel2_impression': 'Channel2',\n 'Channel3_impression': 'Channel3',\n 'Channel4_impression': 'Channel4',\n }\n correct_media_spend_to_channel = {\n 'Channel0_spend': 'Channel0',\n 'Channel1_spend': 'Channel1',\n 'Channel2_spend': 'Channel2',\n 'Channel3_spend': 'Channel3',\n 'Channel4_spend': 'Channel4',\n }\n\n3. Load the data using `CsvDataLoader`:\n\n loader = load.CsvDataLoader(\n csv_path=f'/{PATH}/{FILENAME}.csv',\n kpi_type='non_revenue',\n coord_to_columns=coord_to_columns,\n media_to_channel=correct_media_to_channel,\n media_spend_to_channel=correct_media_spend_to_channel,\n )\n data = loader.load()\n\n Where:\n - `kpi_type` is either `'revenue'` or `'non_revenue'`.\n - `PATH` is the path to the data file location.\n - `FILENAME` is the name of your data file.\n\nXarray Dataset\n--------------\n\nTo load the [simulated Xarray\nDataset](https://github.com/google/meridian/tree/main/meridian/data/simulated_data/pkl/geo_all_channels.pkl)\nusing `XrDatasetDataLoader`:\n\n1. Load the data using `pickle`:\n\n import pickle\n with open(f'/{PATH}/{FILENAME}.pkl', 'r') as fh:\n XrDataset=pickle.load(fh)\n\n Where:\n - `PATH` is the path to the data file location.\n - `FILENAME` is the name of your data file.\n2. Pass the dataset to `XrDatasetDataLoader`. Use the `name_mapping` argument\n to map the coordinates and arrays. Provide mapping if the names in the input\n dataset are different from the required names. The required coordinate names\n are `geo`, `time`, `control_variable`, `media_channel`,\n `organic_media_channel`, and `non_media_channel`. The required data\n variables names are `kpi`, `revenue_per_kpi`, `controls`, `population`,\n `media`, `media_spend`, `organic_media`, and `non_media_treatments`.\n\n loader = load.XrDatasetDataLoader(\n XrDataset,\n kpi_type='non_revenue',\n name_mapping={'channel': 'media_channel',\n 'control': 'control_variable',\n 'organic_channel': 'organic_media_channel',\n 'non_media_treatment': 'non_media_channel',\n 'conversions': 'kpi',\n 'revenue_per_conversion': 'revenue_per_kpi',\n 'control_value': 'controls',\n 'spend': 'media_spend',\n 'non_media_treatment_value': 'non_media_treatments'},\n )\n\n data = loader.load()\n\n Where:\n - `kpi_type` is either `'revenue'` or `'non_revenue'`.\n\nNumpy ndarray\n-------------\n\nTo load numpy ndarrays directly, use `NDArrayInputDataBuilder`:\n\n1. Create the data into separate numpy ndarrays.\n\n import numpy as np\n\n kpi_nd = np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]])\n controls_nd = np.array([\n [[1, 5], [2, 6], [3, 4]],\n [[7, 8], [9, 10], [11, 12]],\n [[13, 14], [15, 16], [17, 18]],\n ])\n population_nd = np.array([1, 2, 3])\n revenue_per_kpi_nd = np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]])\n media_nd = np.array([\n [[1, 5], [2, 6], [3, 4]],\n [[7, 8], [9, 10], [11, 12]],\n [[13, 14], [15, 16], [17, 18]],\n ])\n media_spend_nd = np.array([\n [[1, 5], [2, 6], [3, 4]],\n [[7, 8], [9, 10], [11, 12]],\n [[13, 14], [15, 16], [17, 18]],\n ])\n organic_media_nd = np.array([\n [[1, 5], [2, 6], [3, 4]],\n [[7, 8], [9, 10], [11, 12]],\n [[13, 14], [15, 16], [17, 18]],\n ])\n non_media_treatments_nd = np.array([\n [[1, 5], [2, 6], [3, 4]],\n [[7, 8], [9, 10], [11, 12]],\n [[13, 14], [15, 16], [17, 18]],\n ])\n\n2. Use a\n [`NDArrayInputDataBuilder`](https://github.com/google/meridian/blob/4624447e0aace5c24d42b58dd1cfd8fe0dc00971/meridian/data/nd_array_input_data_builder.py#L25)\n to set time and geos, as well as give channel or dimension\n names as required in a Meridian input data.\n For the definition of each variable, see\n [Collect and organize your data](/meridian/docs/user-guide/collect-data).\n\n from meridian.data import nd_array_input_data_builder as data_builder\n\n builder = (\n data_builder.NDArrayInputDataBuilder(kpi_type='non_revenue')\n )\n builder.time_coords = ['2024-01-02', '2024-01-03', '2024-01-01']\n builder.media_time_coords = ['2024-01-02', '2024-01-03', '2024-01-01']\n builder.geos = ['B', 'A', 'C']\n builder = (\n builder\n .with_kpi(kpi_nd)\n .with_revenue_per_kpi(revenue_per_kpi_nd)\n .with_population(population_nd)\n .with_controls(\n controls_nd,\n control_names=[\"control0\", \"control1\"])\n .with_media(\n m_nd=media_nd,\n ms_nd=media_spend_nd,\n media_channels=[\"channel0\", \"channel1\"]\n )\n .with_organic_media(\n organic_media_nd,\n organic_media_channels=[\"organic_channel0\", \"organic_channel1\"]\n ).with_non_media_treatments(\n non_media_treatments_nd,\n non_media_channel_names=[\"non_media_channel0\", \"non_media_channel1\"]\n )\n )\n\n data = builder.build()\n\n Where:\n - `kpi_type` is either `'revenue'` or `'non_revenue'`.\n\nPandas DataFrame or other data formats\n--------------------------------------\n\nTo load the [simulated other data\nformat](https://github.com/google/meridian/tree/main/meridian/data/simulated_data/xlsx/geo_all_channels.xlsx)\n(such as `excel`) using `DataFrameInputDataBuilder`:\n\n1. Read the data (such as an `excel` spreadsheet) into one or more Pandas `DataFrame`(s).\n\n import pandas as pd\n\n df = pd.read_excel(\n 'https://github.com/google/meridian/raw/main/meridian/data/simulated_data/xlsx/geo_all_channels.xlsx',\n engine='openpyxl',\n )\n\n2. Use a\n [`DataFrameInputDataBuilder`](https://github.com/google/meridian/blob/4624447e0aace5c24d42b58dd1cfd8fe0dc00971/meridian/data/data_frame_input_data_builder.py#L25)\n to map column names to the variable types required in a Meridian input data.\n For the definition of each variable, see\n [Collect and organize your data](/meridian/docs/user-guide/collect-data).\n\n from meridian.data import data_frame_input_data_builder as data_builder\n\n builder = data_builder.DataFrameInputDataBuilder(\n kpi_type='non_revenue',\n default_kpi_column=\"conversions\",\n default_revenue_per_kpi_column=\"revenue_per_conversion\",\n )\n builder = (\n builder\n .with_kpi(df)\n .with_revenue_per_kpi(df)\n .with_population(df)\n .with_controls(df, control_cols=[\"GQV\", \"Competitor_Sales\"])\n )\n channels = [\"Channel0\", \"Channel1\", \"Channel2\", \"Channel3\", \"Channel4\"]\n builder = builder.with_media(\n df,\n media_cols=[f\"{channel}_impression\" for channel in channels],\n media_spend_cols=[f\"{channel}_spend\" for channel in channels],\n media_channels=channels,\n )\n builder = (\n builder\n .with_organic_media(\n df,\n organic_media_cols = [\"Organic_channel0_impression\"],\n organic_media_channels = [\"Organic_channel0\"],\n )\n .with_non_media_treatments(\n df,\n non_media_treatment_cols=['Promo']\n )\n )\n\n data = builder.build()\n\n Where:\n - `kpi_type` is either `'revenue'` or `'non_revenue'`.\n\nNext, you can [create your model](/meridian/docs/user-guide/modeling-overview)."]]