新しく開設された
Discord コミュニティに参加して、リアルタイムのディスカッション、ユーザー同士のサポート、メリディアン チームとの直接の交流をお楽しみください。
リーチとフリークエンシーのデータがある地域レベルのデータを読み込む
コレクションでコンテンツを整理
必要に応じて、コンテンツの保存と分類を行います。
それぞれのデータ型と形式の例として、シミュレートされたデータを次のセクションに示します。
CSV
シミュレートされた CSV データを CsvDataLoader
を使用して読み込む手順は次のとおりです。
列名を変数型にマッピングします。必要な変数型は time
、geo
、controls
、population
、kpi
、revenue_per_kpi
です。リーチとフリークエンシーのデータがないメディア チャネルの場合は、メディア露出とメディア費用をそれぞれ media
と media_spend
の各カテゴリに割り当てる必要があります。逆に、リーチとフリークエンシーのデータがあるメディア チャネルの場合は、リーチ、フリークエンシー、メディア費用をそれぞれ reach
、frequency
、rf_spend
のカテゴリにマッピングする必要があります。各変数の定義については、データの収集と整理をご覧ください。
coord_to_columns = load.CoordToColumns(
time='time',
geo='geo',
controls=['GQV', 'Discount', 'Competitor_Sales'],
population='population',
kpi='conversions',
revenue_per_kpi='revenue_per_conversion',
media=[
'Channel0_impression',
'Channel1_impression',
'Channel2_impression',
'Channel3_impression',
],
media_spend=[
'Channel0_spend',
'Channel1_spend',
'Channel2_spend',
'Channel3_spend',
],
reach =['Channel4_reach', 'Channel5_reach'],
frequency=['Channel4_frequency', 'Channel5_frequency'],
rf_spend=['Channel4_spend', 'Channel5_spend'],
)
メディア露出、リーチ、フリークエンシー、メディア費用を、2 ページに出力して表示する指定したチャネル名にマッピングします。次の例では、Channel0_impression
と Channel0_spend
が、同じチャネル Channel0
に結び付けられています。また、Channel4_reach
、Channel4_frequency
、Channel4_spend
は同じチャネル Channel4
に結び付けられています。
correct_media_to_channel = {
'Channel0_impression': 'Channel0',
'Channel1_impression': 'Channel1',
'Channel2_impression': 'Channel2',
'Channel3_impression': 'Channel3',
}
correct_media_spend_to_channel = {
'Channel0_spend': 'Channel0',
'Channel1_spend': 'Channel1',
'Channel2_spend': 'Channel2',
'Channel3_spend': 'Channel3',
}
correct_reach_to_channel = {
'Channel4_reach': 'Channel4',
'Channel5_reach': 'Channel5',
}
correct_frequency_to_channel = {
'Channel4_frequency': 'Channel4',
'Channel5_frequency': 'Channel5',
}
correct_rf_spend_to_channel = {
'Channel4_spend': 'Channel4',
'Channel5_spend': 'Channel5',
}
CsvDataLoader
を使用してデータを読み込みます。
loader = load.CsvDataLoader(
csv_path=f'/{PATH}/{FILENAME}.csv',
kpi_type='non_revenue',
coord_to_columns=coord_to_columns,
media_to_channel=correct_media_to_channel,
media_spend_to_channel=correct_media_spend_to_channel,
reach_to_channel=correct_reach_to_channel,
frequency_to_channel=correct_frequency_to_channel,
rf_spend_to_channel=correct_rf_spend_to_channel,
)
data = loader.load()
ここで
kpi_type
は、'revenue'
か 'non_revenue'
のいずれかです。
PATH
は、データファイルの場所へのパスです。
FILENAME
はデータファイルの名前です。
Xarray データセット
pickle 化されシミュレートされた Xarray データセットを XrDatasetDataLoader
を使って読み込む手順は次のとおりです。
次のように pickle
を使ってデータを読み込みます。
import pickle
with open(f'/{PATH}/{FILENAME}.pkl', 'r') as fh:
dataset=pickle.load(fh)
ここで
PATH
は、データファイルの場所へのパスです。
FILENAME
はデータファイルの名前です。
データセットを XrDatasetDataLoader
に渡します。name_mapping
引数を使用して、座標と配列をマッピングします。入力データセットでの名前が必須の名前と異なる場合は、それらの名前をマッピングします。必要な座標名は geo
、time
、control_variable
、media_channel
、rf_channel
です。ここで、rf_channel
はリーチとフリークエンシーのデータがあるチャネルを表します。必要なデータ変数名は、kpi
、revenue_per_kpi
、controls
、population
、media
、media_spend
、reach
、frequency
、rf_spend
です。
loader = load.XrDatasetDataLoader(
dataset,
kpi_type='non_revenue',
name_mapping={
'channel': 'media_channel',
'control': 'control_variable',
'conversions': 'kpi',
'revenue_per_conversion': 'revenue_per_kpi',
'control_value': 'controls',
'spend': 'media_spend',
'reach': 'reach',
'frequency': 'frequency',
'rf_spend': 'rf_spend',
},
)
data = loader.load()
ここで
kpi_type
は、'revenue'
か 'non_revenue'
のいずれかです。
NumPy ndarray
numpy ndarray を直接読み込むには、NDArrayInputDataBuilder
を使用します。
データを個別の numpy ndarray に作成します。
import numpy as np
kpi_nd = np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]])
controls_nd = np.array([
[[1, 5], [2, 6], [3, 4]],
[[7, 8], [9, 10], [11, 12]],
[[13, 14], [15, 16], [17, 18]],
])
population_nd = np.array([1, 2, 3])
revenue_per_kpi_nd = np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]])
reach_nd = np.array([
[[1, 5], [2, 6], [3, 4]],
[[7, 8], [9, 10], [11, 12]],
[[13, 14], [15, 16], [17, 18]],
])
frequency_nd = np.array([
[[1, 5], [2, 6], [3, 4]],
[[7, 8], [9, 10], [11, 12]],
[[13, 14], [15, 16], [17, 18]],
])
rf_spend_nd = np.array([
[[1, 5], [2, 6], [3, 4]],
[[7, 8], [9, 10], [11, 12]],
[[13, 14], [15, 16], [17, 18]],
])
NDArrayInputDataBuilder
を使って、時間と地域を設定し、メリディアンの入力データで必要に応じてチャンネル名またはディメンション名を指定します。各変数の定義については、データの収集と整理をご覧ください。
from meridian.data import nd_array_input_data_builder as data_builder
builder = (
data_builder.NDArrayInputDataBuilder(kpi_type='non_revenue')
)
builder.time_coords = ['2024-01-02', '2024-01-03', '2024-01-01']
builder.media_time_coords = ['2024-01-02', '2024-01-03', '2024-01-01']
builder.geos = ['B', 'A', 'C']
builder = (
builder
.with_kpi(kpi_nd)
.with_revenue_per_kpi(revenue_per_kpi_nd)
.with_population(population_nd)
.with_controls(
controls_nd,
control_names=["control0", "control1"])
.with_reach(
r_nd=reach_nd,
f_nd=frequency_nd,
rfs_nd=rf_spend_nd,
rf_channels=["channel0", "channel1"]
)
)
data = builder.build()
詳細は次のとおりです。
kpi_type
は、'revenue'
か 'non_revenue'
のいずれかです。
シミュレートされた他のデータ形式(excel
など)を DataFrameInputDataBuilder
を使って読み込む手順は次のとおりです。
データ(excel
スプレッドシートなど)を 1 つ以上の Pandas DataFrame
に読み込みます。
import pandas as pd
df = pd.read_excel(
'https://github.com/google/meridian/raw/main/meridian/data/simulated_data/xlsx/geo_media_rf.xlsx',
engine='openpyxl',
)
DataFrameInputDataBuilder
を使って、列名をメリディアンの入力データで必要な変数型にマッピングします。各変数の定義については、データの収集と整理をご覧ください。
from meridian.data import data_frame_input_data_builder as data_builder
builder = data_builder.DataFrameInputDataBuilder(
kpi_type='non_revenue',
default_kpi_column="conversions",
default_revenue_per_kpi_column="revenue_per_conversion",
)
builder = (
builder
.with_kpi(df)
.with_revenue_per_kpi(df)
.with_population(df)
.with_controls(df, control_cols=["GQV", "Discount", "Competitor_Sales"])
.with_reach(
df,
reach_cols = ['Channel4_reach', 'Channel5_reach'],
frequency_cols = ['Channel4_frequency', 'Channel5_frequency'],
rf_spend_cols = ['Channel4_spend', 'Channel5_spend'],
rf_channels = ['Channel4', 'Channel5'],
)
)
data = builder.build()
詳細は次のとおりです。
kpi_type
は、'revenue'
か 'non_revenue'
のいずれかです。
次にモデルを作成します。
特に記載のない限り、このページのコンテンツはクリエイティブ・コモンズの表示 4.0 ライセンスにより使用許諾されます。コードサンプルは Apache 2.0 ライセンスにより使用許諾されます。詳しくは、Google Developers サイトのポリシーをご覧ください。Java は Oracle および関連会社の登録商標です。
最終更新日 2025-08-04 UTC。
[null,null,["最終更新日 2025-08-04 UTC。"],[[["\u003cp\u003eSimulated data examples are provided for CSV, Xarray Dataset, and other data formats like Excel, with each format having its own loading method.\u003c/p\u003e\n"],["\u003cp\u003eLoading CSV data requires mapping column names to variable types such as \u003ccode\u003etime\u003c/code\u003e, \u003ccode\u003egeo\u003c/code\u003e, \u003ccode\u003econtrols\u003c/code\u003e, \u003ccode\u003epopulation\u003c/code\u003e, \u003ccode\u003ekpi\u003c/code\u003e, and \u003ccode\u003erevenue_per_kpi\u003c/code\u003e, and also differentiating between media channels with and without reach and frequency data.\u003c/p\u003e\n"],["\u003cp\u003eLoading Xarray Dataset involves using \u003ccode\u003epickle\u003c/code\u003e to read the data, then mapping coordinate and array names within the dataset to required names like \u003ccode\u003egeo\u003c/code\u003e, \u003ccode\u003etime\u003c/code\u003e, \u003ccode\u003econtrol_variable\u003c/code\u003e, \u003ccode\u003emedia_channel\u003c/code\u003e, and \u003ccode\u003erf_channel\u003c/code\u003e.\u003c/p\u003e\n"],["\u003cp\u003eLoading data from other formats, like Excel, requires mapping column names to variable types, just like CSV, and then loading the data into a \u003ccode\u003eDataFrameDataLoader\u003c/code\u003e after reading it into a dataframe.\u003c/p\u003e\n"],["\u003cp\u003eIn all three data formats, media exposure, reach, frequency, and media spend must be mapped to their corresponding channel names for output display.\u003c/p\u003e\n"]]],["The document details how to load simulated data in CSV, Xarray Dataset, and other formats using specific data loaders. For CSV and other formats, users must map column names to variable types like `time`, `geo`, and `kpi`, differentiating between media channels with and without reach/frequency data. They also need to map media, spend, reach, and frequency to channel names. For Xarray Dataset, users load data via pickle and map dataset coordinates and variables using `name_mapping`. Then load the data.\n"],null,["# Load geo-level data with reach and frequency\n\nSimulated data is provided as an example for each data type and format in the\nfollowing sections.\n\nCSV\n---\n\nTo load the\n[simulated CSV data](https://github.com/google/meridian/tree/main/meridian/data/simulated_data/csv/geo_media_rf.csv)\nusing `CsvDataLoader`:\n\n1. Map the column names to the variable types. The required variable types are\n `time`, `geo`, `controls`, `population`, `kpi`, and `revenue_per_kpi`. For\n media channels that don't have reach and frequency data, you must assign\n their media exposure and media spend to the categories of `media` and\n `media_spend`, respectively. Conversely, for media channels that do possess\n reach and frequency data, you must map their reach, frequency, and media\n spend to the categories of `reach`, `frequency`, and `rf_spend`\n correspondingly. For the definition of each variable, see [Collect and\n organize your data](/meridian/docs/user-guide/collect-data).\n\n coord_to_columns = load.CoordToColumns(\n time='time',\n geo='geo',\n controls=['GQV', 'Discount', 'Competitor_Sales'],\n population='population',\n kpi='conversions',\n revenue_per_kpi='revenue_per_conversion',\n media=[\n 'Channel0_impression',\n 'Channel1_impression',\n 'Channel2_impression',\n 'Channel3_impression',\n ],\n media_spend=[\n 'Channel0_spend',\n 'Channel1_spend',\n 'Channel2_spend',\n 'Channel3_spend',\n ],\n reach =['Channel4_reach', 'Channel5_reach'],\n frequency=['Channel4_frequency', 'Channel5_frequency'],\n rf_spend=['Channel4_spend', 'Channel5_spend'],\n )\n\n2. Map the media exposure, reach, frequency, and the media spends to the\n designated channel names that you want to display in the two-page output. In\n the following example, `Channel0_impression` and `Channel0_spend` are\n connected to the same channel, `Channel0`. Additionally, `Channel4_reach`,\n `Channel4_frequency`, and `Channel4_spend` are connected to the same\n channel, `Channel4`.\n\n correct_media_to_channel = {\n 'Channel0_impression': 'Channel0',\n 'Channel1_impression': 'Channel1',\n 'Channel2_impression': 'Channel2',\n 'Channel3_impression': 'Channel3',\n }\n correct_media_spend_to_channel = {\n 'Channel0_spend': 'Channel0',\n 'Channel1_spend': 'Channel1',\n 'Channel2_spend': 'Channel2',\n 'Channel3_spend': 'Channel3',\n }\n\n correct_reach_to_channel = {\n 'Channel4_reach': 'Channel4',\n 'Channel5_reach': 'Channel5',\n }\n correct_frequency_to_channel = {\n 'Channel4_frequency': 'Channel4',\n 'Channel5_frequency': 'Channel5',\n }\n correct_rf_spend_to_channel = {\n 'Channel4_spend': 'Channel4',\n 'Channel5_spend': 'Channel5',\n }\n\n3. Load the data using `CsvDataLoader`:\n\n loader = load.CsvDataLoader(\n csv_path=f'/{PATH}/{FILENAME}.csv',\n kpi_type='non_revenue',\n coord_to_columns=coord_to_columns,\n media_to_channel=correct_media_to_channel,\n media_spend_to_channel=correct_media_spend_to_channel,\n reach_to_channel=correct_reach_to_channel,\n frequency_to_channel=correct_frequency_to_channel,\n rf_spend_to_channel=correct_rf_spend_to_channel,\n )\n data = loader.load()\n\n Where:\n - `kpi_type` is either `'revenue'` or `'non_revenue'`.\n - `PATH` is the path to the data file location.\n - `FILENAME` is the name of your data file.\n\nXarray Dataset\n--------------\n\nTo load the pickled\n[simulated Xarray Dataset](https://github.com/google/meridian/tree/main/meridian/data/simulated_data/pkl/geo_media_rf.pkl)\nusing `XrDatasetDataLoader`:\n\n1. Load the data using `pickle`:\n\n import pickle\n with open(f'/{PATH}/{FILENAME}.pkl', 'r') as fh:\n dataset=pickle.load(fh)\n\n Where:\n - `PATH` is the path to the data file location.\n - `FILENAME` is the name of your data file.\n2. Pass the dataset to `XrDatasetDataLoader`. Use the `name_mapping` argument\n to map the coordinates and arrays. Provide mapping if the names in the input\n dataset are different from the required names. The required coordinate\n names are `geo`, `time`, `control_variable`, `media_channel`, and\n `rf_channel`, where `rf_channel` designates the channels having reach and\n frequency data. The required data variables names are `kpi`,\n `revenue_per_kpi`, `controls`, `population`, `media`, `media_spend`,\n `reach`, `frequency`, and `rf_spend`.\n\n loader = load.XrDatasetDataLoader(\n dataset,\n kpi_type='non_revenue',\n name_mapping={\n 'channel': 'media_channel',\n 'control': 'control_variable',\n 'conversions': 'kpi',\n 'revenue_per_conversion': 'revenue_per_kpi',\n 'control_value': 'controls',\n 'spend': 'media_spend',\n 'reach': 'reach',\n 'frequency': 'frequency',\n 'rf_spend': 'rf_spend',\n },\n )\n\n data = loader.load()\n\n Where:\n - `kpi_type` is either `'revenue'` or `'non_revenue'`.\n\nNumpy ndarray\n-------------\n\nTo load numpy ndarrays directly, use `NDArrayInputDataBuilder`:\n\n1. Create the data into separate numpy ndarrays.\n\n import numpy as np\n\n kpi_nd = np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]])\n controls_nd = np.array([\n [[1, 5], [2, 6], [3, 4]],\n [[7, 8], [9, 10], [11, 12]],\n [[13, 14], [15, 16], [17, 18]],\n ])\n population_nd = np.array([1, 2, 3])\n revenue_per_kpi_nd = np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]])\n reach_nd = np.array([\n [[1, 5], [2, 6], [3, 4]],\n [[7, 8], [9, 10], [11, 12]],\n [[13, 14], [15, 16], [17, 18]],\n ])\n frequency_nd = np.array([\n [[1, 5], [2, 6], [3, 4]],\n [[7, 8], [9, 10], [11, 12]],\n [[13, 14], [15, 16], [17, 18]],\n ])\n rf_spend_nd = np.array([\n [[1, 5], [2, 6], [3, 4]],\n [[7, 8], [9, 10], [11, 12]],\n [[13, 14], [15, 16], [17, 18]],\n ])\n\n2. Use a\n [`NDArrayInputDataBuilder`](https://github.com/google/meridian/blob/4624447e0aace5c24d42b58dd1cfd8fe0dc00971/meridian/data/nd_array_input_data_builder.py#L25)\n to set time and geos, as well as give channel or dimension\n names as required in a Meridian input data.\n For the definition of each variable, see\n [Collect and organize your data](/meridian/docs/user-guide/collect-data).\n\n from meridian.data import nd_array_input_data_builder as data_builder\n\n builder = (\n data_builder.NDArrayInputDataBuilder(kpi_type='non_revenue')\n )\n builder.time_coords = ['2024-01-02', '2024-01-03', '2024-01-01']\n builder.media_time_coords = ['2024-01-02', '2024-01-03', '2024-01-01']\n builder.geos = ['B', 'A', 'C']\n builder = (\n builder\n .with_kpi(kpi_nd)\n .with_revenue_per_kpi(revenue_per_kpi_nd)\n .with_population(population_nd)\n .with_controls(\n controls_nd,\n control_names=[\"control0\", \"control1\"])\n .with_reach(\n r_nd=reach_nd,\n f_nd=frequency_nd,\n rfs_nd=rf_spend_nd,\n rf_channels=[\"channel0\", \"channel1\"]\n )\n )\n\n data = builder.build()\n\n Where:\n - `kpi_type` is either `'revenue'` or `'non_revenue'`.\n\nPandas DataFrame or other data formats\n--------------------------------------\n\nTo load the [simulated other data\nformat](https://github.com/google/meridian/tree/main/meridian/data/simulated_data/xlsx/geo_media_rf.xlsx)\n(such as `excel`) using `DataFrameInputDataBuilder`:\n\n1. Read the data (such as an `excel` spreadsheet) into one or more Pandas `DataFrame`(s).\n\n import pandas as pd\n\n df = pd.read_excel(\n 'https://github.com/google/meridian/raw/main/meridian/data/simulated_data/xlsx/geo_media_rf.xlsx',\n engine='openpyxl',\n )\n\n2. Use a\n [`DataFrameInputDataBuilder`](https://github.com/google/meridian/blob/4624447e0aace5c24d42b58dd1cfd8fe0dc00971/meridian/data/data_frame_input_data_builder.py#L25)\n to map column names to the variable types required in a Meridian input data.\n For the definition of each variable, see\n [Collect and organize your data](/meridian/docs/user-guide/collect-data).\n\n from meridian.data import data_frame_input_data_builder as data_builder\n\n builder = data_builder.DataFrameInputDataBuilder(\n kpi_type='non_revenue',\n default_kpi_column=\"conversions\",\n default_revenue_per_kpi_column=\"revenue_per_conversion\",\n )\n builder = (\n builder\n .with_kpi(df)\n .with_revenue_per_kpi(df)\n .with_population(df)\n .with_controls(df, control_cols=[\"GQV\", \"Discount\", \"Competitor_Sales\"])\n .with_reach(\n df,\n reach_cols = ['Channel4_reach', 'Channel5_reach'],\n frequency_cols = ['Channel4_frequency', 'Channel5_frequency'],\n rf_spend_cols = ['Channel4_spend', 'Channel5_spend'],\n rf_channels = ['Channel4', 'Channel5'],\n )\n )\n\n data = builder.build()\n\n Where:\n - `kpi_type` is either `'revenue'` or `'non_revenue'`.\n\nNext, you can [create your model](/meridian/docs/user-guide/modeling-overview)."]]