加入新推出的
Discord 社区,展开实时讨论,获得同行支持,并直接与 Meridian 团队互动!
所需数据量
使用集合让一切井井有条
根据您的偏好保存内容并对其进行分类。
本部分可帮助您大致了解所需的数据量,其中关于所需数据量的指南只是粗略的方向性指引,因为真正的答案取决于数据的具体情况。
另请参阅介绍国家级模型和地理位置级模型的部分。
国家级模型的数据量
对于国家级模型来说,您尝试衡量和了解的每种效应的数据点数量是一项重要的置信度检查指标。例如,如果您有 12 个媒体渠道、6 个控制变量和 8 个结,那么效应总数为 26(为简单起见,对于这个例子请忽略 Adstock 和 Hill 形参等内容)。如果您有两年的每周数据,那么您将有 104 个数据点,每种效应有 4 个数据点。这是样本量较小的情况,并且您没有足够的数据(此外,媒体支出变化不足会对国家级模型产生不利影响)。如需详细了解结,请参阅 knots
实参的运作方式。
由于很难为国家级模型获取足够的数据,您可以采取以下措施:
缩小 MMM 的范围。您可以少估计一些媒体渠道(通过舍弃支出较低的渠道或合并渠道),使用较少的结来估计时间效应,并移除任何多余的控制变量。不过,请勿移除重要的混杂变量。
获取更多数据。例如,使用三年而不是两年的每周数据。添加更多数据会减小推理方差,但可能会降低推理的相关性。
或者,您可以考虑为数据添加地理位置粒度,并使用地理位置级模型,而不是缩小范围或添加更多数据。
回想一下前面有关国家级模型的虚构示例。您可以将 12 个媒体渠道合并为 3 个,将结数减少到 2 个。您可能还会发现,您的某个控制变量可以解释 KPI,但无法解释媒体,这说明该控制变量并非真正的混杂变量,您可以将其移除。如果您还使用了三年的每周数据,则会有 156 个数据点可用来估计 10 种效应。也就是说,每种效应大约有 15 个数据点。现在,您或许能够从 MMM 中获得一些方向性的信息。
地理位置级模型的数据量
您尝试衡量和了解的每种效应的数据点数量仍然是一项重要的置信度检查指标。但是,由于地理位置的层次结构,这项指标解读起来没有那么清晰。例如,如果您有 12 个媒体渠道、6 个控制变量、100 个结和 105 个地理位置,那么需要估算的效应数量大约为 $(12 \times 105) + (6 \times 105) + 100 = 1,990$(由于媒体和控制变量具有地理位置级效应,您需要让它们分别乘以地理位置数量 [即 105])。如果您有三年的每周数据,那么您会有 $105 \times (52 \times 3) = 16,380$ 个数据点。也就是说,每种效应大约有 8 个数据点。为简单起见,在这个例子中请忽略 Adstock 和 Hill 形参等内容。
在这个例子中未考虑的一个重要细节是,根据地理位置层次结构的定义,地理位置级媒体效应和地理位置级控制变量效应在各个地理位置中不是独立的。也就是说,在估算媒体渠道 1 在地理位置 1 的效应和媒体渠道 1 在地理位置 2 的效应时,系统会共享数据。控制变量也是如此。由于数据会共享,每种效应实际上有 8 个以上的数据点。共享的数据量取决于各个地理位置之间效应的相似程度。这可以通过 eta_m
和 xi_c
形参确定。
如果难以为地理位置级模型获取足够的数据,建议您考虑合并媒体渠道或舍弃支出较低的媒体渠道。或者,您可以对分层方差项 eta_m
和 xi_c
设置一个正则化程度更强的先验(例如 HalfNormal(0.1)
),这将有利于在不同地理位置之间共享信息。
我可以使用广告系列级数据吗?
Meridian 模型仅专注于渠道级别。通常不建议在广告系列一级运行 MMM,因为 MMM 是一个宏工具,在渠道一级运行效果良好。如果您投放的是具有明确开始和结束时间的独立广告系列,则可能会丢失 Adstock 记忆。如要获得更精细的数据洞见,建议您为数字渠道使用以数据为依据的多接触点归因模型。
如未另行说明,那么本页面中的内容已根据知识共享署名 4.0 许可获得了许可,并且代码示例已根据 Apache 2.0 许可获得了许可。有关详情,请参阅 Google 开发者网站政策。Java 是 Oracle 和/或其关联公司的注册商标。
最后更新时间 (UTC):2025-08-04。
[null,null,["最后更新时间 (UTC):2025-08-04。"],[[["\u003cp\u003eData size is determined by the number of geographic areas and time points, which are not independent and influence model requirements.\u003c/p\u003e\n"],["\u003cp\u003eNational models require substantial data per effect for reliable insights, often necessitating scope reduction or data aggregation.\u003c/p\u003e\n"],["\u003cp\u003eGeo models leverage hierarchical structures, potentially requiring less data per effect due to shared information across geographic areas.\u003c/p\u003e\n"],["\u003cp\u003eWhile campaign-level data is not recommended for Meridian's channel-focused approach, granular insights can be gained through multi-touch attribution for digital channels.\u003c/p\u003e\n"]]],["Data requirements vary, with the size calculated by geos multiplied by time points. National models need sufficient data points per effect (e.g., media channels, controls, knots). If data is insufficient, reduce the model's scope by estimating fewer channels/knots, get more data, or use a geo model. Geo models, while more complex due to geo hierarchy, share data across geos, increasing effective data points per effect. Campaign-level data is discouraged in favor of channel-level analysis.\n"],null,["# Amount of data needed\n\nThis section can help you build a sense of how much data you need. The\nguidance about the amount of data needed is rough and directional because the\ntrue answer depends on what the data is like.\n\n- Data size is the number of geos times the number of time points.\n\n- These time points and geos are not independent. For example, 1,000 data points\n in a marketing mixed modeling (MMM) setting isn't the same as something like\n 1,000 coin flips or 1,000 randomly assigned participants in an experiment.\n\nAlso see the sections for national models and geo models.\n\nAmount of data for national models\n----------------------------------\n\nAn important confidence check metric for national models is the number of\ndata points per effect that you are trying to measure and\nunderstand. For example, if you have 12 media channels, six controls, and\neight knots, the total is 26 effects. (For simplicity, ignore things like\nAdstock and Hill parameters for this example.) If you have two years' worth\nof weekly data, then you have 104 data points and four data points per\neffect. This is a low sample-size scenario and you don't have enough data.\n(Additionally, insufficient variation in the media spend adversely impacts\nnational models.) For more information about knots, see\n[How the `knots` argument works](/meridian/docs/advanced-modeling/setting-knots#how-knots-argument-works).\n\nBecause it is difficult to get enough data for a national model, you can\ndo the following:\n\n- Lower the scope of the MMM. You can estimate fewer media channels (either by\n dropping a channel with low-spend or combining channels), use fewer knots to\n estimate time effects, and remove any extraneous controls. However,\n don't remove important confounders.\n\n- Get much more data. For example, use three years' of weekly data instead\n of two. Adding more data will reduce the variance in inference, but might\n make the inference less relevant.\n\n- Alternatively, consider adding geo granularity to your data and using a\n geo model instead of lowering the scope or adding more data.\n\nConsider the previous hypothetical example for the national model. You can\ncombine the 12 media channels into three, lower your knots to two. You might\nalso recognize that one of your controls explains the KPI but not the media,\nwhich means that it is not a true confounder and you can remove it.\nIf you also use three years' worth of weekly data, you then have 156 data\npoints to estimate 10 effects. This is roughly 15 data points per effect and\nnow you might be able to glean some directional information from the MMM.\n\nAmount of data for geo models\n-----------------------------\n\nThe number of data points per effect that you are trying to measure and\nunderstand is still an important confidence-check metric. However, due to the\ngeo hierarchy, that metric is not as clear to interpret. For example, if you\nhave 12 media channels, six controls, 100 knots, and 105 geos, that is roughly\n$(12 \\\\times 105) + (6 \\\\times 105) + 100 = 1,990$ effects to estimate.\n(You multiply by 105 for the number of geos because media and controls have\ngeo-level effects.) If you have three years' worth of weekly data, then you\nhave $105 \\\\times (52 \\\\times 3) = 16,380$ data points. This is roughly 8 data\npoints per effect. For simplicity, ignore things like Adstock and Hill\nparameters in this example.\n\nAn important detail that was not considered in this example is that by\ndefinition of a geo hierarchy, the geo-level media effects and geo-level\ncontrol effects are not independent across the geos. Effectively, this means\nthat data is shared when estimating the effect of media channel 1 on geo 1 and\nthe effect of media channel 1 on geo 2. This is similar for controls too.\nBecause data is shared, you effectively have more than eight data points per\neffect. How much data is shared depends on how similar the effects are across\ngeos. This can be determined by the `eta_m` and `xi_c`parameters.\n\nWe recommend that if you are having difficulty getting enough data for a\ngeo-level model, then consider combining media channels or dropping a media\nchannel with low spend. Or, you can put a more regularizing prior on\nhierarchical variance terms `eta_m` and `xi_c`, for example, `HalfNormal(0.1)`,\nwhich will encourage sharing information across geos.\n\nCan I use campaign-level data?\n------------------------------\n\nThe Meridian model is focused only at channel-level. We generally\ndon't recommend running at the campaign-level because MMM is a macro tool that\nworks well at the channel-level. If you use distinct campaigns that have hard\nstarts and stops, you risk losing the memory of the Adstock. If you are\ninterested in more granular insights, we recommend data-driven multi-touch\nattribution for your digital channels."]]