加入新推出的
Discord 社区,展开实时讨论,获得同行支持,并直接与 Meridian 团队互动!
留出观测结果(训练组和测试组划分)
使用集合让一切井井有条
根据您的偏好保存内容并对其进行分类。
Meridian 模型规范包含一个 holdout_id
实参(维度为 \(G \times T\)的布尔值数组),可用于指定留出样本。在模型训练(例如 MCMC 后验抽样)期间,留出观测结果的 KPI 数据会被忽略,它们不会影响模型的似然或后验密度。留出观测结果的媒体数据仍将用于模型训练,因为它们会影响后续时间段的 Adstock 媒体值。
留出样本的主要用途是计算样本外的拟合优度指标,例如 R 平方。这对于比较不同的模型规格(如先验分布强度)非常有用,前提是所比较的每个模型都使用相同的留出样本。尽管样本外模型拟合度最好的模型不一定是因果推理的最佳模型,但一般来说,拟合度较高的模型更受青睐。导致模型拟合不佳的模型误设也可能会导致因果推理出现偏差。
我们建议使用在各个地理位置和时间段均相当平衡的留出样本。换句话说,使用这样一个留出样本:对于每个地理位置和每个时间段,都具有大致相同的留出观测结果数量。如果留出样本失衡,就会导致训练观测结果过少,无法估计某些地理位置的地理位置效应\(\tau_g\) ,或某些时间段的时间效应 \(\mu_t\) 。默认情况下,Meridian 不会指定留出样本。您必须指定留出样本,并确保该样本具有合理的平衡度。
避免留出时间上连续的大块数据(例如在 MMM 时间窗口结束时)来评估 KPI 的预测误差。Meridian 并非旨在预测 KPI,尤其是在 KPI 带有很强的趋势性和季节性的情况下。相反,Meridian 会估计因果媒体影响,并使用基于结的方法将趋势和季节性变化纳入模型。基于结的方法需要结附近的数据才能有效估计结。如果留出时间上连续的大块数据,则在留出的时间段内,结附近将没有数据。在这种情况下,结的后验分布会受到先验的影响,从而导致预测结果不佳。
此外,Meridian 还可用于估计历史媒体和未来媒体的影响力,因为它假定决定媒体影响力的模型形参在不同时期是一致的。
如未另行说明,那么本页面中的内容已根据知识共享署名 4.0 许可获得了许可,并且代码示例已根据 Apache 2.0 许可获得了许可。有关详情,请参阅 Google 开发者网站政策。Java 是 Oracle 和/或其关联公司的注册商标。
最后更新时间 (UTC):2025-08-04。
[null,null,["最后更新时间 (UTC):2025-08-04。"],[[["\u003cp\u003eThe Meridian model's \u003ccode\u003eholdout_id\u003c/code\u003e argument allows you to specify a holdout sample for calculating out-of-sample goodness of fit metrics, like R-squared, to compare different model specifications.\u003c/p\u003e\n"],["\u003cp\u003eWhile media data for holdout observations is used in training, their KPI data is excluded, ensuring the holdout sample doesn't influence model parameters.\u003c/p\u003e\n"],["\u003cp\u003eFor optimal performance, use a balanced holdout sample across geos and time periods to ensure sufficient data for accurate model estimation.\u003c/p\u003e\n"],["\u003cp\u003eAvoid holding out large, continuous time chunks as Meridian focuses on causal impact estimation, not KPI forecasting, and requires data near knots for effective trend and seasonality modeling.\u003c/p\u003e\n"],["\u003cp\u003eMeridian can be used to estimate the impact of both past and future media due to its assumption of consistent media impact parameters over time.\u003c/p\u003e\n"]]],["The `holdout_id` in the Meridian model defines a holdout sample, where KPI data is excluded from model training but media data is still used. This holdout is used to calculate out-of-sample fit metrics. It is recommended the holdout sample be balanced across geos and time periods. Avoid holding out large contiguous time blocks as this can negatively impact the estimation of knots, affecting model forecasting ability. Meridian can estimate the impact of both past and future media.\n"],null,["# Holdout observations (train and test split)\n\nThe Meridian model specification contains a `holdout_id` argument (a\nboolean array of dimensions \\\\(G \\\\times T\\\\)) that can be used to specify a\nholdout sample. The KPI data of the holdout observations is ignored during model\ntraining (for example, MCMC posterior sampling), and does not affect the model\nlikelihood or posterior density. Media data for the holdout observations is\nstill used for model training, because it affects the adstocked media values for\nsubsequent time periods.\n\nThe primary use of the holdout sample is for calculating out-of-sample goodness\nof fit metrics, such as R-squared. This is useful for comparing different model\nspecifications, such as prior distribution strengths, provided that each model\nbeing compared uses the same holdout sample. There is no guarantee that the\nmodel with the best out-of-sample model fit is the best model for causal\ninference, but generally a better fitting model is preferred. Model\nmisspecifications that lead to poor model fit can also cause bias in causal\ninference.\n\nWe recommend using a holdout sample that is fairly balanced across geos and time\nperiods. In other words, use a holdout sample that has approximately the same\nnumber of holdout observations for each geo and approximately the same number of\nholdout observations for each time period. If the holdout sample is imbalanced,\nthis can result in too few training observations to estimate the geo effect\n\\\\(\\\\tau_g\\\\) for certain geos, or the time effect \\\\(\\\\mu_t\\\\) for certain time\nperiods. By default, Meridian does not specify a holdout sample. You\nmust specify the holdout sample and ensure that it has a reasonable degree of\nbalance.\n\nAvoid holding out large, contiguous-in-time, chunks of data, such as at the end\nof the MMM time window, to assess forecast error in the KPI. Meridian\nisn't designed for forecasting the KPI, especially if there is strong trend and\nseasonality in the KPI. Instead, Meridian estimates the causal media\nimpact and uses the knot-based approach to modeling trend and seasonality. The\nknot-based approach needs data near the knot to estimate the knot effectively.\nIf large, contiguous-in-time, chunks of data are held out, there is no data near\nthe knots within the held out period. In this case, the knot's posterior\ndistribution is driven by the prior, which can result in poor forecasting.\n\nAdditionally, Meridian can be used to estimate the impact of both\nhistorical and future media because it assumes that model parameters which\ndetermine media impact are consistent over time."]]