加入新推出的
Discord 社区,展开实时讨论,获得同行支持,并直接与 Meridian 团队互动!
地理位置级数据和国家级数据
使用集合让一切井井有条
根据您的偏好保存内容并对其进行分类。
Meridian 提供了相关选项,可用于对地理位置级数据或国家级数据进行模型分析。地理位置级数据是指按互斥的地理区域(例如州、城市、特定媒体市场区域,甚至多个国家/地区)细分的数据。这些区域通常都位于较大的地理区域(例如国家/地区)内。国家级数据是指为单个地理区域(通常是整个国家/地区)提供的数据。国家级数据本质上是单一地理位置数据。
地理位置级数据具有多项优势,因此建议在可能的情况下尽量使用。如果大多数(而非全部)媒体渠道都有地理位置级数据,我们建议在地理位置级插补国家级数据,并运行地理位置级模型。如需详细了解插补,请参阅地理位置级模型中的国家级媒体。如需详细了解国家级模型,请参阅国家级建模。
地理位置级模型的优势
统计建模依赖于数据中的模式。地理位置级数据比国家级数据更常出现可重复的模式。
地理位置级数据还具有以下优势:
地理位置选择
在选择地理位置时,请考虑以下指导原则:
首先按总 KPI 舍弃最小的地理位置。较小的地理位置对投资回报率的贡献较小,但仍会对模型拟合产生很大影响,尤其是在所有组都存在单个残差方差时(ModelSpec
中的 unique_sigma_for_each_geo = False
)。
对于将特定媒体市场区域 (DMA) 作为地理单位的美国广告客户来说,大致准则是将人口规模最大的前 50-100 个 DMA 纳入到模型中。这通常能包括绝大多数 KPI 单位,同时排除了可能影响模型拟合和收敛的大多数噪声较大的小型 DMA。
当每个地理位置都有自己的残差方差时(ModelSpec
中的 unique_sigma_for_each_geo =
True
),噪声较大的地理位置对模型拟合的影响较小。不过,此选项为模型增加了很多灵活性,因此可能会导致模型针对某些数据集难以收敛。如果在此选项下 MCMC 抽样确实收敛了,您不妨绘制地理位置人口规模与平均残差标准差(sigma
形参)的对比图,在大多数情况下,您应该会看到一个相当单调的模式。如果您没有看到这种模式,那么最好设置 unique_sigma_for_each_geo = False
并使用一些范围较小的地理位置。
如果您想确保模型能够反映您的所有 KPI 单位数,可以将较小的地理位置汇总到较大的区域。不过,采用此选项时需要注意以下几点:
地理位置级建模具有显著优势,但如果地理位置相对较少,这种优势就会减弱。最好以更精细的地理位置粒度拟合模型,并排除最小的地理位置,而不是将地理位置汇总到更宽泛的级别。
不同的地理位置汇总分组方法可能会导致不同的 MMM 结果。
展示次数或费用等媒体执行变量通常可以跨地理位置汇总。不过,温度等控制变量可能不太容易汇总。
地理位置级模型中的国家级媒体
如果大多数媒体在地理位置级可用,但有一两个媒体仅在国家级可用,我们建议在地理位置级插补国家级媒体数据,并运行地理位置级模型。一种简单的插补方法是,利用地理位置内人口占总人口的比例,根据国家级媒体变量值粗略估计地理位置级媒体变量值。虽然最好能有准确的地理位置级数据,这样就不必进行插补,但插补仍然能提供有关模型形参的有用信息。如需了解详情,请参阅“Geo-level Bayesian Hierarchical Media Mix Modeling”(地理位置级贝叶斯分层媒体组合建模分析)的第 4.4 节。
如未另行说明,那么本页面中的内容已根据知识共享署名 4.0 许可获得了许可,并且代码示例已根据 Apache 2.0 许可获得了许可。有关详情,请参阅 Google 开发者网站政策。Java 是 Oracle 和/或其关联公司的注册商标。
最后更新时间 (UTC):2025-08-28。
[null,null,["最后更新时间 (UTC):2025-08-28。"],[[["\u003cp\u003ePrioritize modeling larger geos based on total KPI, as smaller geos have less ROI contribution and can negatively impact model fit.\u003c/p\u003e\n"],["\u003cp\u003eFor US advertisers, modeling the top 50-100 DMAs by population generally captures significant KPI units while excluding noisy smaller DMAs.\u003c/p\u003e\n"],["\u003cp\u003eWhen using national-level media in a geo-level model, impute national data to the geo-level, preferably using accurate geo-level data if available.\u003c/p\u003e\n"],["\u003cp\u003eAggregating smaller geos into larger regions can be an option to represent 100% of KPI units, but consider the potential impact on model results and interpretation.\u003c/p\u003e\n"],["\u003cp\u003eAvoid redundant national-level variables when \u003ccode\u003eknots = n_times\u003c/code\u003e by either adjusting \u003ccode\u003eknots\u003c/code\u003e or carefully selecting variables based on interpretation goals.\u003c/p\u003e\n"]]],["When selecting geos, prioritize dropping smaller geos with lower KPI contribution first, especially if using a single residual variance. For US DMAs, model the top 50-100 by population. If each geo has unique residual variance, smaller geos matter less, but convergence may be difficult. National-level media can be imputed to the geo level using population proportions. Avoid national-level controls when each time period has its own parameter, as they create redundancy.\n"],null,["# Geo-level and national-level data\n\nMeridian offers the option to model geo-level or national level-data. Geo-level\ndata is when data is broken down into mutually exclusive geographic regions such\nas states, cities, DMAs, or even multiple countries. These regions are typically\nall within a larger geographic region, such as a country. National-level data is\nwhen data is provided for a single geographic region, typically an entire\ncountry. National-level data is essentially single-geo data.\n\nGeo-level data offers several advantages and is thus recommended when\npossible. If geo-level data is available for most, but not all, media channels,\nthen we recommend imputing the national-level data at the geo-level, and running\na geo-model. For more information on imputation,\nsee [National-level media in a geo-level model](#national-imputation). For more\ninformation on the national model,\nsee [National-level modeling](/meridian/docs/basics/national-models).\n\nGeo-level model advantages\n--------------------------\n\nStatistical modeling relies on patterns in the data. Repeatable patterns\nare more common in geo-level data than national-level data.\n\nHere are some other advantages of geo-level data:\n\n- Increases the effective sample size, by pooling data across geos in geo-modeling.\n- Provides tighter credible intervals, provided the geos are similar in terms of the media impact mechanism as the model assumes. For more information, see [Geo-level Bayesian Hierarchical Media Mix\n Modeling](//research.google/pubs/geo-level-bayesian-hierarchical-media-mix-modeling/).\n- Improves estimates for time-effects (such as trend and seasonality), since there are multiple observations per time period.\n- Can support the use of more `knots` to model the \\\\(\\\\mu_t\\\\) parameter. National-level data has fewer degrees of freedom for time-effects. For example, one knot per time period completely saturates the national-level model.\n- Shows greater variability in marketing spend, which is crucial for estimating non-linear effects, like saturation (Hill function parameters).\n- Reduces omitted variable bias due to missing confounders, by reducing the correlation between media spend and confounders. See section 4.3 of [Geo-level Bayesian Hierarchical Media Mix\n Modeling](//research.google/pubs/geo-level-bayesian-hierarchical-media-mix-modeling/) for more information.\n\nGeo selection\n-------------\n\nWhen you are selecting geos, consider the following guidance:\n\n- Drop the smallest geos by total KPI first. Smaller geos have less\n contribution to ROI, yet they can still have a high influence on model fit,\n particularly when there is a single residual variance for all groups\n (`unique_sigma_for_each_geo = False` of `ModelSpec`).\n\n- For US advertisers using designated market area (DMA) as the geographical\n unit, a rough guideline is to model the top 50-100 DMAs by population size.\n This generally includes the vast majority of the KPI units, while excluding\n most of the noisier small DMAs that might impact model fit and convergence.\n\n- When each geo has its own residual variance (`unique_sigma_for_each_geo =\n True` of `ModelSpec`), noisier geos have less impact on model fit. However,\n this option can make convergence difficult for some datasets because it adds\n so much flexibility to the model. If MCMC sampling does converge under this\n option, it might be worth plotting the geo population size versus the mean\n residual standard deviation (`sigma` parameter) - in most cases, you would\n expect to see a fairly monotone pattern. If you don't see this pattern, then\n it might be better to set `unique_sigma_for_each_geo = False` and use a\n smaller subset of geos.\n\nIf you want to make sure the model represents 100% of your KPI units, you\ncan aggregate smaller geos into larger regions. However, this option comes\nwith several caveats:\n\n- Geo-level modeling provides a significant advantage, although this benefit is\n reduced when there are relatively few geos. It may be better to fit a model at a\n finer geo granularity and exclude the smallest geos, rather than aggregating\n geos to a coarser level.\n\n- Different geo aggregation grouping methods can lead to different MMM results.\n\n- Media execution variables, such as impressions or cost, can usually be\n summed across geos. However, some control variables, such as\n temperature, can be less straightforward to aggregate.\n\nNational-level media in a geo-level model\n-----------------------------------------\n\nWhen most media are available at the geo-level, but one or two are only\navailable at the national level, we recommend imputing the national-level\nmedia at a geo-level and running a geo-model. One naive imputation method is\nto approximate the geo-level media variable from its national level value,\nusing the proportion of the population in the geo relative to the total\npopulation. Although it is preferable to have accurate geo-level data so that\nimputation isn't necessary, imputation can still yield useful information\nabout the model parameters. For more information, see section 4.4 of\n[Geo-level Bayesian Hierarchical Media Mix Modeling](//research.google/pubs/geo-level-bayesian-hierarchical-media-mix-modeling/)."]]