Geo-level and national-level data

Meridian offers the option to model geo-level or national level-data. Geo-level data is when data is broken down into mutually exclusive geographic regions such as states, cities, DMAs, or even multiple countries. These regions are typically all within a larger geographic region, such as a country. National-level data is when data is provided for a single geographic region, typically an entire country. National-level data is essentially single-geo data.

Geo-level data offers several advantages and is thus recommended when possible. If geo-level data is available for most, but not all, media channels, then we recommend imputing the national-level data at the geo-level, and running a geo-model. For more information on imputation, see National-level media in a geo-level model. For more information on the national model, see National-level modeling.

Geo-level model advantages

Statistical modeling relies on patterns in the data. Repeatable patterns are more common in geo-level data than national-level data.

Here are some other advantages of geo-level data:

  • Increases the effective sample size, by pooling data across geos in geo-modeling.
  • Provides tighter credible intervals, provided the geos are similar in terms of the media impact mechanism as the model assumes. For more information, see Geo-level Bayesian Hierarchical Media Mix Modeling.
  • Improves estimates for time-effects (such as trend and seasonality), since there are multiple observations per time period.
  • Can support the use of more knots to model the \(\mu_t\) parameter. National-level data has fewer degrees of freedom for time-effects. For example, one knot per time period completely saturates the national-level model.
  • Shows greater variability in marketing spend, which is crucial for estimating non-linear effects, like saturation (Hill function parameters).
  • Reduces omitted variable bias due to missing confounders, by reducing the correlation between media spend and confounders. See section 4.3 of Geo-level Bayesian Hierarchical Media Mix Modeling for more information.

Geo selection

When you are selecting geos, consider the following guidance:

  • Drop the smallest geos by total KPI first. Smaller geos have less contribution to ROI, yet they can still have a high influence on model fit, particularly when there is a single residual variance for all groups (unique_sigma_for_each_geo = False of ModelSpec).

  • For US advertisers using designated market area (DMA) as the geographical unit, a rough guideline is to model the top 50-100 DMAs by population size. This generally includes the vast majority of the KPI units, while excluding most of the noisier small DMAs that might impact model fit and convergence.

  • When each geo has its own residual variance (unique_sigma_for_each_geo = True of ModelSpec), noisier geos have less impact on model fit. However, this option can make convergence difficult for some datasets because it adds so much flexibility to the model. If MCMC sampling does converge under this option, it might be worth plotting the geo population size versus the mean residual standard deviation (sigma parameter) - in most cases, you would expect to see a fairly monotone pattern. If you don't see this pattern, then it might be better to set unique_sigma_for_each_geo = False and use a smaller subset of geos.

If you want to make sure the model represents 100% of your KPI units, you can aggregate smaller geos into larger regions. However, this option comes with several caveats:

  • Geo-level modeling provides a significant advantage, although this benefit is reduced when there are relatively few geos. It may be better to fit a model at a finer geo granularity and exclude the smallest geos, rather than aggregating geos to a coarser level.

  • Different geo aggregation grouping methods can lead to different MMM results.

  • Media execution variables, such as impressions or cost, can usually be summed across geos. However, some control variables, such as temperature, can be less straightforward to aggregate.

National-level media in a geo-level model

When most media are available at the geo-level, but one or two are only available at the national level, we recommend imputing the national-level media at a geo-level and running a geo-model. One naive imputation method is to approximate the geo-level media variable from its national level value, using the proportion of the population in the geo relative to the total population. Although it is preferable to have accurate geo-level data so that imputation isn't necessary, imputation can still yield useful information about the model parameters. For more information, see section 4.4 of Geo-level Bayesian Hierarchical Media Mix Modeling.