Join the newly launched
Discord community for real-time discussions, peer support, and direct interaction with the Meridian team!
Geo-level and national-level data
Stay organized with collections
Save and categorize content based on your preferences.
Meridian offers the option to model geo-level or national level-data. Geo-level
data is when data is broken down into mutually exclusive geographic regions such
as states, cities, DMAs, or even multiple countries. These regions are typically
all within a larger geographic region, such as a country. National-level data is
when data is provided for a single geographic region, typically an entire
country. National-level data is essentially single-geo data.
Geo-level data offers several advantages and is thus recommended when
possible. If geo-level data is available for most, but not all, media channels,
then we recommend imputing the national-level data at the geo-level, and running
a geo-model. For more information on imputation,
see National-level media in a geo-level model. For more
information on the national model,
see National-level modeling.
Geo-level model advantages
Statistical modeling relies on patterns in the data. Repeatable patterns
are more common in geo-level data than national-level data.
Here are some other advantages of geo-level data:
- Increases the effective sample size, by pooling data across geos in geo-modeling.
- Provides tighter credible intervals, provided the geos are similar in
terms of the media impact mechanism as the model assumes. For
more information, see Geo-level Bayesian Hierarchical Media Mix
Modeling.
- Improves estimates for time-effects (such as trend and
seasonality), since there are multiple observations per time
period.
- Can support the use of more
knots
to model the \(\mu_t\) parameter. National-level data has fewer
degrees of freedom for time-effects. For example, one knot per time
period completely saturates the national-level model.
- Shows greater variability in marketing spend, which is
crucial for estimating non-linear effects, like saturation (Hill
function parameters).
- Reduces omitted variable bias due to missing confounders, by reducing
the correlation between media spend and confounders. See section 4.3 of Geo-level Bayesian Hierarchical Media Mix
Modeling
for more information.
Geo selection
When you are selecting geos, consider the following guidance:
Drop the smallest geos by total KPI first. Smaller geos have less
contribution to ROI, yet they can still have a high influence on model fit,
particularly when there is a single residual variance for all groups
(unique_sigma_for_each_geo = False
of ModelSpec
).
For US advertisers using designated market area (DMA) as the geographical
unit, a rough guideline is to model the top 50-100 DMAs by population size.
This generally includes the vast majority of the KPI units, while excluding
most of the noisier small DMAs that might impact model fit and convergence.
When each geo has its own residual variance (unique_sigma_for_each_geo =
True
of ModelSpec
), noisier geos have less impact on model fit. However,
this option can make convergence difficult for some datasets because it adds
so much flexibility to the model. If MCMC sampling does converge under this
option, it might be worth plotting the geo population size versus the mean
residual standard deviation (sigma
parameter) - in most cases, you would
expect to see a fairly monotone pattern. If you don't see this pattern, then
it might be better to set unique_sigma_for_each_geo = False
and use a
smaller subset of geos.
If you want to make sure the model represents 100% of your KPI units, you
can aggregate smaller geos into larger regions. However, this option comes
with several caveats:
Geo-level modeling provides a significant advantage, although this benefit is
reduced when there are relatively few geos. It may be better to fit a model at a
finer geo granularity and exclude the smallest geos, rather than aggregating
geos to a coarser level.
Different geo aggregation grouping methods can lead to different MMM results.
Media execution variables, such as impressions or cost, can usually be
summed across geos. However, some control variables, such as
temperature, can be less straightforward to aggregate.
National-level media in a geo-level model
When most media are available at the geo-level, but one or two are only
available at the national level, we recommend imputing the national-level
media at a geo-level and running a geo-model. One naive imputation method is
to approximate the geo-level media variable from its national level value,
using the proportion of the population in the geo relative to the total
population. Although it is preferable to have accurate geo-level data so that
imputation isn't necessary, imputation can still yield useful information
about the model parameters. For more information, see section 4.4 of
Geo-level Bayesian Hierarchical Media Mix Modeling.
Except as otherwise noted, the content of this page is licensed under the Creative Commons Attribution 4.0 License, and code samples are licensed under the Apache 2.0 License. For details, see the Google Developers Site Policies. Java is a registered trademark of Oracle and/or its affiliates.
Last updated 2025-08-12 UTC.
[null,null,["Last updated 2025-08-12 UTC."],[[["\u003cp\u003ePrioritize modeling larger geos based on total KPI, as smaller geos have less ROI contribution and can negatively impact model fit.\u003c/p\u003e\n"],["\u003cp\u003eFor US advertisers, modeling the top 50-100 DMAs by population generally captures significant KPI units while excluding noisy smaller DMAs.\u003c/p\u003e\n"],["\u003cp\u003eWhen using national-level media in a geo-level model, impute national data to the geo-level, preferably using accurate geo-level data if available.\u003c/p\u003e\n"],["\u003cp\u003eAggregating smaller geos into larger regions can be an option to represent 100% of KPI units, but consider the potential impact on model results and interpretation.\u003c/p\u003e\n"],["\u003cp\u003eAvoid redundant national-level variables when \u003ccode\u003eknots = n_times\u003c/code\u003e by either adjusting \u003ccode\u003eknots\u003c/code\u003e or carefully selecting variables based on interpretation goals.\u003c/p\u003e\n"]]],["When selecting geos, prioritize dropping smaller geos with lower KPI contribution first, especially if using a single residual variance. For US DMAs, model the top 50-100 by population. If each geo has unique residual variance, smaller geos matter less, but convergence may be difficult. National-level media can be imputed to the geo level using population proportions. Avoid national-level controls when each time period has its own parameter, as they create redundancy.\n"],null,["# Geo-level and national-level data\n\nMeridian offers the option to model geo-level or national level-data. Geo-level\ndata is when data is broken down into mutually exclusive geographic regions such\nas states, cities, DMAs, or even multiple countries. These regions are typically\nall within a larger geographic region, such as a country. National-level data is\nwhen data is provided for a single geographic region, typically an entire\ncountry. National-level data is essentially single-geo data.\n\nGeo-level data offers several advantages and is thus recommended when\npossible. If geo-level data is available for most, but not all, media channels,\nthen we recommend imputing the national-level data at the geo-level, and running\na geo-model. For more information on imputation,\nsee [National-level media in a geo-level model](#national-imputation). For more\ninformation on the national model,\nsee [National-level modeling](/meridian/docs/basics/national-models).\n\nGeo-level model advantages\n--------------------------\n\nStatistical modeling relies on patterns in the data. Repeatable patterns\nare more common in geo-level data than national-level data.\n\nHere are some other advantages of geo-level data:\n\n- Increases the effective sample size, by pooling data across geos in geo-modeling.\n- Provides tighter credible intervals, provided the geos are similar in terms of the media impact mechanism as the model assumes. For more information, see [Geo-level Bayesian Hierarchical Media Mix\n Modeling](//research.google/pubs/geo-level-bayesian-hierarchical-media-mix-modeling/).\n- Improves estimates for time-effects (such as trend and seasonality), since there are multiple observations per time period.\n- Can support the use of more `knots` to model the \\\\(\\\\mu_t\\\\) parameter. National-level data has fewer degrees of freedom for time-effects. For example, one knot per time period completely saturates the national-level model.\n- Shows greater variability in marketing spend, which is crucial for estimating non-linear effects, like saturation (Hill function parameters).\n- Reduces omitted variable bias due to missing confounders, by reducing the correlation between media spend and confounders. See section 4.3 of [Geo-level Bayesian Hierarchical Media Mix\n Modeling](//research.google/pubs/geo-level-bayesian-hierarchical-media-mix-modeling/) for more information.\n\nGeo selection\n-------------\n\nWhen you are selecting geos, consider the following guidance:\n\n- Drop the smallest geos by total KPI first. Smaller geos have less\n contribution to ROI, yet they can still have a high influence on model fit,\n particularly when there is a single residual variance for all groups\n (`unique_sigma_for_each_geo = False` of `ModelSpec`).\n\n- For US advertisers using designated market area (DMA) as the geographical\n unit, a rough guideline is to model the top 50-100 DMAs by population size.\n This generally includes the vast majority of the KPI units, while excluding\n most of the noisier small DMAs that might impact model fit and convergence.\n\n- When each geo has its own residual variance (`unique_sigma_for_each_geo =\n True` of `ModelSpec`), noisier geos have less impact on model fit. However,\n this option can make convergence difficult for some datasets because it adds\n so much flexibility to the model. If MCMC sampling does converge under this\n option, it might be worth plotting the geo population size versus the mean\n residual standard deviation (`sigma` parameter) - in most cases, you would\n expect to see a fairly monotone pattern. If you don't see this pattern, then\n it might be better to set `unique_sigma_for_each_geo = False` and use a\n smaller subset of geos.\n\nIf you want to make sure the model represents 100% of your KPI units, you\ncan aggregate smaller geos into larger regions. However, this option comes\nwith several caveats:\n\n- Geo-level modeling provides a significant advantage, although this benefit is\n reduced when there are relatively few geos. It may be better to fit a model at a\n finer geo granularity and exclude the smallest geos, rather than aggregating\n geos to a coarser level.\n\n- Different geo aggregation grouping methods can lead to different MMM results.\n\n- Media execution variables, such as impressions or cost, can usually be\n summed across geos. However, some control variables, such as\n temperature, can be less straightforward to aggregate.\n\nNational-level media in a geo-level model\n-----------------------------------------\n\nWhen most media are available at the geo-level, but one or two are only\navailable at the national level, we recommend imputing the national-level\nmedia at a geo-level and running a geo-model. One naive imputation method is\nto approximate the geo-level media variable from its national level value,\nusing the proportion of the population in the geo relative to the total\npopulation. Although it is preferable to have accurate geo-level data so that\nimputation isn't necessary, imputation can still yield useful information\nabout the model parameters. For more information, see section 4.4 of\n[Geo-level Bayesian Hierarchical Media Mix Modeling](//research.google/pubs/geo-level-bayesian-hierarchical-media-mix-modeling/)."]]