Jump to:
- Run Quality Checks
- Interpret the Diagnostic Statuses
- Convergence
- Negative Baseline
- Bayesian Posterior Predictive P-value (PPP)
- Goodness-of-fit
- Prior Posterior Shift for ROI
- ROI Consistency
Run Quality Checks
After your model has been trained, you must assess its integrity and stability before trusting its results for causal inference. These post-modeling quality checks are designed to diagnose common issues related to model convergence, specification, and plausibility.
Running these checks helps you identify potential problems, understand how your data informed your model, and build confidence that the model's outputs are reliable and make business sense.
Run the following command to generate the results for all necessary diagnostics on this page:
from meridian.analysis.review import reviewer
reviewer.ModelReviewer(mmm).run()
Interpret the Diagnostic Statuses
Each diagnostic check on this page will return one of three statuses. Here is the philosophy behind each one:
PASS: This status is purely informative, and no action is required from the user.
REVIEW: This status appears when a finding depends on business context, and it is not a clear
PassorFail. Manually review the result to determine if further action is needed. Proceeding with aREVIEWstatus is often reasonable, as long as you have assessed the finding and understand its implications.FAIL: This is a critical flag indicating that the check detected a significant problem. It is strongly recommended that you fix the issue before proceeding, as the model's results may be unreliable for causal inference.
Convergence
Model convergence is a fundamental prerequisite before interpreting estimates from any Bayesian model, such as Meridian. Without convergence, estimates are arbitrary and not a true representation of the posterior distribution.
Meridian uses the Gelman & Rubin (1992) potential scale reduction factor (R-hat) to diagnose convergence. R-hat compares the variance between chains to the variance within each chain. If the chains have converged, these variances will be nearly identical, and the R-hat value will be close to 1.0.
To provide a single, clear signal for the entire model, Meridian reports the
max_r_hat value found across all model parameters. This single value
determines the model's overall convergence status.
| Condition | Status | Recommendation |
|---|---|---|
max_r_hat < 1.2 |
PASS | The model has likely converged, as all parameters have R-hat values <1.2. |
max_r_hat >= 1.2 and < 10 |
FAIL | The model hasn't fully converged, and the max_r_hat for parameter X is Y. Manually inspect the parameters with high R-hat values to determine if the results are acceptable for your use case, and consider increasing MCMC iterations or investigating model misspecification. |
max_r_hat >= 10 |
FAIL | The model hasn't converged, and the max_r_hat for parameter X is Y. We recommend increasing MCMC iterations or investigating model misspecification (e.g., priors, multicollinearity) before proceeding. |
If your model's max_r_hat is 1.2 or greater, you should investigate the cause
before trusting the model's outputs. Follow these steps to resolve convergence
problems:
Increase MCMC iterations: First, try increasing the number of MCMC iterations, as the model may need more time to explore the posterior distribution and reach a stable state.
Investigate the model: If the convergence issue persists after increasing iterations, investigate potential model misspecification. This includes carefully re-examining your priors and checking for high multicollinearity between your predictors.
For more information, see Getting MCMC Convergence.
Negative Baseline
In Meridian, the baseline represents the expected outcome (e.g., sales, conversions) under the counterfactual scenario where all treatment variables are set to their baseline values. Essentially, it helps us understand what would have happened to the outcome if you hadn't engaged in any paid media, organic media, or other non-media treatments during the analysis period.
Estimating the baseline accurately is vital because it provides the foundation for determining the incremental impact of your marketing efforts. An inaccurate baseline can lead to significant misinterpretations of your marketing's true impact.
Since the outcome generally cannot be negative, a baseline that drops into negative values indicates a statistical error. However, it's important to be precise about the severity. Like all statistical models, Meridian will have some error, so an occasional, small dip into negative values might not be a major issue. A consistently negative baseline, however, is a clear problem. It suggests that without any marketing, your sales would have been consistently negative, which is nonsensical in a real-world scenario. This is a strong signal that the model is overestimating treatment effects, likely by incorrectly attributing organic growth or other unmeasured positive effects to your treatment variables.
Because Meridian is a statistical and probabilistic model, we can distinguish between these scenarios by assessing the baseline probabilistically, rather than looking at just a single point estimate. The key metric to assess is the posterior probability that the baseline, aggregated over the entire time window, is negative. A high probability of this kind signals a large statistical error and that the model requires adjustment. For a more detailed explanation, see Assess negative baseline.
Meridian assesses this probability to help you diagnose your model:
| Condition | Status | Recommendation |
|---|---|---|
Negative baseline probability < 0.2 |
PASS | The posterior probability that the baseline is negative is X. We recommend visually inspecting the baseline time series in the Model Fit charts to confirm this. |
Negative baseline probability is between 0.2 and 0.8 |
REVIEW | The posterior probability that the baseline is negative is X. This indicates that the baseline time series occasionally dips into negative values. We recommend visually inspecting the baseline time series in the Model Fit charts, but don't be overly concerned. An occasional, small dip may indicate minor statistical error, which is inherent in any model. |
Negative baseline probability > 0.8 |
FAIL | The posterior probability that the baseline is negative is X. This high probability points to a statistical error and is a clear signal that the model requires adjustment. The model is likely over-crediting your treatments. Consider adjusting the model's settings, data, or priors to correct this issue. |
If the negative baseline probability is high (i.e., > 0.8) for your model, we recommend reviewing your model's specification, control variables, and your model's DAG. For more information, see Mitigate negative or low baseline.
Bayesian Posterior Predictive P-value (PPP)
The Bayesian Posterior Predictive P-value (PPP) is a powerful diagnostic tool that checks the overall fit of your model. It is sometimes also referred to as the Bayesian P-value. It answers the question: "Does the data simulated by my model look like the real data I observed?" If the model has correctly learned the data's underlying patterns, the data it simulates should be statistically indistinguishable from the real data. If the simulated data looks completely different, it's a strong sign that the model is misspecified and is a poor fit for the data. For more information, see section 6.3 of Bayesian Data Analysis.
To perform this check, Meridian uses the total sum of the outcome (across all geos and time) as its test statistic. The check compares the distribution of the expected total outcome ($T(y_{\text{exp}})$) from its posterior samples against the single observed total outcome ($T(y)$). A straightforward way to do this comparison is to calculate the one-sided p-value, or percentile rank, of the observed total sum ($T(y)$) within the distribution of expected total sums, which is calculated as:
where $S$ is the total number of posterior samples. With this calculation, an "extreme" or "poor" model fit occurs if the observed data falls in either the extreme left tail (a p-value close to 0) or the extreme right tail (a p-value close to 1) of the posterior predictive distribution of the expected outcome. For reporting purposes, the p-value is transformed so that values close to 0 represent both extreme tails (to match the interpretation of a frequentist p-value). The conceptual null hypothesis is that the observed data was generated by the model. The null hypothesis is "rejected," so to speak, if the p-value is less than some predetermined threshold.
| Condition | Status | Recommendation |
|---|---|---|
Bayesian PPP >= 0.05 |
PASS | The Bayesian posterior predictive p-value is X. The observed total outcome is consistent with the model's posterior predictive distribution. |
Bayesian PPP < 0.05 |
FAIL | The Bayesian posterior predictive p-value is X. The observed total outcome is an extreme outlier compared to the model's expected total outcomes, which suggests a systematic lack of fit. We recommend reviewing input data quality and re-examining the model specification (e.g., priors, transformations) to resolve this issue. |
A FAIL status for the PPP value is a strong indicator of model
misspecification (e.g., missing variables, data issues not caught by EDA,
incorrect priors, or flawed assumptions about adstock decay, saturation, or the
baseline). We recommend you thoroughly review your input data quality for any
anomalies, outliers, or errors across the KPI, media, and control variables.
Furthermore, re-examine the model specification, paying close attention to the
choice of priors, the baseline, and the appropriateness of the Adstock and Hill
transformations. Finally, cross-reference the Bayesian PPP result with other
critical model diagnostics, such as convergence, R-squared, and residual plots,
to gain a holistic view of the model's performance.
Goodness-of-fit
Goodness-of-fit metrics measure how well a model's predictions align with the actual observed data. They serve as an important confidence check but should be interpreted with care, as the primary goal of an MMM is accurate causal inference, not predictive accuracy. Meridian reports three standard metrics:
R-squared: The proportion of variance in the outcome variable that is explained by the model. A value closer to 1 indicates a better fit.
Mean Absolute Percentage Error (MAPE): The average absolute percentage difference between predicted and actual values. A value closer to 0% is better.
Weighted MAPE (wMAPE): A variation of MAPE where errors are weighted by the actual outcome value (e.g., revenue). This is often preferred over MAPE as it gives less importance to geos and time periods with small outcomes, which can otherwise inflate the error metric.
These metrics are reported for every model, serving primarily as a tool for relative comparison against other candidate models.
| Condition | Status | Recommendation |
|---|---|---|
R-squared > 0 |
PASS | R-squared = X, MAPE = Y, and wMAPE = Z. These goodness-of-fit metrics are intended for guidance and relative comparison. |
R-squared <= 0 |
REVIEW | R-squared = X, MAPE = Y, and wMAPE = Z. A negative R-squared signals a potential conflict between your priors and the data, and it warrants investigation. If this conflict is intentional (due to an informative prior), no further action is needed. If it's unintentional, we recommend relaxing your priors to be less restrictive. |
These fit metrics will normally have a PASS status, as they are intended for
guidance and relative comparison. However, if the R-squared yields a negative
value, Meridian will report a REVIEW status. A negative R-squared often stems
from overly informative priors that conflict with the patterns in your data.
This conflict is an important diagnostic signal, but it is not inherently 'bad.'
It requires consideration based on your modeling goals, as the conflict might be
intentional. For example, you might use a strong, experiment-based prior to
deliberately counteract a known bias (like a missing confounder) that you
believe exists in the observational data. In this scenario, the negative
R-squared is just highlighting the tension you've introduced.
Therefore, a REVIEW status prompts you to investigate why this conflict is
occurring. If the conflict is unintentional (and not a deliberate choice like
the example above), we recommend you review and relax your priors to be less
restrictive. If the issue persists, investigate the model's structure for other
issues, such as missing key variables or incorrect assumptions about the
relationships between your predictors and the outcome. For more information, see
Negative
R-squared.
Bayesian PPP Compared to R-squared
Bayesian PPP and R-squared are complementary metrics. R-squared is a relative metric, primarily used to compare one model against another (e.g., Model A's R-squared is better than Model B's). In contrast, the Bayesian PPP is an absolute measure of model adequacy.
Additionally, R-squared typically measures the variance explained by the model's point estimate. The Bayesian PPP, by contrast, considers uncertainty in estimates and determines if the observed data is a plausible draw from the model.
A FAIL status from the Bayesian PPP indicates the model is fundamentally
misspecified. Conversely, R-squared can be low even if the model is perfectly
specified because the true process is inherently noisy.
Prior Posterior Shift for ROI
A fundamental concept in Bayesian modeling is "learning from data." This check helps you understand how much the model learns by comparing the prior distribution with the posterior distribution. There are two primary interpretations of this check:
When there is a significant shift: This is generally a good sign. It indicates that the data used to fit the MMM is informative enough to update the model's initial beliefs, resulting in a more precise, data-driven estimate.
When there is little to no shift: This means the information in the data used to fit the MMM is weak relative to the information in the prior. This can occur for two main reasons:
Low information in the data: The data for that channel is too sparse, noisy, or lacks variation. When there is little information in the data, the prior and the posterior will be similar. Channels with low spend are particularly susceptible to this. To remediate this, the best practice is often to merge the channel with another related channel to increase its signal. If there is no reasonable channel to merge it with, it is still better to include the channel in the model and rely on your prior knowledge (assuming it's at least approximately reasonable) than to drop the channel entirely. Using a reasonable prior is better than pretending the channel doesn't exist. Dropping the channel should only be considered as a last resort, for example, if its spend is truly negligible and it cannot be logically combined elsewhere. For more information, see When the posterior is the same as the prior.
Strong information in the prior: The prior is intentionally set to be very strong (low variance). This isn't necessarily a bad thing. If the prior was based on solid external knowledge (e.g., from a previous causal experiment not used to fit this model), then it's acceptable and expected that the prior has more information than the model data, and a lack of shift is fine.
You can visually inspect this shift. Run the following command to plot the ROI posterior distribution against the ROI prior distribution for each media channel:
model_diagnostics = visualizer.ModelDiagnostics(meridian)
model_diagnostics.plot_prior_and_posterior_distribution()
Quantitatively, Meridian performs two-sided hypothesis tests for key statistics (mean, median, first and third quartiles) of the ROI parameter for each paid media channel. The test checks if the analytical statistic from the prior (e.g., prior mean) falls outside a confidence interval constructed from the posterior samples. This is done using a non-parametric bootstrap:
An empirical distribution for a statistic (e.g., the mean) is generated from the posterior samples by repeatedly resampling.
A two-sided hypothesis test is conducted where the null hypothesis is that the posterior statistic is equal to the prior statistic.
The test calculates p-values by finding the proportion of the bootstrapped posterior statistics that are greater or less than the prior's value.
If the p-value is below the significance level ($\alpha=0.05$), the null hypothesis is rejected, and a significant shift is reported for that specific statistic.
A channel is flagged as having "no significant prior-posterior shift" in the recommendation table if none of its key statistics show a significant shift.
| Condition | Status | Recommendation |
|---|---|---|
| For all channels, there is a significant prior-posterior shift. | PASS | The model has successfully learned from the data. This is a positive sign that your data was informative. |
| For any channel, there is no significant prior-posterior shift. | REVIEW | We've detected channels X, Y, and Z where the posterior distribution did not significantly shift from the prior. This suggests the data signal for these channels was not strong enough to update the model's beliefs. Please review these channels to determine if this is expected (due to strong priors) or problematic (due to a weak signal). |
ROI Consistency
ROI is often the most scrutinized output of an MMM. This check helps ensure the model's ROI estimates are plausible. Extreme ROI values can indicate underlying issues, such as problems with baseline estimation or model specification.
Meridian performs this check by evaluating the posterior mean ROI for each paid
media channel against its corresponding custom prior distribution. A channel's
ROI is flagged as an outlier if its posterior mean falls into the extreme tails
of the prior distribution. Specifically, this REVIEW status is triggered if
the estimate falls above the 99th percentile or below the 1st percentile of your
stated prior belief.
| Condition | Status | Recommendation |
|---|---|---|
| For all channels, the posterior mean ROI is within the 1st and 99th percentiles of its prior distribution. | PASS | The posterior distribution of the ROI is within a reasonable range, aligning with the custom priors you provided. |
| For any channel, the posterior mean ROI falls into the extreme tails (i.e., above the 99th or below the 1st percentile) of its prior distribution. | REVIEW | We've detected channels X, Y and Z where the posterior mean falls into the extreme tail of your custom prior. Please review this result to determine if it is reasonable within your business context. |
This check is only performed when custom priors are set and is skipped if default priors are used. The check's purpose is to detect a conflict between the model's data-driven result (the posterior) and your explicit, expert-driven business hypothesis (the custom prior). It acts as an actionable alert when the data strongly contradicts your stated beliefs, prompting a review of either the model or the assumptions. In contrast, default priors aren't business hypotheses; they are general-purpose statistical tools for regularization. Since they are intentionally broad and don't represent your specific business knowledge, comparing the model's result against them wouldn't provide a meaningful or actionable insight.