Join the newly launched Discord community for real-time discussions, peer support, and direct interaction with the Meridian team!

Set knots

Jump to:

How the knots argument works
Choosing the number of knots for time effects in the model
Automatic Knot Selection in Meridian
Other approaches for time effects: binary indicators and periodic functions

How the `knots` argument works

Meridian uses a time-varying intercept approach for modeling time effects ( Spline (mathematics), Wikipedia. Ng, Wang, & Dai. 2021). This approach models time effects \(\mu = [\mu_1, \dots, \mu_T]\) for each of the \(T\) time periods (a weekly-level, three-year MMM has \(52 \times 3\) time periods). The \(T\) time effects are modeled with possibly fewer than \(T\) many parameters using the relationship:

\[\mu = W \ast b\]

Where:

\(\mu\) is \(1 \times T\) representing the effect of each time \(t=1, \dots ,T\), \(W\) is a \(T \times K\) deterministic weight matrix
\(b\) (called knot_values in Meridian) is \(K \times 1\) where \(K \leq T\).

Bayesian posterior inference is done on \(b\), which is translated in terms of \(\mu\) according to the weight matrix \(W\). The number of knots \(K\) is determined by user input. The weight matrix \(W\) is determined by the L1 distance of a time period to the two neighboring knots.

To clarify how L1 distance determines the weight matrix, consider time period \(9\), where the two neighboring knots are at \(6\) and \(11\). The L1 distance from time period \(9\) and knot \(11\) is \(2\). The L1 distance from time period \(9\) and knot \(6\) is \(3\). So, the knot at \(6\) gets weight \(0.4 = 1 - \frac{3}{2+3} \) and the knot at \(11\) gets weight \(0.6 = 1 - \frac{2}{2+3} \). The weighted average of these two neighboring knots determines the value of \(\mu_9\).

Notice that when knots < n_times, there is some level of dimensionality reduction going on. The n_times periods are modeled with fewer than n_times parameters. The weight function determines how the time periods are combined.

Choosing the number of knots for time effects in the model

When you think about how to set knots of ModelSpec, it is helpful to think of the two extremes: knots can be anywhere from one to the number of time periods (n_times). When knots = n_times, there is no dimensionality reduction and each time period gets its own parameter. In a geo-level model, having as many knots as time periods is identifiable because you have multiple geos, and therefore multiple observations, per time period. When knots = 1, all time periods are measured with a single parameter which is equivalent to saying time has no effect. This absence of effect becomes a common intercept for all time periods.

When 1 < knots < n_times, you are in the middle of these two extremes. You can try a range of values that span the space of eligible values. For information about how to think about the middle of these two extremes, see Bias-variance trade-off.

We recommend that you try the following:

Geo level models should start at the default (knots = n_times). If you notice that overfitting is extreme or media effect estimates are unrealistic, then consider reducing the number of knots. The need to reduce the number of knots is more likely to apply as the number of geos per time point decreases.

Note: n_times is the number of time periods with the number of max_lag time periods subtracted.
National level models should start at the default 1 knot and increase the number of knots from there. Continue to increase until overfitting becomes extreme or media effect estimates become unrealistic.
A similar number of knots can return similar results, such as knots = 10 and knots = 11, so it can be helpful to spread out the values that you want to try.

Bias-variance trade-off

It can be helpful to think of the setting the number of knots as a bias-variance trade-off. When knots = n_times, each time period gets its own parameter and so the effect of a given time period is estimated using only data from that time period. However, knots = n_times is high-variance because of less data points available at a given time period.

When knots < n_times, each knot is estimated using the data of nearby time periods, with closer time periods getting more weight. Since the two closest knots determine the inference of a particular time period, the effect of a given time period is estimated by that time period's data and by nearby time periods' data. As the number of knots decreases, nearby time points are more and more influential on the inference for a particular time point, with the closer time points getting more weight. This decreases variance because more and more time points are used to estimate the effect of a given time period. However, the data isn't from the given time period, which increases bias.

In summary, more knots reduce bias in time effect estimates, while fewer knots reduce variance in time effect estimates. As an analyst, you can tune where on the bias-variance trade-off you want to be. If time is an important confounder between media and the KPI, then the bias-variance trade-off in estimating time effects translates to a bias-variance trade-off in estimating the causal effects of media.

Additionally, you can choose to have different bias-variance trade offs for different time regions. You do this by setting knots to a list, which specifies knot locations. Knot locations can be dense in areas where you prefer low bias in the estimates (such as a holiday season), and sparse in areas where the analyst prefers low variance in the estimate (such as an off-holiday season).

When to consider using fewer knots

When you set the number of knots, it also can be helpful to think about how the time period affects the media execution. Control variables should be confounding variables that impact both media execution and the KPI. For more information about control variables, see Selecting control variables.

Similar logic applies for time. If the time period isn't a factor for media execution, then time isn't a true confounding variable and you can avoid spending too many degrees of freedom on modeling time many knots. Advertisers need to consider whether the time period plays a role in planning around media execution. For example, a travel brand's media planning likely depends on the time period. A snack brand may have more consistent media planning across time periods. Also, consider whether time is really the important confounding variable, or if time is a proxy for some other variable that can be directly modeled, likely with fewer degrees of freedom. For example, was time really the confounding variable that drove media execution? Or was it how many COVID cases there were nationwide? Advertisers know their own media planning strategy and have insight into these topics.

When you must use `knots < n_times`

There are situations where you must set knots < n_times, for example, in a national-level model where you don't have multiple observations per time period and there are not enough degrees of freedom for each time period to get its own parameter. Note that some level of dimensionality reduction is necessary.

Another example is when you must include a national-level media or national-level control variable. By definition, national-level variables change over time but not over geo. Such a variable is perfectly collinear with time and is thus redundant with a model that has a parameter for each time period. If you set knots close to n_times, you technically might have an identifiable model. However, it still might be weakly identifiable and lead to problems. Given the concerns around estimating time effects in a national model, it is even more important to have high quality controls in a national model than in a geo model. For more information about high quality controls, see Selecting control variables.

Automatic Knot Selection in Meridian

Choosing knots can be challenging, as it requires balancing the bias-variance trade-off for reliable causal inference, incorporating business context to accurately represent how time influences media and KPIs, and accepting trial and error as part of the iterative model tuning process. To address these challenges, Meridian provides an Automatic Knot Selection (AKS) feature.

This feature is inspired by the methodology in Spline Regression with Automatic Knot Selection but includes several custom enhancements. At its core, the algorithm uses the linear spline basis to model the relationship between time (knots) and the scaled outcome variable. It employs a backward elimination approach, starting with a full set of potential knots and sequentially removing those that don't improve the model fit. This fit is evaluated using a regularization method, which penalizes the model for unnecessary complexity (i.e., number of knots). A key modification in Meridian's version is a geo-aware penalty, which encourages more knots as the number of geos in the data increases. This is because with more geos, there is more data available for each time period, allowing the model to better estimate time effects. The final optimal set of knots is determined using the Akaike information criterion (AIC) to ensure the model accurately captures trend and seasonality patterns.

Use Automatic Knot Selection

To enable AKS in your Meridian model, set the enable_aks parameter to True in the ModelSpec.

model_spec = spec.ModelSpec(enable_aks=True)
mmm = model.Meridian(model_spec=model_spec)

Once AKS is enabled in ModelSpec, you can retrieve the list of knots that Meridian selected.

knot_info = mmm.knot_info
selected_knots = knot_info.knot_locations

Practical Guidance

Here are some tips for working with AKS:

Start with the default: For most use cases, letting AKS select the knots provides a good balance between model fit and complexity (i.e., the number of knots in the model).
Inspect the results: After enabling AKS in ModelSpec, it's always a good idea to review the selected knots to ensure they make sense in the context of your data.
Manually adjust knots: You can manually add or remove knots to align with your business context. For example, you might add knots during a holiday season to reduce bias in your estimates, or remove some knots during an off-season to decrease variance. For more details, see Bias-variance trade-off.

See the following code snippet as an example of how to add a new knot at time point 52, create a new list, and use it to define a new ModelSpec.
```
modified_knots = np.append(selected_knots, 52)
model_spec = spec.ModelSpec(knots=modified_knots)
```
Tune the AKS penalty parameter: the AKS algorithm may occasionally select a large number of knots. This can be problematic for national models or models with a small number of geos, where limited data is available for each time period. If you observe an unusually high number of knots in such cases, you can call the automatic_knot_selection() function to tune the algorithm and encourage fewer knots.

You can modify the base_penalty argument in automatic_knot_selection to define the search range for the regularization penalty. By default, it is set to np.geomspace(0.1, 100, 100), meaning the default base penalty search range is from 0.1 to 100. You can supply your own list of penalty parameters. Increasing the penalty parameter applies stronger regularization, which in turn reduces the number of selected knots. For example, you can set the search range from 10 to 200 using np.geomspace(10, 200, 100).

You can also set the range of the number of knots considered using the min_internal_knots and max_internal_knots arguments.
```
import numpy as np
from meridian.model import spec
from meridian.model import knots

# Assume 'data' is the input data defined in the InputData step
aks = knots.AKS(data)
# Modify `base_penalty`, `min_internal_knots`, `max_internal_knots`
# arguments to encourage fewer knots
base_penalty = np.geomspace(10, 200, 100)
knot_locations = aks.automatic_knot_selection(base_penalty=base_penalty,
                                              min_internal_knots=2,
                                              max_internal_knots=10).knots
# You must use the 'knots' argument and NOT set 'enable_aks=True'.
model_spec = spec.ModelSpec(knots=knot_locations)
```
Experiment with different settings: Don't be afraid to experiment with different knot selection strategies to find what works best for your specific dataset and modeling goals.

Other approaches for modeling time effects: binary indicators and periodic functions

You can create and input binary indicators or periodic functions as control variables to model time effects in Meridian. Each offers advantages in certain cases.

Meridian recommends including control variables that are confounding variables, meaning they causally affect both media execution and the KPI. Control variables related to time effects are no different (see Control variables to learn more).

Binary indicators

A binary indicator takes a value of 1 when a condition is met and 0 when it is not. For example, a value of 1 to indicate all time periods in December and a value of 0 otherwise. In Meridian, binary indicators can be used as control variables to model time effects that are consistent over a set of time periods and optionally vary by geo. Knots and binary indicators can be used together, but be mindful of the total number of parameters used to model time effects.

Consistent effects

A binary indicator can cover multiple time periods, which assumes that the KPI effect (per capita in a geo model) is consistent across all time periods. A binary indicator uses multiple time periods to estimate a consistent effect, which improves estimates and uses degrees of freedom efficiently, provided that the assumption of a consistent effect is approximately correct.

The indicator has no effect on any time period outside of those indicated, whereas placing a knot at a specific time period will affect the neighboring time periods up to the next adjacent knot.

Modeling time effects as consistent over a set of time periods may be appealing for a national model, where imposing structure can help stabilize estimates. For a geo model, the flexibility of using many knots is often preferred.

Geographical variation

When a binary indicator is used as a control variable in a geo-level model, it is estimated as having a geo-dependent effect. This is ideal for events where you expect the impact to differ by region, like the Super Bowl having a larger impact in the host city. In contrast, knots estimate time effects that are not geo-dependent. Knots create a flexible spline function to capture time-based patterns that are shared across all geographies. This makes them more parameter-efficient if you don't expect geo-dependent time effects.

Turn off geographical variation for a binary indicator

One may want to use a binary indicator without geo-dependent time-effects. You can do this by setting the prior for its hierarchical variance, xi_c, to a point mass at zero. In which case, each geo-specific coefficient for the binary indicator will be identical. To turn off geo-effects for all control variables, set their variance prior to a deterministic value of zero:

xi_c = tfp.distributions.Deterministic(0)

To turn off the geo-effect for just one specific control variable, you can set the scale of its prior to zero. For example, if you have four control variables and want to turn off geo-dependent effects for the first:

xi_c = tfp.distributions.HalfNormal(scale=[0, 5, 5, 5])

Multicollinearity risk

Binary indicators can have a risk of high multicollinearity with media execution variables. Therefore, it is important to be mindful of multicollinearity and only include binary indicators that are indeed confounders.

Periodic functions

Another option is to add a periodic function, like a Fourier series, as a control variable. Periodic functions can be an attractive alternative to knots, especially in national models.

Periodic functions model the time effects as having a smooth and cyclical pattern on the KPI (per capita KPI in the case of a geo model). Periodic functions are a strong parametric assumption about how time affects the KPI. It may be appropriate for a national model, where imposing structure can help get a stable estimate of seasonality. For a geo model, the flexibility of using many knots is often preferred, as it doesn't force a smooth and cyclical pattern on the KPI.

Practical recommendations

Recommendations depend on whether the model is a geo or national model.

Geo models

Binary indicators can be used in geo models to model geo-dependent time effects. Knots and binary indicators can be used together, but be mindful of the total number of parameters used to model time effects. Binary indicators with full knots are not recommended.

For time effects that are not geo-dependent, use knots: if you want to model a temporal pattern that is consistent across geos, knots provide flexibility without risking over-parameterization.
For time effects that are geo-dependent, use a binary indicator: if you have a strong hypothesis that an event's impact varies by geo, a binary indicator as a control variable is the right tool.

National models

National models especially benefit from parsimony, which can be achieved with binary indicators, periodic functions, or a few well-placed knots. These can be used together, but be especially mindful of the total number of parameters given the importance of parsimony in a national model. Each of these options improves estimates and uses degrees of freedom efficiently, provided that the assumption is approximately correct. The following summarizes the assumptions for each of these:

Periodic functions model the time effects as having a smooth and cyclical pattern on the KPI.
Binary indicators model time effects as consistent across affected time periods.
Knots model time effects as a piecewise linear trend over time.

Media saturation and lagging

Set the max_lag parameter