Set knots

Jump to:

How the knots argument works

Meridian uses a time-varying intercept approach for modeling time effects ( Spline (mathematics), Wikipedia. Ng, Wang, & Dai. 2021). This approach models time effects \(\mu = [\mu_1, \dots, \mu_T]\) for each of the \(T\) time periods (a weekly-level, three-year MMM has \(52 \times 3\) time periods). The \(T\) time effects are modeled with possibly fewer than \(T\) many parameters using the relationship:

\[\mu = W \ast b\]

Where:

  • \(\mu\) is \(1 \times T\) representing the effect of each time \(t=1, \dots ,T\), \(W\) is a \(T \times K\) deterministic weight matrix

  • \(b\) (called knot_values in Meridian) is \(K \times 1\) where \(K \leq T\).

Bayesian posterior inference is done on \(b\), which is translated in terms of \(\mu\) according to the weight matrix \(W\). The number of knots \(K\) is determined by user input. The weight matrix \(W\) is determined by the L1 distance of a time period to the two neighboring knots.

To clarify how L1 distance determines the weight matrix, consider time period \(9\), where the two neighboring knots are at \(6\) and \(11\). The L1 distance from time period \(9\) and knot \(11\) is \(2\). The L1 distance from time period \(9\) and knot \(6\) is \(3\). So, the knot at \(6\) gets weight \(0.4 = 1 - \frac{3}{2+3} \) and the knot at \(11\) gets weight \(0.6 = 1 - \frac{2}{2+3} \). The weighted average of these two neighboring knots determines the value of \(\mu_9\).

Notice that when knots < n_times, there is some level of dimensionality reduction going on. The n_times periods are modeled with fewer than n_times parameters. The weight function determines how the time periods are combined.

Choosing the number of knots for time effects in the model

When you think about how to set knots of ModelSpec, it is helpful to think of the two extremes: knots can be anywhere from one to the number of time periods (n_times). When knots = n_times, there is no dimensionality reduction and each time period gets its own parameter. In a geo-level model, having as many knots as time periods is identifiable because you have multiple geos, and therefore multiple observations, per time period. When knots = 1, all time periods are measured with a single parameter which is equivalent to saying time has no effect. This absence of effect becomes a common intercept for all time periods.

When 1 < knots < n_times, you are in the middle of these two extremes. You can try a range of values that span the space of eligible values. For information about how to think about the middle of these two extremes, see Bias-variance trade-off.

We recommend that you try the following:

  • Geo level models should start at the default (knots = n_times). If you notice that overfitting is extreme or media effect estimates are unrealistic, then consider reducing the number of knots. The need to reduce the number of knots is more likely to apply as the number of geos per time point decreases.

  • National level models should start at the default 1 knot and increase the number of knots from there. Continue to increase until overfitting becomes extreme or media effect estimates become unrealistic.

  • A similar number of knots can return similar results, such as knots = 10 and knots = 11, so it can be helpful to spread out the values that you want to try.

For information that might help you develop algorithms for knot selection, see Knot selection in sparse Gaussian processes with a variational objective function in the Wiley online library.

Bias-variance trade-off

It can be helpful to think of the setting the number of knots as a bias-variance trade-off. When knots = n_times, each time period gets its own parameter and so the effect of a given time period is estimated using only data from that time period. However, knots = n_times is high-variance because of less data points available at a given time period.

When knots < n_times, each knot is estimated using the data of nearby time periods, with closer time periods getting more weight. Since the two closest knots determine the inference of a particular time period, the effect of a given time period is estimated by that time period's data and by nearby time periods' data. As the number of knots decreases, nearby time points are more and more influential on the inference for a particular time point, with the closer time points getting more weight. This decreases variance because more and more time points are used to estimate the effect of a given time period. However, the data isn't from the given time period, which increases bias.

In summary, more knots reduce bias in time effect estimates, while fewer knots reduce variance in time effect estimates. As an analyst, you can tune where on the bias-variance trade-off you want to be. If time is an important confounder between media and the KPI, then the bias-variance trade-off in estimating time effects translates to a bias-variance trade-off in estimating the causal effects of media.

Additionally, you can choose to have different bias-variance trade offs for different time regions. You do this by setting knots to a list, which specifies knot locations. Knot locations can be dense in areas where you prefer low bias in the estimates (such as a holiday season), and sparse in areas where the analyst prefers low variance in the estimate (such as an off-holiday season).

When to consider using fewer knots

When you set the number of knots, it also can be helpful to think about how the time period affects the media execution. Control variables should be confounding variables that impact both media execution and the KPI. For more information about control variables, see Selecting control variables.

Similar logic applies for time. If the time period isn't a factor for media execution, then time isn't a true confounding variable and you can avoid spending too many degrees of freedom on modeling time many knots. Advertisers need to consider whether the time period plays a role in planning around media execution. For example, a travel brand's media planning likely depends on the time period. A snack brand may have more consistent media planning across time periods. Also, consider whether time is really the important confounding variable, or if time is a proxy for some other variable that can be directly modeled, likely with fewer degrees of freedom. For example, was time really the confounding variable that drove media execution? Or was it how many COVID cases there were nationwide? Advertisers know their own media planning strategy and have insight into these topics.

When you must use knots < n_times

There are situations where you must set knots < n_times, for example, in a national-level model where you don't have multiple observations per time period and there are not enough degrees of freedom for each time period to get its own parameter. Note that some level of dimensionality reduction is necessary.

Another example is when you must include a national-level media or national-level control variable. By definition, national-level variables change over time but not over geo. Such a variable is perfectly collinear with time and is thus redundant with a model that has a parameter for each time period. If you set knots close to n_times, you technically might have an identifiable model. However, it still might be weakly identifiable and lead to problems. Given the concerns around estimating time effects in a national model, it is even more important to have high quality controls in a national model than in a geo model. For more information about high quality controls, see Selecting control variables.

Other approaches for modeling time effects: binary indicators and periodic functions

You can create and input binary indicators or periodic functions as control variables to model time effects in Meridian. Each offers advantages in certain cases.

Binary indicators

A binary indicator takes a value of 1 when a condition is met and 0 when it is not. For example, a value of 1 to indicate all time periods in December and a value of 0 otherwise. In Meridian, binary indicators can be used as control variables to model time effects that are consistent over a set of time periods and optionally vary by geo. Knots and binary indicators can be used together, but be mindful of the total number of parameters used to model time effects.

Consistent effects

A binary indicator can cover multiple time periods, which assumes that the KPI effect (per capita in a geo model) is consistent across all time periods. A binary indicator uses multiple time periods to estimate a consistent effect, which improves estimates and uses degrees of freedom efficiently, provided that the assumption of a consistent effect is approximately correct.

The indicator has no effect on any time period outside of those indicated, whereas placing a knot at a specific time period will affect the neighboring time periods up to the next adjacent knot.

Modeling time effects as consistent over a set of time periods may be appealing for a national model, where imposing structure can help stabilize estimates. For a geo model, the flexibility of using many knots is often preferred.

Geographical variation

When a binary indicator is used as a control variable in a geo-level model, it is estimated as having a geo-dependent effect. This is ideal for events where you expect the impact to differ by region, like the Super Bowl having a larger impact in the host city. In contrast, knots estimate time effects that are not geo-dependent. Knots create a flexible spline function to capture time-based patterns that are shared across all geographies. This makes them more parameter-efficient if you don't expect geo-dependent time effects.

Turning off geographical variation for a binary indicator

One may want to use a binary indicator without geo-dependent time-effects. You can do this by setting the prior for its hierarchical variance, xi_c, to a point mass at zero. In which case, each geo-specific coefficient for the binary indicator will be identical. To turn off geo-effects for all control variables, set their variance prior to a deterministic value of zero:

xi_c = tfp.distributions.Deterministic(0)

To turn off the geo-effect for just one specific control variable, you can set the scale of its prior to zero. For example, if you have four control variables and want to turn off geo-dependent effects for the first:

xi_c = tfp.distributions.HalfNormal(scale=[0, 5, 5, 5])

Periodic functions

Another option is to add a periodic function, like a Fourier series, as a control variable. Periodic functions can be an attractive alternative to knots, especially in national models.

Periodic functions model the time effects as having a smooth and cyclical pattern on the KPI (per capita KPI in the case of a geo model). Periodic functions are a strong parametric assumption about how time affects the KPI. It may be appropriate for a national model, where imposing structure can help get a stable estimate of seasonality. For a geo model, the flexibility of using many knots is often preferred, as it doesn't force a smooth and cyclical pattern on the KPI.

Practical recommendations

Recommendations depend on whether the model is a geo or national model.

Geo models

Binary indicators can be used in geo models to model geo-dependent time effects. Knots and binary indicators can be used together, but be mindful of the total number of parameters used to model time effects.

  • For time effects that are not geo-dependent, use knots: if you want to model a temporal pattern that is consistent across geos, knots provide flexibility without risking over-parameterization.
  • For time effects that are geo-dependent, use a binary indicator: if you have a strong hypothesis that an event's impact varies by geo, a binary indicator as a control variable is the right tool.

National models

National models especially benefit from parsimony, which can be achieved with binary indicators, periodic functions, or a few well-placed knots. These can be used together, but be especially mindful of the total number of parameters given the importance of parsimony in a national model. Each of these options improves estimates and uses degrees of freedom efficiently, provided that the assumption is approximately correct. The following summarizes the assumptions for each of these:

  • Periodic functions model the time effects as having a smooth and cyclical pattern on the KPI.
  • Binary indicators model time effects as consistent across affected time periods.
  • Knots model time effects as a piecewise linear trend over time.