Jump to:
- How the
knots
argument works - Choosing the number of knots for time effects in the model
- Other approaches for time effects: binary indicators and periodic functions
How the knots
argument works
Meridian uses a time-varying intercept approach for modeling time effects ( Spline (mathematics), Wikipedia. Ng, Wang, & Dai. 2021). This approach models time effects \(\mu = [\mu_1, \dots, \mu_T]\) for each of the \(T\) time periods (a weekly-level, three-year MMM has \(52 \times 3\) time periods). The \(T\) time effects are modeled with possibly fewer than \(T\) many parameters using the relationship:
\[\mu = W \ast b\]
Where:
\(\mu\) is \(1 \times T\) representing the effect of each time \(t=1, \dots ,T\), \(W\) is a \(T \times K\) deterministic weight matrix
\(b\) (called
knot_values
in Meridian) is \(K \times 1\) where \(K \leq T\).
Bayesian posterior inference is done on \(b\), which is translated in terms of \(\mu\) according to the weight matrix \(W\). The number of knots \(K\) is determined by user input. The weight matrix \(W\) is determined by the L1 distance of a time period to the two neighboring knots.
To clarify how L1 distance determines the weight matrix, consider time period \(9\), where the two neighboring knots are at \(6\) and \(11\). The L1 distance from time period \(9\) and knot \(11\) is \(2\). The L1 distance from time period \(9\) and knot \(6\) is \(3\). So, the knot at \(6\) gets weight \(0.4 = 1 - \frac{3}{2+3} \) and the knot at \(11\) gets weight \(0.6 = 1 - \frac{2}{2+3} \). The weighted average of these two neighboring knots determines the value of \(\mu_9\).
Notice that when knots < n_times
, there is some level of dimensionality
reduction going on. The n_times
periods are modeled with fewer than n_times
parameters. The weight function determines how the time periods are combined.
Choosing the number of knots for time effects in the model
When you think about how to set knots
of ModelSpec
, it is helpful to think
of the two extremes: knots can be anywhere from one to the number of time
periods (n_times
). When knots = n_times
, there is no dimensionality
reduction and each time period gets its own parameter. In a geo-level model,
having as many knots as time periods is identifiable because you have multiple
geos, and therefore multiple observations, per time period. When knots = 1
,
all time periods are measured with a single parameter which is equivalent to
saying time has no effect. This absence of effect becomes a common intercept for
all time periods.
When 1 < knots < n_times
, you are in the middle of these two extremes. You can
try a range of values that span the space of eligible values. For information
about how to think about the middle of these two extremes, see
Bias-variance trade-off.
We recommend that you try the following:
Geo level models should start at the default (
knots = n_times
). If you notice that overfitting is extreme or media effect estimates are unrealistic, then consider reducing the number of knots. The need to reduce the number of knots is more likely to apply as the number of geos per time point decreases.National level models should start at the default
1
knot and increase the number of knots from there. Continue to increase until overfitting becomes extreme or media effect estimates become unrealistic.A similar number of knots can return similar results, such as
knots = 10
andknots = 11
, so it can be helpful to spread out the values that you want to try.
For information that might help you develop algorithms for knot selection, see Knot selection in sparse Gaussian processes with a variational objective function in the Wiley online library.
Bias-variance trade-off
It can be helpful to think of the setting the number of knots as a bias-variance
trade-off. When knots = n_times
, each time period gets its own parameter and
so the effect of a given time period is estimated using only data from that time
period. However, knots = n_times
is high-variance because of less data points
available at a given time period.
When knots < n_times
, each knot is estimated using the data of nearby time
periods, with closer time periods getting more weight. Since the two closest
knots determine the inference of a particular time period, the effect of a given
time period is estimated by that time period's data and by nearby time periods'
data. As the number of knots decreases, nearby time points are more and more
influential on the inference for a particular time point, with the closer time
points getting more weight. This decreases variance because more and more time
points are used to estimate the effect of a given time period. However, the data
isn't from the given time period, which increases bias.
In summary, more knots reduce bias in time effect estimates, while fewer knots reduce variance in time effect estimates. As an analyst, you can tune where on the bias-variance trade-off you want to be. If time is an important confounder between media and the KPI, then the bias-variance trade-off in estimating time effects translates to a bias-variance trade-off in estimating the causal effects of media.
Additionally, you can choose to have different bias-variance trade offs for
different time regions. You do this by setting knots
to a list, which
specifies knot locations. Knot locations can be dense in areas where you prefer
low bias in the estimates (such as a holiday season), and sparse in areas where
the analyst prefers low variance in the estimate (such as an off-holiday
season).
When to consider using fewer knots
When you set the number of knots, it also can be helpful to think about how the time period affects the media execution. Control variables should be confounding variables that impact both media execution and the KPI. For more information about control variables, see Selecting control variables.
Similar logic applies for time. If the time period isn't a factor for media execution, then time isn't a true confounding variable and you can avoid spending too many degrees of freedom on modeling time many knots. Advertisers need to consider whether the time period plays a role in planning around media execution. For example, a travel brand's media planning likely depends on the time period. A snack brand may have more consistent media planning across time periods. Also, consider whether time is really the important confounding variable, or if time is a proxy for some other variable that can be directly modeled, likely with fewer degrees of freedom. For example, was time really the confounding variable that drove media execution? Or was it how many COVID cases there were nationwide? Advertisers know their own media planning strategy and have insight into these topics.
When you must use knots < n_times
There are situations where you must set knots < n_times
, for example, in a
national-level model where you don't have multiple observations per time period
and there are not enough degrees of freedom for each time period to get its own
parameter. Note that some level of dimensionality reduction is necessary.
Another example is when you must include a national-level media or
national-level control variable. By definition, national-level variables change
over time but not over geo. Such a variable is perfectly collinear with time and
is thus redundant with a model that has a parameter for each time period. If you
set knots
close to n_times
, you technically might have an identifiable
model. However, it still might be
weakly identifiable
and lead to problems. Given the concerns around estimating time effects in a
national model, it is even more important to have high quality controls in a
national model than in a geo model. For more information about high quality
controls, see
Selecting control variables.
Other approaches for modeling time effects: binary indicators and periodic functions
You can create and input binary indicators or periodic functions as control variables to model time effects in Meridian. Each offers advantages in certain cases.
Binary indicators
A binary indicator takes a value of 1 when a condition is met and 0 when it is not. For example, a value of 1 to indicate all time periods in December and a value of 0 otherwise. In Meridian, binary indicators can be used as control variables to model time effects that are consistent over a set of time periods and optionally vary by geo. Knots and binary indicators can be used together, but be mindful of the total number of parameters used to model time effects.
Consistent effects
A binary indicator can cover multiple time periods, which assumes that the KPI effect (per capita in a geo model) is consistent across all time periods. A binary indicator uses multiple time periods to estimate a consistent effect, which improves estimates and uses degrees of freedom efficiently, provided that the assumption of a consistent effect is approximately correct.
The indicator has no effect on any time period outside of those indicated, whereas placing a knot at a specific time period will affect the neighboring time periods up to the next adjacent knot.
Modeling time effects as consistent over a set of time periods may be appealing for a national model, where imposing structure can help stabilize estimates. For a geo model, the flexibility of using many knots is often preferred.
Geographical variation
When a binary indicator is used as a control variable in a geo-level model, it is estimated as having a geo-dependent effect. This is ideal for events where you expect the impact to differ by region, like the Super Bowl having a larger impact in the host city. In contrast, knots estimate time effects that are not geo-dependent. Knots create a flexible spline function to capture time-based patterns that are shared across all geographies. This makes them more parameter-efficient if you don't expect geo-dependent time effects.
Turning off geographical variation for a binary indicator
One may want to use a binary indicator without geo-dependent time-effects.
You can do this by setting the prior for its hierarchical variance, xi_c
, to a
point mass at zero. In which case, each geo-specific coefficient for the binary
indicator will be identical. To turn off geo-effects for all control variables,
set their variance prior to a deterministic value of zero:
xi_c = tfp.distributions.Deterministic(0)
To turn off the geo-effect for just one specific control variable, you can set
the scale
of its prior to zero. For example, if you have four control
variables and want to turn off geo-dependent effects for the first:
xi_c = tfp.distributions.HalfNormal(scale=[0, 5, 5, 5])
Periodic functions
Another option is to add a periodic function, like a Fourier series, as a control variable. Periodic functions can be an attractive alternative to knots, especially in national models.
Periodic functions model the time effects as having a smooth and cyclical pattern on the KPI (per capita KPI in the case of a geo model). Periodic functions are a strong parametric assumption about how time affects the KPI. It may be appropriate for a national model, where imposing structure can help get a stable estimate of seasonality. For a geo model, the flexibility of using many knots is often preferred, as it doesn't force a smooth and cyclical pattern on the KPI.
Practical recommendations
Recommendations depend on whether the model is a geo or national model.
Geo models
Binary indicators can be used in geo models to model geo-dependent time effects. Knots and binary indicators can be used together, but be mindful of the total number of parameters used to model time effects.
- For time effects that are not geo-dependent, use knots: if you want to model a temporal pattern that is consistent across geos, knots provide flexibility without risking over-parameterization.
- For time effects that are geo-dependent, use a binary indicator: if you have a strong hypothesis that an event's impact varies by geo, a binary indicator as a control variable is the right tool.
National models
National models especially benefit from parsimony, which can be achieved with binary indicators, periodic functions, or a few well-placed knots. These can be used together, but be especially mindful of the total number of parameters given the importance of parsimony in a national model. Each of these options improves estimates and uses degrees of freedom efficiently, provided that the assumption is approximately correct. The following summarizes the assumptions for each of these:
- Periodic functions model the time effects as having a smooth and cyclical pattern on the KPI.
- Binary indicators model time effects as consistent across affected time periods.
- Knots model time effects as a piecewise linear trend over time.