Actuarial Expertise
Explorations

Smoothing Splines for Trend

Business Problem

Severity trend for insurance losses is typically estimated using loglinear regression of average loss amounts on historical period (accident or calendar year). This results in a single trend estimate for a block of years. This method is sufficient if the trend can be assumed to be constant over time.

If the trend is changing over time, a simple regression model may no longer be appropriate. An alternative is to use the actual year-to-year changes to produce an inflation trend index. The main disadvantage of this is that year-to-year changes can be highly volatile and produce unusual patterns if used in pricing.

Smoothing splines provide an easy tool for a compromise between a single trend for all years and a different trend for each year.

Smoothing Splines as a Blending Method

Smoothing splines are useful in smoothing noisy time series. The degree of smoothing is controlled by a single parameter, which plays a role much like the K value in Bühlmann credibility.[1] A very small smoothing parameter means that each historical period stands on its own; a very large smoothing parameter means that the individual periods move towards a simple linear regression.

The smoothing spline model results in a curve that comes as close to the data as possible (by minimizing squared error) while also being subject to a penalty to avoid too much wiggle in the curve (penalizing the second derivative or curvature).

By construction, the resulting smoothing spline is a natural cubic spline with linear extensions beyond the last data point. This allows us to extrapolate to prospective periods using a constant trend that is most heavily influenced by the latest points. The extrapolation to future periods works with the smoothed data points, rather than the actual data points, and so the forecast to future periods has some stability. The use of the spline for forecasting has been well-studied and found to be a special case of an ARIMA time-series model (see Hyndman et al.).

The smoothing parameter can also be mapped to a measure of degrees of freedom or “effective number of parameters.” A large smoothing parameter leads to a two-parameter model (a linear fit), whereas a small smoothing parameter leads to a curve that interpolates the actual data, with the number of parameters equal to the number of points in the time series.

When performed on the logarithms of the severities, a very large smoothing parameter approaches the traditional loglinear regression as a limiting case.

The algorithm for calculating smoothing splines is available in most statistical software. The central calculation is the “Reinsch algorithm,” which requires a bit of linear algebra, but can easily be programmed by an actuary (even in Visual Basic for Excel ????).

An Example

To illustrate this technique, the graph below shows data from the U.S. Bureau of Labor Statistics for the inflation on hospital costs. This is one of the components that go into the Consumer Price Index. We first take the logarithms of the cost index and then calculate the spline on that sequence.

In this example, a smoothing parameter of 4.0 was selected, implying a curve with effectively 8.7 parameters on a time series of 31 points. While there are many discussions on how best to choose the smoothing parameter, it is easy to simply try a few values and select a desired level of smoothing based on aesthetics of the graph on the following page.

The smoothed curve is extended to future periods as a constant trend, represented in the graph as a flat line for 2021 to 2023. The analyst is not required to take this smoothing spline as the final answer, but it does provide a quick way of identifying changes in the inflation rate over time without overreacting to any single point.

The reader is encouraged to try this technique on their own severity series and see if it offers insight into changing trends over time.

Further Extensions and Research

Beyond the generic version of the smoothing spline, additional tools are available:

  • Weighting functions can be introduced to give more weight to some periods and to deemphasize other points.
  • Confidence intervals or ranges around the smoothed curve can be produced (see Hyndman et al. or Wahba (1983)).
  • Alternative smoothers, such as local regression (LOESS) or regression splines (where “knots” are assigned by the user) can be tried as alternative methods.

References

Hastie, T., R. Tibshirani, and J.H. Friedman, The Elements of Statistical Learning: Data Mining, Inference, and Prediction, 2nd ed. New York: Springer, 2009, https://web.stanford.edu/~hastie/ElemStatLearn/. [See Chapter 5: “Basis Expansion and Regularization.”]

Hyndman, R and M.L King, I. Pitrun, and B. Billah, “Local linear forecasts using cubic smoothing splines,” Australian and New Zealand Journal of Statistics 47:1, 2005, pp. 87-99, https://robjhyndman.com/papers/splinefcast.pdf.

Wahba, Grace, “Improper Priors, Spline Smoothing and the Problem off Guarding Against Model Errors in Regression,” Journal of the Royal Statistical Society, Series B (Methodological) 40:3, 1978, pp. 364-372.

Wahba, Grace, “Bayesian ‘Confidence Intervals’ for the Cross-validated Smoothing Spline,” Journal of the Royal Statistical Society, Series B (Methodological) 45:1, 1983, pp. 133-150.

Wu, Tongtong, “Introduction to Smoothing Splines,” 2004, online lecture slides, https://www.scribd.com/presentation/421924201/smsp-ppt.

R Packages

smooth.spline() function in stats package
smooth.Pspline() function in pspline package

[1]  This similarity is not coincidental. Smoothing splines can be interpreted in a Bayesian framework. See Wahba (1978), as the pioneer for this connection. Hastie et al. show that smoothing splines can be arranged as generalized ridge regression, also with a Bayesian interpretation.


Dave Clark, FCAS, is a senior actuary in corporate pricing & underwriting services for Munich Re America Services, Inc. in Princeton, New Jersey.