
“We had a large dataset and a pricing model that fit the data well. But then the results on our overall portfolio came in way outside the predicted range. What went wrong?”
One possible explanation for this hypothetical scenario is the assumption that our historical data is “independent and identically distributed” (iid). Most statistical models have a default assumption that all of the observed data points are independent. This is a convenience that helps with computation by keeping the math relatively simple.
In reality, the data collected for actuarial work are subject to multiple sources of dependencies. For example, a company’s loss data is only from policies that the company wrote — business quoted, but not written, is not included in the data. There may also be dependencies for losses subject to similar inflation costs or methods for setting case reserves.
These dependencies can be difficult to model, but what do we lose if they are ignored and we make a naïve independence assumption?
Generalized least squares
The tool for including correlation in a regression model is to move from ordinary least squares (OLS) to generalized least squares (GLS).[1]
In OLS with independence assumption, for a response variable y and design matrix X, the model parameters are:
b^ = (XT ∙ X)-1 ∙ XT ∙ y
If the observations for the response variable y are related by a known covariance matrix V, then the solution is generalized from the OLS calculation as:
b^ = (XT ∙ V-1 ∙ X)-1 ∙ XT ∙ V-1 ∙ y
So, what happens if we should be including the covariance V, but ignore it to keep things simple?
The good news is that getting the covariance structure wrong does not introduce bias. The fitted parameters are not “wrong” in any systematic way. But it does mean that we are not using the best estimate of model parameters.
This has two implications for our modeled results:
- We are not using our data efficiently: Too much weight is assigned to “noisy” parts of the data and not enough weight is assigned to the more stable parts of the data.
- Standard errors on the parameters (and therefore also p-values) are understated, giving false signals about which predictors to include and overly narrow confidence intervals around predictions.
In practice, ignoring covariance means that the model parameters may change in surprisingly large ways when new data comes in.
A special case: compound symmetry
The GLS mathematics (or extensions into generalized linear models, or GLMs) give us a method of accounting for correlations in a predictive model, but the question of how much correlation to include is more difficult. This problem is addressed by selecting a parsimonious correlation structure with as few parameters as needed. The “compound symmetry” structure is a particularly interesting special case.
The case in which there is an equal correlation coefficient r between any two observations is called the “exchangeable” or “compound symmetry” structure:
In actuarial language, the compound symmetry covariance structure corresponds to a “common shock” or “common mixture” model. This idea has been described in papers by Wang (1998), Meyers (2007) and Ferrara and Li (2015). We can think of this as each risk in our model having a random component that is independent of other risks, plus a common random variable shared by all of the risks.
One way to think of this common shock (in a multiplicative model) is to suppose that I model a complete class rating plan and then, at the last minute, I am told that the data provided was in Euros rather than U.S. dollars, as I had assumed. I would not need to refit the whole model, because I could just rescale it for the correct exchange rate.
A feature of the compound symmetry structure is that the estimated model parameters b^ will be the same for GLS and OLS, regardless of the selected correlation value r. The only change in the resulting statistics is that the standard error on the intercept term is greater when r>0.
So how much does it increase by? A lot.
The best way to see this is by considering the effective number of points in the model. In OLS, the variance of the parameter estimate decreases by dividing by the number of observed points, n. When correlation is introduced, we instead use neff. The relationship between the two under compound symmetry is given below.[2]
neff=n/((n-1)∙r+1)
neff =1/r
The effective number of points is constrained by an upper limit based on a positive correlation coefficient r. Specifically, neff<1/r. This means that even if we have a “big data” set with, say, n=1,000,000 observations, a small correlation of r=.01 will reduce the effective sample size to only n=100 points.[3] Even a seemingly small amount of correlation can greatly increase the standard error on the intercept term.
So, the good news under compound symmetry is that the rating relativities are all correct; the bad news is that the overall level (the model intercept) is highly variable. When there is a common shock — say, a spike in inflation — it is cold comfort to know that all segments of the portfolio are going bad at the same time.
References
Faes, C., Molenberghs, G., Aerts, M., Verbeke, G., & Kenward, M. G. (2009). The Effective Sample Size and an Alternative Small Sample Degrees of Freedom Method. The American Statistician, 63(4).
Ferrara, P. G. & Li, Z. M. (2015). Advances in Common Shock Models. Variance, 9(1). https://variancejournal.org/article/127621
Hilbe, J. M., & Hardin, J. W. (2013). Generalized Estimating Equations (2nd ed.). CRC Press.
Meng, X.-L. (2018). Statistical Paradises and Paradoxes in Big Data (I): Law of Large Populations, Big Data Paradox, and the 2016 US Presidential Election. The Annals of Applied Statistics, 12(2), 685–726.
Meyers, G. (2007). The Common Shock Model for Correlated Insurance Losses. Variance, 1(1), 40–52. https://www.casact.org/abstract/common-shock-model-correlated-insurance-losses
Wang, S. (1998). Aggregation of Correlated Risk Portfolios: Models and Algorithms. Proceedings of the Casualty Actuarial Society, LXXXV, 848–939. https://www.casact.org/abstract/aggregation-correlated-risk-portfolios-models-and-algorithms
Dave Clark, FCAS, is a senior actuary with Munich Re.
[1] These ideas also apply in GLMs, which become part of generalized estimating equations (GEE) when a covariance structure is introduced. The correlation matrix is included in the “iteratively reweighted least squares” (IRLS) step of the GLM calculation. Hilbe and Hardin (2013) provide a book-length treatment of GEE.
[2] A fuller discussion can be found in Faes et al. (2009).
[3] Meng (2018) has a similarly surprising result for “big data” cases where data may be subject to hidden dependencies because of nonrandomized collection. He coined this as the “big data paradox.”