Actuarial Expertise

In Praise of Value at Risk

VaR can misbehave,
Hiding dragons in the tail.
Many views reveal.

Propose value at risk (VaR) as a risk measure and you will be the fool in the room. Peers will roll their eyes and will whisper behind your back, “Don’t they know… not subadditive?” Instinctively we reach for tail value at risk (tVaR) confident in its well-named coherence — would a rose risk measure smell as sweet? Nonactuaries have fewer qualms: VaR is alive and well in capital models from Solvency II, A.M. Best (both original and revised capital adequacy ratio) and Standard and Poor’s. The Swiss Solvency Test is an exception, using coherent tVaR.

This Explorations column will begin to explore three questions:

  • When does VaR fail to be subadditive in real applications? That is, when is the VaR of a sum greater than the sum of the VaRs?
  • How significant is VaR’s failure?
  • Is tVaR the only good alternative or are there others?

Insurance is based on diversification and subadditivity expresses that a risk measure respects diversification: The risk of a sum is less than the sum of the risk of the parts. “Less risky” can be measured in a number of ways — broadly classified into location, dispersion and tail measures. Insurers are often regulated and internally managed based on tail risk measures, which motivates our interest in VaR.

Our experience with “tame” loss distributions and normal random variables leads us to expect that VaR should be subadditive. Indeed, this is the case for the family of elliptically contoured distributions that greatly generalizes the multivariate normal, but it is not true for all distributions.

How can a portfolio possibly be more risky than the sum of its parts? A well-known risk management text (McNeil, Embrechts and Frey, 2005) lists three cases for which VaR can fail to be subadditive:

  • Case 1: When the dependence structure is of a special, highly asymmetric form.
  • Case 2: When the marginals have a very skewed distribution.
  • Case 3: When the marginals are very heavy-tailed.

Case 1 is a circus trick. Asymmetric dependence is spectacular and alarming, but generally not a dragon. It can be controlled using tVaR, but also as our opening haiku suggests, by using many views, e.g.  VaR at several different return periods. It is, however, a very instructive trick to learn and an ever-present possibility to consider.

Case 2 is where the dragons live. Dire consequences can follow if they pass unnoticed and the potential skewness of the marginals is ignored. Again, the risk can be controlled by using tVaR or by using many views.

Case 3 is where the really big dragons live. When the marginals are heavy-tailed, there is a complete breakdown of diversification. In this case, I don’t want to pool risk because I want to minimize the number of samples I draw. Glyn Holton offered a great mental picture: You have a choice of drinking from several wells but one of them is poisoned. You clearly won’t “diversify” your risk by mixing water from all the wells — you’ll try one and if you survive you’ll stick with it. For very thick-tailed distributions, tVaR is of no use. The distributions involved do not have a mean and therefore tVaR is not defined. However, many views will still ring alarm bells.

In this issue’s column, we will explore Case 1 in more detail. Subsequent articles will consider the other two cases.

Using VaR at a range of return periods (“many views”) will slay all dragons, whereas tVaR will fail in the face of particularly ferocious foe. Reporting VaR at a number of return periods has long been standard practice within reinsurance (if your broker or reinsurer isn’t showing you a range of return periods it is time for an RFP!) and A. M. Best has recently adopted the idea of assessing tail risk through many views in its stochastic BCAR. It is a theoretically sound approach that works in all circumstances, coherence be damned.

Case 1: Failure of subadditivity driven by dependence structure
Given two nontrivial marginal distributions, X and Y, and a confidence level, α, it is always possible to find a particular form of dependence resulting in a failure of subadditivity! This is very surprising, as it shows that dependence trumps characteristics of the marginal distributions. We shall see that the exact form of the dependence has many unique characteristics.

To be concrete, think of X and Y as samples from the underlying distribution. In cat-model speak they are samples from the yearly loss table. More specifically, suppose that we have samples of 10,000 draws from X and Y and that we are interested in the α = 0.99 VaR. From the definition, we can compute vX=Var0.99 (X) vx = Var0.99 (X) by sorting the X sample from largest to smallest and by selecting the 100th observation, and similarly for Y. (Generally we would select the 10,000 × (1-α) largest observation.)

How can a portfolio possibly be more risky than the sum of its parts?


It is widely appreciated that positive dependence between variables increases the risk of their sum. Therefore, a reasonable first guess for the “worst” possible dependence structure is when X and Y are comonotonic. Comonotonic means that we order the samples X and Y separately from highest to lowest and pair off the resulting elements: The largest value of X with the largest value of Y, second largest of X with second largest of Y and so forth. In many senses, this pairing or dependence structure does produce the most risky sum X + Y, it has the greatest variance and worst tVaR characteristics, for example. However, it does not result in a failure of VaR subadditivity at any threshold α! In fact, it will result in VaR being exactly additive; the α percentile of the sum is simply the sum of the α percentiles of X and Y. There is no diversification benefit, but there is also no failure of subadditivity. The worst α-VaR pairing of X and Y has a more subtle and surprising form.

To find a failure of subadditivity, let’s start by solving a more general problem: How should we combine observations from X and Y so that the αVar of the sum is as large as possible? That is, given our samples xi, yi, i=1,2, …. 10,000 we want to form pairs (xi, yk(i)), which will define a bivariate distribution of X and Y, so that the VaR of X + Y, which has samples xi + yk(i), is as large as possible. The function k(i) defines a shuffle of {1, 2, … , 10,000} as i varies. In other words, we want the 100th largest observation of X + Y to be as big as possible.

The first thing to observe is that we should only pair the 100 largest observations of X with the 100 largest observations from Y. If we have a candidate pairing that does not satisfy this condition, we can make a better candidate by swapping any pairings using observations outside the “top 100” with unused top 100 entries.

We can now abstract the problem as follows: We have n=100 points X and Y that we want to pair to maximize the minimum pairwise sum. How should we pair these n entries? An obvious contender is the crossed pairing: Pair the largest value of X with the smallest of Y, the second largest of X with the second smallest of Y and so forth, ending with a pairing of the smallest value of X with the largest of Y. (Order tied elements arbitrarily.) The crossed pairing makes sense; it does not “waste” any large values by needlessly pairing them together.

It is easy to see that if we are just trying to pair n=2 values from X and Y , we achieve the right answer (see Figure 1). If the X values are x1<x2 and the Y values are y1 < y2, then there are two possible pairings: The uncrossed pairing x1 « y1, x2 « y2 and the crossed pairing x1 « y2, x2 « y1. But clearly x1 + y1x1 + y2x2 + y2 and x1 + y1x2 + y1x2 + y2, so the minimum value for the crossed pairing is greater than or equal to that for the uncrossed pairing. It turns out that the crossed pairing is the optimal answer for any number of points n ≥ 2 (see the Appendix for details).

Figure 1: Crossed (cyan) and uncrossed or comonotonic (gray) combinations or (x1, x2) and (y1, y2). The filled cyan circles represent the aggregate assuming crossed dependence and filled gray assuming uncrossed. The maximum minimum value is the lower cyan circle corresponding to the crossed arrangement.

It is a general theorem, first proved independently by Makarov in 1981 and Ruschendorf in 1982, that an analog of the crossed arrangement gives the maximum VaR for any two distributions X and Y, and not just for equally likely discrete samples. The proof relies on a famous paper by Strassen written in 1965. It is surprising that this result was not known until 1982.

Getting back to our original problem, note that the crossed pairing will violate subadditivity if all the samples from X and Y above their respective α-VaRs are different, because each term in the crossed pairing is greater than the sum of the individual VaRs! There are several important points to note about this failure of subadditivity.

  • The dependence structure works for any nontrivial marginal distributions X and Y —it is universal.
  • The dependence structure is tailored to a specific value of a and does not work for other values of α. It will actually produce relatively thinner tails for higher values of α than either the comonotonic copula or independence. In this sense it is a peculiar example. It is not hiding dragons; in a way, it creates a phantom dragon at a particular α.
  • The implied dependence structure only specifies how the larger values of X and Y are related; for values below the α-VaRs of X and Y, any dependence structure can be used.
  • The dependence structure does not have “right tail dependence”; in fact it is the exact opposite.

The crossed dependence is hard to generalize to three or more marginal distributions. Whereas it is easy to create maximal positive dependence for any number of variables (the comonotomic copula), it is much harder to create maximal negative dependence between three or more variables. The reason is that if X and Y are negatively correlated and Y and Z are negatively correlated, then X and Z will tend to be positively correlated. Recently, Embrechts, Puccetti, and Ruschendorf, (2013) have shown that iteratively making each marginal cross with the sum of the other marginal distributions gets close to the optimal solution and provides a usable algorithm to compute the worst VaR dependence structure for n ≥ 3 variables. Their method is called the “rearrangement algorithm,” which will be explained in a future column. Future columns will also explore skewness and thick-tailed exceptions to subadditivity.

Editor’s Note: An appendix to this Explorations column is available for download.


Embrechts, Paul, Giovanni Puccetti, and Ludger Ruschendorf. 2013. “Model uncertainty and VaR aggregation.” Journal of Banking and Finance 37(8): 2750–64. doi:10.1016/j.jbankfin.2013.03.014.

Makarov, GD. 1982. “Estimates for the Distribution Function of a Sum of Two Random Variables When the Marginal Distributions Are Fixed.” Theory of Probability & Its Applications 26(4). SIAM: 803–6.

McNeil, Alexander J., Paul Embrechts, and Rudiger Frey. 2005. Quantitative Risk Management: Concepts, Techniques, and Tools. Princeton University Press. doi:10.1198/jasa.2006.s156.

Ruschendorf, Ludger. 1982. “Random Variables with Maximum Sums.” Advances in Applied Probability 14(3): 623–32. doi:10.2307/1426677.

Strassen, V. 1965. “The Existence of Probability Measures with Given Marginals.” The Annals of Mathematical Statistics 36(2): 423–39.