Kicking off the first annual meeting of the Casualty Actuarial Society’s second century, outgoing president Bob Miccolis announced the formation of the CAS Institute (abbreviated “iCAS”), a subsidiary that (to start) will offer credentials in data science. At the meeting the questions “what is data science, anyway?” and “how does it relate to actuarial science?” were on many people’s minds. As an actuary whose job title includes “data scientist” and as one with a long-standing interest in the history and philosophy of science, I cannot resist weighing in.
Let’s start with our own field. I often hear actuarial science described as a branch of applied mathematics focused on modeling and pricing insurance risks. I used to parrot this implicit definition myself, but now find it way too narrow a frame. Insurance was an early adopter of probabilistic and statistical methods because of a distinctive feature of insurance products: One does not know the cost of selling an insurance contract at the time of sale. Therefore costing insurance contracts and reserving for insurance liabilities involves more than accounting; it involves statistical inference and forecasting. Actuarial science is inherently a form of data science.
To be sure, insurance has many distinctive aspects. But the use of probability and statistics is no longer one of them. To illustrate, consider two stories. The first is classic adverse selection: An insurer that uses (say) credit score or (say) chess club memberships to selectively market insurance to young male motorcycle drivers can adversely select against its competitors — it can skim off the best risks and offer then attractive rates, while its competitors must raise the rates for its deteriorating book of business. The second story is Michael Lewis’ Moneyball. Billy Beane, the general manager of the cash-strapped Oakland A’s, realized that by basing scouting decisions on data analysis, he could hire talented baseball players that richer teams were blind to. (The “blindness” was in the minds’ eyes of the richer teams’ baseball scouts, who systematically used biased unaided judgment, rather than publicly available data, to make multimillion dollar decisions.)
Actuarial science is inherently a form of data science.
Each story involves what behavioral scientists and economists call decision making under uncertainty. At the time of sale, we don’t know which driver will crash his motorcycle, and at the time of hire, we don’t know which employee will perform well or poorly on the job. The spoils go to the competitor who makes the best use of data. Just as more sophisticated use of data enables nimble insurers to profitably grow, it enabled the cash-poor Oakland A’s to rise up in the ranks. Paraphrasing Michael Lewis, better, data-enabled management can run circles around taller piles of cash. More generally, analytically sophisticated competitors can thrive in inefficient markets, improve inefficient business processes and sometimes even achieve breakthrough innovations.
These are two classic illustrations of data science enabling better business decisions; they can equally well be viewed as examples of what I think of as “greater actuarial science.” The idea is threefold: First, 21st century actuarial science takes on board the continually evolving tools and methods of modern data science. Second, greater actuarial science is not restricted to the insurance industry; it is about professionals making better, more evidence-based, decisions under uncertainty in a variety of private and public sector domains. Third, though quantitative, greater actuarial science is not a branch of applied mathematics; it is an applied quantitative social science, akin to, and overlapping with, such fields as marketing science, people analytics, behavioral economics, and personalized health and wellness.
Data Science
Following Drew Conway’s famous Venn diagram,1 data science is often described as the intersection of mathematical and statistical methods, computing with data and domain knowledge. Data science encompasses each of what the late Leo Breiman called the “two cultures”2 of statistics: using data to estimate parametric models and applying non-parametric “statistical learning” methods to rich datasets (big data). Actuarial applications of generalized linear models, copula models, multilevel/hierarchical models and Bayesian data analysis all fall in the former category. Thanks to the skewed nature of insurance losses, credibility issues, the heterogeneous and/or emergent character of many insurance risks, and the need to forecast uncertain quantities into the future, the use of what statisticians call “generative models” will always be core to our field. But it is equally true that insurance data scientists routinely use such statistical learning techniques as nonparametric techniques as random forests, boosted trees and regularized regression to build better pricing, underwriting, claim triage and price elasticity models.
Contiguous Disciplines
In recent decades, the availability of computing power, data and open-source statistical and statistical learning algorithms have all grown at a roughly exponential rate. Perhaps the same could be said of the awareness of the power of data-driven decision making in many areas of business and public policy. This has resulted in a rapidly growing demand for creative professionals who are equally fluent in the language of business and the methods of data science. Data science actuaries who have built claim fraud, customer churn, price elasticity, predictive hiring or customer segmentation solutions for insurance organizations can do the same for noninsurance organizations. My own experience is that experienced data scientists can successfully work outside their domains by collaborating with nontechnical subject matter experts. Doing so requires more than technical skills alone; also required are creativity and associative thinking, the intellectual curiosity needed to learn new domain-specific concepts, and the ability to communicate with colleagues who are nontechnical or specialists from other domains. In short, the data science revolution enables actuaries of a certain stripe not just to deepen their foundations, but also to expand their professional footprint to include new applications both within and beyond insurance.
Computational Social Science
In many ways, insurance risks pertain to physical things: expensive cars cost more to repair; wood frame houses are more likely to burn down than brick ones; and injured workers with multiple comorbidities are likely to be out of work longer. And yet insurance company underwriting, fraud investigation, marketing, strategic, claims adjusting and hiring decisions are made by people subject to both cognitive biases and organizational pressures. Insureds’ purchasing decisions are influenced by both the way choices are arranged (the “choice architecture”) and such cognitive biases as the availability heuristic3 (one’s estimate of an event’s probability is often a function of how easily it comes to mind). Furthermore, previously unimaginably detailed analyses of insureds’ risk behavior is now possible thanks to the “digital breadcrumbs” we all leave behind as we go about our digitally mediated existences. All of which is to say: There is more to “greater actuarial science” than big data and algorithms. Twenty-first century actuarial science should be viewed as one of the social sciences, not a branch of applied mathematics.
1 http://drewconway.com/zia/2013/3/26/the-data-science-venn-diagram.
2 https://projecteuclid.org/euclid.ss/1009213726
3 http://www.casact.org/community/affiliates/CANE/0412/Guszcza_Rethinking_Rationality.pdf