
This fall the CAS released four new reports in Phase II of the CAS Research Paper Series on Race & Insurance Pricing. While three of the papers focus heavily on regulation, model governance and approaches to avoid bias in pricing, the fourth presents a case study with actual data and aims at giving a real-life example of how new variables could remove bias in pricing.
“Balancing Risk Assessment and Social Fairness: An Auto Telematics Case Study” by Jean-Philippe Boucher, Ph.D. and Mathieu Pigeon, Ph.D., does not directly explore the question of race in pricing but does look at other potentially sensitive variables such as gender, marital status, age, credit score and territory of residence to see if their ability to differentiate expected cost in auto insurance pricing can be replaced through the use of newer telematics data. The authors point out that there are concerns about the strong links between ethnicity and variables such as territory, marital status and credit score.
There are many good reasons to read this paper:
1) The paper includes a short overview of the current state of the market for telematics-based pricing usage in the Canadian market and dispels the popular notion that telematics-based pricing programs will be chosen primarily by drivers who know they are good drivers.
2) There is a nice overview of the types of telematics variables available in the data and ways to understand and normalize them for analysis. The paper also shows the relationship of these variables to frequency and severity as well as the overall impact that including telematics variables has on the residual impact of sensitive variables in overall model lift.
3) It provides a good example of the practical application of newer approaches to building pricing models including GLM-net (GLMs with an elastic net penalty term) and a tree-based XGBoost (gradient boosting) approach. The traditional GLM does not do well at fitting models with highly collinear variables, but as the paper demonstrates, some of the variables an insurer may wish to add with telematics are highly corelated with existing pricing variables. This paper could be read as an example of the value that can be gained by moving to these more complex approaches, some of which are now available in off-the-shelf actuarial pricing software packages. The authors share their theoretical approach and tuning process for the models selected and also share their code in a GitHub project.
The authors point out that there are concerns about the strong links between ethnicity and variables such as territory, marital status and credit score.
4) In addition to the paper’s narrative conclusions and charts, the authors share detailed assumptions and a link to their GitHub project and associated website, which allows readers to explore the code and the synthetic dataset. A reader who wanted to try their hand at these newer methods could use this paper as a guide to learn more.
5) On the website associated with the paper, readers can see the initial project proposal that was submitted. This could serve as a template for readers who are interested in responding to future research proposal requests in hopes of furthering the actuarial literature.
6) It’s not as long as it looks! Much of the length comes from useful charts and tables as well as appendices describing the modeling approach and giving brief synopses of prior papers on relevant topics. You might find that the small amount of time spent reading the paper satisfies your bias topics continuing education requirements.
My reading of this paper got me thinking about other questions that are not directly answered in the results. A curious reader might be able to explore these questions in the synthetic data used for the study or when doing analyses on a non-synthetic dataset:
- When we include telematics variables, can we show the degree to which pricing within various sensitive classes becomes more equitable by identifying the higher risk drivers within each class (e.g., unmarried men aged 20-25 with bad credit scores) rather than putting all of them in a single unfavorable rate bucket.
- Are there special pockets of drivers that stick out as having a significant change in expected cost that would be valuable for an insurer to understand either in terms of segmentation of price or in marketing or program design?
- Could an actuary cite a paper like this to convince other stakeholders of the value of moving to more complex modeling techniques or of including telematics data? Here I’m thinking of two main categories of stakeholders: insurance companies and regulators.
I personally found that reading this paper was worth my time, and I commend the authors and, in turn, the CAS for sponsoring this research and making available these results, synthetic data and code.
To read this paper and others in the Race and Insurance Pricing Series, visit the CAS website at https://www.casact.org/publications-research/research/research-paper-series-race-and-insurance-pricing.