Can a Machine Learn to Do Actuarial Work? Is that the right question?

This essay is one of five essays that were submitted in response to the CAS Publications Department’s call for essays on the “Intersection of Artificial Intelligence and Actuarial Science.” See the CAS 2024 Summer E-Forum for three other essays, including the prize-winning essay by Ronald Richman.

During the COVID lockdown, I found myself unemployed as an actuary for the first time in my career when my previous employer surprised us by shutting down our division. Although I could have taken a different position in the company, I really loved what I was doing and had no interest in doing what I consider more mundane work. Yes, I’m a bit of an actuarial snob, and frankly this has been a lucrative career for me. But after working for 30 years in a dynamic profession that exercised my brain every day, it was hard to just hit the brakes and learn to play pickleball. Instead, I enrolled in an online Data Science Boot Camp with Vanderbilt University8 and learned how to pickle a machine learning (ML) model. Yes, that is a thing. Google it!

My interest in learning what those data scientists were doing hatched when my prior employer was looking for projects that their new data science team could take on. Although they had hired a team of very bright data scientists, management didn’t seem to know what to do with them! Recently, I ran into two similar situations at other companies. I spoke with a young man who had been hired to be a data scientist at a very large multi-national company. He said that there was no job description, and the company didn’t know what to do with him, so he was considering the actuarial track instead. Another colleague had a large group of data scientists that didn’t understand that lots of data didn’t necessarily mean lots of valuable insights.

Oh yes, they could program beautiful interfaces, but they just weren’t doing anything to help analyze the product. It seems that the insurance industry, in some cases, may feel it needs to get on the data science bandwagon but doesn’t know what to do once on board.

In my case, I had a great project that I did not have time to do myself and jumped on the offer to get some free help. For this project, in collaboration with the underwriters, I had already created a rating model that uses government data to calculate a rating score for individual risks. The underwriters were spending many hours looking up the data online then manually typing it into the rating model. With a little digging, I found the online database underlying the needed information. In collaboration with the data scientists, we added a button to the rating model that facilitates going out to the internet to grab the data directly and load it into the rating model thereby saving the company hours of underwriter work on each account.

So why am I sharing this with other actuaries? Well, “data science” may be last year’s buzz word for technology, but AI is the current buzz word. From my experience working with data scientists, I could see that data science technology could really help with the efficiency of both underwriting and actuarial work. However, it was clear that neither the underwriters nor the data scientists could identify how to bring it all together. It is the actuaries who have the big picture and really need to be involved with all this evolving technology. As my underwriter colleague puts it, “The main problem is data scientists need to have context with regards to the data they are using. There needs to be a real understanding of the risk and insurance product.” Actuaries learn this context through the exam process and master it as part of their jobs. After the immediate success of this collaborative project, I became curious to learn about the technology that data scientists are using. Little did I know that the data science boot camp I signed up for would also lead me down the path of machine learning, which is the backbone to AI. I have decided to write this essay in order to share my observations on machine learning and the role of the actuary.

Background
It is easy to find a definition of artificial intelligence online. According to Britannica, “Research in AI has focused chiefly on the following components of intelligence: learning, reasoning, problem solving, perception, and using language.” Frankly, when I was studying machine learning, I wasn’t thinking that it was the same as the artificial intelligence that we hear about today. However, according to Google Cloud Services, “ML is an application of AI that allows machines to extract knowledge from data and learn from it autonomously.” (Google, n.d.) One definition of machine learning I found while studying how to deploy an ML model is “Machine Learning models are powerful tools to make predictions based on available data” (Sahakyan, 2019). This definition should sound familiar to actuaries since that is exactly what we do — use available data to predict what next year’s results, loss costs, trends, etc., will be. We could say that actuaries are powerful humans that make predictions based on available data. So, it seems that maybe machine learning might be used to replace actuaries . . . or maybe not.

In case you don’t know, machine learning involves dividing a dataset into a training portion and a test portion , programming around the training dataset, running the program on the test dataset and evaluating the results of the model using statistical analysis. This process is just like what we do every day as actuaries. We study the data we have, which is our training dataset and come up with models and equations that we then apply to future or new data to make predictions and test the results to see if the model works. It has been shown that AI can make art with DALL.E 2 and ChatGPT can write a great term paper and create an outline for a book. But can AI do actuarial work? I was intrigued.

The boot camp experience
At the end of the boot camp, we had to do a group project using machine learning to create a tool with which users could interact and retrieve desired information. We had to find an online database that was available to the public. Our team chose to use the Airbnb API (application programming interface), which provides information on historic rentals.9 The first takeaway from this is that there are many datasets available for free or for a fee everywhere. For example, if you want to get Census data from the government, you can directly query the data using their APIs rather than downloading Excel files.10 We also learned that programmers (and the boot camp instructor) used Google as their main source for figuring out how to code something or find data. So just Google whatever data you are looking for followed by “API.”

Using publicly available free data for this project was interesting because the other team members did not seem to have any experience using datasets that have not been vetted and cleaned. Part of data science is using programming skills, usually Python, to clean the data. This process means you have to identify what is wrong with the data in the first place before you can clean it. I found that the other members were not too interested in that part of the project. We ran the common data cleaning algorithms we had learned. But after that, the group kept trying to come up with different ways to look at the data to see if they could improve the statistics rather than looking at the underlying data itself to see if, in fact, it was good and could be used for predicting anything. When I did a deep dive into the data, it was full of errors, such as duplicate entries and outliers that clearly were not correct. Even though data scientists have tools to clean the data, it takes asking the right questions to find out if there is a problem. For example, once we saw that the averages looked strange, we needed to check the full range of values. When you find a location that rents for $100,000 a night, you might want to question that! (It turned out to be a mansion in Nashville that rented out for movie/music shoots.) So just as in actuarial work, someone must look closely at the data to make sure that it is descriptive of what is being estimated. But that is not usually part of the data scientist’s or a programmers’ job. So, who is responsible for data integrity?

The end result of the project was that we developed a nice interface with which users could interact, but for which the statistics underlying the information provided showed little credibility. The machine didn’t learn very well because the input data was poor. However, there is no reason or way for any user to know that it is a statistically poor app, which means the results cannot be trusted! This, of course, is the problem with AI models — which is the same as with any actuarial insurance model. Garbage in, garbage out. Think about this real-world example. When pricing large accounts, my experience was that the underwriters would remove the largest claims because they felt that these types of claims were “one off” and would never happen again. On one account, as usual, we had only 10 years of data, and there was a very large claim. The underwriters were arguing to remove this claim. So, when I went back through 20 years of submissions (30 years of data), I found that this one risk had an $80 million claim every three years! And lo and behold, the underwriter experienced an $80M loss on the risk after they wrote it. What if we used the underwriter dataset to train a model to price these risks in the future?

In machine learning models, there are all sorts of ways to attack the learning problem and all sorts of statistics with which to evaluate the results. With this project above, we used multiple types of ML models and all sorts of statistics to evaluate the models. But none of the different models or statistics ever improved the results. That was because the input data was faulty. This is the problem with all AI. You can go out and grab data everywhere via databases or web scraping tools (yes, we learned that too!), but if the data is not properly vetted, you just don’t know what you get. We were able to see the results of other groups’ projects, which had sleek front ends with lots of impressive graphics. There is something very alluring about a dashboard with all the bells and whistles and a sleek appearance. But when the data is fundamentally flawed, that front end is all smoke and mirrors.

AI has created a whole new field of employment called prompt engineers. AI is so dependent on the need to ask the right question in order to get an intelligent response, that they are paying people a lot of money to help design how to ask the chatbot the right questions. This is recognition to some extent that the data underlying this AI application is not complete, and one needs to be very specific about how the question is phrased in order to find the answer. I recommend that everyone go and try ChatGPT themselves.11 If you ask it for something like, “What are current actuarial loss costs trends?” The answer is: “I don’t have real-time data, and my training only includes information up to January 2022.”

ChatGPT goes on to make a further recommendation:

Keep in mind that actuarial analysis is a complex field that involves predicting future events based on historical data and statistical models. Therefore, consulting with actuaries or experts in the field may provide more detailed and accurate insights into the specific trends you are interested in.

So, currently your actuarial job is secure, at least from ChatGPT. On the other hand, it does a very nice job of giving you a book outline on any subject. I asked it to write a book about my grandmother, who has an unusual name. It came back with an interesting outline about a person living in outer space.

To be fair, ChatGPT provides a disclaimer about the data integrity.

Remember that while I strive to provide accurate and helpful information, I may not always have the most up-to-date or real-time data. If you have a specific task or question in mind, feel free to let me know how I can assist you!

Now that I have learned some of these technologies, I can see how we can use them to streamline a project I have worked on for the last 20 years. I recently published the trend model I developed for medical malpractice indemnity trends.12 Using data science tools, my colleague, Kristen Clark, and I have put together a Python-based model that organizes and cleans all the National Practitioner Databank13 data, including accumulating related claims and Fund state claims, and then produces a trend analysis. It used to take two months of work to process one state at a time. Now, it is done in 15 minutes. Our next step is to create an ML model in which we feed the model additional external data to see if we can teach it to predict when the indemnity loss cost will increase again. As AI methods and models become more mainstream, I hope that actuaries will take the time and effort to learn them through a data science or AI bootcamp. The CAS is currently offering seminars on some of the tools listed above, which is a valuable resource.

Conclusion
AI and data science techniques will have a huge impact on the actuary of the future. We will no longer be spending days and nights programming and cleaning data because we will have data scientists to do that for us. Actuaries will be the insurance professionals that design the questions and structure the problems that artificial intelligence will be used to help solve because we have the context with which to do that. With the advent of more advanced data science and AI tools, the actuarial job may very well move to what it should be. Actuaries should be spending their time doing the thinking and analytics and letting the machine do the processing work.

Actuaries need to keep up with new technologies and learn to employ them in their own jobs. So instead of being fearful of what AI might do in the future, actuaries need to consider learning about the technology and seeing how it can make their jobs more efficient. Quite possibly the role of the actuary in the future may include chief data integrity officers. ●

Betsy Wellington, FCAS, is a retired actuary and independent consultant.

Works cited
Google, C. (n.d.). artificial-intelligence-vs-machine-learning. Retrieved from cloud.google.com: https://cloud.google.com/learn/artificial-intelligence-vs-machine-learning

Sahakyan, E. T. (2019, September 19). create-an-api-to-deploy-machine-learning-models-using-flask-and-heroku. Retrieved from lizziecodes: https://lizzie.codes/author/lizziecodes/

Sidebar:

I have to say that the boot camp was quite intense in terms of how much we learned in a 10-week span. Here is a sampling of tools and resources we explored:

• Anaconda
• APIs
• AWI
• Clone Repository
• Deploying Models
• Flask
• Gitbash
• Github
• Hadoop
• Heroku
• HTML
• Javascript
• JSON
• Jupyter Notebook
• Machine Learning
• Matplot
• Neural Networks
• Pandas
• pgAdmin
• Pickle a ML Model
• Pip install
• Plotly
• Postgres
• Python
• R
• Regex
• Scikit-learn
• Spark
• SQL
• Tableau
• Unsupervised Learning
• VBA
• WeatherPy
• WebApp
• Web Scraping

If this list seems like a foreign language to you, I recommend you consider a bootcamp experience. You can learn a lot and put it to use quickly. Although I am not an expert in any of these applications now, I was able to see how much these tools could help in an actuarial environment. I have never counted how many programming languages I have had to learn over the last 30 years, but it was a lot. It has become too easy to rely simply on Excel to do everything in our jobs. These new technologies can be very useful and in fact make the actuarial job much more efficient.