If you are reading this article, it is most likely because it contains the words big data, even if you are not sure what those words mean.
Take heart. The latest information revolution has a lot of people trying to sort hope from hype. Two actuarial thought leaders lent their perspectives on big data and its massive potential to more than 2,000 actuaries at the opening session of the Casualty Actuarial Society’s Centennial Celebration and Annual Meeting in New York in November.
They also touched on big data’s challenges and the privacy concerns the topic raises.
First, here is some hype, culled from the media by James Guszcza, FCAS, U.S. chief data scientist at Deloitte:
“Data is the new oil.”
“Big data is one of the greatest sources of power in the 21st century.”
“In the past few years we have produced more data than in all of human history.”
The rhetoric sounds overblown, but maybe it is not — not entirely, anyway. It is certainly true that we now have unprecedented ability to gather and store staggering amounts of information. We have computers and algorithms that can sift, arrange and analyze the data in ways that did not exist even a few years ago.
“It’s easy to dismiss all of this as a lot of hype,” said Guszcza, “but there’s something new here.”
Guszcza offered three definitions of big data:
1) A dictionary-style definition: “Data sets with size beyond the capability of standard IT tools to capture and analyze.”
2) A conceptual definition: “Data with a high volume, plus velocity (“It comes at you all the time,” said Guszcza) and variety (not just numbers, but text, photos and videos).”
3) A half-joking definition: “Anything that doesn’t fit in Excel.”
How big is big? Around a petabyte, said casualty actuary Steve Mildenhall, FCAS. That is a million gigabytes, or all of the hard drives of about 10,000 laptops combined.
Big data is different, Mildenhall said, not only because there is more of it.
Traditional data were expensive to collect and store, explains Mildenhall, CEO of analytics for Aon. At the same time, the information was valuable in pricing and underwriting, which justified the time and expense of gathering it.
Insurance claims are a good example of traditional data, Mildenhall said. Adjusters update claim estimates regularly. Actuaries then summarize that information quarterly and then estimate ultimate claims. The process is laborious, each step is taken with great care; however, the results are quite valuable.
By contrast, he said, big data (Facebook likes, Twitter hashtags and smartphone pings) is cheap or free, but no individual datum is particularly valuable.
Invoking an image from eminent statistician David Hand, Mildenhall said “Raw data is like iron ore, a large, bulky useless thing.” The tweets of a teen, for example, are worthless unless combined with a million others. Like drops of water in an ocean, they have little meaning until you see the wave that they form together.
Mathematical models — the actuary’s specialty — detect the wave.
Right now the most famous wave detectors arise from Silicon Valley. Google, for example, noticed that it could spot where people were getting the flu faster than government researchers. The company cleverly tracks spikes of such flu-related terms as “fever” or “cough.”
The best analysis goes past the obvious, Mildenhall stated. If a Netflix bot does no more than recommend sci-fi action movies to fans of sci-fi action, it is not doing much of a job. It not only has to find something a viewer might like, it must find something that the viewer would not have otherwise considered.
The key is turning the information into insight, using analysis and models — familiar territory for property-casualty actuaries. Driving behavior has been known to be linked to age and gender for decades, a fact so well known today that it seems obvious. More recently, credit score data has been linked to auto insurance claims. That link was not well understood at first, but today credit data is increasingly viewed as a reflection of underlying behavioral traits that can also manifest themselves in “risky” driving. Commenting that data volumes are an imperfect proxy for useful information, Guszcza suggested that “behavioral data” might be a more useful organizing principle than big data for thinking about the “digital breadcrumbs” that people increasingly leave behind as they go about their daily activities.
Distilling raw data into actionable insight won’t always be as straightforward as some think.
Far from being a panacea, big data can actually exacerbate data analysis pitfalls. As an example, Guszcza again pointed to Google Flu Trends. Though valuable, the algorithm began to overestimate flu outbreaks because no methodology was in place to recalibrate the model to reflect changes in the Internet search behavior that generated the data. Another example of a big data pitfall: If an analyst tests enough hypotheses, random chance alone makes it likely that some relationships will appear significant, even when nothing is actually happening. This is a major reason why many medical, psychological and sociological findings fail to replicate. As big data becomes more prevalent, so do the risks of false discoveries. To illustrate, Guszcza alluded to a peer-reviewed publication reporting that women tended to wear red or pink when they were in peak fertility. When evaluating such findings, it is good to consider what other hypotheses might have been tested along the way.
Big data also raises privacy concerns. Everyone leaves behind “digital breadcrumbs” from their shopping, Internet searching, networking, driving and travel. “If you have a smartphone, all bets are off,” Mildenhall said. People are more likely to compromise on privacy if they trust the user and also receive value in exchange, he added. But data live forever once they are stored, and no one can predict how they could ultimately be used.
As big data becomes more prevalent, so do the risks of false discoveries.
The ultimate risk for insurers would be too much knowledge, Mildenhall said. If one could predict exactly which drivers will crash or which homes will flood, then the basis for insurance disappears. People who are not at risk would not buy insurance; imperiled people could not afford insurance.
The perfect forecaster, though, seems unlikely. Regardless of what happens in the insurance world, Mildenhall said, big data holds enormous potential — and property-casualty actuaries have the skills to capitalize.
They could end up as the statistical forecasters of the future, both inside insurance and out.
To view the complete CAS conference session, “Big Data — What It Is, and What It Means for the Insurance Industry,” visit the CAS website.
James P. Lynch, FCAS, is chief actuary and director of research and information services for the Insurance Information Institute in New York.