“For all the damage that misapplied data can do, data used correctly is a powerful positive force.” — Cathy O’Neil, On Being a Data Skeptic
Big data is one of the signature issues of our time and also one of the most poorly understood. My previous column discussed what might be called “two dogmas of big data.”
First: Data volume, variety and velocity are at best an imperfect proxy for useable information.
Second: Big data does not diminish the need for scientific and statistical methodology.
If anything, the opposite is the case. It is a sign of our data-infused times that this point is often at the heart of major news stories. For example the clever Google Flu Trends algorithm, long a poster child for big data innovation, began overestimating flu outbreaks because suitable methodology was not in place to account for changes in Internet search algorithms and behavior. Another example is the replication crisis in science: The more analytical options you explore and hypotheses you test, the more random chance tends to yield false discoveries. Most notoriously, a prestigious academic journal recently published a study reporting statistically significant evidence for “psi phenomena”: a precognitive ability to anticipate the future. Unsurprisingly the findings subsequently failed to replicate. (Readers in the mood for a playful take on the episode can try Googling “Daryl Bem Colbert Report.”)
Does this mean that “big data” is meaningless or irrelevant? No. But the naïve thinking about how “more is different” can lead to both poor scientific methodology and muddled strategic planning for data analytics.
I propose that, particularly in personal insurance and health care analytics, “behavioral data” would be a more useful organizing principle than big data (or at least a complementary one). The familiar use of credit data to help underwrite and price personal motor and homeowners insurance policies is a case in point. Credit is more than modestly predictive; it’s highly predictive of insurance claim experience. The most likely reason is that credit scores serve as outward proxies for underlying behavioral traits that in turn influence insurance risk behavior. Figuratively speaking, credit functions as a sort of “window into the soul.”
Is credit data big data? Who cares? This semantic question is much less interesting than the observation that credit is behavioral data. While the observation might be obvious today, it was not always so. It took the insurance industry over three decades to adopt this powerful data source that had long ago revolutionized loan underwriting practices.
In hindsight, we can see the story of credit scoring as a bellwether example of a process that has rapidly become pervasive. Once upon a time, people paid cash for items and records of transactions were relatively few, far between and laborious to maintain. With the advent of digital computers and credit cards, bill-paying behavior began to leave behind “digital exhaust” that was later used in innovative ways to make predictions in numerous domains. Leap forward to today, and ever more aspects of our daily lives are digitally mediated. When we text a friend, binge-watch a season of a streaming TV show, make a social or professional network connection, shortchange ourselves on REM sleep or take a corner too fast while driving, we increasingly leave behind digital exhaust. These digital traces can be mashed up and used to make powerful inferences about individuals’ psychology and predictions of their future behaviors, health states, financial positions and insurance risk.
Credit is more than modestly predictive; it’s highly predictive of insurance claim experience … Figuratively speaking, credit functions as a sort of “window into the soul.”
A study performed at the University of Cambridge’s Psychometrics Centre dramatically illustrates the power of behavioral data. Social networking “likes” of various bits of online content for 58,000 American subjects were matched with indicators of whether they were black or white, married or divorced, substance abusers or not, gay or straight, Democrat or Republican, and Christian or Muslim. Principal components regression applied to the “likes” was able to predict many of these attributes with 80-90% accuracy (as measured by the receiver-operating characteristic curve or AUC). Like, you know.
Alex “Sandy” Pentland, a prominent computational social scientist at the MIT Media Lab, puts the matter nicely:
I believe that the power of big data is that it is information about people’s behavior instead of information about their beliefs. It’s about the behavior of customers, employees, and prospects for your new business. It’s not about the things you post on Facebook, and it’s not about your searches on Google, which is what most people think about, and it’s not data from internal company processes and RFIDs [radio-frequency identifications]. This sort of big data comes from things like location data off of your cell phone or credit card: It’s the little data breadcrumbs that you leave behind you as you move around in the world. What those breadcrumbs tell is the story of your life … Who you actually are is determined by where you spend time and which things you buy. Big data is increasingly about real behavior, and, by analyzing this sort of data, scientists can tell an enormous amount about you. They can tell whether you are the sort of person who will pay back loans. They can tell you if you’re likely to get diabetes … .
The implications for insurance are obvious, as are the broader societal implications. Pentland himself goes on to comment, “George Orwell was not nearly creative enough when he wrote 1984.”
Considerations of social responsibility should therefore be viewed as part and parcel of the topic of innovation with behavioral big data. The behavioral content of big data accounts for the unease and controversy surrounding it. But viewing the situation simply as a tug-of-war between societal and industrial interests would be a missed opportunity. Telematics data is an example. Insurers might view telematics data as the ultimate actuarial segmentation machine: We can now track how quickly individual drivers accelerate, how they take corners, even whether they text while driving. On the other hand, individuals might view this as creepily invasive. An innovative mindset can help break the impasse by envisioning new products and services that simultaneously benefit individual drivers, the greater society and the insurer. For example, periodic feedback reports could be digitally delivered to drivers providing specific suggestions for how they can improve their driving behavior and potentially enjoy lower premiums.
An innovative mindset can help break the impasse by envisioning new products and services that simultaneously benefit individual drivers, the greater society and the insurer.
Generally speaking, if a risk score benefits a company for underwriting and pricing, it can in principle also benefit the individual as a way to manage his or her own risks. Design principles suggested by behavioral nudge science (“Did you know that your lane-changing behavior is riskier than 80% of similar drivers?”) could be A/B tested to help ensure that the digital delivery of information prompts the desired safe driving behavior change. Everything can be opt-in, and such arrangements can simultaneously benefit individuals, companies and the greater society. And perhaps the book of actuarial science will add a chapter on the science of behavior change.
Endnote: for more on the behavioral data theme, see “The personalized and the personal: socially responsible innovation through big data,” Deloitte Review 14, 2014, and “Two dogmas of big data: understanding the power of analytics for predicting human behavior,” Deloitte Review 15, 2014.
Jim Guszcza, FCAS, is U.S. chief data scientist for Deloitte Consulting LLP.