Google Street View has become a surprisingly useful way to learn about the world without stepping into it. People use it to plan journeys, to explore holiday destinations, and to virtually stalk friends and enemies alike.
But researchers have found more insidious uses. In 2017 a team of researchers used the images to study the distribution of car types in the US and then used that data to determine the demographic makeup of the country. It turns out that the car you drive is a surprisingly reliable proxy for your income level, your education, your occupation, and even the way you vote in elections.
Now a different group has gone even further. Łukasz Kidziński at Stanford University in California and Kinga Kita-Wojciechowska at the University of Warsaw in Poland have used Street View images of people’s houses to determine how likely they are to be involved in a car accident. That’s valuable information that an insurance company could use to set premiums.
The result raises important questions about the way personal information can leak from seemingly innocent data sets and whether organizations should be able to use it for commercial purposes.
The researchers’ method is straightforward. They began with a data set of 20,000 records of people who had taken out car insurance in Poland between 2013 and 2015. These were randomly selected from the database of an undisclosed insurance company.
Each record included the address of the policyholder and the number of damage claims he or she made during the 2013–’15 period. The insurer also shared its own prediction of future claims, calculated using its state-of-the-art risk model that takes into account the policyholder’s zip code and the driver’s age, sex, claim history, and so on.
The question that Kidziński and Kita-Wojciechowska investigated is whether they could make a more accurate prediction using a Google Street View image of the policyholder’s house.
To find out, the researchers entered each policyholder’s address into Google Street View and downloaded an image of the residence. They classified this dwelling according to its type (detached house, terraced house, block of flats, etc.), its age, and its condition. Finally, the researchers number-crunched this data set to see how it correlated with the likelihood that a policyholder would make a claim.
The results are something of a surprise. It turns out that a policyholder’s residence is a surprisingly good predictor of the likelihood that he or she will make a claim. “We found that features visible on a picture of a house can be predictive of car accident risk, independently from classically used variables such as age or zip code,” say Kidziński and Kita-Wojciechowska.
When these factors are added to the insurer’s state-of-the-art risk model, they improve its predictive power by 2%. To put that in perspective, the insurer’s model is better than a null model by only 8% and is based on a much larger data set that includes variables such as age, sex, and claim history.
So the Google Street View technique has the potential to significantly improve the prediction. And the current work is merely a proof of principle. The researchers say its accuracy could be improved using larger data sets and better data analysis.
The researchers’ approach raises a number of important questions about how personal data should be used. Policyholders in Poland might be startled to learn that their home addresses had been fed into Google Street View to obtain and analyze an image of their residence.
An interesting question is whether they gave informed consent to this activity and whether an insurance company can use data in this way, given Europe’s strict data privacy laws. “The consent given by the clients to the company to store their addresses does not necessarily mean a consent to store information about the appearance of their houses,” say Kidziński and Kita-Wojciechowska.
And the approach could open a Pandora’s box of data analytics. If insurance companies can benefit, why not other businesses? “The insurance industry could be quickly followed by the banks, as there is a proven correlation between insurance risk models and credit-risk scoring,” say Kidziński and Kita-Wojciechowska.
The ability to collect, analyze, and exploit information has increased dramatically in recent years. This ability has outstripped most people’s understanding of what is possible with their data, and it has certainly outstripped the speed at which legislation can be passed to control it.
Of course, Google is not the only company to collect street-level images. “Such practice, however, raises concerns about the privacy of data stored in publicly available Google Street View, Microsoft’s Bing Maps Streetside, Mapillary, or equivalent privately held data sets like CycloMedia,” say Kidziński and Kita-Wojciechowska.
This kind of work is likely to raise the question of whether these companies should be able to collect and store these images at all. In Germany, where privacy is an important issue of public debate, Google is already banned from collecting Street View images. It may not be the last place to introduce such a ban.
Ref: arxiv.org/abs/1904.05270: Google Street View Image of a House Predicts Car Accident Risk of Its Resident