Nine days before the World Health Organization announced that it had identified the novel coronavirus, a Toronto-based startup called BlueDot, which uses artificial intelligence to track the spread of diseases, picked up a local news article about an unusual cluster of pneumonia cases in Wuhan, China.
Researchers at Boston Children’s Hospital who have been using similar technology to scrape disease-related chatter from social media and chat rooms since 2006 also flagged the news story. A third machine learning tool picked it up too: The WHO’s own Epidemic Intelligence from Open Sources project, which is now scraping information on the global spread of COVID-19 from up to 120,000 articles each day.
The promise of melding artificial intelligence with digital epidemiology, which is the study of how diseases spread using the collection and analysis of large amounts of online data, has long since been established. But most of the information currently available to health researchers falls under the category of open-source data, meaning it’s publicly available.
The largest sets of nonpublic data — things like search queries, website access logs, private social media posts and location data — belong to large technology companies such as Facebook and Google. According to a new report from Duke University’s Center on Science and Technology Policy, online platforms hold a gold mine of data that could help digital epidemiologists track the coronavirus more accurately.
Unfortunately, getting data from the internal servers of some of Silicon Valley’s biggest companies into the hands of government and academic researchers isn’t so simple. The primary obstacle is a set of concerns over the privacy of social media users whose data might be handed over by the companies. And the companies themselves must ensure they aren’t jeopardizing the trade secrets of their own technology.
“This data is a public good that should be shared,” Sarah Rispin Sedlak, one of the Duke researchers who on March 19 published the report on information sharing during an epidemic, told CQ Roll Call. “But something like that has to be done within a framework that ensures protections for both the companies and the individual users.”
“You need a legal and ethical framework that allows the company to share data in a way that is sufficiently protective of individual privacy and that the digital epidemiologists agree not to use that data for other purposes or send it elsewhere,” Rispin Sedlak said.
In 2008, Google unveiled a tool called Google Flu Trends that aggregated specific search queries related to flu-like symptoms to estimate how many people throughout the United States were infected at a given time, promising “an early-warning system for outbreaks of influenza.” But the tool never fulfilled its own promise — its estimates for the 2013 flu season were off by 140 percent, according to WIRED — and it was the subject of complaints by privacy advocates.
Since then, data privacy has become a hot-button political issue. On Capitol Hill, members of both parties are working on comprehensive data privacy legislation, which could include provisions allowing better information sharing with the aim of improving public health. But those efforts have stalled because of the coronavirus emergency and the vagaries of an election year.
“I don’t think what we want to do is wade into the waters of broad privacy legislation,” Rispin Sedlak said.
However, as the coronavirus spreads and deaths from COVID-19 continue to increase, lawmakers could be spurred to action on a more narrowly focused measure related to public health.
Last week, Facebook said it would begin sharing aggregated, anonymized location data and high-resolution population density maps with researchers at Harvard University’s School of Public Health, the National Tsing Hua University in Taiwan, the Gates Foundation and others trying to understand how the coronavirus is spreading around the world.
A Google spokesperson told CQ Roll Call that Google has not shared any location data but that the company is “exploring ways that aggregated anonymized location information could help in the fight against COVID-19.”
“One example could be helping health authorities determine the impact of social distancing, similar to the way we show popular restaurant times and traffic patterns in Google Maps,” the spokesperson said. “This work would follow our stringent privacy protocols and would not involve sharing data about any individual’s location, movement, or contacts.”
But Rispin Sedlak says that search data, the likes of which powered Google Flu Trends, remains the most promising data for tracking diseases like COVID-19. But without establishing rules for how companies should anonymize it and researchers should keep it secure, it may not be shared anytime soon.
“Some sort of framework enabled by laws about how this exact type of data would be used for this exact type of purpose would be very helpful,” Rispin Sedlak said. “It would give the tech companies the rules of the road and some comfort that if they shared the data while following the rules, they would be safe from criticism over privacy concerns.”