The early detection of infectious disease outbreaks is essential to optimizing public health response and control. While established biosurveillance systems such as the US Outpatient Influenza-like Illness Surveillance Network (ILINet) generally have high accuracy and reliability, they often require clinician reporting of data, and the data collected may be unavailable for up to 2 weeks.1 Google Flu Trends was launched in 2009, with the ability to estimate weekly US influenza activity by region with only a 1-day delay.2 Subsequently, there has been an increasing focus on the use of internet-based biosurveillance – also referred to as digital disease detection or digital epidemiology – to provide data in real time or near real time.3-5
A review published in 2017 in PLOS Neglected Tropical Diseases examined research pertaining to internet-based biosurveillance of vector-borne diseases (VBD) specifically.1 “When considering the future role of internet big data in enhancing VBD surveillance and control, it is important to recognize that the developers of these digital surveillance methods have long cautioned that they are designed to augment rather than replace conventional surveillance data,” the authors wrote.
Infectious Disease Advisor spoke with Nicholas Generous, MPH, a digital epidemiologist at Los Alamos National Laboratory, New Mexico, who has co-authored numerous journal articles on the topic, about this emerging field and findings from recent publications.5-7
Infectious Disease Advisor: What is generally known about internet-based surveillance methods for infectious diseases, including potential benefits?
Nicholas Generous: Traditional public health surveillance systems rely heavily on clinician submitted data on infectious diseases. These data are often considered the gold standard in epidemiology and are used to monitor levels of disease and to determine if there is an outbreak. While these data are reliable and mostly accurate, they can take up to a week or two to be made available and require a functioning public health system in place. Internet data offers a potential complement to traditional data.
If one knows where to look, digital traces of health are shed all over the internet. The use of internet data to monitor disease spread is built on the insight that when people get sick they search for information online and sometimes share it online. These digital traces of health get collected and can be accessed via websites like Google Trends, Twitter, and others. For some diseases — like flu and dengue —there is an association between the number of digital traces and the levels of disease in a population.
Internet data can be broken into 2 types: information seeking about a health condition, such as Google, Wikipedia, and WebMD; and information sharing about [one’s] health on social media, such as Facebook, Instagram, and Twitter.
The primary advantage of using internet data is that it is inexpensive and abundant, and it can be obtained in near real time. Scientists hope that by combining traditional data with internet data, we will be able to monitor and forecast diseases better than with traditional data alone.
However, there are some shortcomings with the use of internet data. Excessive news media interest can drive searches about health that are not reflective of disease levels in the population, diseases with few cases are not easily able to be monitored, and symptoms of the disease need to be somewhat recognizable to the individual.
Infectious Disease Advisor: What are some of the main takeaways from the new review?
Mr Generous: At a high level, the takeaway from this paper is that internet data streams do seem to have utility and can be used to supplement traditional VBD surveillance approaches, but that it is unclear how to transition this research work into an actionable public health tool.
One of the shortcomings of the field is that the many studies are myopic and only focus on demonstrating monitoring/forecasting a single disease, in a single location, with a single internet data source; for example, Google search queries for dengue in Mexico and Twitter for Lyme disease in the United States. While this is adequate for demonstrating feasibility, it does not provide the systematic rigor to understand how to transition these one-off research studies into a functional and actionable public health surveillance system. The 2017 review provides an important step toward this transition by surveying how internet data streams have been used for VBD surveillance and offering insight into when these data work, when they do not, and how they can be used.
Infectious Disease Advisor: What are remaining needs in terms of research and development in this area?
Mr Generous: There are several. More attention needs to be given to practical demonstration of internet-based disease surveillance approaches in public health decision-making and response: How accurate do these approaches need to be to be actionable? What level of uncertainty is acceptable? If a trial can be conducted demonstrating practical use of internet data streams, it would clear up a lot of questions.
Additionally, there needs to be more studies systematically evaluating the use of internet data streams for surveillance. This includes looking at multiple geographic scales (local and state), diverse socioeconomic and demographic heterogeneity, and diseases and disease burden level.
Overall, internet data streams seem to work best for monitoring large seasonal diseases in relatively wealthier areas of the world. It is unclear at what point they stop working; for example, can you detect 10 people sick in a city of 100,000? What about 1000 people?
It is also important to understand more about how news media bias can drive search traffic online and affect internet data. For example, if there is a panic of disease that is reported in the news media, many people who are not sick will search for information and possibly post about it on social media. How do we separate true health observations from people just searching?
We also need to understand better the dynamics of how these digital health traces get generated. How do people search for and share health information about their condition online? Do they also search for people they know (primary vs secondary health observations)? Do people search before they go to the doctor when they suspect they have something, or do they search after they go to the doctor? How accurate are people at self-diagnosing their disease condition when searching online?
There are many behavioral questions that should be explored so that we can better understand what is happening when people are leaving digital traces of their health online.
- Pollett S, Althouse BM, Forshey B, Rutherford GW, Jarman RG. Internet-based biosurveillance methods for vector-borne diseases: are they novel public health tools or just novelties? PLoS Negl Trop Dis. 2017;11(11):e0005871.
- Ginsberg J, Mohebbi MH, Patel RS, Brammer L, Smolinski MS, Brilliant L. Detecting influenza epidemics using search engine query data. Nature. 2009;457(7232):1012-1014.
- Salathé M, Bengtsson L, Bodnar TJ, et al. Digital epidemiology. PLoS Comput Biol. 2012;8(7):e1002616.
- Brownstein JS, Freifeld CC, Madoff LC. Digital disease detection—harnessing the Web for public health surveillance. N Engl J Med. 2009;360(21):2153-2157.
- Generous N, Fairchild G, Deshpande A, Del Valle SY, Priedhorsky R. Global disease monitoring and forecasting with Wikipedia. PLoS Comput Biol. 2014;10(11):e1003892.
- Daughton AR, Generous N, Priedhorsky R, Deshpande A. An approach to and web-based tool for infectious disease outbreak intervention analysis. Sci Rep. 2017;7:46076.
- Daughton AR, Priedhorsky R, Fairchild G, et al. An extensible framework and database of infectious disease for biosurveillance. BMC Infect Dis. 2017;17(1):549.