Flu seasons are unpredictable, but according to the Centers for Disease Control and Prevention, we are in for a bad flu year indicated by the early climb in flu activity. In fact, this is the soonest the flu season has taken off in almost a decade (since the 2003-2004 flu season). Fortunately, the current circulating flu strains match well with this year’s vaccine (90%), and it is not too late to get vaccinated.
The percent of patient visits due to influenza-like illness (ILI) is one measure of flu activity. Several states are experiencing high ILI levels (defined as six to eight standard deviations above mean ILI levels). One limitation of using ILI to track influenza activity is the time delay between the patient visits and reporting by public health officials. At least one week can pass before ILI data has been analyzed and made publicly available. Our research team is investigating the possibility of using tweets to track flu activity. Tweets come in both massive numbers and in real-time. To test our methods we compared weekly ILI and weekly aggregated tweets containing the word “flu” from several US cities.
The “flu” tweets were further subdivided into four datasets; (1) tweets containing only re-tweets, (2) those without re-tweets (original tweets), (3) tweets with a URL web address, and (4) those without a URL web address. Analysis shows consistently higher correlations between ILI and original tweets compared to re-tweets only. Correlations for tweets with a URL vs. those without were more variable, but for the majority of the cities ILI was better correlated with tweets without a URL web address. We hypothesize this may be due to the more personal tweets such as, “I have the flu” included in the original tweet and non-URL tweet groups.
Correlations were variable, but reached 0.8 as of week 4 (the week ending January 26th, 2013) in San Diego when ILI were compared to original tweets (Figure 1). We suspect as the flu season continues correlations will improve.
The use of tweets to track flu activity could result in outbreak detection weeks before more traditional surveillance methods, like ILI alone. This would enable health services to better prepare for an influx in ill patients and better organize prevention efforts in vulnerable populations and nearby cities.