
Researchers have developed software that predicts when and where outbreaks of disease can occur, based on a twenty-year archive of New York Times articles and other Internet data,
reports Mashable. The authors of the development are Microsoft and Technion - Israel Institute of Technology.
The system shows amazing results when testing on historical data. For example, reports of a drought in Angola in 2006 triggered a warning about a possible cholera outbreak in the country, because previous events taught the system that cholera outbreaks are more likely in the years after the drought. The second warning about cholera in Angola was triggered by news of African storms in early 2007; Less than a week later, there were reports that cholera had indeed spread in the region. In such trials involving the prediction of disease, violence and a significant number of deaths, the warnings of the system were correct in 70–90% of cases.
In the future, the system can help humanitarian organizations deal more effectively with outbreaks of disease or other problems, says Eric Horvitz, a scientist and co-director of Microsoft Research. Horwitz conducted a study in collaboration with Kira Radinski, a researcher at the Technion - Israel Institute of Technology.
')
According to Horwitz, the current system performance indicators are good enough to suggest that its improved version can be used in real conditions. The system was developed using the New York Times news archive for 22 years, from 1986 to 2007, and also uses data from the Web to find out what leads to notable events.
“One of the sources we found useful was
DBpedia , in which information from Wikipedia is presented in a structured form using crowdsourcing,” says Radinski. "We can understand or see the location of the places in the news articles, how much people earn there, and even information about politics." Other sources included
WordNet , which helps the system understand the meaning of words, and
OpenCyc , a common knowledge database.
They all provide a valuable context that is not available in the news, and which is needed to figure out the general rules of what events precede others. For example, the system may infer a connection between events in the cities of Rwanda and Angola, based on the fact that both countries in Africa have similar GDPs and other factors. This approach led the system to conclude that the prediction of cholera outbreaks should take into account the location of the country or city, the proportion of water surface, population density, GDP, and whether there was a drought in the previous year.
The idea of ​​finding ways to predict disease outbreaks is not new, nor is the concept of data mining for forecasting, but the scale of this project potentially makes it very useful. Since the system is able to successfully carry out the correlation between events and it is enough to generalize the data to make the result useful, it can be applied in various fields.