Analyzing Evolving Stories in News Articles
Roberto Camacho Barranco, UTEP CS Ph.D. program
There is an overwhelming number of news articles published every day around the globe. Following the evolution of a news story is a difficult task given that there is no such mechanism available to track back in time to discover and study the hidden relationships between relevant events in digital news feeds. The techniques developed so far to extract meaningful information from a massive corpus rely on similarity search, which results in a myopic loopback to the same topic without providing the needed insights to hypothesize the origin of a story that may be completely different than the news today. In this talk, I will present an algorithm that mines historical news data to detect the origin of an event, segments the timeline into disjoint groups of coherent news articles, and outlines the most important documents in a timeline with a soft probability to provide a better understanding of the evolution of a story. Qualitative and quantitative evaluations of this framework demonstrate that the algorithm discovers statistically significant and meaningful stories in reasonable time. Additionally, I will present a relevant case study on a set of news articles which demonstrates that the generated output of the algorithm holds the promise to aid prediction of future entities (e.g. actors) in a story.