Text Analysis of E-Mail

StÈfan Sinclair has blogged an interesting story from the New York Times on how Enron Offers an Unlikely Boost to E-Mail Surveillance. Researchers, including Dr. Skillicorn at Queen’s, are using a large collection of Enron e-mail posted by the Federal Energy Regulatory Commission to experiment with e-mail tracking and analysis. A large corpus like the Enron one (over a million messages) can be used as a testbed for social network analysis or diachronic trend analysis. The article also talks about fears that government Echelon-style surveillance of e-mail may become available to corporate intelligence types. I wonder if we can develop useful text analysis tools optimized for e-mail collections like a dialogue of messages on a subject, or the Humanist archives. Some thing for TAPoRware.

Scientists had long theorized that tracking the e-mailing and word usage patterns within a group over time – without ever actually reading a single e-mail – could reveal a lot about what that group was up to. The Enron material gave Mr. Skillicorn’s group and a handful of others a chance to test that theory, by seeing, first of all, if they could spot sudden changes.

For example, would they be able to find the moment when someone’s memos, which were routinely read by a long list of people who never responded, suddenly began generating private responses from some recipients? Could they spot when a new person entered a communications chain, or if old ones were suddenly shut out, and correlate it with something significant?