The New York Times has an article about commercial text analysis systems by John Markoff, Armies of Expensive Lawyers, Replaced by Cheaper Software (March 5, 2011, A1 in New York Edition; March 4 online). He describes how companies are building systems that can analyze the immense amounts of documents shared in lawsuits. Traditionally an army of people would comb through the documents, “Now, thanks to advances in artificial intelligence, “e-discovery” software can analyze docuemnts in a fraction of the time for a fraction of the cost.”
Some programs go beyond just finding documents with relevant terms at computer speeds. They can extract relevant concepts — like documents relevant to social protest in the Middle East — even in the absence of specific terms, and deduce patterns of behavior that would have eluded lawyers examining millions of documents.
There is a nice graphic to accompany the article here. Markoff mentions companies like Blackstone Discovery and Cataphora. He also argues that the availability of a large email archive from Enron has made it possible for teams to experiment on a real dataset.