EPIC: Carnivore Documents

omnivore.gif
Omnivore Source Code FOIA Document
Did the FBI build use text analysis for network-tapping? I found an interesting page on the Electronic Privacy Information Centre about Carnivore and Omnivore (its predecessor), two Internet monitoring systems created by the FBI. EPIC has a EPIC Carnivore Page with a summary and scans of documents recieved through Freedom of Information Requests. See also EPIC Carnivore FOIA Documents. The documents are fascinating given all the lines blacked out that you can try to guess at. There is a beauty to these documents with heavy black regions and “Secret” crossed out all over. Note how EPIC uses this aesthetic in their annual report.
Continue reading EPIC: Carnivore Documents

Text Analysis and Alzheimer’s

Both The Globe and Mail and CBC ran stories about researchers who compared word lists from Iris Murdoch’s books looking at word variety. See CBC News: Iris Murdoch novel may be evidence of Alzheimer’s. Now that computers index our files (a feature in Tiger, for example), could we get them to warn us when our word variety goes down? Could my e-mail client or blog be fitted to alert me to changes in my use of language?
Continue reading Text Analysis and Alzheimer’s

Comparison Engine and Clustering Engine

Antonio Gulli has two interesting tools up on the web. The first is a Rank Comparison Engine, which will query a bunch of search engines, get their list of hits and build a table of points (pills) showing which hits are unique to which index and which shared. The results are interactive, allowing you to mouse-over points to see the short description.
The second is SnakeT Clustering Engine (SNippet Aggregation for Knowledge ExTraction.) It searches various indexes and builds a list of high frequency words that cluster with the query word. You can then navigate by the cooccuring words. Neat use of text analysis for concept exploration.
My one complaint is the design – he needs a graphic designer to make these sing.

Getty Thesaurus of Geographic Names Online (TGN)

The Getty Thesaurus of Geographic Names is a ” a hierarchical vocabulary of around 1.1 million names, and coordinates and other information for around 892,000 geographic places.” (From Getty Vocabularies Download Center)
In other words it is an controlled vocabulary of place names that can be searched online or, with permission, downloaded in XML form (or relational database or MARC.) I wonder if this could be used to create text engines that search by place and use the TGN records (which contain hierarchical information) to provide context? To put it another way, is TGN an ontology?
Continue reading Getty Thesaurus of Geographic Names Online (TGN)

Jason Lewis: ActiveText

At the Textologies workshop organized here at McMaster by Travis Kroeker and Andrew Mactavish, I saw a neat project, ActiveText that was demonstrated by Jason E. Lewis at Concordia. ActiveText is a C++ library that can be used to make active text. Jason has gotten it right – the objects he handles go from glyphs up to passages. They can have behaviors so that segments of text are activated. See the animation.
Continue reading Jason Lewis: ActiveText

Copernic: NRC Summarizing Tools

Copernic is a company that has licensed text summarization technology from the Institute for Information Technology at the National Research Council. They have agent and summarizer tools that can help searching the web and managing results. The Copernic Summarizer, in particular, looks like an interesting application of summarization for everyday use, including the ability to summarize web pages in real time. Neat!
Continue reading Copernic: NRC Summarizing Tools