Clusty: Cluster Searching

Clusty the Clustering Engine is a meta-search engine which uses VivÌsimo which is based on technology from Carnegie Mellon. Clusty does a nice job of clustering results from multiple search engines into folders that actually make sense. There are some other neat interface issues that Google could learn from.
They do the clustering by crawling and running some sort of cluster processing on the information. I’m not sure how this works over the engines, though it makes sense over a domain. VivÌsimo also offers enterprise solutions – I wonder if they could be adapted to crawl and cluster humanities texts?
Continue reading Clusty: Cluster Searching

DARPA Global Autonomous Language Exploitation

DARPA seeks strong, responsive proposals from well-qualified sources for a new research and development program called GALE (Global Autonomous Language Exploitation) with the goal of eliminating the need for linguists and analysts and automatically providing relevant, distilled actionable information to military command and personnel in a timely fashion.

Global Autonomous Language Exploitation (GALE) is an unbelievably ambitious DARPA project from the same office that brought us the ARPANET (Information Processing Technology Office.) Imagine if they succeed? Thanks to Greg Crane for pointing this out.

Update – the DARPA Information Processing Technology Office page on GALE is here. Under the GALE Proposer Pamphlet (BAA 05-28) there is a description of the types of discourse that should be processed and the desired results.

Engines must be able to process naturally-occurring speech and text of all the following types:

  • Broadcast news (radio, television)
  • Talk shows (studio, call-in)
  • Newswire
  • Newsgroups
  • Weblogs
  • Telephone conversations

. . .

DARPA’s desired end result includes

  • A transcription engine that produces English transcripts with 95% accuracy
  • A translation engine producing English text with 95% accuracy
  • A distillation engine able to fill knowledge bases with key facts and to deliver useful information as proficiently as humans can.

    TADA talk

    Here is a blog entry on a short talk I gave about text analysis and collaboration. StÈfan Sinclair had the neat idea of having students enter notes about the conference into a blog on the Text Analysis Developers Alliance as the conference went along.

    My talk began by offering a model for how computing practices change interpretation and the role of text analysis. I then went on to talk about different types of interpretation – between developers, between developers and researchers and between researchers.

    EPIC: Carnivore Documents

    omnivore.gif
    Omnivore Source Code FOIA Document
    Did the FBI build use text analysis for network-tapping? I found an interesting page on the Electronic Privacy Information Centre about Carnivore and Omnivore (its predecessor), two Internet monitoring systems created by the FBI. EPIC has a EPIC Carnivore Page with a summary and scans of documents recieved through Freedom of Information Requests. See also EPIC Carnivore FOIA Documents. The documents are fascinating given all the lines blacked out that you can try to guess at. There is a beauty to these documents with heavy black regions and “Secret” crossed out all over. Note how EPIC uses this aesthetic in their annual report.
    Continue reading EPIC: Carnivore Documents