There is a blog on the discussion at the Text Analysis Developers Alliance blog. This is being updated by participants.
Category: Text Analysis
Text Analysis Summit
For today and the next two days I am at the Text Analysis Summit that I blogged earlier.
I am typing my notes into a wiki page on a new wiki about text analysis; see wikiTA.
TADA: Text Analysis Summit
My colleague StÈfan Sinclair is organizing a Text Analysis Summit which promises to be great retreat from buzyness.
Software, Tools and Lists for Text Analysis
Software, Tools, Lists, Resources is a good list of resources for computational linguistics. It has a nice list of lists like stop words/function words.
I should check the functionality of these tools against TAPoR.
This came from StÈfan Sinclair.
Using TACT with Electronic Texts (for free)
Using TACT with Electronic Texts, a classic introduction and manual is now available for free as a PDF from the MLA! The MLA and the authors (Ian Lancashire et. al.) should be congratulated for putting this up. Even if you don’t use TACT the opening chapters are relevant to anyone interested in text analysis. Bravo! This is thanks to Judith Altreuter.
EPIC: Carnivore Documents
Omnivore Source Code FOIA Document
Did the FBI build use text analysis for network-tapping? I found an interesting page on the Electronic Privacy Information Centre about Carnivore and Omnivore (its predecessor), two Internet monitoring systems created by the FBI. EPIC has a EPIC Carnivore Page with a summary and scans of documents recieved through Freedom of Information Requests. See also EPIC Carnivore FOIA Documents. The documents are fascinating given all the lines blacked out that you can try to guess at. There is a beauty to these documents with heavy black regions and “Secret” crossed out all over. Note how EPIC uses this aesthetic in their annual report.
Continue reading EPIC: Carnivore Documents
Comparing: Jack Lynch
Text Analysis with Compare is an essay by Jack Lynch on approaches to comparing texts to find allusions from one to another. It lays out some simple methods and their advantages/disadvantages. I think we are going to try to implement some of these in TAPoRware.
Google vs. Microsoft
What’s Next for Google is an indepth article by Charles H. Ferguson from the January 2005 issue of Technology Review (from MIT.) The article looks at Google and how it might respond if Microsoft seriously decides to dominate the search engine business.
Continue reading Google vs. Microsoft
Text Analysis and Alzheimer’s
Both The Globe and Mail and CBC ran stories about researchers who compared word lists from Iris Murdoch’s books looking at word variety. See CBC News: Iris Murdoch novel may be evidence of Alzheimer’s. Now that computers index our files (a feature in Tiger, for example), could we get them to warn us when our word variety goes down? Could my e-mail client or blog be fitted to alert me to changes in my use of language?
Continue reading Text Analysis and Alzheimer’s
Comparison Engine and Clustering Engine
Antonio Gulli has two interesting tools up on the web. The first is a Rank Comparison Engine, which will query a bunch of search engines, get their list of hits and build a table of points (pills) showing which hits are unique to which index and which shared. The results are interactive, allowing you to mouse-over points to see the short description.
The second is SnakeT Clustering Engine (SNippet Aggregation for Knowledge ExTraction.) It searches various indexes and builds a list of high frequency words that cluster with the query word. You can then navigate by the cooccuring words. Neat use of text analysis for concept exploration.
My one complaint is the design – he needs a graphic designer to make these sing.