The NEW MLAgazine: About Computers and the People Who Made Them is a strange magazine with long interesting articles on the historyof Netscape and a lot of stories about Macintosh history – including a good one on the history of the Mac OS. Looks to be the work mostly of one fan. I’m not sure what the title stands for or how it is a magazine, but the content is interesting.
Text Analysis of E-Mail
StÈfan Sinclair has blogged an interesting story from the New York Times on how Enron Offers an Unlikely Boost to E-Mail Surveillance. Researchers, including Dr. Skillicorn at Queen’s, are using a large collection of Enron e-mail posted by the Federal Energy Regulatory Commission to experiment with e-mail tracking and analysis. A large corpus like the Enron one (over a million messages) can be used as a testbed for social network analysis or diachronic trend analysis. The article also talks about fears that government Echelon-style surveillance of e-mail may become available to corporate intelligence types. I wonder if we can develop useful text analysis tools optimized for e-mail collections like a dialogue of messages on a subject, or the Humanist archives. Some thing for TAPoRware.
Scientists had long theorized that tracking the e-mailing and word usage patterns within a group over time – without ever actually reading a single e-mail – could reveal a lot about what that group was up to. The Enron material gave Mr. Skillicorn’s group and a handful of others a chance to test that theory, by seeing, first of all, if they could spot sudden changes.
For example, would they be able to find the moment when someone’s memos, which were routinely read by a long list of people who never responded, suddenly began generating private responses from some recipients? Could they spot when a new person entered a communications chain, or if old ones were suddenly shut out, and correlate it with something significant?
NITLE: National Institute for Technololgy and Liberal Education: Semantic Indexing
National Institute for Technology and Liberal Education or NITLE (pronounced “nightly”?) have a free semantic indexing tool written in perl that you can download. Their page also has useful starting links on semantic analysis. The project was/is funded by Mellon.
In particular I recommend the introduction to latent semantic indexing they have put up at, Patterns in Unstructured Data: Discovery, Aggregation, and Visualization by Yu, Cuadrado, Ceglowski, and Payne.
Continue reading NITLE: National Institute for Technololgy and Liberal Education: Semantic Indexing
Clusty: Cluster Searching
Clusty the Clustering Engine is a meta-search engine which uses VivÌsimo which is based on technology from Carnegie Mellon. Clusty does a nice job of clustering results from multiple search engines into folders that actually make sense. There are some other neat interface issues that Google could learn from.
They do the clustering by crawling and running some sort of cluster processing on the information. I’m not sure how this works over the engines, though it makes sense over a domain. VivÌsimo also offers enterprise solutions – I wonder if they could be adapted to crawl and cluster humanities texts?
Continue reading Clusty: Cluster Searching
Toogle and Woogle
Woogle – Words in pictures is a art project that takes a phrase and builds it from Google retrieved images.
Toogle Image Search takes a word, finds an image from Google and then converts it to a text version of the image/word where the word is repeated in different colours to make up the image.
Both are neat art toys that nicely play on Google by C6.org and Gu Jian. I’m not clear as to who C6 is – an art collective in the UK? – but they have a number of clean and irreverant projects. Thanks to Robert for pointing this out to me.
Is radio done?
An article in Marketing Magazine online has the intriguing title Is radio done? (David Chilton, May 23, 2005.)
The article quotes the Generation M: Kaiser Family Foundation Study that I blogged about youth and media. It suggests that with iPods and so on, youth are not listening to radio.
ACH/ALLC Text Analysis Texts
ACH/ALLC Conference 2005 program is now up. Martin Holmes has set up a neat page with access to raw XML and plain text for text analysis of the abstracts. I have been playing around with the text with the TAPoRware Tools. I find the plain text tools work well on the plain text of the prose. Very cool and reflexive in a way that suits our community.
DARPA Global Autonomous Language Exploitation
DARPA seeks strong, responsive proposals from well-qualified sources for a new research and development program called GALE (Global Autonomous Language Exploitation) with the goal of eliminating the need for linguists and analysts and automatically providing relevant, distilled actionable information to military command and personnel in a timely fashion.
Global Autonomous Language Exploitation (GALE) is an unbelievably ambitious DARPA project from the same office that brought us the ARPANET (Information Processing Technology Office.) Imagine if they succeed? Thanks to Greg Crane for pointing this out.
Update – the DARPA Information Processing Technology Office page on GALE is here. Under the GALE Proposer Pamphlet (BAA 05-28) there is a description of the types of discourse that should be processed and the desired results.
Engines must be able to process naturally-occurring speech and text of all the following types:
- Broadcast news (radio, television)
- Talk shows (studio, call-in)
- Newswire
- Newsgroups
- Weblogs
- Telephone conversations
. . .
DARPA’s desired end result includes
- A transcription engine that produces English transcripts with 95% accuracy
- A translation engine producing English text with 95% accuracy
- A distillation engine able to fill knowledge bases with key facts and to deliver useful information as proficiently as humans can.
Google and Publishers
According to CNET, Publishers balk at Google book copy plan. They are worried that Google’s Google Print project to scan books will violate copyright.
Continue reading Google and Publishers
Face of Text: Streaming Video and Podcast
We have added to the The Face of Text web site a section on Media – from there you can launch a Quicktime application that lets you see streaming video of selected talks at the conference with synchronized slides and text. The application was developed with LiveStage Pro – an interesting authoring environment for Quicktime applications. You can also hear podcasts/MP3 audio of selected talks.
The streaming media was developed by Zack Melnick as a Multimedia senior thesis project. Drew Paulin has been updating the web site.