BBC: Text analysis of texting to catch criminals

The BBC have a story on Texting study to catch criminals by researchers at Leicester University in the School of Psychology. They are studying the individual styles of text messages in order to help with forensic investigations. The BBC had previously reported how Text messages examined in Danielle case had helped in the prosecution of the 15-year-old Danielle Jones’ uncle who seems to have sent text messages from Danielle’s phone in order to throw off the scent.

The Leicester University researchers have a web page that welcomes people willing to submit samples. I wonder how useful anonymously submitted messages will be. I imagine, if they get enough messages, it will give them a control sample for the study of particular suspect messages.

RDUES: WebCorp: The Web as Corpus

The Research and Development Unit for English Studies (RDUES) of UCE of Birmingham has a tool WebCorp: The Web as Corpus which searches google for a term and then goes to the top 199 documents Google identifies and searches them. It takes a while and works like our Googlizer, but produces more verbose results. It produces a concordance organized by document with links to a full word frequency list for the doc. The advanced search form has some interesting features, including the ability to point it at other engines.

UCE Birmingham is strange place from the web. UCE stands for “University of Central England” and you have to go deep to the At A Glance : History Of UCE Birmingham to find this out. (There’s no point explaining it to outsiders anywhere on the web page.) They seem to have been formed out of all the little colleges, polytechnics and schools in the area in 1992.

RefViz: Bibliographic Visualization

RefViz Screen Image RefViz is a visualization tool from Thomson Researchsoft (who also publish EndNote and ProCite). RefViz lets you visualize “galaxies” of bibliographic references showing clusters of references by keywords. It also has a matrix view where you can see how keywords correlate.

Save time and learn more about what is happening in the literature with RefViz. With this powerful text analysis and visualization software program, you get an intuitive framework for exploring reference collections based on content. (From the Product Info page.)

Rollyo: Roll Your Own Search Engine

Rollyo: Roll Your Own Search Engine is another service that lets you create custom search engines that only search a “Searchroll” set of domains/sites. You can have multiple Searchrolls and you can create small Rollyo search panels for your site.

They are onto something, especially if they allowed hierarchies of shared (public) Searchrolls. The down side of the service is the limit on sites you can include (25) and the advertising that shows up in the results.

Here is one I created form my personal website www.geoffreyrockwell.com:



Amazon Text Analysis

Amazon has recently added a neat feature to the pages on certain books. If you go, for example to, Amazon.com: The Search: How Google and Its Rivals Rewrote the Rules of Business and Transformed Our Culture: Books: John Battelle and mouse-over the image of the book it gives you:

  • SIPs: Statistically Improbably Phrases
  • CAPs: Capitalized Phrases
  • and the ability to search inside the book and get a concordance.

They are providing a simple form of text analysis right on the book page. You can click on a SIP and see what other books (for sale on Amazon) have a high frequency of that improbable phrase.

Tools to the Rescue?

A Kaleidoscope of Digital American Literature by Martha Brogan with assistance from DaphnÈe Rentfrow (Council on Library and Information Resources, Digital Library Federation, Washington, D.C., Sept. 2005) is a deep report on the state digital resources for the study of American literature. It concludes that while there are some excellent resources, things are fragmented and there need to be better tools. The MLA is criticized as “missing in action” compared to other organizations, which is probably not fair, but indicates an a problem of perception. The MLA isn’t viewed as leading in this area.
Continue reading Tools to the Rescue?

Summit on Digital Tools: Final Report

The Final Report for the University of Virginia Summit on Digital Tools for the Humanities is now posted as a PDF at the site. The describes how we identified the opportunity for new tools to support these areas of humanities scholarship:

  • Interpretation
  • Exploration of Resources
  • Collaboration
  • Visualization of Time, Space, and Uncertainty

I posted on this before at grockwel: Research Notes: Virginia Tool Summit.

Narus: Data-Mining IP

Who creates the software for real-time IP traffic monitoring? Narus is a company named in a Wired story about Whistle-Blower Outs NSA Spy Room. The page about NarusInsight says they provide,

CALEA- and ETSI-compliant modules for lawful intercept featuring a robust warrant management system. Capabilities include playback of streaming media (for example, VoIP), rendering of Web pages, examination of e-mails and the ability to analyze the payload/attachments of e-mail or file transfer protocols.

The Wired story is about an EFF Class-Action Lawsuit Against AT&T that accuses “the telecom giant of violating the law and the privacy of its customers by collaborating with the National Security Agency (NSA) in its massive and illegal program to wiretap and data-mine Americans’ communications.”