Text Technology and TAPoR – Page 8

Trip Report on Face of Text

Vika at Brown posted a series of trip reports on the conference we organized here, Trip report: The Face of Text. It is one of several posts and is the most thorough trip report I think I have ever read. For the conference site, see The Face of Text. This is courtesy of James Chartrand.

PubSub web tracking

PubSub is a cool site that will track keywords. You subscribe to a set of words and then it tracks those for you and you can read your subscriptions in a news aggregator.
What is particularly impressive is the opening interface which lets you start using without getting an account or anything. As an example of how to get someone started with a service, it is one of the best I have seen. Pity you don’t actually get any results immediately.
How can we do this for humanities research?

Mac Web Mining

METAfy has a Mac OS X Web Mining tool called > Web Mining Automation Software for MacOS X” href=”http://www.metafy.com/index.html”>Anthracite which, from the screen shots, uses a visual programming paradigm. Looks neat.
I found this on a page on Data Mining Resources that is a Subject Tracer Information Blog. See Deep Web Research Subject Tracer.

TAPoRware Features

We are releasing version 1.0 of the TAPoRware Tools. (You can get the version 1.0 now, but we there are some loose ends to clean up.) That got me thinking about the next version. Stan Ruecker and Zachary Devereux of the University of Alberta gave a paper at the Face of Text on Scraping Google and Blogstreet for Just-in-Time Text Analysis which showed the potential for certain tools and included a list of features they would like. Stan kindly sent me the list so I could weave it into my list.
Continue reading TAPoRware Features

Text Analysis Spiders

One of the most exciting directions in text analysis is the adaptation of spiders, trackers and aggregators so that they can gather just-in-time texts (jitexts) for further analysis. This could open up text analysis to cultural studies researchers and make it a playful way to comb the internet. Most of the tools out there start with Google as their spider – do we need to create our own index so as to avoid depending on Google?
Continue reading Text Analysis Spiders

Swish-E

SWISH-E is Simple Web Indexing System for Humans – Enhanced, a web page index and retrieval system available for download. Eric Lease Morgan of infomotions recommended it in a note to the TEI-L.

Xaira

[oucs] Xaira is a project at Oxford adapting the SARA XML search engine for general XML retrieval. This is a great idea – SARA early on had a lot of the functionality we are all looking for, but was limited to the BNC. Now the Oxford folks are getting support to adapt it and make it available.

Robinson: Anastasia

Anastasia by Peter Robinson (and Andrew West) at De Montfort University is now open source. It is a server and XML document processing system that can be used to publish, render, and search an XML corpus.
Continue reading Robinson: Anastasia

John Willinski: Public Knowledge Project

This last weekend I was at the Humanities Computing Summer Institute organized by Ray Siemens at the University of Victoria. I heard a great lecture by John Willinski at UBC on the Public Knowledge Project. This project is developing an open source e-journal tool that allows you to manage the review process and publication. They also have a conference tool. This is worth supporting!
Continue reading John Willinski: Public Knowledge Project

PhiloLogic

PhiloLogic – The ARTFL Project XML/SGML Full-Text System is now available for download. PhiloLogic is an extremely fast large corpus search engine that was developed by Mark Olsen and company at Chicago for the ARTFL collection of French lit from the revolution to the 1900s. The original textbase was one of the first (if not the first) large scale diachronic literary full-text databases. PhiloLogic has proven its worth over the years.
Continue reading PhiloLogic