Antonio Gulli has two interesting tools up on the web. The first is a Rank Comparison Engine, which will query a bunch of search engines, get their list of hits and build a table of points (pills) showing which hits are unique to which index and which shared. The results are interactive, allowing you to mouse-over points to see the short description.
The second is SnakeT Clustering Engine (SNippet Aggregation for Knowledge ExTraction.) It searches various indexes and builds a list of high frequency words that cluster with the query word. You can then navigate by the cooccuring words. Neat use of text analysis for concept exploration.
My one complaint is the design – he needs a graphic designer to make these sing.
The Getty Thesaurus of Geographic Names is a ” a hierarchical vocabulary of around 1.1 million names, and coordinates and other information for around 892,000 geographic places.” (From Getty Vocabularies Download Center)
In other words it is an controlled vocabulary of place names that can be searched online or, with permission, downloaded in XML form (or relational database or MARC.) I wonder if this could be used to create text engines that search by place and use the TGN records (which contain hierarchical information) to provide context? To put it another way, is TGN an ontology?
Continue reading Getty Thesaurus of Geographic Names Online (TGN)
At the Textologies workshop organized here at McMaster by Travis Kroeker and Andrew Mactavish, I saw a neat project, ActiveText that was demonstrated by Jason E. Lewis at Concordia. ActiveText is a C++ library that can be used to make active text. Jason has gotten it right – the objects he handles go from glyphs up to passages. They can have behaviors so that segments of text are activated. See the animation.
Continue reading Jason Lewis: ActiveText
Copernic is a company that has licensed text summarization technology from the Institute for Information Technology at the National Research Council. They have agent and summarizer tools that can help searching the web and managing results. The Copernic Summarizer, in particular, looks like an interesting application of summarization for everyday use, including the ability to summarize web pages in real time. Neat!
Continue reading Copernic: NRC Summarizing Tools
XML for Overlapping Structures (XfOS) using a non XML Data Model by Alexander Czmiel was an interesting paper at the 2004 ALLC/ACH on implementing systems with overlapping hierarchies.
While overlapping hierarchies would seem to be an obscure or advanced issue in markup, I think it is important to opening up markup practices to match existing intellectual practices, especially exploratory practices.
LMNL (Layered Markup anNotation Language) is what Alexander ended up using and his paper provided me an introduction to this fascinating language developed by Wendell Piez. LMNL looks like it could be used for exploratory markup and then built up into sophisticated interpretations of text.
Continue reading LMNL and exploratory markup
One way to ask about the place of computing in the humanities is to ask about method. I am reading Plato and the Good by my old prof Rosemary Desjardins. The second chapter nicely teases out Platonic dialectic from the Philebus in a way that can fits what I am going to call neon-baroque theories of folded interruption. Dialectic involves division of the stuff of the continuum into threads (analysis or digitization) and then the weaving of these threads into a fabric (synthesis or processing.) The problem with dialectic that Rosemary teases out is the problem I have with Deleuze’s interruption of the flow – how do you get a flow to divide in the first place?
To the weaver, therefore, we now put our question: what must be the case in order that she be able first to pick out the appropriate fleece, secondly to measure off the divisions that will yield the the threads of warp and woof, and then finally to interweave those threads so as to produce the web of the finished fabric? (p. 42)
Method is not just analysis and synthesis of a continuum, just as humanities computing is not just digitizing and processing the analog. Method, from meta (above, after) + hodos (way, path) involves a capacity to forsee the form you want to generate in the confusion. This is a looking back (after the way) so as to look forward (above the path.) You need to have an idea of what you want to weave before you start dividing (pro-video) and that comes from a recollection of what has been done. Thus Rosemary connects dialectic to Socratic recollection. Method in the humanities is circular – it involves a re-searching – a looking back to look forward. To analyze the flow into discrete digits you need to pull a flow out of chaos – you need to create a particular continuum for sampling, whether it be a flow of of sound or colour.
How does this help us with computing in the humanites? Well … lets go slow here and leave that to later.
Continue reading Method and Technology
Another great paper at the Brown conference was by Domenico Fiormonte on “Textual genesis and the writing process: The Magrelli Genetic Machine”. After giving us a background on philology and textual criticism in Italy, he showed a Flash variant machine that allows one to see manuscript and text interact. Domenico led the development of the Digital Variants site at the University of Edinburgh which has information about tools, theory, texts, and projects.
Continue reading Fiormonte: Genetic Machines
The Listening Post is a networked installation that culls text from online and displays them and synthesizes them.
This looks anticipates a project on the sonification of text that I am working on with Bill Farkas who has developed some cool sonification systems.
Search Technologies” href=”http://a9.com/”>A9.com is a new search engine site from Amazon that lets you search inside books in addition to searching the web. There is supposed to be a feature to allow you to link notes to what you find and you can, if you get an account, keep information about your search history.
Remember when people speculated that Netscape could become your OS? As Google and other (pseudo) portals add features we are returning to the possibility of a network portal OS. My kids use MSN for more and more, I use Google for more and more – at what point do I ditch the “personal” computer for an environment available through any networked device?
As always someone else has implemented any good idea. WebCorp: The Web as Corpus is an aggregator like the TAPoRware Googlizer that we are developing. We do more on the post-processing, theirs has other strengths. What can we learn from this tool? (Thanks to Ian Lancashire for this.)