Text Analysis with Compare is an essay by Jack Lynch on approaches to comparing texts to find allusions from one to another. It lays out some simple methods and their advantages/disadvantages. I think we are going to try to implement some of these in TAPoRware.
Both The Globe and Mail and CBC ran stories about researchers who compared word lists from Iris Murdoch’s books looking at word variety. See CBC News: Iris Murdoch novel may be evidence of Alzheimer’s. Now that computers index our files (a feature in Tiger, for example), could we get them to warn us when our word variety goes down? Could my e-mail client or blog be fitted to alert me to changes in my use of language?
Continue reading Text Analysis and Alzheimer’s
Antonio Gulli has two interesting tools up on the web. The first is a Rank Comparison Engine, which will query a bunch of search engines, get their list of hits and build a table of points (pills) showing which hits are unique to which index and which shared. The results are interactive, allowing you to mouse-over points to see the short description.
The second is SnakeT Clustering Engine (SNippet Aggregation for Knowledge ExTraction.) It searches various indexes and builds a list of high frequency words that cluster with the query word. You can then navigate by the cooccuring words. Neat use of text analysis for concept exploration.
My one complaint is the design – he needs a graphic designer to make these sing.
The Getty Thesaurus of Geographic Names is a ” a hierarchical vocabulary of around 1.1 million names, and coordinates and other information for around 892,000 geographic places.” (From Getty Vocabularies Download Center)
In other words it is an controlled vocabulary of place names that can be searched online or, with permission, downloaded in XML form (or relational database or MARC.) I wonder if this could be used to create text engines that search by place and use the TGN records (which contain hierarchical information) to provide context? To put it another way, is TGN an ontology?
Continue reading Getty Thesaurus of Geographic Names Online (TGN)
At the Textologies workshop organized here at McMaster by Travis Kroeker and Andrew Mactavish, I saw a neat project, ActiveText that was demonstrated by Jason E. Lewis at Concordia. ActiveText is a C++ library that can be used to make active text. Jason has gotten it right – the objects he handles go from glyphs up to passages. They can have behaviors so that segments of text are activated. See the animation.
Continue reading Jason Lewis: ActiveText
Copernic is a company that has licensed text summarization technology from the Institute for Information Technology at the National Research Council. They have agent and summarizer tools that can help searching the web and managing results. The Copernic Summarizer, in particular, looks like an interesting application of summarization for everyday use, including the ability to summarize web pages in real time. Neat!
Continue reading Copernic: NRC Summarizing Tools
XML for Overlapping Structures (XfOS) using a non XML Data Model by Alexander Czmiel was an interesting paper at the 2004 ALLC/ACH on implementing systems with overlapping hierarchies.
While overlapping hierarchies would seem to be an obscure or advanced issue in markup, I think it is important to opening up markup practices to match existing intellectual practices, especially exploratory practices.
LMNL (Layered Markup anNotation Language) is what Alexander ended up using and his paper provided me an introduction to this fascinating language developed by Wendell Piez. LMNL looks like it could be used for exploratory markup and then built up into sophisticated interpretations of text.
Continue reading LMNL and exploratory markup
One way to ask about the place of computing in the humanities is to ask about method. I am reading Plato and the Good by my old prof Rosemary Desjardins. The second chapter nicely teases out Platonic dialectic from the Philebus in a way that can fits what I am going to call neon-baroque theories of folded interruption. Dialectic involves division of the stuff of the continuum into threads (analysis or digitization) and then the weaving of these threads into a fabric (synthesis or processing.) The problem with dialectic that Rosemary teases out is the problem I have with Deleuze’s interruption of the flow – how do you get a flow to divide in the first place?
To the weaver, therefore, we now put our question: what must be the case in order that she be able first to pick out the appropriate fleece, secondly to measure off the divisions that will yield the the threads of warp and woof, and then finally to interweave those threads so as to produce the web of the finished fabric? (p. 42)
Method is not just analysis and synthesis of a continuum, just as humanities computing is not just digitizing and processing the analog. Method, from meta (above, after) + hodos (way, path) involves a capacity to forsee the form you want to generate in the confusion. This is a looking back (after the way) so as to look forward (above the path.) You need to have an idea of what you want to weave before you start dividing (pro-video) and that comes from a recollection of what has been done. Thus Rosemary connects dialectic to Socratic recollection. Method in the humanities is circular – it involves a re-searching – a looking back to look forward. To analyze the flow into discrete digits you need to pull a flow out of chaos – you need to create a particular continuum for sampling, whether it be a flow of of sound or colour.
How does this help us with computing in the humanites? Well … lets go slow here and leave that to later.
Continue reading Method and Technology
Another great paper at the Brown conference was by Domenico Fiormonte on “Textual genesis and the writing process: The Magrelli Genetic Machine”. After giving us a background on philology and textual criticism in Italy, he showed a Flash variant machine that allows one to see manuscript and text interact. Domenico led the development of the Digital Variants site at the University of Edinburgh which has information about tools, theory, texts, and projects.
Continue reading Fiormonte: Genetic Machines