H. P. Luhn, KWIC and the Concordance

We all know that the Google display comes indirectly from the Concordance, but I have found in Luhn’s 1966 “Keyword-in-Context Index for Technical Literature (Kwic Index)” the explicit recognition of the link and the reason for drawing on the concordance.

the significance of such single keywords could, in most instances, be determined only by referring to the statement from which the keyword had been chosen. This somewhat tedious procedure may be alleviated to a significant degree by listing selected keywords together with surrounding words that act as modifiers pointing up the more specific sense in which a keyword has been applied. This method of indexing words is well established in the process of compiling concordances of important works of literature of the past. The added degree of information conveyed by such keyword-in-context indexes, or “KWIC Indexes” for short, can readily be provided by automatic processing. (p. 161)

The problem for Luhn is that simply retrieving words doesn’t give you a sense of their use. His solution, first shown in the late 1950s, was to provide some context (hence “keyword-in-context”) so that readers can disambiguate themselves and make decisions about which index items to follow. It is from the KWIC that we ultimately get the concordance features of the Google display, though it should be noted that Luhn was proposing KWIC as a way of printing automatically generated literature indexes where the kewwords were in the titles. In this quote Luhn explicitly acknowledges that this is a method well established in concordances.

There is also a link between Luhn and Father Busa. According to Black, quoted in Marguerite Fischer, “The Kwic Index Concept: A Retrospective View”,

the Pontifical Faculty of Philosophy in Milan decided that they would make an analytical index and concordance to the Summa Theologica of St. Thomas Aquinas, and approached IBM about the possibility of having the operations performed on Data Processing. Experience gained in this project contributed towards the development of the KWIC Index. (This is a quote on page 123 from Black, J. D., 1962, “The Keyword: Its Use in Abstracting, Indexing, and Retrieving Information”.)

From the concordance to KWIC through to Google?

For some historical notes on Luhn see, H. P. Luhn and Automatic Indexing.

Vice President Al Gore

Icon of ComputerPeter O sent me a link to the original 1994 web page for Vice President Al Gore kept by NARA, the National Archives and Records Administration (of the USA.) What is amusing is that this copy of Gore’s page looks really dated and positions him as a pioneer of the Internet:

Vice President Gore, having first coined the term “information superhighway” 17 years ago, is the recognized public leader in the development of the National Information Infrastructure (NII).

Not quite the same as saying he invented it. To see the page Gore’s page linked from go to the White House page. Many of the links work, though not Clinton’s page.

Globe and Mail: The big ideas of 2009

Saturday’s Globe and Mail had a full page on The Big Ideas of 2009. The listed five, three of which have to do with information technology and two with biology.

  1. Do-It-Yourself DNA
  2. The 3-D Revolution (as in 3-D movies and screens)
  3. The Age of Avatars (as in your avatars will become transportable across virtual worlds)
  4. Grow Your Own Tissue
  5. Reality Check for Social Networks (as in Social Networks aren’t getting the advertising and will lose momentum)

These ideas seem to be about the body and space with the possible exception of the 5th which is not really a big idea so much as a correction. I would like to suggest a different list around time:

  1. 3-D Social Year It’s Facebook
  2. Genome Online Networks Technology
  3. DNA Cells Web Tissue Users
  4. 000 Second Time World Human User Sites
  5. Life Canada said Ko using virtual advertising avatars

This list was generated scientifically. I took the text of the Globe story (edited it down to just the titles, text and authors), ran it through the TAPoRware List Words (with a stop word list), and then took the sequence of high frequency words in the order they appeared and broke it into phrases (without deleting any). This is a technique I learned from David Hoover who performed it at the Face of Text conference. It is surprising how often you can find suggestive phrases in a frequency sorted word list. I will let you interpret this oracle, but remember that you read “Second Time” here first. This list is what the Globe author’s really meant for 2009.

As an aside, I should say that the reason I am blogging this today (January 9th) is because Saturday’s paper (January 3rd) was delivered to our house today. I didn’t confuse things as we were travelling Saturday and the paper was cancelled until Monday. When we called the circulation desk they told us other people in Edmonton had had the wrong papers delivered. Here is the note I sent the editors this morning:

 I would like to thank the Globe and Mail for delivering Saturday’s (Jan. 3rd) paper to my house today (Jan. 9th.) As the Globe knows, we are behind in Edmonton and need the chance to catch up with all the timeless opinions gathered. It was particularly kind of the Globe since I hadn’t read Saturday’s edition as I was traveling. I managed to get half way through the paper before realizing that I was reading old news.

I do want to take issue with your list of 5 burgeoning ideas (A 10). Two of “the big ideas” have to do with the compression of space (“The 3-D Revolution” and “The Age of Avatars”) but you neglected the big ideas in the compression of time. I would suggest that the really big idea is the “New News” otherwise known as nNews or iNews. What matters in this day of personalization is what news is new to the individual avatar, and what time they are in (like the burgeoning age of avatars.) In Second Life my avatar wants second news, and today you delivered.

What I don’t understand is why we got Saturday’s paper while others apparently got Monday’s. (This is according to the kind and real human at the circulation desk who told us others got their New News too, but a different edition.) How did you know I was exactly 6 days behind?

Blog: Infolet – Informatica e letteratura

picture-7.png

My friend Domenico Fiormonte at l’Università di Roma Tre, Dipartimento di Italianistica, has a blog I just found out about with Paolo Sordi called, Infolet – Informatica e letteratura (Informatics and Litterature.) They write longer thoughtful entries (in Italian) rather than my short ones.

In an entry Dai margini dell’Impero (From the margins of the Empire) Domenico criticizes “anglonorthern” computing humanists at DH 2008 for excessive specialization and excessive focus on electronic texts (and a particularly narrow version of text at that.) He goes on to say that we have known there is an anglo-american hegemony (of two or three centres) in the management, both political and scientific of the digital. (See the paper, “The international debate on humanities computing: education, technology and the primacy of languages” PDF in English for a longer discussion of this). These are strong words that, at the very least, reflect a sense of marginalization of researchers working in the European South on Romance languages and coming from a philological tradition.

I am torn as to how to respond to Domenico, but respond we should because he is willing to say things that many feel. Whether we believe the colonialization rhetoric or not, we should be willing to talk about internationalization internationally (and in multiple languages.) My response to the entry and the subsequent comments can be read in the comment I left.

The issue of internationalization and marginalization resonates partly because I work in Canada and here we have a close, but not always equal, relationship with researchers in the US and the UK. To be fair, I think we feel in Canada that we are welcome in digital humanities societies and that US colleagues are more than willing to collaborate. We also are aware of our own fetish of the issue that can distract from meaningful collaboration. If anything we may have a greater role internationally than the size of the population would merit. Our problem is that we ourselves can get caught marginalizing our Québécois colleagues. We have our own two-nations version of this marginalization problem – how to foster a truly bilingual research community avoiding “two solitudes” of research silos, an English rest-of-Canada community and a francophone Québécois community? Our Society for Digital Humanities / Société pour l’étude des médias interactifs is a real and sustained attempt to address bilingual research. Ray Siemens and Christian Vandendorpe deserve a lot of credit for their ongoing efforts in this regard, but we have a ways to go.

How to dispose of your computer: In Loving Memory of the Mainframe (aka IMS)

In Loving Memory of the Mainframe (aka IMS) is a site with a YouTube video of the goodbye New Orleans jazz funeral that was held outside in the snow at the University of Manitoba for their IBM 650 mainframe. See the Network World story How to really bury a mainframe. The Network World site provides a transcript of the eulogy including this,

Farewell IMS, we’ll remember you well. After forty-seven years, there are many stories to tell. Like when Tel Reg nearly shut down MTS, and when the Y2K bug put us under duress. You helped us achieve our academic objectives, and gave our admin processes a proper perspective.

But now we must lay you under the flora, because we have to go deal with this bloody Aurora. So we commit your parts to be recycled. Earth to Earth. Ashes to Ashes. Dust to Dust. To the god of computers, please bless it and keep it, and give it grace and peace, but please do not resurrect it.

Now, how do we bury projects this gracefully?

Rome Reborn in Google Earth

Image of Google Rome Ever wondered what it was like to stand in the Roman forum back in 320 CE? Well, growing up in Rome and being dragged through the now hot and dusty forum I have wondered what it was like back then amny times. Now I can fly around imperial Rome thanks to a collaboration between the Rome Reborn project led by Bernie Frischer at Virginia and Google Earth. You can download the latest Google Earth viewer and relevant layers at Google Earth Rome. All that is missing is people.

This project has recieved a lot of press like the BBC story, Google Earth revives ancient Rome. I first noticed it on the Italian Google News where it made the Top Stories front page yesterday (called Prima Pagina in the Italian.) The mayor of Rome, Gianni Alemanno even blogged it on the Google blog inviting people to tour.

The idea that virtual technologies now let people experience the city that I guide as it appeared in 320 A.D. fills me with pride — a pride that I inherited from Rome’s glorious past.

As a humorous aside, there is an interesting view to be had if you go through the “floor” of ancient Rome. Then you see the satellite view of modern Rome (flattened) below the ancient 3D model in an interesting inversion of the archaeological layers.

Screen Shot from Google Earth Rome

Here you see the distinctive design of Michelangelo’s Campidoglio beneath the model. The lines are the flags for items of interest that you can click on to get descriptions of the buildings.