JSTOR: Data for Research Visualization

"Dialogue" in Philosophy Journals
"Dialogue" in Philosophy Journals

Thanks to Judith I have been playing with JSTOR’s Data for Research (DfR). They provide a faceted way of visualizing and search the entire JSTOR database. Features include:

  • Full-text and fielded searching of the entire JSTOR archive using a powerful faceted search interface. Using this interface one can quickly and easily define content of interest through an iterative process of searching and results filtering.
  • Online viewing of document-level data including word frequencies, citations, key terms, and ngrams.
  • Request and download datasets containing word frequencies, citations, key terms, or ngrams associated with the content selected.
  • API for content selection and retrieval. (from the About page)

I’m impressed by how much they expose. They even have a Submit Data Request and an API. This is important – we are seeing a large scale repository exposing its information to new types of queries other than just search.

Rock-afire Explosion Clip – Rockafiremovie.com

rockafire

Shannon pointed me to The Rock-afire Explosion, an animatronic band from the 80s that was one of the entertainments at Showbiz Pizza. Rock-afire Explosion has been resurrected by a fan and one of the original creators of Creative Engineering who are programming tunes and uploading video to YouTube. See, for example, Madonna’s 4 Minutes. They take bids on New Shows to Program at a strange and not very clear site. If you bid high enough and it isn’t “dirty” they will program the animatronic band to do a song you want. (Would they do Plato’s dialogues?)

I cannot begin to describe how strangely captivating this all is. Perhaps the documentary made about it (see Rockafiremovie.com) captures the passion. Or, for a computing perspective, see the clip about Programming the Rock-afire Explosion.

Whatever happened to animatronics? Will it make a comeback now that we all carry around smartphones that can control things?

Internet Archive: Movies from the History of Computing

Willard McCarty on Humanist (Vol. 23, No. 116.) pointed to some early films about computing which are worth looking at. One is “The Information Machine” from IBM in 1956. It is an animated cartoon which presents the computer in a history of human information invention. It presents three functions for computing:

  1. Control or Balance (controlling complex systems)
  2. Design (helping us design and think)
  3. Simulation (modelling and predicting)

Another film is On Guard! The Story of SAGE also from IBM. This is about IBM’s contributions to air defense, specifically the SAGE system and the development of airborn modular computing. There is a fun part about the interactive operator terminal that visualizes data (as opposed to a TV that shows video.) The narrator actually talks about visualization (though not interactivity.

RFCs: How the Internet Got Its Rules

Stephen D. Crocker has written an Op-Ed on How the Internet Got Its Rules (April 6, 2009) about the Request for Comments or R.F.C.’s of the Internet. He looks back on writing the first R.F.C. 40 years ago as a student assigned to write up notes from a meeting. He chose the to call it a R.F.C. because:

What was supposed to be a simple chore turned out to be a nerve-racking project. Our intent was only to encourage others to chime in, but I worried we might sound as though we were making official decisions or asserting authority. In my mind, I was inciting the wrath of some prestigious professor at some phantom East Coast establishment. I was actually losing sleep over the whole thing, and when I finally tackled my first memo, which dealt with basic communication between two computers, it was in the wee hours of the morning.

Calling them R.F.C.’s set the tone for the consensual culture.

The early R.F.C.’s ranged from grand visions to mundane details, although the latter quickly became the most common. Less important than the content of those first documents was that they were available free of charge and anyone could write one. Instead of authority-based decision-making, we relied on a process we called “rough consensus and running code.” Everyone was welcome to propose ideas, and if enough people liked it and used it, the design became a standard.

Another feature was layering for independence that allowed people to build new technologies on older ones without asking permission.

Thanks to Dan Cohen on Twitter for this.

H. P. Luhn, KWIC and the Concordance

We all know that the Google display comes indirectly from the Concordance, but I have found in Luhn’s 1966 “Keyword-in-Context Index for Technical Literature (Kwic Index)” the explicit recognition of the link and the reason for drawing on the concordance.

the significance of such single keywords could, in most instances, be determined only by referring to the statement from which the keyword had been chosen. This somewhat tedious procedure may be alleviated to a significant degree by listing selected keywords together with surrounding words that act as modifiers pointing up the more specific sense in which a keyword has been applied. This method of indexing words is well established in the process of compiling concordances of important works of literature of the past. The added degree of information conveyed by such keyword-in-context indexes, or “KWIC Indexes” for short, can readily be provided by automatic processing. (p. 161)

The problem for Luhn is that simply retrieving words doesn’t give you a sense of their use. His solution, first shown in the late 1950s, was to provide some context (hence “keyword-in-context”) so that readers can disambiguate themselves and make decisions about which index items to follow. It is from the KWIC that we ultimately get the concordance features of the Google display, though it should be noted that Luhn was proposing KWIC as a way of printing automatically generated literature indexes where the kewwords were in the titles. In this quote Luhn explicitly acknowledges that this is a method well established in concordances.

There is also a link between Luhn and Father Busa. According to Black, quoted in Marguerite Fischer, “The Kwic Index Concept: A Retrospective View”,

the Pontifical Faculty of Philosophy in Milan decided that they would make an analytical index and concordance to the Summa Theologica of St. Thomas Aquinas, and approached IBM about the possibility of having the operations performed on Data Processing. Experience gained in this project contributed towards the development of the KWIC Index. (This is a quote on page 123 from Black, J. D., 1962, “The Keyword: Its Use in Abstracting, Indexing, and Retrieving Information”.)

From the concordance to KWIC through to Google?

For some historical notes on Luhn see, H. P. Luhn and Automatic Indexing.

Vice President Al Gore

Icon of ComputerPeter O sent me a link to the original 1994 web page for Vice President Al Gore kept by NARA, the National Archives and Records Administration (of the USA.) What is amusing is that this copy of Gore’s page looks really dated and positions him as a pioneer of the Internet:

Vice President Gore, having first coined the term “information superhighway” 17 years ago, is the recognized public leader in the development of the National Information Infrastructure (NII).

Not quite the same as saying he invented it. To see the page Gore’s page linked from go to the White House page. Many of the links work, though not Clinton’s page.

Globe and Mail: The big ideas of 2009

Saturday’s Globe and Mail had a full page on The Big Ideas of 2009. The listed five, three of which have to do with information technology and two with biology.

  1. Do-It-Yourself DNA
  2. The 3-D Revolution (as in 3-D movies and screens)
  3. The Age of Avatars (as in your avatars will become transportable across virtual worlds)
  4. Grow Your Own Tissue
  5. Reality Check for Social Networks (as in Social Networks aren’t getting the advertising and will lose momentum)

These ideas seem to be about the body and space with the possible exception of the 5th which is not really a big idea so much as a correction. I would like to suggest a different list around time:

  1. 3-D Social Year It’s Facebook
  2. Genome Online Networks Technology
  3. DNA Cells Web Tissue Users
  4. 000 Second Time World Human User Sites
  5. Life Canada said Ko using virtual advertising avatars

This list was generated scientifically. I took the text of the Globe story (edited it down to just the titles, text and authors), ran it through the TAPoRware List Words (with a stop word list), and then took the sequence of high frequency words in the order they appeared and broke it into phrases (without deleting any). This is a technique I learned from David Hoover who performed it at the Face of Text conference. It is surprising how often you can find suggestive phrases in a frequency sorted word list. I will let you interpret this oracle, but remember that you read “Second Time” here first. This list is what the Globe author’s really meant for 2009.

As an aside, I should say that the reason I am blogging this today (January 9th) is because Saturday’s paper (January 3rd) was delivered to our house today. I didn’t confuse things as we were travelling Saturday and the paper was cancelled until Monday. When we called the circulation desk they told us other people in Edmonton had had the wrong papers delivered. Here is the note I sent the editors this morning:

 I would like to thank the Globe and Mail for delivering Saturday’s (Jan. 3rd) paper to my house today (Jan. 9th.) As the Globe knows, we are behind in Edmonton and need the chance to catch up with all the timeless opinions gathered. It was particularly kind of the Globe since I hadn’t read Saturday’s edition as I was traveling. I managed to get half way through the paper before realizing that I was reading old news.

I do want to take issue with your list of 5 burgeoning ideas (A 10). Two of “the big ideas” have to do with the compression of space (“The 3-D Revolution” and “The Age of Avatars”) but you neglected the big ideas in the compression of time. I would suggest that the really big idea is the “New News” otherwise known as nNews or iNews. What matters in this day of personalization is what news is new to the individual avatar, and what time they are in (like the burgeoning age of avatars.) In Second Life my avatar wants second news, and today you delivered.

What I don’t understand is why we got Saturday’s paper while others apparently got Monday’s. (This is according to the kind and real human at the circulation desk who told us others got their New News too, but a different edition.) How did you know I was exactly 6 days behind?

Blog: Infolet – Informatica e letteratura

picture-7.png

My friend Domenico Fiormonte at l’Università di Roma Tre, Dipartimento di Italianistica, has a blog I just found out about with Paolo Sordi called, Infolet – Informatica e letteratura (Informatics and Litterature.) They write longer thoughtful entries (in Italian) rather than my short ones.

In an entry Dai margini dell’Impero (From the margins of the Empire) Domenico criticizes “anglonorthern” computing humanists at DH 2008 for excessive specialization and excessive focus on electronic texts (and a particularly narrow version of text at that.) He goes on to say that we have known there is an anglo-american hegemony (of two or three centres) in the management, both political and scientific of the digital. (See the paper, “The international debate on humanities computing: education, technology and the primacy of languages” PDF in English for a longer discussion of this). These are strong words that, at the very least, reflect a sense of marginalization of researchers working in the European South on Romance languages and coming from a philological tradition.

I am torn as to how to respond to Domenico, but respond we should because he is willing to say things that many feel. Whether we believe the colonialization rhetoric or not, we should be willing to talk about internationalization internationally (and in multiple languages.) My response to the entry and the subsequent comments can be read in the comment I left.

The issue of internationalization and marginalization resonates partly because I work in Canada and here we have a close, but not always equal, relationship with researchers in the US and the UK. To be fair, I think we feel in Canada that we are welcome in digital humanities societies and that US colleagues are more than willing to collaborate. We also are aware of our own fetish of the issue that can distract from meaningful collaboration. If anything we may have a greater role internationally than the size of the population would merit. Our problem is that we ourselves can get caught marginalizing our Québécois colleagues. We have our own two-nations version of this marginalization problem – how to foster a truly bilingual research community avoiding “two solitudes” of research silos, an English rest-of-Canada community and a francophone Québécois community? Our Society for Digital Humanities / Société pour l’étude des médias interactifs is a real and sustained attempt to address bilingual research. Ray Siemens and Christian Vandendorpe deserve a lot of credit for their ongoing efforts in this regard, but we have a ways to go.