Research in Support of Digital Libraries at Xerox PARC Part II: Technology

I came across an interesting article in D-Lib that summarizes some of the work at Xerox PARC, Research in Support of Digital Libraries at Xerox PARC Part II: Technology. This is, as the title suggests, is the second part of an extended survey. The article covers some projects on subjects like “visualization of large text collections, summarization, and automatic detection of thematic structure.” There are some interesting examples of citation browsing tools like the Butterfly Citation Browser here.

Another humanities computing centre is dissolved

On Humanist there was an announcement that John Dawson, the Manager of the Literary and Linguistic Computing Centre of Cambridge (LLCC), was retiring and they were having a 45th year celebration conference and retirement party. What the announcement doesn’t say is that with the retirement of Dawson the Cambridge Computing Service is decommissioning the LLCC. I found this on a Computing Service page dedicated to the LLCC:

John Dawson, Manager of the centre will be retiring in October 2009. The LLCC will then cease to exist as a distinct unit, but Rosemary Rodd, current Deputy Manager, will continue to provide support for Humanities computing as a member of the Computing Service’s Technical User Services. She will be based on the New Museums Site.

It seems symptomatic of some shift.

Gaming as Actions: Students Playing a Mobile Educational Computer Game

The online journal Human IT has an issue on gaming with an interesting article about mobile gaming (or augmented reality gaming) for education. See Elisabet M. Nilsson & Gunilla Svingby: Gaming as Actions: Students Playing a Mobile Educational Computer Game. The article has a clear and short summary of the literature around serious games and education that points out that there isn’t yet much evidence for the theoretical claims.

The overall conclusion seems to be that even if several studies show effects on learning as well as on attitudes, empirical evidence is still lacking in support of the assumption that computer games are advantageous for use in educational settings. (p. 28-9)

The article touches on the problem we all have when we ask students to role play (whether as part of a game or simulation), which is how seriously they take it.

Some of the groups had a clear ironic touch on almost all of their utterances, at the same time as they were taking on the assignment with a serious attitude. When playing the game, they seemed to constantly oscillate back and forth between the imagined game world and their own reality. They played their alloted fictive role, and at the same time referred to their own personal experiences. (p. 43)

I’m convinced this irony has to do with how comfortable students feel playing roles before others. What does it mean in the web of class relationships to ask a student to act before others? Should they have a choice? Obviously they handle the uncomfort with irony as a way of preserving their identity in the class. That they can do both (play a fictive role and their ironic self) at the same time is impressive. On page 53 the authors suggest that a context where students can alternate (motivations) could make for an “engaging learning experience.”

Almost Augmented Reality

Augmented reality is almost real according to a BBC story by Michael Fitzpatrick, Mobile phones get cyborg vision. Developers like Layar have made it possible to get realtime information about your surroundings overlayed over what your camera sees.

Launched this June in Amsterdam, residents and visitors can now see houses for sale, popular bars and shops, jobs on offer in the area, and a list of local doctors and ATMs by scanning the landscape with the software.

The social media implications are tremendous – imagine having a myPlace site where I can add meaning to locations that others can view. Historical tours, ghost stories, contextual music, political rants and so on could be added to real locations.

Thanks to Sean for this.

BBC links to other news sites: Moreover Technology

The BBC News has an interesting feature where their stories link to other stories on the same subject from other news sources. See for example the story on, Chavez backer held over TV attack – on the right there are links to stories on the same subject from other news venues like the Philadelphia Inquirer. They even explain why the BBC links to other news sites.

How does it work?

The Newstracker system uses web search technology to identify content from other news websites that relates to a particular BBC story. A news aggregator like Google News or Yahoo News uses this type of technique to compare the text of stories and group similar ones together.

BBC News gets a constantly updating feed of stories from around 4000 different news websites. The feed is provided to us by Moreover Technologies. The company provides a similar service for other clients.

Our system takes the stories and compares their text with the text of our own stories. Where it finds a match, we can provide a link directly from our story to the story on the external site.

Because we do this comparison very regularly, our stories contain links to the most relevant and latest articles appearing on other sites.

Sounds like an interesting use of “real time” text analysis and an alternative to Google News. Could we implement something like that for blogs? The company that provides them with this is Moreover Technologies.

Extracts from original TEI planning proposal

I recently discovered (thanks to a note from Lou Burnard to the TEI list) a document online with extracts from the Funding Proposal for Phase 1 (Planning Conference) for the Text Encoding Initiative which led to the Poughkeepsie conference of 1987 that laid out the plan for the TEI.

The document is an appendix to the 1988 full Proposal for Funding for An Initiative to Formulate Guidelines for the Encoding and Interchange of Machine-Readable Texts. The planning proposal led to the Poughkeepsie conference where consensus was developed that led to the full proposal that funded the initial development of the TEI Guidelines. (Get that?)

The doubled document (the Extracts of the first proposal is an appendix to the 1988 proposal) is fascinating to read 20 years later. In section “3.4 Concrete Results” of the full proposal they describe the outcomes of the full grant thus:

Ultimately, this project will produce a single potentially large document which will:

  • define a format for encoded texts, into which texts prepared using other schemes can be translated,
  • define a formal metalanguage for the description of encoding schemes,
  • describe existing schemes (and the new scheme) formally in that metalanguage and informally in prose,
  • recommend the encoding of certain textual features as minimal practice in the encoding of new texts,
  • provide specific methods for encoding specific textual features known empirically to be commonly used, and
  • provide methods for users to encode features not already provided for, and for the formal definition of these extensions.

I am struck by how the TEI has achieved most of these goals (and others, like a consortial structure for sustainable evolution.) It is also interesting to note what seems to have been done differently, like the second and third bullet points – the development of a “metalanguage for the description of encoding schemes” and “describing existing schemes” with it. I hadn’t thought of the TEI Guidelines as a project to document the variety encoding schemes. Have they done that?

Another interesting wrinkle is in the first proposal extracts where the document talks about “What Text ‘Encoding’ Is”. First of all, why the single quotation marks around “encoding” – was this a new use of the term then? Second, they mention that “typically, a scheme for encoding texts must include:”

Conventions for reducing texts to a single linear sequence wherever footnotes, text-critical apparatus, parallel columns of text (as in polyglot texts), or other complications make the linear sequence problematic.

It is interesting to see linearity creep into what encoding schemes “must” do, including one that is ultimately hierarchical and non-linear. I wonder how to interpret this – is it simply a pragmatic matter of how to you organize the linear sequence of text and code in the TEI document, especially when what you are trying to represent is not linear? Could it be the need for encoded text to be a “string” for the computer to parse? Time to ask someone.

Drawing attention to the things that seem strange obscures the fact that these two proposals were immensely important for digital humanities. They describe how the proposers imagined problems of text representation could be solved by an international project. We can look back and admire the clarity of vision that led to the achievements of the TEI – achievements of not just a few people, but of many organized as per the proposal. These are beautiful (and influential) administrative documents, if we dare say there is such a thing. I would say that they and the Guidelines themselves are some of the most important scholarship in our field.

Sperberg-McQueen: Making a Synthesizer Sound like an Oboe

Michael Sperberg-McQueen has an interesting colloquium paper that I just came across, The State of Computing in the Humanities: Making a Synthesizer Sound like an Oboe. There is an interesting section on “Document Geometries” where he describes different ways we represent texts on a computer from linear ways to typed hierarchies (like TEI.)

The entire TEI Guidelines can be summed up in one phrase, which we can imagine directed at producers of commercial text processing software: “Text is not simple.”

The TEI attempts to make text complex — or, more positively, the TEI enables the electronic representation of text to capture more complexity than is otherwise possible.

The TEI makes a few claims of a less vague nature, too.

  • Many levels of text, many types of analysis or interpretation, may coexist in scholarship, and thus must be able to coexist in markup.
  • Text varies with its type or genre; for major types the TEI provides distinct base tag sets.
  • Text varies with the reader, the use to which it is put, the application software which must process it; the TEI provides a variety of additional tag sets for these.
  • Text is linear, but not completely.
  • Text is not always in English. It is appalling how many software developers forget this.

None of these claims will surprise any humanist, but some of them may come as a shock to many software developers.

This paper also got me thinking about the obviousness of structure. McQueen criticizes the “tagged linear” geometry (as in COCOA tagged text) thus,

The linear model captures the basic linearity of text; the tagged linear model adds the ability to model, within limits, some non-linear aspects of the text. But it misses another critical characteristic of text. Text has structure, and textual structures can contain other textual structures, which can contain still other structures within themselves. Since as readers we use textual structures to organize text and reduce its apparent complexity, it is a real loss if software is incapable of recognizing structural elements like chapters, sections, and paragraphs and insists instead on presenting text as an undifferentiated mass.

I can’t help asking if text really does have structure or if it is in the eye of the reader. Or perhaps, to be more accurate, if text has structure in the way we mean when we tag text using XML. If I were to naively talk about text structure I would actually be more likely to think of material things like pages, cover, tabs (in files), and so on. I might think of things that visually stand out like sidebars, paragraphs, indentations, coloured text, headings, or page numbers. None of these are really what gets encoded in “structural markup.” Rather what gets encoded is a logic or a structure in the structuralist sense of some underlying “real” structure.

Nonetheless, I think Sperberg-McQueen is onto something about how readers use textual structures and the need to therefore give them similar affordances. I would rephrase the issue as a matter of giving readers affordances with which to manage the complexity and amount of text. A book gives you things like a Table of Contents and Index. An electronic text (or electronic book) doesn’t have to give you exactly the same affordances, but we do need some ways of managing the excess complexity of text. In fact, we should be experimenting with what the computer can do well rather than reimplementing what paper does well. You can’t flip pages on the computer or find a coffee stain near something important, but you can scroll or search for a pattern. The TEI and logical encoding is about introducing computationally useful structure, not reproducing print structures. That’s why pages are so awkward in the TEI.

Update: The original link to the paper doesn’t work now, try this SSOAR Link – they have a PDF. (Thanks to Michael for pointing out the link rot.)

What would Dante think? EA puts sexual bounty on booth babes

dantesLust

Ars Technica has a story about how EA puts sexual bounty on the heads of its own booth babes.

EA has a new way to annoy its own models: give out prizes for Comic Con attendees who commit acts of lust with their booth babes. Also, if you win, you get to take the lady out to dinner! This is going to end well for everyone involved.

All this to promote Dante’s Inferno, their new game. I wish I had the time to identify which circle of hell Dante would have put the marketing idiot who came up with this embarassment.

Light Industry

Edward Kienholz's Friendly Grey Computer (1965)

Edward Kienholz’s Friendly Grey Computer (1965)

Browsing about Edward Kienholz’s “Friendly Grey Computer” (1965) I came across Light Industry’s Theatre of Code.

Theater of Code will present three performance/interventions that explore how computer code, scripting language, and software applications relate to the movement of bodies and the staging and choreography of our lives.

The one that intrigues me the most is Website Impersonations: The Ten Most Visited, where a performer embodies a web site, translating the HTML code on the fly. For example, here is www.yahoo.com being performed in 2008. Here is a documentary that explains the www.facebook.com performance.

Web Impersonations Stage Diagram

Web Impersonations Stage Diagram