BBC links to other news sites: Moreover Technology

The BBC News has an interesting feature where their stories link to other stories on the same subject from other news sources. See for example the story on, Chavez backer held over TV attack – on the right there are links to stories on the same subject from other news venues like the Philadelphia Inquirer. They even explain why the BBC links to other news sites.

How does it work?

The Newstracker system uses web search technology to identify content from other news websites that relates to a particular BBC story. A news aggregator like Google News or Yahoo News uses this type of technique to compare the text of stories and group similar ones together.

BBC News gets a constantly updating feed of stories from around 4000 different news websites. The feed is provided to us by Moreover Technologies. The company provides a similar service for other clients.

Our system takes the stories and compares their text with the text of our own stories. Where it finds a match, we can provide a link directly from our story to the story on the external site.

Because we do this comparison very regularly, our stories contain links to the most relevant and latest articles appearing on other sites.

Sounds like an interesting use of “real time” text analysis and an alternative to Google News. Could we implement something like that for blogs? The company that provides them with this is Moreover Technologies.

Extracts from original TEI planning proposal

I recently discovered (thanks to a note from Lou Burnard to the TEI list) a document online with extracts from the Funding Proposal for Phase 1 (Planning Conference) for the Text Encoding Initiative which led to the Poughkeepsie conference of 1987 that laid out the plan for the TEI.

The document is an appendix to the 1988 full Proposal for Funding for An Initiative to Formulate Guidelines for the Encoding and Interchange of Machine-Readable Texts. The planning proposal led to the Poughkeepsie conference where consensus was developed that led to the full proposal that funded the initial development of the TEI Guidelines. (Get that?)

The doubled document (the Extracts of the first proposal is an appendix to the 1988 proposal) is fascinating to read 20 years later. In section “3.4 Concrete Results” of the full proposal they describe the outcomes of the full grant thus:

Ultimately, this project will produce a single potentially large document which will:

  • define a format for encoded texts, into which texts prepared using other schemes can be translated,
  • define a formal metalanguage for the description of encoding schemes,
  • describe existing schemes (and the new scheme) formally in that metalanguage and informally in prose,
  • recommend the encoding of certain textual features as minimal practice in the encoding of new texts,
  • provide specific methods for encoding specific textual features known empirically to be commonly used, and
  • provide methods for users to encode features not already provided for, and for the formal definition of these extensions.

I am struck by how the TEI has achieved most of these goals (and others, like a consortial structure for sustainable evolution.) It is also interesting to note what seems to have been done differently, like the second and third bullet points – the development of a “metalanguage for the description of encoding schemes” and “describing existing schemes” with it. I hadn’t thought of the TEI Guidelines as a project to document the variety encoding schemes. Have they done that?

Another interesting wrinkle is in the first proposal extracts where the document talks about “What Text ‘Encoding’ Is”. First of all, why the single quotation marks around “encoding” – was this a new use of the term then? Second, they mention that “typically, a scheme for encoding texts must include:”

Conventions for reducing texts to a single linear sequence wherever footnotes, text-critical apparatus, parallel columns of text (as in polyglot texts), or other complications make the linear sequence problematic.

It is interesting to see linearity creep into what encoding schemes “must” do, including one that is ultimately hierarchical and non-linear. I wonder how to interpret this – is it simply a pragmatic matter of how to you organize the linear sequence of text and code in the TEI document, especially when what you are trying to represent is not linear? Could it be the need for encoded text to be a “string” for the computer to parse? Time to ask someone.

Drawing attention to the things that seem strange obscures the fact that these two proposals were immensely important for digital humanities. They describe how the proposers imagined problems of text representation could be solved by an international project. We can look back and admire the clarity of vision that led to the achievements of the TEI – achievements of not just a few people, but of many organized as per the proposal. These are beautiful (and influential) administrative documents, if we dare say there is such a thing. I would say that they and the Guidelines themselves are some of the most important scholarship in our field.

Sperberg-McQueen: Making a Synthesizer Sound like an Oboe

Michael Sperberg-McQueen has an interesting colloquium paper that I just came across, The State of Computing in the Humanities: Making a Synthesizer Sound like an Oboe. There is an interesting section on “Document Geometries” where he describes different ways we represent texts on a computer from linear ways to typed hierarchies (like TEI.)

The entire TEI Guidelines can be summed up in one phrase, which we can imagine directed at producers of commercial text processing software: “Text is not simple.”

The TEI attempts to make text complex — or, more positively, the TEI enables the electronic representation of text to capture more complexity than is otherwise possible.

The TEI makes a few claims of a less vague nature, too.

  • Many levels of text, many types of analysis or interpretation, may coexist in scholarship, and thus must be able to coexist in markup.
  • Text varies with its type or genre; for major types the TEI provides distinct base tag sets.
  • Text varies with the reader, the use to which it is put, the application software which must process it; the TEI provides a variety of additional tag sets for these.
  • Text is linear, but not completely.
  • Text is not always in English. It is appalling how many software developers forget this.

None of these claims will surprise any humanist, but some of them may come as a shock to many software developers.

This paper also got me thinking about the obviousness of structure. McQueen criticizes the “tagged linear” geometry (as in COCOA tagged text) thus,

The linear model captures the basic linearity of text; the tagged linear model adds the ability to model, within limits, some non-linear aspects of the text. But it misses another critical characteristic of text. Text has structure, and textual structures can contain other textual structures, which can contain still other structures within themselves. Since as readers we use textual structures to organize text and reduce its apparent complexity, it is a real loss if software is incapable of recognizing structural elements like chapters, sections, and paragraphs and insists instead on presenting text as an undifferentiated mass.

I can’t help asking if text really does have structure or if it is in the eye of the reader. Or perhaps, to be more accurate, if text has structure in the way we mean when we tag text using XML. If I were to naively talk about text structure I would actually be more likely to think of material things like pages, cover, tabs (in files), and so on. I might think of things that visually stand out like sidebars, paragraphs, indentations, coloured text, headings, or page numbers. None of these are really what gets encoded in “structural markup.” Rather what gets encoded is a logic or a structure in the structuralist sense of some underlying “real” structure.

Nonetheless, I think Sperberg-McQueen is onto something about how readers use textual structures and the need to therefore give them similar affordances. I would rephrase the issue as a matter of giving readers affordances with which to manage the complexity and amount of text. A book gives you things like a Table of Contents and Index. An electronic text (or electronic book) doesn’t have to give you exactly the same affordances, but we do need some ways of managing the excess complexity of text. In fact, we should be experimenting with what the computer can do well rather than reimplementing what paper does well. You can’t flip pages on the computer or find a coffee stain near something important, but you can scroll or search for a pattern. The TEI and logical encoding is about introducing computationally useful structure, not reproducing print structures. That’s why pages are so awkward in the TEI.

Update: The original link to the paper doesn’t work now, try this SSOAR Link – they have a PDF. (Thanks to Michael for pointing out the link rot.)

What would Dante think? EA puts sexual bounty on booth babes


Ars Technica has a story about how EA puts sexual bounty on the heads of its own booth babes.

EA has a new way to annoy its own models: give out prizes for Comic Con attendees who commit acts of lust with their booth babes. Also, if you win, you get to take the lady out to dinner! This is going to end well for everyone involved.

All this to promote Dante’s Inferno, their new game. I wish I had the time to identify which circle of hell Dante would have put the marketing idiot who came up with this embarassment.

Light Industry

Edward Kienholz's Friendly Grey Computer (1965)

Edward Kienholz’s Friendly Grey Computer (1965)

Browsing about Edward Kienholz’s “Friendly Grey Computer” (1965) I came across Light Industry’s Theatre of Code.

Theater of Code will present three performance/interventions that explore how computer code, scripting language, and software applications relate to the movement of bodies and the staging and choreography of our lives.

The one that intrigues me the most is Website Impersonations: The Ten Most Visited, where a performer embodies a web site, translating the HTML code on the fly. For example, here is being performed in 2008. Here is a documentary that explains the performance.

Web Impersonations Stage Diagram

Web Impersonations Stage Diagram

579m Virtual World Registered Accounts: Kzero


Kzero have released some interesting information about the “virtual worlds sector”, see 579m Virtual World Registered Accounts. If they are right it is kids between 10 and 15 who are the big joiners. They are “57% of the overall total.”

Also check out their Universe visualizations of the different virtual worlds by age segment. World of Warcraft doesn’t seem to be on their charts.

Lilian Edwards: Facebook When You Die

Thomas Crampton has a fascinating video interview with Lilian Edwards of panGloss on the subject of what happens to your online identity like Facebook When You Die. He wrote notes on the interview on his blog here. What intrigued me was the emergence of online services like (free) Dead Man’s Switch which will send a bunch of e-mails to whoever you want if you don’t respond to some regular ping. A more commercial option is Legacy Locker which,

is a safe, secure repository for your vital digital property that lets you grant access to online assets for friends and loved ones in the event of loss, death, or disability.

They use “Verifiers” as in people you trust to confirm if you are dead or disabled. They also offer more services.

An alternative is to add an envelop with a list of your passwords to your will. I’m also told that you should explicitly will your domain names to your heirs if you don’t want them contested. (Who would contest “”?)

Office of the Privacy Commissioner » Blog Archive » Report of Findings with respect to Facebook

The Office of the Privacy Commissioner of Canada has issued a Report of Findings with respect to Facebook. The OPC investigated Facebook after a complaint by Canadian Internet Policy and Public Interest Clinic (CIPPIC)and concluded that four aspects of the complaint were well founded. In some cases Facebook has agreed to change things, but they have not agreed to recommendations about third-party applications which have access not only to your information, but that of friends who have not agreed.

When users add an application, they consent to giving the application’s developer access to some of their personal information, as well as that of their “friends.” Moreover, the only way that users can refuse to share personal information when their friends add applications is by opting completely out of all applications, or blocking specific applications.

Michael Geist has a nice summary of the finding at Privacy Commissioner Finds Facebook Violating Canadian Privacy Law, though he doesn’t mention explicitly what bothers me – the ability of an application to mine information about me if a “friend” agrees. There is, in the comments, a discussion of what can be done if Facebook doesn’t comply that is interesting.

YouTube: Possible ou Probable

The French publishing company editis put together a short video about books, Possible ou Probable (in French.) The video presents a world where different formats of e-book readers exist. You can go to a bookstore and scan the bar code of a paper book and then buy the electronic version. You can edit your own guide books or scrap books. The e-book readers, like an iPhone, actually are touch-screen tablets for reading, editing, and multimedia.

The video is professionally done and takes place in Paris and Bruges. Thanks to Stéfan and Matt K for this.