Cornell Web Lab: Large scale web research

Diagram from Web Lab Paper

The Cornell Web Lab is an interesting example of a high performance computing project in the humanities and social sciences. As they say,

The Web Laboratory is a joint project of Cornell University and the Internet Archive to provide data and computing tools for research about the Web and the information on the Web.

In a paper on the project, A Research Library Based on the Historical Collections of the Internet Archive, William Arms and colleagues point out that the data challenge of the social sciences (and humanities) is that the data is poorly structured and there is a lot of it. The Internet Archive is a case in point; as of 2006 they had 5 to 6 petabytes of data of web pages. While it is amazing that we have such archives in computer (and human) readable form, it is hard to do anything with that much. The Web Lab approach is to provide HPC basic services for extracting subsets of the whole that can then be used by other tools.

Flamenco: Faceted browsing

Screen shot of Flamenco

FLAMENCO stands for FLexible information Access using MEtadata in Novel COmbinations and is an open source faceted browser framework. It is developed in Python and uses Lucene and MySQL to give developers a framework to develop browsing interfaces for collections. There are some nice demos, from which the image above was taken (it is from the Nobel Prize Winners demo and I chose to see only women winners.)

We experimented with a faceted browser for the second generation of the McMaster Museum of Art Online Roman Coin Collection. The model works well with smaller and visual collections, but with items that are harder to represent in small representations we need to think about visual cues. How, for example, might we represent a web page as a small icon in a visual browser so that you can recognize it? Stan Ruecker has been working on interesting models.

Beyond Analogue: Current Research in Humanities Computing

topband2.jpg

Beyond Analogue: Current Graduate Research in Humanities Computing is a conference being organized by the Humanities Computing graduate students at the University of Alberta on February 13th. Daniel O’Donnell from U of Lethbridge and Paul Youngman of U of North Carolina-Charlotte will be the keynote speakers. If you are grad student you might want to submit a proposal for a poster or paper. Either way you are welcome to attend the full day conference if in Edmonton that day.

Pliny: Welcome

Screen Shot of Pliny Pliny, the annotation and note management tool by John Bradley at King’s College London just got a Mellon Award for Technology Collaboration.

The Mellon Awards honour not-for-profit organisations for leadership in the collaborative development of open source software tools with application to scholarship in the arts and humanities, as well as cultural-heritage not-for-profit activities.

Pliny is free and you can try it out on the Mac or PC. John has thought a lot about how tools fit in the research process of humanists.

Debategraph: social mapping debates

Screen Shot On the Independent I came across the interactive visualization above onMapping the crisis in Gaza. The visualization environment looks like your standard bubblegraph, but has lots of other features as you can see from the toolbar at the bottom. Here is another view:

Screen Shot

The maps can be edited by users – they have wiki features for those who register accounts. In some ways they are communal mind maps. The software comes from Debategraph.org.

Globe and Mail: The big ideas of 2009

Saturday’s Globe and Mail had a full page on The Big Ideas of 2009. The listed five, three of which have to do with information technology and two with biology.

  1. Do-It-Yourself DNA
  2. The 3-D Revolution (as in 3-D movies and screens)
  3. The Age of Avatars (as in your avatars will become transportable across virtual worlds)
  4. Grow Your Own Tissue
  5. Reality Check for Social Networks (as in Social Networks aren’t getting the advertising and will lose momentum)

These ideas seem to be about the body and space with the possible exception of the 5th which is not really a big idea so much as a correction. I would like to suggest a different list around time:

  1. 3-D Social Year It’s Facebook
  2. Genome Online Networks Technology
  3. DNA Cells Web Tissue Users
  4. 000 Second Time World Human User Sites
  5. Life Canada said Ko using virtual advertising avatars

This list was generated scientifically. I took the text of the Globe story (edited it down to just the titles, text and authors), ran it through the TAPoRware List Words (with a stop word list), and then took the sequence of high frequency words in the order they appeared and broke it into phrases (without deleting any). This is a technique I learned from David Hoover who performed it at the Face of Text conference. It is surprising how often you can find suggestive phrases in a frequency sorted word list. I will let you interpret this oracle, but remember that you read “Second Time” here first. This list is what the Globe author’s really meant for 2009.

As an aside, I should say that the reason I am blogging this today (January 9th) is because Saturday’s paper (January 3rd) was delivered to our house today. I didn’t confuse things as we were travelling Saturday and the paper was cancelled until Monday. When we called the circulation desk they told us other people in Edmonton had had the wrong papers delivered. Here is the note I sent the editors this morning:

 I would like to thank the Globe and Mail for delivering Saturday’s (Jan. 3rd) paper to my house today (Jan. 9th.) As the Globe knows, we are behind in Edmonton and need the chance to catch up with all the timeless opinions gathered. It was particularly kind of the Globe since I hadn’t read Saturday’s edition as I was traveling. I managed to get half way through the paper before realizing that I was reading old news.

I do want to take issue with your list of 5 burgeoning ideas (A 10). Two of “the big ideas” have to do with the compression of space (“The 3-D Revolution” and “The Age of Avatars”) but you neglected the big ideas in the compression of time. I would suggest that the really big idea is the “New News” otherwise known as nNews or iNews. What matters in this day of personalization is what news is new to the individual avatar, and what time they are in (like the burgeoning age of avatars.) In Second Life my avatar wants second news, and today you delivered.

What I don’t understand is why we got Saturday’s paper while others apparently got Monday’s. (This is according to the kind and real human at the circulation desk who told us others got their New News too, but a different edition.) How did you know I was exactly 6 days behind?

LRB: John Lanchester: Is it Art?

Willard McCarty in Humanist (Vol. 22, No. 410) pointed us to the London Review of Books essay, Is it Art? by John Lanchester about videogames. The essay starts by pointing out how videogames have been ignored and segregated despite their economic effect. He then goes on to ask why that is that case leading to thoughts about what might make games art. He makes an interesting connection between the conventions of games (which drive people new to games wild) to ways games are becoming like work. The repetitive worklike aspect of games is something he draws from Steven Poole (see Working for the Man.) Here are some quotes,

From the economic point of view, this was the year video games overtook music and video, combined, in the UK. The industries’ respective share of the take is forecast to be £4.64 billion and £4.46 billion. (For purposes of comparison, UK book publishers’ total turnover in 2007 was £4.1 billion.) As a rule, economic shifts of this kind take a while to register on the cultural seismometer; and indeed, from the broader cultural point of view, video games barely exist. …

There is no other medium that produces so pure a cultural segregation as video games, so clean-cut a division between the audience and the non-audience. Books, films, TV, dance, theatre, music, painting, photography, sculpture, all have publics which either are or aren’t interested in them, but at least know that these forms exist, that things happen in them in which people who are interested in them are interested. They are all part of our current cultural discourse. Video games aren’t. …

Northrop Frye once observed that all conventions, as conventions, are more or less insane; Stanley Cavell once pointed out that the conventions of cinema are just as arbitrary as those of opera. Both those observations are brought to mind by video games, which are full, overfull, of exactly that kind of arbitrary convention. Many of these conventions make the game more difficult. Gaming is a much more resistant, frustrating medium than its cultural competitors. Older media have largely abandoned the idea that difficulty is a virtue; if I had to name one high-cultural notion that had died in my adult lifetime, it would be the idea that difficulty is artistically desirable. …

They have a tightly designed structure in which the player has to earn points to win specific rewards, on the way to completing levels which earn him the right to play on other levels, earn more points to win other rewards, and so on, all of it repetitive, quantified and structured. The trouble with these games – the majority of them – isn’t that they are maladapted to the real world, it’s that they’re all too well adapted. The people who play them move from an education, much of it spent in front of a computer screen, full of competitive, repetitive, quantifiable, measured progress towards goals determined by others, to a work life, much of it spent in front of a computer screen, full of competitive, repetitive, quantifiable, measured progress towards goals determined by others, and for recreation sit in front of a computer screen and play games full of competitive, repetitive, quantifiable, measured progress towards goals determined by others. Most video games aren’t nearly irresponsible enough.

There is a strong sense in Wright’s work that the most interesting thing about his games is what is done with them by the user; that the user’s experiences and reactions and creativity are the most important thing about the game. …

The other way in which games might converge on art is through the beauty and detail of their imagined worlds, combined with the freedom they give the player to wander around in them. Already quite a few games offer what’s known as ‘sandbox’ potential, to allow the player to ignore specific missions and tasks and just to roam around. …

UNESCO: Intangible Cultural Heritage

If one were to ask what cultural practices are incompatible with information technology you would come up with something like UNESCO’s Intangible Cultural Heritage. ICH is the culture that isn’t material like books, paintings, sculpture and buildings. It is the folk practices and oral traditions. ICH is defined in Article 2 of the Convention for the Safeguarding of the Intangible Cultural Heritage (Paris, 17 October 2003),

For the purposes of this Convention,

1. The “intangible cultural heritage” means the practices, representations, expressions, knowledge, skills – as well as the instruments, objects, artefacts and cultural spaces associated therewith – that communities, groups and, in some cases, individuals recognize as part of their cultural heritage. This intangible cultural heritage, transmitted from generation to generation, is constantly recreated by communities and groups in response to their environment, their interaction with nature and their history, and provides them with a sense of identity and continuity, thus promoting respect for cultural diversity and human creativity. For the purposes of this Convention, consideration will be given solely to such intangible cultural heritage as is compatible with existing international human rights instruments, as well as with the requirements of mutual respect among communities, groups and individuals, and of sustainable development.

2. The “intangible cultural heritage”, as defined in paragraph 1 above, is manifested inter alia in the following domains:

(a) oral traditions and expressions, including language as a vehicle of the intangible cultural heritage;

(b) performing arts;

(c) social practices, rituals and festive events;

(d) knowledge and practices concerning nature and the universe;

(e) traditional craftsmanship.

The history of this convention is rooted in finding ways to preserve heritage that, not being material, can’t be preserved through physical preservation or representation. It is therefore concerned with preserving that which resists technologies of information.

Picture of the Tenores di Bitti

I came across this on the site of Tenores di Bitti “Mialinu Pira”, a voice group signing in the pastoral oral tradition of Sardinia that has been added to the Intangible Heritage list (as of 2008). As the UNESCO site puts it,

Canto a tenore has developed within the pastoral culture of Sardinia. It represents a form of polyphonic singing performed by a group of four men using four different voices called bassu, contra, boche and mesu boche. One of its characteristics is the deep and guttural timbre of the bassu and contra voices. It is performed standing in a close circle. The solo singers chants a piece of prose or a poem while the other voices form an accompanying chorus.

What is interesting is that this group is named after an Italian anthropologist, Michelangelo “Mialinu” Pira whose best known book, La rivolta dell’oggetto: antropologia della Sardegna (The revolt of the object: an anthropology of Sardinia) is partly about the effects of technology on pastoral culture. (The book is online.)

We will know the digital culture partly by what it is not, and UNESCO’s Intangible Cultural Heritage are a bureaucratic process for defining that which is oral, practices, and local.

Tupi or not Tupi: the Cannibal Manifesto

At a Global Dialogue meeting Clarissa introduced me us to Oswaldo de Andrade’s Cannibal Manifesto. This is one of those rare documents we should all read. The Manifesto Antropófago dates from 1928 and celebrates Brazilian remediation (such a stuffy word compared to “cannibalism”) of other literatures. The third line, which is in English in the orginal, captures the idea:

Tupi or not tupi that is the question.

The Tupi were an indigenous people of Brazil who were supposed to have ritually eaten their enemies. Not to belabor the point, but the joke eats Shakespeare and English into a modernist manifesto simultaneously rejects Western patterns. The manifesto starts with:

Only Cannibalism unites us. Socially. Economically. Philosophically.

The unique law of the world. The disguised expression of all individualisms, all collectivisms. Of all religions. Of all peace treaties.

It could be the law of blogging that eats the web or the law of social media that eat their versions. Remediation with teeth.