Next Steps for E-Science and the Textual Humanities

D-Lib Magazine has a report on next steps for high performance computing (or as they call it in the UK, “e-science”) and the humanities, Next Steps for E-Science, the Textual Humanities and VREs. The report summarizes four presentations on what is next. Some quotes and reactions,

The crucial point they made was that digital libraries are far more than simple digital surrogates of existing conventional libraries. They are, or at least have the potential to be, complex Virtual Research Environments (VREs), but researchers and e-infrastructure providers in the humanities lack the resources to realize this full potential.

I would call this the cyberinfrastructure step, but I’m not sure it will be libraries that lead. Nor am I sure about the “virtual” in research environments. Space matters and real space is so much more high-bandwidth than the virtual. In fact, subsequent papers made something like this point about the shape of the environment to come.

Loretta Auvil form the NCSA is summarized to the effect that Software Environment for the Advancement of Scholarly Research (SEASR) is,

API-driven approach enables analyses run by text mining tools, such as NoraVis (http://www.noraproject.org/description.php) and Featurelens (http://www.cs.umd.edu/hcil/textvis/featurelens/) to be published to web services. This is critical: a VRE that is based on digital library infrastructure will have to include not just text, but software tools that allow users to analyse, retrieve (elements of) and search those texts in ever more sophisticated ways. This requires formal, documented and sharable workflows, and mirrors needs identified in the hard science communities, which are being met by initiatives such as the myExperiment project (http://www.myexperiment.org). A key priority of this project is to implement formal, yet sharable, workflows across different research domains.

While I agree, of course, on the need for tools, I’m not sure it follows that this “requires” us to be able to share workflows. Our data from TAPoR is that it is the simple environment, TAPoRware, that is being used most, not the portal, though simple tools may be a way in to VREs. I’m guessing that the idea of workflows is more of a hypothesis of what will enable the rapid development of domain specific research utilities (where a utility does a task of the domain, while a tool does something more primitive.) Workflows could turn out to be perceived of as domain-specific composite tools rather than flows just as most “primitive” tools have some flow within them. What may happen is that libraries and centres hire programmers to develop workflows for particular teams in consultation with researchers for specific resources, and this is the promise of SEASR. When it crosses the Rubicon of reality it will provide support units a powerful way to rapidly deploying sophisticated research environments. But if it is programmers who do this, will they want a flow model application development environment or default back to something familiar like Java. (What is the research on the success of visual programming environments?)

Boncheva is reported as presenting the Generic Architecture for Text Engineering (GATE).

A key theme of the workshop was the well documented need researchers have to be able to annotate the texts upon which they are working: this is crucial to the research process. The Semantic Annotation Factory Environment (SAFE) by GATE will help annotators, language engineers and curators to deal with the (often tedious) work of SA, as it adds information extraction tools and other means to the annotation environment that make at least parts of the annotation process work automatically. This is known as a ‘factory’, as it will not completely substitute the manual annotation process, but rather complement it with the work of robots that help with the information extraction.

The alternative to the tool model of what humanists need is the annotation environment. John Bradley has been pursuing a version of this with Pliny. It is premised on the view that humanists want to closely markup, annotate, and manipulate smaller collections of texts as they read. Tools have a place, but within a reading environment. GATE is doing something a little different – they are trying to semi-automate linguistic annotation, but their tools could be used in a more exploratory environment.

What I like about this report is we see the three complementary and achievable visions of the next steps in digital humanities:

  • The development of cyberinfrastructure building on the library, but also digital humanities centres.
  • The development of application development frameworks that can create domain-specific interfaces for research that takes advantage of large-scale resources.
  • The development of reading and annotation tools that work with and enhance electronic texts.

I think there is fourth agenda item we need to consider, which is how we will enable reflection on and preservation of the work of the last 40 years. Willard McCarty has asked how we will write the history of humanities computing and I don’t think he means a list of people and dates. I think he means how we will develop from a start-up and unreflective culture to one that one that tries to understand itself in change. That means we need to start documenting and preserving what Julia Flanders has called the craft projects of the first generations which prepared the way for these large scale visions.

Toy Chest (Online or Downloadable Tools for Building Projects)

Alan Liu and others have set up a Knowledge Base for the Department of English at UCSB which includes a neat Toy Chest (Online or Downloadable Tools for Building Projects) for students. The idea is to collect free or very cheap tools students can use and they have done a nice job documenting things.

The idea of a departmental knowledge base is also a good one. I assume the idea is that this can be an informal place for public knowledge faculty, staff and students gather.

Extjs

Javascript frameworks have been with us for a while now and anyone developing standards-based web interfaces has probably learned to love one or another of them. Beyond abstracting common chores like javascript native object prototype improvements, DOM tinkering, Ajax object management and so on (which, I daresay, is pretty darn appealing on its own), they shield the developer from the maddening caprice of cross-platform/cross-browser compatibility issues.

My favourite from the growing list of libraries has recently been mootools, I like it for it’s consistency, simplicity, and style. Lately, however, I have been checking out Extjs. Created by Jack Slocum, Extjs was originally an extension to the Yahoo UI, but has been a stand-alone library since version 1.1. Ext 2.0 was recently released.

While Ext does pretty much everything the other libraries do, a bit of poking around reveals an astonishing wealth of features. Ext is designed as an application development library, where most of its competitors are better described as utility libraries. Though Ext features a bunch of impressive application management classes like internal namespace management and garbage collection, as well as a vast range of function, object, and DOM extension classes, what draws most developers to Ext is its collection of exquisite controls, most popular of which is probably Ext’s beautiful data grids.

Ext panels, tree and grid

Ext’s grids, trees, form panels and window layout panels all have themable styles included so they look great out of the box. The control classes also feature powerful configuration parameters, like XHR URL fields (where applicable), data store record object reference, data field formatting and so on.

For casual developers, getting past “Hello world” with Ext is intimidating and it requires some persistence to get comfortable with, but the payoff is a serious arsenal of high-performance development tools for producing powerful, stable, good looking web applications. The Extjs site has numerous tutorials and excellent API documentation. Check out Jack’s description of building an Ext app using Aptana, Adobe AIR and Google Gears.

Tech and the Humanities: The MLA at Chicago

Right after Christmas I was involved in two events at the MLA. I organized and presided over a session on Open Digital Communities which was nicely written up by the Chronicle of Higher Education, Tech and the Humanities: A Report from the Front Lines – Chronicle.com.

I also participated in a newish format for the MLA – what is now being called a digital roundtable on Textual Visualization organized by Maureen Jameson where I showed visualization tools available through TAPoRware and the TAPoR portal.

Spreading the load – volunteer computing

Martin Mueller and James Chartrand both pointed me to an article in the Economist on volunteer computing, Spreading the load. The article nicely covers a number of projects that enlist volunteers over the web, like those I noted in Tagging Games. They don’t really distinguish the projects like BOINC that enlist volunteer processing from the ones like BOSSA (and the Mechanical Turk) that enlist volunteer human contributions, and perhaps there isn’t such a difference. It is always a human volunteering some combination of their time and computing to a larger project.

What Martin has suggested is that we think about how humanities computing projects might be enabled by distributed skill support. Could we enlist volunteer taggers for electronic texts with the right set up? Would we need to make it a game like ESP to check tagging choices against each other? The only example I can think of in the humanities is the Suda On Line (SOL), a project where volunteers are translating the Suda, “Byzantine encyclopedia known as the Suda, a 10th century CE compilation of material on ancient literature, history, and biography.” (From the SOL About page.) Can that infrastructure be generalized to a translating and enrichment engine for language, literature, history and philosophy?

The End of the Netscape Era

Stories like this one from CNET, Is this the end of Netscape?, are saying that AOL won’t support Netscape past February. See BBC News and Tom Drapeau’s blog entry announcing this.

Netscape Navigator was created by Marc Andreessen (after he co-authored Mosaic at the NCSA) and released in 1994. When Netscape went public in 1995 marks the beginning of the dot-com bubble. AOL bought Netscape in 1998 for billions of dollars. What were they thinking?

Tagging Games

ESP Help ScreenPeter O pointed me to a new phenomenon on the web that I’ve been meaning to blog for a while. That is the leveraging of human players for tasks that can’t be easily automated. Perhaps the best example is the ESP Game. The online game is described in “How to Play”:

The ESP Game is a two-player game. Each time you play you are randomly paired with another player whose identity you don’t know. You can’t communicate with your partner, and the only thing you have in common with them is that you can both see the same image. The goal is to guess what your partner is typing on each image. Once you both type the same word(s), you get a new image.

The game (and its Google Image Labeler spin-off) leverages fun to get image tagging done. Remember when we thought computer image recognition would do that? Now we are using online games to make it fun for humans to do what we do best – instant complex judgements about the visual. If you get enough people playing we could make serious inroads into tagging the visual web.

What is impressive about ESP is what a simple and powerful idea it is and this is Luis von Ahn‘s second sweet contribution, the first one being CAPTCHA and reCAPTCHA.

While it isn’t quite as clean, a generalized version of the idea of people power is Amazon’s Mechanical Turk. The idea is that people can,

Complete simple tasks that people do better than computers. And, get paid for it. Learn more.

Choose from thousands of tasks, control when you work, and decide how much you earn.

Developers can register tasks, people can work on HITs (Human Intelligence Tasks) and get paid for the work, and Amazon can become the largest labour market for small tasks.

netzspannung.org | Archive | Archive Interfaces

Image of Semantic Map

netzspannung.org is a German new media group with an archive of “media art, projects from IT research, and lectures on media theory as well as on aesthetics and art history.” They have a number of interfaces to this archive, for an explanation see, Archive Interfaces. The most interesting is the Java Semantic Map (see picture above.)

netzspannung.org is an Internet platform for artistic production, media projects, and intermedia research. As an interface between media art, media technology and society, it functions as an information pool for artists, designers, computer scientists and cultural scientists. Headed by » Monika Fleischmann and » Wolfgang Strauss, at the » MARS Exploratory Media Lab, interdisciplinary teams of architects, artists, designers, computer scientists, art and media scientists are developing and producing tools and interfaces, artistic projects and events at the interface between art and research. All developments and productions are realised in the context of national and international projects.

See The Semantic Map Interface for more on their Java Web Start archive browser.

Image of Semantic Map