HC Summit, Illinois: Trip Report

This weekend I have been at a summit around humanities computing at the University of Illinois that was organized by John Unsworth, Orville Vernon Burton and the folks at the NCSA.
One difference between HC in the US on the one hand and HC in Canada and Europe on the other, is that in the US there hasn’t been a national organization that could help organize the various centres for the purpose of presenting ideas to national funding agencies. (Perhaps the ACH once functioned that way, but now it is international.) One set of questions we discussed was the need, the scope, and the activities a national (US) gathering. I also got a quick tour through a number of NCSA initiatives of interest to humanities computing. The following are some of those of interest.

VIAS is crawler technology that can track and process information in a domain like visualization. (To try it see VIAS – type in a word related to visualization and then look at the results. Note especially how the system can extract stuff from result pages.) Can we imagine crawler agents with intelligent summarization that could track issues of interest to humanists and maintain a running summary of the topic. Such personal research agents would take significant resources, but could create a new type of text – what I call JITTeR (Just In Time Text Research) – for analysis.
Tom Finholt presented the Sakai Project a major initiative to produce a “community source” e-learning portal to replace Blackboard or Web CT. It is built on uPortal and other projects. If we believe in erasing the difference between learning and research this is an environment worth track, supporting, and writing services for. Tom also presented some lessons from NEESgrid :: Virtual Collaboratory for Earthquake Engineering:

  • You must have a user interface – you need an organizing metaphor
  • Developers must walk in the shoes of users
  • Users know a lot about CS – listen to them
  • The real work begins after software is delivered
  • Data models, data repositories, and long term curation are difficult socio-technical problems

Randy Butler talked about Grid Computing in a way that made sense to me. He quoted Dan Reed to the effect that “Grids are good for bringing together things that necessarily cannot be collocated.” In other words grids need not be high performance projects – they could be other types of gatherings of processing services. What then is the difference between a grid and a portal with distributed web services? He summarized grid services into this list:

  • Information Discovery – what where
  • Monitoring
  • Authentication and Authorization – who can do what
  • Job management
  • Scheduling
  • Data Management – where do I get data and what do I do with it
  • Collaboration

The issue for grids is not just technical but social. For me a major issue is privacy of data in an international context – can US grids assure international users that their data/use will not become meat for security data mining? Does it matter to anyone?
Stephen Downie gave a very clear picture of music information and retrieval (music-ir.org, virtual home of music information retrieval research”). Their issues parallel those of text retrieval research and it is instructive to see how they see the domain. I wish I could summarize text representation and analysis as well. Stephen’s talk was following by an intriguing demo of D2K being used on music streams to cluster by genre. I wish I understood fourier transforms!
Duane Searsmith (and Loretta Auvil) presented D2K (Data 2 Knowledge) and T2K (ThemeWeaver), a project of the NCSA – Automated Learning Group. This is a layered project that has developed a visual programming environment for data mining applications. Can this be adapted to mining encoded text? That’s what John Unsworth and Stephen Ramsay are trying to do.

As an aside – we discussed later the coming importance of “W” (wisdom) in computerse. The history of claims about computing can be summarized by the shift from data to information to knowledge with wisdom next. Its time to start trademarking wisdom IT phrases.

We got demos of Walls and a CAVE from Alan Craig. Caves seem to be a dead-end as they are difficult to scale, but Walls could become accessible. (In fact we saw a Wall in lobby of the new CS building – it wasn’t working, of course.)
Duane Gran showed the Ivanhoe Game software, which is very cool and uses a TextArc inspired interface in a way that I think improves on the circularity of TextArc. The interface uses the inside and outside to represent the discourse field (outside of circle) and users (weighted points inside).
John Cuadrado showed latent semantic indexing and analysis tools developed for/with NITLE. Kevan Kiernan showed the editing tools being developed within/for the ECLIPSE environment – these are the best I’ve seen, and Kim Tryka talked about GIS in history at UVa. (There were other projects, but my notes are not as good as they should be.)
In short we could see an interesting encounter between high-performance computing (data mining, grids, large portals) and humanities computing. The two communities could come together around tools, collaborative grids, and political need post dot-boom.