centerNet and Google Book Search

centerNet met with a representative from Google Book Search, Jon Orwant, about how Google could support the humanities. I believe there are four levels of collaboration.

  • Content Curation Interface: We could partner to make possible the careful cleaning and encoding of the books scanned. In most cases the quality of the OCRed text is still poor. It would be nice to have a social layer that allowed people to sign out texts voluntarily to clean them out. We could also help with the selection of editions that are scanned.
  • Collections Research Interface: Google could make it possible to build tools that let users create research study collections that are subsets of Google Books that can be studied. For this we need access to an API so research portals can access collections not just individual texts. Google will want assurance that those who have access don’t abuse it.
  • Social Research Tools Interface: We need a way to run tools against texts and collections. We need an API so that tools can be plugged in that can then access texts and collections. Again there is an issue of access. Perhaps Open Social could become a standard for tool plug-ins.
  • Republication Interface: We need a way to be able create study sites for research groups or courses that make some subset of texts and tools available for a specific purpose.

In all these cases it is clear that Google doesn’t want to read applications, correct lost of texts, or build tools. For that matter none of us know what tools should be written. They see themselves doing smart engineering that creates a platform that enables others who might build layers (research tools, collections portals, and so on) which might be used by others.

John spoke to the centerNet meeting at DH 2009. The motto of Google is to organize the world’s information and make it accessible and useful. The crawl, index, and search the web. One can index and search the world’s books, but it is hard to crawl books (or newpapers or movies.)

There are about 120 millions works in the world and 165 million manifestations. They have an agreement in principle with the publishers that has still not been ruled on. (I think I have that right.) If it is approved in court then Google will be able to some cool things:

  • Authors/publishers will be able to opt in or out.
  • If authors/publishers opt in then Google could sell their book if they are still under copyright. They have algorithmic pricing to figure out what to charge.
  • They could give universities access to the full text of collections of out of date works for a license.
  • They could create a terminal at every library that has every book that is out of copyright.
  • They could create a “research corpus” that could be used released for experimentation under a creative commons license. This could be used in contests like T-REX.

John gave some fascinating examples of things his intern has been doing from within the firewall.

Tools for Data-Driven Scholarship » Final Report

I’ve been meaning to blog about the Final Report of the Tools for Data-Driven Scholarship Workshop. This workshop was organized by the Center for History and New Media at George Mason and the Maryland Institute for Technology and the Humanities in October of 2008 and they have put up the final report with a number of sensible recommendations. The report summarizes the issues around tool development, the need for reward systems, and it discussed the idea of an “invisible college” of scholars/tool developers who would exchange ideas and support. They distilled the problems down to:

1. Tools need to work better with other tools.

2. Tools need to connect better with content and use that content in a more robust way.

3. Tools need better mechanisms for being found by the scholars who need them.  They are not currently finding their audience(s).

They acknowledge that “There may be intellectual and even practical value in reinvention-in ‘recreating the wheel.’” This is a tack we need to take seriously since tool development in the humanities has been going on since the 70s (or earlier if you count Busa’s work). Perhaps the reinvention in the humanities is like reinterpretation – a sign of life not a problem.

Prezi – The zooming presentation editor

Screenshot of Prezi

Screenshot of Prezi

Prezi – The zooming presentation editor is a neat PowerPoint alternative. You build a presentation that is one large map, then you script a tour that zooms in and out. This avoids the problem Tufte points out of fragmented discourse. Users can zoom in and out. Prezi is also presented as a service where you craft and store the presentations online rather than through local software.

Springer Exemplar: Search Results

Screenshot of Exemplar

Screenshot of Exemplar

Springer has an interesting tool that lets you search for a pattern and see its distribution. When you search for a term it allows you to see distribution over time in the upper left, then distribution over disciplinary categories. It also shows a KWIC (Keyword in Context.) See Springer Exemplar: Search Results for Interactivity. The design is clean and easy to explore, but the content seems to be only recent materials in Springer journals.