NiCHE: The Programming Historian

NiCHE logoNiCHE (Network in Canadian History & Environment) has a useful wiki called The Programming Historian by William Turkel and Alan MacEachern. The wiki is a “tutorial-style introduction to programming for practicing historians” but it is could also be used by textual scholars who want to be able to program their own tools. It takes you through learning and using Python for text processing for things like word frequencies and KWICs. It reminds me of Susan Hockey’s book, Snobol Programming for the Humanities. (Oxford: Oxford University Press, 1985) which I loved at the time, even if I couldn’t find a Snobol interpreter for the Mac.

We need more of such books/wikis.

Conference Report: Tools For Data-Driven Scholarship

I just got back from the Tools For Data-Driven Scholarship meeting organized by MITH and the Centre for New Media and History. This meeting was funded by the NEH, NSF, and the IMLS and brought together tool developers, content providers (like museums and public libraries), and funders (NEH, JISC, Mellon, NSF and IMLS.) The goal was to imagine initiative(s) that could advance humanities tool development and connect tools better with audiences. I have written a Conference Report with my notes on the meeting. One of the interesting questions asked by a funder was “What do the developers really want?” It was unclear that developers really wanted some of the proposed solutions like a directory of tools or code repository. Three things the breakout group I was in came up with was:

  • Recognition, credit and rewards for tool development – mechanisms to get academic credit for tool development. This could take the form of tool review, competitions, prizes or just citation when our tool is used. In other words we want attention.
  • Long-term Funding so that tool development can be maintained. A lot of tool development takes place in grants that run out before the tool can really be tested and promoted to the community. In other words we want funding to continue tool development without constantly writing grants.
  • Methods, Recipes, and Training that are documented that bring together tools in the context of humanities research practices. We want others with the outreach and writing skills to weave stories about their use to help introduce tools to others. In other words we want others to do the marketing of our tools.

A bunch of us sitting around after the meeting waiting for a plane had the usual debriefing about such meetings. What do they achieve even if they don’t lead to initiatives. From my perspective these meeting are useful in unexpected ways:

  • You meet unexpected people and hear about tools that you didn’t know about. The social dimension is important to meetings organized by others that bring people together from different walks. I, for example, finally met William Turkle of Digital History Hacks.
  • Reports are generated that can be used to argue for support without quoting yourself. There should be a report from this meeting.
  • Ideas for initiatives are generated that can get started in unexpected ways. Questions emerge that you hadn’t thought of. For example, the question of audience (both for tools and for initiatives) came up over and over.

Fortune of the Day – Fortune Hunting

Visual Collocator

Lisa Young with the support of the Brown University Scholarly Technology Group (STG) has developed a Fortune of the Day – Fortune Hunting interactive art site based on a collection of scanned fortune cookie slips she created. It has elements of a public textuality site like the Dictionary though focused completely on fortunes. The interface is simple and elegant. I believe it has been exhibited recently for the first time. The project uses the TAPoRware Visual Collocator for one of its interfaces.

University Affairs: MLA changes course on web citations

University Affairs has a story by Tim Johnson on the latest MLA Style Manual, titled “MLA changes course on web citations”, where they quote me about the new MLA recommendation that URLs aren’t needed in citations (because they aren’t reliable.) I had a long discussion with Tim – being interviewed when they have talked to other people is a strange way to learn about a subject. In retrospect it would have been more useful to point out the emerging alternatives to URLs, some of which are designed to be more stable. Some that I know of:

  • TinyURL and similar projects let you get a short (“tiny”) URL that redirects to the full location.  A list of such tools is at http://daverohrer.com/15-tinyurl-alternatives-shorten-your-urls/
  • The Digital Object Identifier (DOI®) System allows unique identifiers to be allocated and then has a resolution system to point to a location(s). To quote from their Overview, a DOI “is a name (not a location) for an entity on digital networks. It provides a system for persistent and actionable identification and interoperable exchange of managed information on digital networks.”
  • The WayBack Machine grabs copies of web pages at regular intervals if allowed. You can thus see changes in the document over time.

In short, we don’t have a clear standard that has emerged, but we have alternatives that could provide us with a stable system.

I should add that the point of a citation is not what is in it, but whether it lets you easily find the referenced research so that we can recapitulate the research.

CaSTA 2008: New Directions in Text Analysis

CaSTA 08 LogoI am at the CaSTA 2008 New Directions in Text Analysis conference at the University of Saskatchewan in Saskatoon. The opening keynote by Meg Twycross was a thorough and excellent tour through manuscript digitization and forensic analysis techniques.

My notes are in a conference report (being written as it happens.)

Today is Open Access Day

Open Access Day LogoToday, October 14th, 2008, is Open Access Day which I discovered the University of Alberta library promotes thanks to Erika.

The Canadian libraries supporting OAD are listed on the Open Access Day 2008 wiki. I love the U of Calgary comment, “We’re considering options but will definitely mark the day.” U of Alberta, by contrast has a number of initiatives including a Open Access blog and a We Support Open Access (PDF) poster.

Of particular interest is the SPARC Author’s Addendum which is a form for author’s to fill out to assert their copyright when signing agreements with publishers. It basically adds an addendum to whatever agreement you are signing that asserts that you retain copyright and that you retain the right to reproduce the article for non-commercial purposes. It is a nice little “tool”. Now we need one like that for graduate students when they are signing the Theses Canada license. What would it assert?

University Libraries in Google Project to Offer Backup Digital Library – Chronicle.com

Hathi Slogan and LogoFrom Bethany I discovered this story by the Chronicle of Higher Education about the HathiTrust, titled University Libraries in Google Project to Offer Backup Digital Library (Jeffrey R. Young, Oct. 13, 2008). “Hathi” is the hindi word for elephant suggesting memory and size. Here is a quote from the HathiTrust site:

As a digital repository for the nation’s great research libraries, HathiTrust (pronounced hah-TEE) brings together the immense collections of partner institutions.

HathiTrust was conceived as a collaboration of the thirteen universities of the Committee on Institutional Cooperation and the University of California system to establish a repository for these universities to archive and share their digitized collections. Partnership is open to all who share this grand vision.

The repository, among other things, will pool the volumes digitized by Google in collaboration with the universities so there is a backup should Google lose interest. Large-scale search is being studied now and they expect in November to have preview version available.

A Companion to Digital Literary Studies

Cover of Companion The A Companion to Digital Literary Studies edited by Ray Siemens and Susan Schreibman is available online in full text. This is tremendous resource with too many excellent contributions to list individually. Chapters go from Reading on the Screen by Christian Vandendorpe and Algorithmic Criticism by Stephen Ramsay.

There is a good Annotated Overview of Selected Electronic Resources by Tanya Clement and Gretchen Gueguen with links to projects like TAPoR.