Tasman: Literary Data Processing

I came across a 1957 article by an IBM scientist, P. Tasman on the methods used in Roberto Busa’s Index Thomisticus project, with the title Literary Data Processing (IBM Journal of Research and Development, 1(3): 249-256.) The article, which is in the third issue of the IBM Journal of Research and Development, has an illustration of how they used punch cards for this project.

Image of Punch Card

While the reproduction is poor, you can read the things encoded on the card for each word:

  • Location in text
  • Special reference mark
  • Word
  • Number of word in text
  • First letter of preceding word
  • First letter of following word
  • Form card number
  • Entry card number

At the end Tasman speculates on how these methods developed on the project could be used in other areas:

Apart from literary analysis, it appears that other areas of documentation such as legal, chemical, medical, scientific, and engineering information are now susceptible to the methods evolved. It is evident, of course, that the transcription of the documents in these other fields necessitates special sets of ground rules and codes in order to provide for information retrieval, and the results will depend entirely upon the degree and refinement of coding and the variety of cross referencing desired.

The indexing and coding techniques developed by this method offer a comparatively fast method of literature searching, and it appears that the machine-searching application may initiate a new era of language engineering. It should certainly lead to improved and more sophisticated techniques for use in libraries, chemical documentation, and abstract preparation, as well as in literary analysis.

Busa’s project may have been more than just the first humanities computing project. It seems to be one of the first projects to use computers in handling textual information and a project that showed the possibilities for searching any sort of literature. I should note that in the issue after the one in which Tasman’s article appears you have an article by H. P. Luhn (developer of the KWIC) on A Statistical Approach to Mechnized Encoding and Searching of Literary Information. (IBM Journal of Research and Development 1(4): 309-317.) Luhn specifically mentions the Tasman article and the concording methods developed as being useful to the larger statistical text mining that he proposes. For IBM researchers Busa’s project was an important first experiment handling unstructured text.

I learned about the Tasman article in a journal paper deposited by Thomas Nelson Winter on Roberto Busa, S.J., and the Invention of the Machine-Generated Concordance. The paper gives an excellent account of Busa’s project and its significance to concording. Well worth the read!

Digital Humanities Talks at the 2013 MLA Convention

The ACH has put together a useful Guide to Digital-Humanities Talks at the 2013 MLA Convention. I will presenting at various events including:

Short Guide To Evaluation Of Digital Work

The Journal of Digital Humanities has republished my Short Guide to Evaluation of Digital Work as part of an issue on Closing the Evaluation Gap (Vol. 1, No. 4). I first wrote the piece for my wiki and you can find the old version here. It is far more useful bundled with the other articles in this issue od JDH.

The JDH is a welcome experiment in peer-reviewed republication. One thing they do is to select content that has been published in other forms (blogs, online essays and so on) and then edit it for recombination in a thematic issue. The JDH builds on the neat Digital Humanities Now that showcases neat stuff on the web. Both are projects of the Roy Rosenzweig Center for History and New Media. The CHNM deserved credit for thinking through what we can do with the openness of the web.

Conference Report of DH 2012

I’m at Digital Humanities 2012 in Hamburg. I’m writing a conference report on philosophi.ca. The conference started with a keynote by Claudine Moulin that touched on research infrastructure. Moulin was the lead author of the European Science Foundation report on Research Infrastructure in the Humanities (link to my entry on this). She talked about the need for a cultural history of research infrastructure (which the report actually provides.) The humanities should not just import ideas and stories about infrastructure. We should use this infrastructure turn to help us understand the types of infrastructure we already have; we should think about the place of infrastructure in the humanities as humanists.

Pundit: A novel semantic web annotation tool

Susan pointed me to Pundit: A novel semantic web annotation tool. Pundit (which has a great domain name “thepund.it”) is an annotation tool that lets people create and share annotations on web materials. The annotations are triples that can be saved and linked into DBpedia and so on. I’m not sure I understand how it works entirely, but the demo is impressive. It could be the killer-app of semantic web technologies for the digital humanities.

Digital Infrastructure Summit 2012

A couple of weeks ago I gave a talk at Digital Infrastructure Summit 2012 which was hosted by the Canadian University Council of Chief Information Officers (CUCCIO). This short conference was very different from any other I’ve been at. CUCCIO, by its nature, is a group of people (university CIOs) who are used to doing things. They seemed committed to defining a common research infrastructure for Canadian universities and trying to prototype it. It seemed all the right people were there to start moving in the same direction.

For this talk I prepared a set of questions for auditing whether a university has good support for digital research in the humanities. See Check IT Out!. The idea is that anyone from a researcher to an administrator can use these questions to check out the IT support for humanists.

My conference notes are here.

Globalization Compendium Archive

I have been working for a while on archiving the Globalization Compendium which I worked on. Yesterday I got it archived in two Institutional Repositories:

In both cases there is a Zip of a BagIt bag with the XML files, code and other documentation from the site. My first major deposit.

Digital Humanities in Italy: Tito Orlandi

I just got a complementary copy of La macchina nel tempo: Studi dei informatica umanistica in onore di Tito Orlandi (The Time Machine: Studies in humanities computing in honour of Tito Orlandi) which I blogged about before. This got me wondering how much of Prof. Tito Orlandi’s writings are available online and what his legacy is. It turns out that Orlandi has put together a list of his publications with links to online versions where possible. There are even some in English like the excellent Is Humanities Computing a Discipline?

But how might one summarize Orlandi’s contribution? In his prefatory “Controcanto,” one of the editors of The Time Machine, Domenico Fiormonte, writes about first encountering Orlandi in a bunker where Fiormonte then spent a summer. During that summer he learned 3 things:

  1. Everything that in the humanities is taken for granted (starting with the concept of text) has to be formalized in informatics.
  2. The passage from analogue to digital is process of profound redefinition for the “cultural object”.
  3. Thus, every act of encoding (or digital representation) presupposes (or forces us into) a hermeneutical act. (p. VI, my translation)

These three lessons seem about as good a starting place for the digital humanities as any. They also suggest some of what Tito Orlandi was interested in, namely formalization, redefinition, and interpretation. But surveying Orlandi’s writings, using the list of digital humanities publications from his personal site, you can see other themes. He believed that we needed to develop the theoretical foundations of humanities computing and that we should do that from the mathematical model of the computer, not how it works practically. (See Informatica, Formalizzazione e Discipline Umanistiche (in Italian.)) He believed that would help us understand how one can model culture on a computer. He discussed the importance of modelling before Willard McCarty did in Humanities Computing – something that should be recognized out of fairness to the pioneering work of Italian digital humanists since Busa.

Reading Orlandi and about Orlandi I also sense an impatience with those that follow him. This is what he writes in an unpublished talk given in London in 2000. He is talking about discussions by other scholars on the digital humanities.

I feel a sense of inadequateness, even disorder, in the overall change as presented by the same scholars. In fact, when they proceed to propose a definition of humanities computing, they tend to consider the products of computation, be they hardware (the Net) or software (applications like concordance programs or statistical packages), rather than the first principles of computing.

Orlandi wanted to ground the digital humanities in mathematics – a language common to informatics, science and potentially the digital humanities. That the digital humanities wandered off into hypertext, new media and so on seems to have annoyed him. He was also irritated that ideas he had been teaching and writing about for years were being ignored in the English-speaking world. Take a look at The Scholarly Environment of Humanities Computing: A Reaction to Willard McCarty’s talk on The computational transformation of the humanities. This web page discusses an outburst of his at a paper by McCarty with what Orlandi felt were ideas he had been discussing for a decade at least. It is instructive how he sets aside his pride to get at the issues that matter. He might be irritated, but he also wants to use this to reflect on more important issues.

Perilli and Fiormonte have done a great job bringing together a festschrift in honour of Orlandi. The Time Machine isn’t really about Orlandi’s thought so much as about his legacy in Italy. What we need now is for his foundational works to be translated and a retrospective interpretation of his contributions.

Collaborative Research in the Digital Humanities by Marilyn Deegan and Willard McCarty

A new digital humanities collection focusing on collaboration, Collaborative Research in the Digital Humanities, has been published by Ashgate. The collection is edited by Marilyn Deegan and Willard McCarty and was developed in honour of Harold Short who retired a few years ago from King’s College London where he set up the Humanities Computing Centre (now called the Department of Digital Humanities).

I contributed a chapter on crowdsourcing entitled, “Crowdsourcing the humanities: social research and collaboration”.