Replication as a way of knowing in the digital humanities

Poster for Replication Talk
Poster for Replication Talk

At the end of April I gave a talk at the University of Würzburg on Replication as a way of knowing in the digital humanities. This was sponsored by the Dr. Fotis Jannidis who holds the position of Chair of computer philology and modern German literature there. He and others have built a digital humanities program and interesting research agenda around text mining and German literature. The talk tried out some new ideas Stéfan Sinclair and I are working on. The abstract read:

Much new knowledge in the digital humanities comes from the practices of encoding and programming not through discourse. These practices can be considered forms of modelling in the active sense of making by modelling or, as I like to call them, practices of thinking-through. Alas, these practices and the associated ways of knowing are not captured or communicated very well through the usual academic forms of publication which come out of discursive knowledge traditions. In this talk I will argue for “replication” as a way of thinking-through the making of code. I will give examples and conclude by arguing that such thinking-through replication is critical to the digital literacy needed in the age of big data and algorithms.

The Rise and Fall Tool-Related Topics in CHum

Tool Network Image
Tool network with COCOA selected

I just found out that a paper we gave in 2014 was just published. See The Rise and Fall Tool-Related Topics in CHum. Here is the abstract:

What can we learn from the discourse around text tools? More than might be expected. The development of text analysis tools has been a feature of computing in the humanities since IBM supported Father Busa’s production of the Index Thomisticus (Tasman 1957). Despite the importance of tools in the digital humanities (DH), few have looked at the discourse around tool development to understand how the research agenda changed over the years. Recognizing the need for such an investigation a corpus of articles from the entire run of Computers and the Humanities (CHum) was analyzed using both distant and close reading techniques. By analyzing this corpus using traditional category assignments alongside topic modelling and statistical analysis we are able to gain insight into how the digital humanities shaped itself and grew as a discipline in what can be considered its “middle years,” from when the field professionalized (through the development of journals like CHum) to when it changed its name to “digital humanities.” The initial results (Simpson et al. 2013a; Simpson et al. 2013b), are at once informative and surprising, showing evidence of the maturation of the discipline and hinting at moments of change in editorial policy and the rise of the Internet as a new forum for delivering tools and information about them.

Literature Measured

I finally got around to reading the latest Pamphlets of the Stanford Literary Lab. This pamphlet, 12. Literature Measured (PDF) written by Franco Moretti, is a reflection on the Lab’s research practices and why they chose to publish pamphlets. It is apparently the introduction to a French edition of the pamphlets. The pamphlet makes some important points about their work and the digital humanities in general.

Images come  first, in our pamphlets, because – by visualizing empirical findings – they constitute the specific object of study of computational criticism; they are our “text”; the counterpart to what a well-defined excerpt is to close reading. (p. 3)

I take this to mean that the image shows the empirical findings or the model drawn from the data. That model is studied through the visualization. The visualization is not an illustration or supplement.

By frustrating our expectations, failed experiments “estrange” our natural habits of thought, offering us a chance to transform them. (p. 4)

The pamphlet has a good section on failure and how that is not just a rhetorical ploy, but important to research. I would add that only certain types of failure are so. There are dumb failures too. He then moves on to the question of successes in the digital humanities and ends with an interesting reflection on  how the digital humanities and Marxist criticism don’t seem to have much to do with each other.

But he (Bordieu) also stands for something less obvious, and rather perplexing: the near-absence from digital humanities, and from our own work as well, of that other sociological approach that is Marxist criticism (Raymond Williams, in “A Quantitative Literary History”, being the lone exception). This disjunction – perfectly mutual, as the indiference of Marxist criticism is only shaken by its occasional salvo against digital humanities as an accessory to the corporate attack on the university – is puzzling, considering the vast social horizon which digital archives could open to historical materialism, and the critical depth which the latter could inject into the “programming imagination”. It’s a strange state of a airs; and it’s not clear what, if anything, may eventually change it. For now, let’s just acknowledge that this is how things stand; and that – for the present writer – something needs to be done. It would be nice if, one day, big data could lead us back to big questions. (p. 7)

The Index Thomisticus as Project

This is a story from early in the technological revolution, when the application was out searching for the hardware, from a time before the Internet, a time before the PC, before the chip, before the mainframe. From a time even before programming itself. (Winter 1999, 3)


Father Busa is rightly honoured as one of the first humanists to use computing for a humanities research task. He is considered the founder of humanities computing for his innovative application of information technology and for the considerable influence of his project and methods, not to mention his generosity to others. He did not only work out how use the information technology of the late 1940s and 1950s, but he pioneered a relationship with IBM around language engineering and with their support generously shared his knowledge widely. Ironically, while we have all heard his name and the origin story of his research into presence in Aquinas, we know relatively little about what actually occupied his time – the planning and implementation of what was for its time one of the major research computing projects, the Index Thomsticus.

This blog essay is an attempt to outline some of the features of the Index Thomisticus as a large-scale information technology project as a way of opening a discussion on the historiography of computing in the humanities. This essay follows from a two-day visit to the Busa Archives at the Università Cattolica del Sacro Cuore. This visit was made possible by Marco Carlo Passarotti who directs the “Index Thomisticus” Treebank project in CIRCSE (Centro Interdisciplinare di Ricerche per la Computerizzazione dei Segni dell’Espressione – Interdisciplinary Centre for Research into the Computerization of Expressive Signs) which evolved out of GIRCSE (Gruppo not Centro – or Group not Centre), the group that Father Busa helped form in the 1980s. Passarotti not only introduced me to the archives, he also helped correct this blog as he is himself an archive of stories and details. Growing up in Gallarate, his family knew Busa, he studied under Busa, he took over the project, and he is one of the few who can read Busa’s handwriting.


Original GIRCSE Plaque kept by Passarotti

Continue reading The Index Thomisticus as Project

The Digital Humanist


On Thursday I was part of a conference here in Verona (see my conference notes) that celebrated the seminar I led at the University of Verona and the English publication of The Digital Humanist by Domenico Fiormonte, Francesca Tomasi, and Teresa Numerico (with a Preface by me). This is the English adaptation/translation of their 2010 Italian book which has finally come out in English. Here is the edited text of my presentation. (Thanks to Domenico for helping me with the Italian!)

Dear Friends and Colleagues,

Today we are here to celebrate the end of a laboratory on digital humanities and a beginning with the publication of the The Digital Humanist: A Critical Inquiry by Domenico Fiormonte, Teresa Numerico and Francesca Tomasi.

Oggi si celebra la fine questa laboratorio che abbiamo creato insieme e una la publicazione in Inglese del libro L’umanista digitale che è stato pubblicato per la prima volta in Italiano nel 2010 e poi aggiornato e tradotto in inglese da Desmond Schmidt e Christopher Ferguson.

The English publication of this book is important to the book because part of what makes it “A Critical Inquiry” is that it questions the universality of English. I use the word universality in two senses, both of which are to be questioned:

First, that there is an assumption that we need a universal language or metalanguage – a dream of philosophers, a dream that can be said to have led to the idea of a universal machine or computer,

E secondo, uso la parola universale per il modo in cui l’Inglese invade l’informatica, dai motori di recerca ai linguaggi di programmazione, come abbiamo sentito oggi nelle presentazioni degli studenti.

Il filosofo della scienza e della tecnologia, Langdon Winner, ha scritto un bel testo dal titolo: “Do Artifacts Have Politics?” In questo articolo Winner cerca di navigare tra due posizioni opposte – quella del determinismo tecnologico che sostiene che ogni messaggio è determinato dal tecnologia–

And, he argues that neither can technologies be said to be neutral – the argument of so many technologists that relieves them of the need to take responsibility for what they develop.

Instead Winner argues that we have to attend to the artefacts themselves – some bring baggage or structure experience and some less so.

One of the great contributions of this book is just such a critical attending to the digital artefacts themselves – especially those like search engines or electronic texts that are important to us in the humanities.

Questo libro, invece di parlare dell’informatica in generale – parla delle tecnologie che usiamo come umanisti e ci aiuta a capire l’importanza del nostro lavoro – infatti direi che ci aiuta capire come dobbiamo assumerci la responsabilità per le nostre technologie.

As Heidegger and others point out, sometimes the hardest thing to do is to notice technologies that we use every day like the glasses on the end of our nose. We need to find ways back to noticing the systems of ready-to-hand in which we navigate our desires and dreams. That includes for Heidegger also noticing the way language itself structures our thinking.

But how can we do that? How can we attend? What practices can we draw on from the humanities?

Lev Manovich in an online essay talks about the comedy of breakdowns as an interruption that forces us to notice technology – something that was normal in Russia, but isn’t normal in the West.

Siegfried Zielinski – in Deep Time of the Media proposes an archaeology that pays attention to the failed technologies – the branches that have been left out of the origin myths.

This book provides, I think, three other, uniquely humanities ways into thinking again about technology:

First, it is written from the margins – at least the linguistic margins of an Anglophone discourse of technology (and digital humanities.) It was first written in Italian and draws on an Italian humanities computing tradition. The book reminds us to pay attention to language, so important to the humanities and technology too.

Second, it historicizes the technologies we take for granted – looking, for example, at key figures who imagined our cybernetic future.

Terzo, questo libro non soltanto guarda agli artefatti e ai sistemi in un modo critico, ma guarda anche ai modi in cui noi organizziamo il discorso accademico sull’informatica umanistica – direi che tratta le digital humanities come artefatto umano che deve anche essere criticato, specialmente perché siamo ciechi ai modi nei quali l’organizzazione della disciplina segue la cultura anglo-sassone. The digital humanist ci chiede di criticare come siamo e potremmo essere dei digital humanists. Questa è un questione di ethos – come viviamo con la tecnologia, come ci organizziamo per porre attenzione alla tecnologia

E’ per questo che raccomando questo libro specialmente a voi dotorandi.

For those of you just discovering the digital as a subject for humanities attention I recommend this book – it is a way in for humanists.

Voglio concludere con un commento sulla presentazione dei libri – se un libro e come una neonato – un natio come ne parlava Vico –è anche importante come il libro viene educato insegnato e interpretato.

Remember the lesson of Frankenstein. The tragedy is not that he was made of parts, but that he was abandoned at birth. The same can be said of the digital humanities – a field made of parts.

Questa e la seconda volta che aiuto a presentare questo libro. La prima volta è stata la settimana scorsa a Roma. Direi che addesso sono diventato un presentatore con esperienza nell’ allevamento. Posso annuciare il tour?

As I was just saying in Italian, this is the second time I present this book – and I’ve chosen to do it in two tongues – English and Italian. In this I’m drawing on a Canadian political tradition of bilingual presentations which I have always admired. Such bilingual talks weave two languages to make something that is not a universal language but is free of the particular blindness of a particular language.

My reason for switching is that if we are to avoid the universalizing tendency of technologies of thinking like language we have to habituate ourselves to travel back and forth translating and thinking across. That used to be obvious to the humanities, but we seem to have forgotten that discipline.

Attraversare le lingue è qualcosa che voi Italiani dovete fare per forza – per noi anglo-sassoni è una nuova esperienza – troppo volte aspettiamo che l’atro venga da noi invece di incontrarci a metà strada.

Nel frattempo, The Digital Humanist è un importante tentativo che attraversa Italiano e Inglese per invitarci tutti a dialogare.


Edoardo Ferrarini on the Digital Humanities in Italy

Edoardo Ferrarini gave a talk yesterday on “Lo statuto disciplinare dell’Informatica umanistica” or “The Status of Humanities Informatics” (with a possible pun on status/statute). Ferrarini works in the area of Latin Literature of the Middle Ages and Humanism at the University of Verona. The talk was interesting and important in three ways:

  • First, he gave an Italian history of humanities computing which both looked at what happened (and is happening in Italy) and looked at what could happen given the current regulations around programs. The second part I didn’t quite follow as it assumed a knowledge of the statutes that govern the academy here, but my sense was that they are constrained by national definitions of what is allowed. In particular they are dealing with a changing, but rigid definition of what is allowed in the way of programs.
  • Second, he provided a definition of Humanities Informatics (IU) that drew on a long Italian tradition that we (in the English speaking world) are largely ignorant of. His definition draws on definitional work of Tito Orlandi, though I’m not so sure how closely. More on the definition below.
  • Third, he used this definition as a lens with which to review what IU should be and what it could be in the face of the statutes and status of the field in Italy. He argued for it being an interdisciplinary field available across humanities disciplines.

Continue reading Edoardo Ferrarini on the Digital Humanities in Italy : Digital Humanities Concepts 2015

TU Darmstadt MA LLC Structure

Just left a most delightful conference on Key ideas and concepts of Digital Humanities in Darmstadt, Germany. My conference notes are on : Digital Humanities Concepts 2015. The conference brought together an extraordinary set of speakers who were influential in the field when I entered it. Susan Hockey, Michael Sperberg-McQueen, Nancy Ide, George Landow, Wilhelm Ott and the list goes on. I would be hard pressed to imagine a conference I have been at better able to reflect on the history and ideas of humanities computing. The organizers Andrea Rapp, Michael Sperberg-McQueen, Sabine Bartsch and Michael Bender deserve much more praise than I was able to lavish on them.

Among all the great papers I will mention:

  • Michael Sperberg-McQueen gave a very smart and well argued paper on descriptive markup arguing against its dismissal as enforcing hierarchies.
  • Marco Passarotti talked about the Index Thomisticus (which he directs) and the Busa Archive. He brought some documents including some Gantt charts and early letters. I am definitely going to visit him and the archive in Milan.
  • Fotis Jannidis gave a great paper on topic modelling and its temptations. He has very interesting stuff to say about how the method has been adopted by humanists.
  • Julia Flanders gave a paper on “Looking for Gender in the History of DH” that when published will, I predict, become mandatory reading. She gives us a way forward after what happened at DH 2015. It was a truly wise and humble talk that could go a long way to providing an inclusive way forward.
  • Nancy Ide gave a great overview of the separate trajectories taken by DH and Corpus Linguistics.
  • Peter Robinson gave a call for open editions and walked us through what that might mean.

Given the speakers, there was a lot of reflection on the history of humanities computing and disciplinarity, though enframed by a German context. TU Darmstadt has an MA in Linguistic and Literary Computing (see image of the structure of the degree above) and is now developing an undergrad degree.

What Ever Happened to Project Bamboo?

What Ever Happened to Project Bamboo? by Quinn Dombrowski is one of the few honest discussions about the end of a project. I’ve been meaning to blog this essay which follows on her important conference paper at DH 2013 in Nebraska (see my conference report here which comments on her paper.) The issue of how projects fail or end rarely gets dealt with and Dombrowski deserves credit for having the courage to document the end of a project that promised so much.

I blog about this now as I just finished a day-long meeting of the Leadership Council for Digital Infrastructure where we discussed a submission to Industry Canada that calls for coordinated digital research infrastructure. While the situation is different, we need to learn from projects like Bamboo when we imagine massive investment in research infrastructure. We all know it is important, but doing it right is not as easy as it sounds.

Which brings me back to failure. There are three types of failure:

  • The simple type we are happy to talk about where you ran an experiment based on a hypothesis and didn’t get positive results. This type is based on a simplistic model of the scientific process which pretends to value negative results as much as positive ones. We all know the reality is not that simple and, for that matter, that the science model doesn’t really apply to the humanities.
  • The messy type where you don’t know why you failed or what exactly failed. This is the type where you promised something in a research or infrastructure proposal and didn’t deliver. This type is harder to report because it reflects badly on you. It is an admission that you were confused or oversold your project.
  • The third and bitter type is the project that succeeds on its own terms, but is surpassed by the disciplines. It is when you find your research isn’t current any longer and no one is interested in your results. It is when you find yourself ideologically stranded doing something that someone important has declared critically flawed. It is a failure of assumptions, or theory, or positioning and no one wants to hear about this failure, they just want to avoid it.

When people like Willard McCarty and John Unsworth call for a discussion of failure in the digital humanities they describe the first type, but often mean the second. The idea is to describe a form of failure reporting similar to negative results – or to encourage people to describe their failure as simply negative results. What we need, however, is honest description of the second and third types of failure, because those are expensive. To pretend some expensive project that slowly disappeared in missunderstanding was simply an experiment is missing what was at stake. This is doubly true of infrastructure because infrastructure is not supposed to be experimental. No one pays for roads and their maintenance as an experiment to see if people will take the road. You should be sure the road is needed before building.

Instead, I think we need to value research into infrastructure as something independent of the project owners. We need to do in Canada what the NSF did – bring together research on the history and theory of infrastructure.