Torn Apart: Nimble Digital Humanities

Torn Apart is a curation and visualization of publicly available data concerning ICE, CBP facilities, and usages. Also lists of allied and pro-immigrant facilities.

At DH 2018 I heard Roopika Risam speak about the impressive critical digital humanities Torn Apart / Separados project she is part of. (See my conference notes here.) The project is rightly getting attention. For example, the Inside Higher Ed has a story on Digital Humanities for Social Good. This story presents Torn Apart / Separados as an answer to critiques about the digital humanities that they are not critical enough and/or lack interpretative value. (See Stanley Fish’s Stop Trying to Sell the Humanities.) The Inside Higher Ed article rightly points out that there have been socially engaged digital humanities projects for some time.

What I find impressive and think is truly important is how nimble the project is. This project was imagined and implemented in “real” time – ie. it was developed in response to events unfolding in the news. It was also developed without a grant and by a distributed team of volunteers. Thats what computing in the humanities should be – a way to think through issues critically not a way to get funding.

CSDH and CGSA 2018

This year we had busy CSDH and CGSA meetings at Congress 2018 in Regina. My conference notes are here. Some of the papers I was involved in include:


  • “Code Notebooks: New Tools for Digital Humanists” was presented by Kynan Ly and made the case for notebook-style programming in the digital humanities.
  • “Absorbing DiRT: Tool Discovery in the Digital Age” was presented by Kaitlyn Grant. The paper made the case for tool discovery registries and explained the merger of DiRT and TAPoR.
  • “Splendid Isolation: Big Data, Correspondence Analysis and Visualization in France” was presented by me. The paper talked about FRANTEXT and correspondence analysis in France in the 1970s and 1980s. I made the case that the French were doing big data and text mining long before we were in the Anglophone world.
  • “TATR: Using Content Analysis to Study Twitter Data” was a poster presented by Kynan Ly, Robert Budac, Jason Bradshaw and Anthony Owino. It showed IPython notebooks for analyzing Twitter data.
  • “Climate Change and Academia – Joint Panel with ESAC” was a panel I was on that focused on alternatives to flying for academics.


  • “Archiving an Untold History” was presented by Greg Whistance-Smith. He talked about our project to archive John Szczepaniak’s collection of interviews with Japanese game designers.
  • “Using Salience to Study Twitter Corpora” was presented by Robert Budac who talked about different algorithms for finding salient words in a Twitter corpus.
  • “Political Mobilization in the GG Community” was presented by ZP who talked about a study of a Twitter corpus that looked at the politics of the community.

Also, a PhD student I’m supervising, Sonja Sapach, won the CSDH-SCHN (Canadian Society for Digital Humanities) Ian Lancashire Award for Graduate Student Promise at CSDHSCHN18 at Congress. The Award “recognizes an outstanding presentation at our annual conference of original research in DH by a graduate student.” She won the award for a paper on “Tagging my Tears and Fears: Text-Mining the Autoethnography.” She is completing an interdisciplinary PhD in Sociology and Digital Humanities. Bravo Sonja!

Too Much Information and the KWIC

A paper that Stéfan Sinclair and wrote about Peter Luhn and the Keyword-in-Context (KWIC) has just been published by the Fudan Journal of the Humanities and Social Sciences, Too Much Information and the KWIC | SpringerLink. The paper is part of a series that replicates important innovations in text technology, in this case, the development of the KWIC by Peter Luhn at IBM. We use that as a moment to reflect on the datafication of knowledge after WW II, drawing on Lyotard.

John Stuart Mill marginalia project

Project to digitise and publish his marginalia online will allow scholars to see his cutting remarks on Ralph Waldo Emerson

The Guardian has a story on an interesting digital humanities project, JS Mill scribbles reveal he was far from a chilly Victorian intellectualThe project, Mill Marginalia Online, is digitizing an estimated 40,000 comments, doodles, and other marks that John Stuart Mill wrote in his collection of 1,700 books, now at Somerville College, Oxford. His collection was donated to Somerville 30 years after his death in 1905 because the women of the college weren’t allowed to access the Oxford libraries at the time.

His comments are not just scholarly notes. For example, above is an image of the title page of Emerson’s Essays that Mill added text to in order to mock it. The new title page with Mill’s penciled in elaboration and the original reads,

Philosophy Bourgeois,
Sentimental Essays: in the art of
Intimately blending
Sense and Nonsense:
R. W. Emerson,
of Concord, Massachusetts.
A clever + well organised youth brought up
in the old traditions.
In thought “all’s fish that comes to net.”
With Fog Preface
By Thomas Carlyle.
“Patent Divine-light Self-acting Foggometer”
To the Court of
Her mAJESTy Queen Vic.

A JEST indeed. The Daily Nous has an article on this with the title, Mill’s Myriad Marginalia: Mundane, Mysterious, Mocking.

The Ethics of Datafiction

Information Wants to Be Free, Or Does It? The Ethics of Datafication has just come out in the Electronic Book Review. This article was written with Bettina Berendt at KU Leuven and is about thinking about the ethics of digitization. The article first looks at the cliche phrase “information wants to be free” and then moves on to survey a number of arguments why some things should be digitized.

Distant Reading after Moretti

The question I want to explore today is this: what do we do about distant reading, now that we know that Franco Moretti, the man who coined the phrase “distant reading,” and who remains its most famous exemplar, is among the men named as a result of the #MeToo movement.

Lauren Klein has posted an important blog entry on Distant Reading after MorettiThis essay is based on a talk delivered at the 2018 MLA convention for a panel on Varieties of Digital Humanities. Klein asks about distant reading and whether it shelters sexual harassment in some way. She asks us to put not just the persons, but the structures of distant reading and the digital humanities under investigation. She suggests that it is “not a coincidence that distant reading does not deal well with gender, or with sexuality, or with race.” One might go further and ask if the same isn’t true of the digital humanities in general or the humanities, for that matter. Klein then suggests some thing we can do about it:

  • We need more accessible corpora that better represent the varieties of human experience.
  • We need to question our models and ask about what is assumed or hidden.



Cooking Up Literature: Talk at U of South Florida

Last week I presented a paper based on work that Stéfan Sinclair and I are doing at the University of South Florida. The talk, titled, “Cooking Up Literature: Theorizing Statistical Approaches to Texts” looked at a neglected period of French innovation in the 1970s and 1980s. During this period the French were developing a national corpus, FRANTEXT, while there was also a developing school of exploratory statistics around Jean-Paul Benzécri. While Anglophone humanities computing was concerned with hypertext, the French were looking at using statistical methods like correspondence analysis to explore large corpora. This is long before Moretti and “distant reading.”

The talk was organized by Steven Jones who holds the DeBartolo Chair in Liberal Arts and is a Professor of Digital Humanities. Steven Jones leads a NEH funded project called RECALL that Stéfan and I are consulting on. Jones and colleagues at USF are creating a 3D model of Father Busa’s original factory/laboratory.

Are Algorithms Building the New Infrastructure of Racism?

Robert Moses

3quarksdaily, one of the better web sites for extracts of interesting essays, pointed me to this essay on Are Algorithms Building the New Infrastructure of Racism? in Nautilus by Aaron M. Bornstein (Dec. 21, 2017). The article reviews some of the terrain covered by Cathy O’Neil’s book Weapons of Math Destruction, but the article also points out how AIs are becoming infrastructure and infrastructure with bias baked in is very hard to change, like the low bridges that Robert Moses built to make it hard for public transit to make it into certain areas of NYC. Algorithmic decisions that are biased and visible can be studied and corrected. Decisions that get built into infrastructure disappear and get much harder to fix.

a fundamental question in algorithmic fairness is the degree to which algorithms can be made to understand the social and historical context of the data they use …

Just as important is paying attention to the data that is used to train the AIs in the first place. Historic data carries the biases of these generations and they need to be questioned as they get woven into our infrastructure.