Information Wants to Be Free, Or Does It? The Ethics of Datafication has just come out in the Electronic Book Review. This article was written with Bettina Berendt at KU Leuven and is about thinking about the ethics of digitization. The article first looks at the cliche phrase “information wants to be free” and then moves on to survey a number of arguments why some things should be digitized.
The question I want to explore today is this: what do we do about distant reading, now that we know that Franco Moretti, the man who coined the phrase “distant reading,” and who remains its most famous exemplar, is among the men named as a result of the #MeToo movement.
Lauren Klein has posted an important blog entry on Distant Reading after Moretti. This essay is based on a talk delivered at the 2018 MLA convention for a panel on Varieties of Digital Humanities. Klein asks about distant reading and whether it shelters sexual harassment in some way. She asks us to put not just the persons, but the structures of distant reading and the digital humanities under investigation. She suggests that it is “not a coincidence that distant reading does not deal well with gender, or with sexuality, or with race.” One might go further and ask if the same isn’t true of the digital humanities in general or the humanities, for that matter. Klein then suggests some thing we can do about it:
- We need more accessible corpora that better represent the varieties of human experience.
- We need to question our models and ask about what is assumed or hidden.
— DH at USF (@UsfDh) February 27, 2018
Last week I presented a paper based on work that Stéfan Sinclair and I are doing at the University of South Florida. The talk, titled, “Cooking Up Literature: Theorizing Statistical Approaches to Texts” looked at a neglected period of French innovation in the 1970s and 1980s. During this period the French were developing a national corpus, FRANTEXT, while there was also a developing school of exploratory statistics around Jean-Paul Benzécri. While Anglophone humanities computing was concerned with hypertext, the French were looking at using statistical methods like correspondence analysis to explore large corpora. This is long before Moretti and “distant reading.”
The talk was organized by Steven Jones who holds the DeBartolo Chair in Liberal Arts and is a Professor of Digital Humanities. Steven Jones leads a NEH funded project called RECALL that Stéfan and I are consulting on. Jones and colleagues at USF are creating a 3D model of Father Busa’s original factory/laboratory.
3quarksdaily, one of the better web sites for extracts of interesting essays, pointed me to this essay on Are Algorithms Building the New Infrastructure of Racism? in Nautilus by Aaron M. Bornstein (Dec. 21, 2017). The article reviews some of the terrain covered by Cathy O’Neil’s book Weapons of Math Destruction, but the article also points out how AIs are becoming infrastructure and infrastructure with bias baked in is very hard to change, like the low bridges that Robert Moses built to make it hard for public transit to make it into certain areas of NYC. Algorithmic decisions that are biased and visible can be studied and corrected. Decisions that get built into infrastructure disappear and get much harder to fix.
a fundamental question in algorithmic fairness is the degree to which algorithms can be made to understand the social and historical context of the data they use …
Just as important is paying attention to the data that is used to train the AIs in the first place. Historic data carries the biases of these generations and they need to be questioned as they get woven into our infrastructure.
Domenico Fiormonte has recently blogged about an interesting document he has by Father Busa that relates to a difficult moment in the history of the digital humanities in Italy in 2002. The two page “Conditional Agreement”, which I translate below, was given to Domenico and explained the terms under which Busa would agree to sign a letter to the Minister (of Education and Research) Moratti in response to Moratti’s public statement about the uselessness of humanities informatics. A letter was being prepared to be signed by a large number of Italian (and foreign) academics explaining the value of what we now call the digital humanities. Busa had the connections to get the letter published and taken seriously for which reason Domenico visited him to get his help, which ended up being conditional on certain things being made clear, as laid out in the document. Domenico kept the two pages Busa wrote and recently blogged about them. As he points out in his blog, these two pages are a mini-manifesto of Father Busa’s later views of the place and importance of what he called textual informatics. Domenico also points out how political is the context of these notes and the letter eventually signed and published. Defining the digital humanities is often about positioning the field in the larger academic and public political spheres we operate in.
I’ve just come across some important blog essays by David Gaertner. One is Why We Need to Talk About Indigenous Literature in the Digital Humanities where he argues that colleagues from Indigenous literature are rightly skeptical of the digital humanities because DH hasn’t really taken to heart the concerns of Indigenous communities around the expropriation of data.
Steven Jones has just put up a historic flowchart from the Busa Archive at the Università Cattolica del Sacro Cuore, Milan, Italy. See A flow chart for Busa’s “Mechanized Linguistic Analysis”. Jones has been posting important historical images associated with his book Roberto Busa, S.J., and the Emergence of Humanities Computing. This flow chart shows the logic of the processing using punched cards and tape that was developed by Busa and Paul Tasman (who is probably one of the designers of this chart.) The folks at the Busa Archive had shared this flow chart with me for a paper I gave at the Instant History conference in Chicago on Busa’s Methods. Now Steven has shared it openly with permission.
For more on the Busa Archives and what they show us about the Index Thomisticus as Project see here.
I’ve just come back from the Chicago Colloquium on Digital Humanities and Computer Science at the University of Illinois, Chicago. The Colloquium is a great little conference where a lot of new projects get shown. I kept conference notes on the Colloquium here.
I was struck by the number of sessions of papers on mapping projects. I don’t know if I have ever seen so many geospatial projects. Many of the papers talked about how mapping is a different way of analyzing the data whether it is the location of eateries in Roman Pompeii or German construction projects before 1924.
I gave a paper on “Information Wants to Be Free, Or Does It? Ethics in the Digital Humanities.”