Replication as a way of knowing in the digital humanities

May 5th, 2016
Poster for Replication Talk

Poster for Replication Talk

At the end of April I gave a talk at the University of Würzburg on Replication as a way of knowing in the digital humanities. This was sponsored by the Dr. Fotis Jannidis who holds the position of Chair of computer philology and modern German literature there. He and others have built a digital humanities program and interesting research agenda around text mining and German literature. The talk tried out some new ideas Stéfan Sinclair and I are working on. The abstract read:

Much new knowledge in the digital humanities comes from the practices of encoding and programming not through discourse. These practices can be considered forms of modelling in the active sense of making by modelling or, as I like to call them, practices of thinking-through. Alas, these practices and the associated ways of knowing are not captured or communicated very well through the usual academic forms of publication which come out of discursive knowledge traditions. In this talk I will argue for “replication” as a way of thinking-through the making of code. I will give examples and conclude by arguing that such thinking-through replication is critical to the digital literacy needed in the age of big data and algorithms.

The Rise and Fall Tool-Related Topics in CHum

May 3rd, 2016
Tool Network Image

Tool network with COCOA selected

I just found out that a paper we gave in 2014 was just published. See The Rise and Fall Tool-Related Topics in CHum. Here is the abstract:

What can we learn from the discourse around text tools? More than might be expected. The development of text analysis tools has been a feature of computing in the humanities since IBM supported Father Busa’s production of the Index Thomisticus (Tasman 1957). Despite the importance of tools in the digital humanities (DH), few have looked at the discourse around tool development to understand how the research agenda changed over the years. Recognizing the need for such an investigation a corpus of articles from the entire run of Computers and the Humanities (CHum) was analyzed using both distant and close reading techniques. By analyzing this corpus using traditional category assignments alongside topic modelling and statistical analysis we are able to gain insight into how the digital humanities shaped itself and grew as a discipline in what can be considered its “middle years,” from when the field professionalized (through the development of journals like CHum) to when it changed its name to “digital humanities.” The initial results (Simpson et al. 2013a; Simpson et al. 2013b), are at once informative and surprising, showing evidence of the maturation of the discipline and hinting at moments of change in editorial policy and the rise of the Internet as a new forum for delivering tools and information about them.

three dimensional dynamic data exploration for dh research

April 23rd, 2016

I’m blogging now at Three dimensional dynamic data exploration for DH research. This the project that brought me to Hamburg for these three months so most of my blog entries will be on that site. The project is developing ideas for a next generation visualizations for the humanities.

IBM to close Many Eyes

April 23rd, 2016

I just discovered that IBM to close Many Eyes. This is a pity. It was  great environment that let people upload data and visualize it in different ways. I blogged about it ages ago (in computer ages anyway.) In particular I liked their Word Tree which seems one of the best ways to explore language use.

It seems that some of the programmers moved on and that IBM is now focusing on Watson Analytics.

What’s in a number? William Shakespeare’s legacy analysed

April 22nd, 2016

shakespeare

The Guardian published an article on What’s in a number? William Shakespeare’s legacy analysed (April 22, 2016). This article is part of a Shakespeare 400 series in honour of the 400th anniversary of the bard’s death. The article is introduced thus:

Shakespeare’s ability to distil human nature into an elegant turn of phrase is rightly exalted – much remains vivid four centuries after his death. Less scrutiny has been given to statistics about the playwright and his works, which tell a story in their own right. Here we analyse the numbers behind the Bard.

The authors offer a series of visualizations of statistics about Shakespeare that are rather more of a tease than anything really interesting. They also ignore the long history of using quantitative methods to study Shakespeare going back to Mendenhall’s study of authorship using word lengths.

Mendenhall, T. C. (1901). “A Mechanical Solution of a Literary Problem.” The Popular Science Monthly. LX(7): 97-105.

CAA and SAH Release Guidelines for the Evaluation of Digital Scholarship in Art and Architectural History

April 8th, 2016

The CAA and SAH Release Guidelines for the Evaluation of Digital Scholarship in Art and Architectural History. The College Art Association and Society of Architectural Historians have released guidelines that include attention to process:

A work of digital scholarship often requires developing or refining a methodology. That work should be evaluated as a contribution to scholarship, just as methodological innovations in traditional scholarship are given weight in assessments of achievement. By extension, digital scholarship may need to be evaluated by the process of analysis in addition to the results of the analysis. (p. 5)

The guidelines go on how to identify the importance of the process through things like project narratives. They also talk about how the “inadequacy of existing peer review for digital scholarship is directly related to the changing nature of publications. In many cases, peer review for a digital publication is little different from that of a print publication,…” It sounds like the arts are going through the same discussions as we are.

Literature Measured

April 8th, 2016

I finally got around to reading the latest Pamphlets of the Stanford Literary Lab. This pamphlet, 12. Literature Measured (PDF) written by Franco Moretti, is a reflection on the Lab’s research practices and why they chose to publish pamphlets. It is apparently the introduction to a French edition of the pamphlets. The pamphlet makes some important points about their work and the digital humanities in general.

Images come  first, in our pamphlets, because – by visualizing empirical findings – they constitute the specific object of study of computational criticism; they are our “text”; the counterpart to what a well-defined excerpt is to close reading. (p. 3)

I take this to mean that the image shows the empirical findings or the model drawn from the data. That model is studied through the visualization. The visualization is not an illustration or supplement.

By frustrating our expectations, failed experiments “estrange” our natural habits of thought, offering us a chance to transform them. (p. 4)

The pamphlet has a good section on failure and how that is not just a rhetorical ploy, but important to research. I would add that only certain types of failure are so. There are dumb failures too. He then moves on to the question of successes in the digital humanities and ends with an interesting reflection on  how the digital humanities and Marxist criticism don’t seem to have much to do with each other.

But he (Bordieu) also stands for something less obvious, and rather perplexing: the near-absence from digital humanities, and from our own work as well, of that other sociological approach that is Marxist criticism (Raymond Williams, in “A Quantitative Literary History”, being the lone exception). This disjunction – perfectly mutual, as the indiference of Marxist criticism is only shaken by its occasional salvo against digital humanities as an accessory to the corporate attack on the university – is puzzling, considering the vast social horizon which digital archives could open to historical materialism, and the critical depth which the latter could inject into the “programming imagination”. It’s a strange state of a airs; and it’s not clear what, if anything, may eventually change it. For now, let’s just acknowledge that this is how things stand; and that – for the present writer – something needs to be done. It would be nice if, one day, big data could lead us back to big questions. (p. 7)

SFPC: School for Poetic Computation

April 7th, 2016

The School for Poetic Computation is where I would study if I had the time (and money). Courses include:

  • Generative Text
  • Radical Computer Science
  • Physical Computing
  • Concepts and Theory
  • Recreating the Past

 

Information Geographies

April 1st, 2016

Thanks to a note from Domenico Fiormonte to Humanist I came across the Information Geographies page at the Oxford Internet Institute. The OII has been producing interesting maps that show aspects of the internet. The one pictured above shows the distribution of Geographic Knowledge in Freebase. Given the importance of Freebase to Google’s Knowledge Graph it is important to understand the bias of its information to certain locations.

Geographic content in Freebase is largely clustered in certain regions of the world. The United States accounts for over 45% of the overall number of place names in the collection, despite covering about 2% of the Earth, less than 7% of the land surface, and less than 5% of the world population, and about 10% of Internet users. This results in a US density of one Freebase place name for every 1500 people, and far more place names referring to Massachusetts than referring to China.

Domenico Fiormonte’s email to Humanist (Humanist Discussion Group, Vol. 29, No. 824) argues that “It is our responsibility to preserve cultural diversity, and even relatively small players can make a difference by building more inclusive ‘representations’.” He argues that we need to be open about the cultural and linguistic biases of the tools and databases we build.

Godwin’s Bot: Recent stories on AI

March 29th, 2016

Godwin’s Bot is a good essay from Misha Lepetic on 3QuarksDaily on artificial intelligence (AI). The essay reflects on the recent Microsoft debacle with @TayandYou, an AI chat bot that was “targeted at 18 to 24 year old in the US.” (About Tay & Privacy) For a New Yorker story on how Microsoft shut it down after Twitter trolls trained it to be offensive see I’ve Seen the Greatest A.I. Minds of My Generation Destroyed By Twitter. Lepetic calls her Godwin’s Bot after Godwin’s Law that asserts that in any online conversation there will eventually be a comparison to Hitler.

What is interesting about the essay is that it then moves to an interview wtih Stephen Wolfram on AI & The Future of Civilization where Wolfram distinguishes between inventing a goal, which is difficult to automate, and (once one can articulate a goal clearly) executing it, which can be automated.

How do we figure out goals for ourselves? How are goals defined? They tend to be defined for a given human by their own personal history, their cultural environment, the history of our civilization. Goals are something that are uniquely human.

Lepetic then asks if Tay had a goal or who had goals for Tay. Microsoft had a goal, and that had to do with “learning” from and about a demographic that uses social media. Lepetic sees it as a “vacuum cleaner for data.” In many ways the trolls did us a favor by misleading it.

Or … TayandYou was troll-bait to train a troll filter.

My question is whether anyone has done a good analysis of how the Tay campaign actually worked?