What’s in a number? William Shakespeare’s legacy analysed


The Guardian published an article on What’s in a number? William Shakespeare’s legacy analysed (April 22, 2016). This article is part of a Shakespeare 400 series in honour of the 400th anniversary of the bard’s death. The article is introduced thus:

Shakespeare’s ability to distil human nature into an elegant turn of phrase is rightly exalted – much remains vivid four centuries after his death. Less scrutiny has been given to statistics about the playwright and his works, which tell a story in their own right. Here we analyse the numbers behind the Bard.

The authors offer a series of visualizations of statistics about Shakespeare that are rather more of a tease than anything really interesting. They also ignore the long history of using quantitative methods to study Shakespeare going back to Mendenhall’s study of authorship using word lengths.

Mendenhall, T. C. (1901). “A Mechanical Solution of a Literary Problem.” The Popular Science Monthly. LX(7): 97-105.

Literature Measured

I finally got around to reading the latest Pamphlets of the Stanford Literary Lab. This pamphlet, 12. Literature Measured (PDF) written by Franco Moretti, is a reflection on the Lab’s research practices and why they chose to publish pamphlets. It is apparently the introduction to a French edition of the pamphlets. The pamphlet makes some important points about their work and the digital humanities in general.

Images come  first, in our pamphlets, because – by visualizing empirical findings – they constitute the specific object of study of computational criticism; they are our “text”; the counterpart to what a well-defined excerpt is to close reading. (p. 3)

I take this to mean that the image shows the empirical findings or the model drawn from the data. That model is studied through the visualization. The visualization is not an illustration or supplement.

By frustrating our expectations, failed experiments “estrange” our natural habits of thought, offering us a chance to transform them. (p. 4)

The pamphlet has a good section on failure and how that is not just a rhetorical ploy, but important to research. I would add that only certain types of failure are so. There are dumb failures too. He then moves on to the question of successes in the digital humanities and ends with an interesting reflection on  how the digital humanities and Marxist criticism don’t seem to have much to do with each other.

But he (Bordieu) also stands for something less obvious, and rather perplexing: the near-absence from digital humanities, and from our own work as well, of that other sociological approach that is Marxist criticism (Raymond Williams, in “A Quantitative Literary History”, being the lone exception). This disjunction – perfectly mutual, as the indiference of Marxist criticism is only shaken by its occasional salvo against digital humanities as an accessory to the corporate attack on the university – is puzzling, considering the vast social horizon which digital archives could open to historical materialism, and the critical depth which the latter could inject into the “programming imagination”. It’s a strange state of a airs; and it’s not clear what, if anything, may eventually change it. For now, let’s just acknowledge that this is how things stand; and that – for the present writer – something needs to be done. It would be nice if, one day, big data could lead us back to big questions. (p. 7)

Where Probability Meets Literature and Language: Markov Models for Text Analysis

3quarksdaily, one of my favourite sites to read just posted a very nice essay by Sanjukta Paul on Where Probability Meets Literature and Language: Markov Models for Text Analysis. The essay starts with Markov, who in the 19th century was doing linguistic analysis by hand and goes to authorship attribution by people like Fiona Tweedie (the image above is from a study she co-authored). It also explains markov models on the way.

Which Words Are Used To Describe White And Black NFL Prospects?

Graphic with words

I’ve been meaning to blog this 2014 use of Voyant Tools for some time. Which Words Are Used To Describe White And Black NFL Prospects?. Deadspin did a neat project where they gathered pre-drafting scout reports on black and white football players and then analyzed them with Voyant showing how some words are used more for white or black players.

Continue reading Which Words Are Used To Describe White And Black NFL Prospects?

Blockbusters: how Rutherford Chang became the second best Tetris player in the world

The Guardian has a story about Blockbusters: how Rutherford Chang became the second best Tetris player in the world. Chang is an artist who has been playing Tetris over and over and filming it. His hundreds of thousands of games can be viewed on YouTube here.

How is this art? I suspect it is in the way he plays with repetition. Another project, Alphabetized Newspaper, takes all the words in stories on the cover of The New York Times and rearranges them in alphabetical order created a sort of sorted word list. (Click image and explore.)



He also did this with video of NBC nightly news, which produces a bizarre effect. Imagine all the very short clips of people saying “and” in a row.

I am struck by how he has humanly recreated what an algorithm could do.


LOTRProject: Visualizing the Lord of the Rings


Emil Johansson, a student in Gothenburg, has created a fabulous site called the LOTRProject (or Lord Of The Rings Project. The site provides different types of visualizations about Tolkien’s world (Silmarillion, Hobbit, and LOTR) from maps to family trees to character mentions (see image above).

Continue reading LOTRProject: Visualizing the Lord of the Rings

Literary Analysis and the Wolfram Language


Lately I’ve been trying Wolfram Mathematica more an more for analytics. I was introduced to Mathematica by Bill Turkel and Ian Graham who have done some impressive stuff with it. Bill Turkel has now created a open access, open content, and open source textbook Digital Research Methods with Mathematica. The text is a Mathematica notebook itself so, if you have Mathematica you can actually use the text to do analytics on the spot.

Wolfram has also posted an interesting blog entry on Literary Analysis and the Wolfram Language: Jumping Down a Reading Rabbit Hole. They show how you can generate word clouds and sentiment analysis graphs easily.

While I am still learning Mathematica, some of the features that make it attractive include:

  • It uses a “literate programming” model where you write notebooks meant to be read by humans with embedded code rather than writing code with awkward comments embedded.
  • It has a lot of convenient Web, Language, and Visualization functions that let you do things we want to do in the digital humanities.
  • You can call on Wolfram Alpha in a notebook to get real world knowledge like capital cities or maps or language information.

philosophi.ca : Digital Humanities Concepts 2015

TU Darmstadt MA LLC Structure

Just left a most delightful conference on Key ideas and concepts of Digital Humanities in Darmstadt, Germany. My conference notes are on philosophi.ca : Digital Humanities Concepts 2015. The conference brought together an extraordinary set of speakers who were influential in the field when I entered it. Susan Hockey, Michael Sperberg-McQueen, Nancy Ide, George Landow, Wilhelm Ott and the list goes on. I would be hard pressed to imagine a conference I have been at better able to reflect on the history and ideas of humanities computing. The organizers Andrea Rapp, Michael Sperberg-McQueen, Sabine Bartsch and Michael Bender deserve much more praise than I was able to lavish on them.

Among all the great papers I will mention:

  • Michael Sperberg-McQueen gave a very smart and well argued paper on descriptive markup arguing against its dismissal as enforcing hierarchies.
  • Marco Passarotti talked about the Index Thomisticus (which he directs) and the Busa Archive. He brought some documents including some Gantt charts and early letters. I am definitely going to visit him and the archive in Milan.
  • Fotis Jannidis gave a great paper on topic modelling and its temptations. He has very interesting stuff to say about how the method has been adopted by humanists.
  • Julia Flanders gave a paper on “Looking for Gender in the History of DH” that when published will, I predict, become mandatory reading. She gives us a way forward after what happened at DH 2015. It was a truly wise and humble talk that could go a long way to providing an inclusive way forward.
  • Nancy Ide gave a great overview of the separate trajectories taken by DH and Corpus Linguistics.
  • Peter Robinson gave a call for open editions and walked us through what that might mean.

Given the speakers, there was a lot of reflection on the history of humanities computing and disciplinarity, though enframed by a German context. TU Darmstadt has an MA in Linguistic and Literary Computing (see image of the structure of the degree above) and is now developing an undergrad degree.

Text Mining The Novel 2015


On Thursday and Friday (Oct. 22nd and 23rd) I was at the 2nd workshop for the Text Mining the Novel project. My conference notes are here Text Mining The Novel 2015. We had a number of great papers on the issue of genre (this year’s topic.) Here are some general reflections:

  • The obvious weakness of text mining is that it operates on the novel as text, specifically digital text (or string.) We need to find ways to also study the novel as material object (thing), as a social object, as a performance (of the reader), and as an economic object in a market place. Then we also have to find ways to connect these.
  • So many analytical and mining processes depend on bags of words from dictionaries to topics. Is this a problem or a limitation? Can we try to abstract characters, plot, or argument.
  • I was interested in the philosophical discussions around the epistemological in novels and philosophical claims about language and literature.