Literary Analysis and the Wolfram Language

digital-research-methods-cover-2015-medium

Lately I’ve been trying Wolfram Mathematica more an more for analytics. I was introduced to Mathematica by Bill Turkel and Ian Graham who have done some impressive stuff with it. Bill Turkel has now created a open access, open content, and open source textbook Digital Research Methods with Mathematica. The text is a Mathematica notebook itself so, if you have Mathematica you can actually use the text to do analytics on the spot.

Wolfram has also posted an interesting blog entry on Literary Analysis and the Wolfram Language: Jumping Down a Reading Rabbit Hole. They show how you can generate word clouds and sentiment analysis graphs easily.

While I am still learning Mathematica, some of the features that make it attractive include:

  • It uses a “literate programming” model where you write notebooks meant to be read by humans with embedded code rather than writing code with awkward comments embedded.
  • It has a lot of convenient Web, Language, and Visualization functions that let you do things we want to do in the digital humanities.
  • You can call on Wolfram Alpha in a notebook to get real world knowledge like capital cities or maps or language information.

Text Mining The Novel 2015

novelTMworkshop

On Thursday and Friday (Oct. 22nd and 23rd) I was at the 2nd workshop for the Text Mining the Novel project. My conference notes are here Text Mining The Novel 2015. We had a number of great papers on the issue of genre (this year’s topic.) Here are some general reflections:

  • The obvious weakness of text mining is that it operates on the novel as text, specifically digital text (or string.) We need to find ways to also study the novel as material object (thing), as a social object, as a performance (of the reader), and as an economic object in a market place. Then we also have to find ways to connect these.
  • So many analytical and mining processes depend on bags of words from dictionaries to topics. Is this a problem or a limitation? Can we try to abstract characters, plot, or argument.
  • I was interested in the philosophical discussions around the epistemological in novels and philosophical claims about language and literature.

 

diyMatrix: Bertin’s Manual

bertin machine

I have long been interested in Jacques Bertin, a pioneer in thinking about visualization. His Semiology of Graphics is a classic. I had been thinking it would be great to try or simulate his way of doing cluster analysis with physical matrices which he called “dominos”. I was therefore pleased to see that someone has recreated his matrices, see DIY Matrix.

Charles Perin, Pierre Dragicevic, and Jean-Daniel Fekete have updated the matrices and fabricated a version for a CHI’15 workshop on Investigating the Challenges of Making Data Physical (PDF).

Update: They also have a web application called Bertifier that allows you to try it virtually. This interactive allows you to choose different ways of decorating the blocks and will then also reorder them. It is fascinating to play with.

interactiveBertin

Now I have something I want to print on a fabricator.

Dennis Cooper: Zac’s Haunted House (A Novel)

Dennis Cooper has created an interesting novel of looping animated gifs called Zac’s Haunted House (A Novel). The novel is published by Kiddiepunk. I’m not sure why he deliberately calls it a novel when it has so little language, though one can think of the animated gifs as some sort of linked visual language. Perhaps animated gifs are becoming the visual equivalent of words with which we can compose.

I found this courtesy of 3QuarksDaily.

Wilkens: Literary Attention Lag

Matthew Wilkens has posted a nice blog essay about his short MLA paper on geography and memory, Literary Attention Lag. He looked at how some cities get far more literary attention than their population merits despite a general correlation between population and attention. For example, in 1860 Chicago and New Orleans had about the same population, but New Orleans gets a lot more attention.

What is particularly useful is that he provides an iPython notebook with a documented version of his code here. He also provides a link to his data so you can edit and recapitulate his study.

Stéfan Sinclair and I are experimenting with Mathematica notebooks and iPython notebooks as a way to share research thinking with code woven in.

Is GamerGate About Media Ethics or Harassing Women? Harassment, the Data Shows

PeopleTargeted

In all the GamerGate stories, an interesting move by Newsweek as to commission a study of GamerGate tweets. Taylor Wofford reported about the results in an article from October 25th, 2014 that is titled, Is GamerGate About Media Ethics or Harassing Women? Harassment, the Data Shows. The study was run by BrandWatch  and they looked at who was the target of tweets with #gamegate. Low and behold the GamerGate community seemed more concerned with female game designers than journalists which calls into question the claim that GamerGate is really about ethics and games journalism.

We are now gathering tweets too and we will see if we can reproduce the results. At first glance the number of GamerGate tweets seems really low – they seem to be sampling. It will also be interesting to see if there has been a shift in emphasis in the discussion.

bookworm

chart (1)The folks behind the Google Ngram Viewer have developed a new tools called bookworm. It has a number of corpora (the example above is from bills from beta.congress.gov.) It lets you describe more complex queries and you can upload your own data.

Bookworm is hosted by the Cultural Observatory at Harvard directed by Erez Lieberman Aiden and Jean-Baptiste Michel who were behind the NGgam Viewer. They have recently published a book Uncharted where they talk about different cultural trends they studied using the NGram Viewer. The book is accessible though a bit light.

NovelTM: Text Mining the Novel

This week SSHRC announced the new partnership grants awarded including one I am a co-investigator on, NovelTM: Text Mining the Novel.

This project brings together researchers and partners from 21 different academic and non-academic institutions to produce the first large-scale quantitative history of the novel. Our aim is to bring new computational approaches in the field of text mining to the study of literature as well as bring the unique knowledge of literary studies to bear on larger debates about data mining and the place of information technology within society.

NovelTM is led by Andrew Piper at McGill University. At the University of Alberta I will be gathering a team that will share the resulting computing methods through TAPoR and developing recipes or tutorials so that others can try them.