NovelTM: Text Mining the Novel

This week SSHRC announced the new partnership grants awarded including one I am a co-investigator on, NovelTM: Text Mining the Novel.

This project brings together researchers and partners from 21 different academic and non-academic institutions to produce the first large-scale quantitative history of the novel. Our aim is to bring new computational approaches in the field of text mining to the study of literature as well as bring the unique knowledge of literary studies to bear on larger debates about data mining and the place of information technology within society.

NovelTM is led by Andrew Piper at McGill University. At the University of Alberta I will be gathering a team that will share the resulting computing methods through TAPoR and developing recipes or tutorials so that others can try them.

Replaying Japan 2014

Last week we organized Replaying Japan 2014 here in Edmoton. This was the second international conference on Japanese game studies and the third event we co-organized with the Ritsumeikan Center for Game Studies (in Japanese with English pamphlet).

The opening keynote was by Tomohiro Nishikado, the designer of Space Invaders – the 1978 game that launched specialty arcades in Japan. He talked about the design process and showed his notebooks which he had brought. Here you can see the page on his notebook with the sketches of the aliens and then the bitmap versions. I kept my conference notes on his talk and others here.

15000611835_4b0f80fe1f_m 14813976698_7d3ce381c9_z

 

The conference was a huge success with over 100 attendees from 6 countries and over 20 universities. We had people from industry, academia and government too. We had a significant number of Japanese speakers despite English being the language of the conference. After the conference we met to plan for next year’s conference in Kyoto. See you there!

This conference was supported by the Japan Foundation, the GRAND Network of Centres of Excellence, the Prince Takamado Centre, the Ritsumeikan Center for Game Studies, CIRCA, and the University of Alberta.

NYTimes: Inequality and Web Search Trends

The Upshot in the New York Times has a nice article titled In One America, Guns and Diet. In the Otehr, Cameras and ‘Zoolander.’: Inequality and Web Search Trends by David Leonhardt (August 18, 2014). They combined data from Google on favorite searches by county with socio economic data to show what searches correlate with the richer and poorer areas. While few of the correlations are surprising they provide details that one wouldn’t think of. Not only are religious searches more common in poorer areas, but so are searches for “about hell” and “antichrist.” In wealthy areas by contrast they search for “holiday greetings” presumably because they are more likely to live far from family.

Ayway, a neat study that illustrates who the aggregation of different datasets can work.

DH 2014, Dagstuhl, and Exploiting Text

Over the last month I’ve been to a number of conferences that I have been writing conference notes on.

  • At the beginning of July I was at DH 2014 in Lausanne Switzerland where I gave a workshop with Stéfan Sinclair on Your Very Own Voyant, participated in some panels and gave a paper (also with Stéfan).
  • I was at a Dagstuhl around data science and digital humanities at the end of July. We had a fascinating conversation. I ended up in a workshop on the ethics of big data which is going to become yet something else I wish I had the time to study properly.
  • At the beginning of August I went to a workshop at Waterloo that was in honour of Frank Wm. Tompa, Exploiting Text. This workshop had speakers, including myself, who spoke to issues that Tompa was interested in from dictionaries to algorithms for text retrieval. I was often lost in the algorithm talks but it was fascinating to listen to a different view of text.

Text Analysis with Topic Models

TopicModelPlot

Fotis pointed me to this set of tutorials on Text Analysis with Topic Models for the Humanities and Social Sciences. The tutorials are built around Python, but most of it could be done with other tools. While I haven’t followed through the set of tutorials, they look like a great primer on text mining, visualization and interpretation. I particularly like how they include different datasets (British Novels, French plays …) to play with.

Topic Modeling and Gephi

Veronica Poplawski has posted a nice blog essay on Topic Modeling and Gephi: A Work in Progress : Digital Environmental Humanities. She walks through a project she did on 358 Environmental Humanities documents related to a workshop I was part of in the Fall (see my conference report here.) First she used Mallet to generate topics and then she created an XML file to bring the topics and associated words into Gephi for visualization. Nice work!

Weaponizing the Digital Humanities

Jan Christoph Meister has posted a blog about Weaponizing the Digital Humanities. His entry comes from an exchange we had, first around the paper about stylistics to psychologically profile people. (See my conference report on DH2014.) After the session we ended up talking with someone probably from the intelligence community. It is a bit startling to realize that we merit attention, if that is what it is. Certainly research on recognition of typing patterns might be of interest, but it is hard to imagine what else would be of interest.

The other side of intelligence interest in our field is our interest in surveillance. What can we learn from the intelligence agencies and the techniques they develop? I’m certainly intrigued by what they might have been able to do. What responsibilities do we have to engage the ethical and interpretative issues raised by Snowden’s revelations. My blog entry Interpreting the CSEC Presentation: Watch Out Olympians in the House! would be a attempt to interpret Snowden documents – perhaps paleography of the documents.

Meister rightly opens the ethical issue of whether our organization should have a code of ethics that touches on how our research is used. We have a code of conduct, should it extend to issues of surveillance? The humanist in me asks how other fields in the humanities have dealt with the sudden military application of their research. There was/is an issue around the involvement of anthropologists and sociologists in Petagon-funded projects.

My Very Own Voyant Workshop

Stéfan Sinclair and I just finished a workshop on My Very Own Voyant. The workshop focused on how to run VoyantServer on your local machine. This allows you to run Voyant locally. There are all sorts of reasons to run locally:

  • It runs faster
  • You can upload large texts faster
  • It can process larger text corpora
  • You can control the server
  • You can keep your corpora confidential

You can download VoyantServer and read instructions here.