Is it Research or is it Spying? Thinking-Through Ethics in Big Data AI and Other Knowledge Sciences

Is it Research or is it Spying? Thinking-Through Ethics in Big Data AI and Other Knowledge Sciences has just been published online. It was written with Bettina Berendt and Marco Büchler and came out of a Dagschule retreat where a group of us started talking about ethics and big data. Here is the abstract:

How to be a knowledge scientist after the Snowden revelations?” is a question we all have to ask as it becomes clear that our work and our students could be involved in the building of an unprecedented surveillance society. In this essay, we argue that this affects all the knowledge sciences such as AI, computational linguistics and the digital humanities. Asking the question calls for dialogue within and across the disciplines. In this article, we will position ourselves with respect to typical stances towards the relationship between (computer) technology and its uses in a surveillance society, and we will look at what we can learn from other fields. We will propose ways of addressing the question in teaching and in research, and conclude with a call to action.

A PDF of our author version is here.

Wilkens: Literary Attention Lag

Matthew Wilkens has posted a nice blog essay about his short MLA paper on geography and memory, Literary Attention Lag. He looked at how some cities get far more literary attention than their population merits despite a general correlation between population and attention. For example, in 1860 Chicago and New Orleans had about the same population, but New Orleans gets a lot more attention.

What is particularly useful is that he provides an iPython notebook with a documented version of his code here. He also provides a link to his data so you can edit and recapitulate his study.

Stéfan Sinclair and I are experimenting with Mathematica notebooks and iPython notebooks as a way to share research thinking with code woven in.

Is GamerGate About Media Ethics or Harassing Women? Harassment, the Data Shows

PeopleTargeted

In all the GamerGate stories, an interesting move by Newsweek as to commission a study of GamerGate tweets. Taylor Wofford reported about the results in an article from October 25th, 2014 that is titled, Is GamerGate About Media Ethics or Harassing Women? Harassment, the Data Shows. The study was run by BrandWatch  and they looked at who was the target of tweets with #gamegate. Low and behold the GamerGate community seemed more concerned with female game designers than journalists which calls into question the claim that GamerGate is really about ethics and games journalism.

We are now gathering tweets too and we will see if we can reproduce the results. At first glance the number of GamerGate tweets seems really low – they seem to be sampling. It will also be interesting to see if there has been a shift in emphasis in the discussion.

bookworm

chart (1)The folks behind the Google Ngram Viewer have developed a new tools called bookworm. It has a number of corpora (the example above is from bills from beta.congress.gov.) It lets you describe more complex queries and you can upload your own data.

Bookworm is hosted by the Cultural Observatory at Harvard directed by Erez Lieberman Aiden and Jean-Baptiste Michel who were behind the NGgam Viewer. They have recently published a book Uncharted where they talk about different cultural trends they studied using the NGram Viewer. The book is accessible though a bit light.

Checktext.org

I was sent a note about Checktext.org, an web site where you paste (or upload) some text and it gives you basic analytical information like Flesch-Kincaid Grade Level. One neat feature is that it will do a plagiarism check against a database. It isn’t clear how they build their database or if they are just using Google, but it caught a web page I passed it.

NovelTM: Text Mining the Novel

This week SSHRC announced the new partnership grants awarded including one I am a co-investigator on, NovelTM: Text Mining the Novel.

This project brings together researchers and partners from 21 different academic and non-academic institutions to produce the first large-scale quantitative history of the novel. Our aim is to bring new computational approaches in the field of text mining to the study of literature as well as bring the unique knowledge of literary studies to bear on larger debates about data mining and the place of information technology within society.

NovelTM is led by Andrew Piper at McGill University. At the University of Alberta I will be gathering a team that will share the resulting computing methods through TAPoR and developing recipes or tutorials so that others can try them.

Text Analysis with Topic Models

TopicModelPlot

Fotis pointed me to this set of tutorials on Text Analysis with Topic Models for the Humanities and Social Sciences. The tutorials are built around Python, but most of it could be done with other tools. While I haven’t followed through the set of tutorials, they look like a great primer on text mining, visualization and interpretation. I particularly like how they include different datasets (British Novels, French plays …) to play with.

Topic Modeling and Gephi

Veronica Poplawski has posted a nice blog essay on Topic Modeling and Gephi: A Work in Progress : Digital Environmental Humanities. She walks through a project she did on 358 Environmental Humanities documents related to a workshop I was part of in the Fall (see my conference report here.) First she used Mallet to generate topics and then she created an XML file to bring the topics and associated words into Gephi for visualization. Nice work!

My Very Own Voyant Workshop

Stéfan Sinclair and I just finished a workshop on My Very Own Voyant. The workshop focused on how to run VoyantServer on your local machine. This allows you to run Voyant locally. There are all sorts of reasons to run locally:

  • It runs faster
  • You can upload large texts faster
  • It can process larger text corpora
  • You can control the server
  • You can keep your corpora confidential

You can download VoyantServer and read instructions here.