chart (1)The folks behind the Google Ngram Viewer have developed a new tools called bookworm. It has a number of corpora (the example above is from bills from It lets you describe more complex queries and you can upload your own data.

Bookworm is hosted by the Cultural Observatory at Harvard directed by Erez Lieberman Aiden and Jean-Baptiste Michel who were behind the NGgam Viewer. They have recently published a book Uncharted where they talk about different cultural trends they studied using the NGram Viewer. The book is accessible though a bit light.

Evgeny Morozov: How much for your data?

Evgeny Morozov has a nice essay in Le Monde Diplomatique (English Edition, August 2014) on Whilst you whistle in the shower: How much for your data? (article on LMD here). He raises questions about the monetization of all of our data and how we are willing to give up more and more data. He describes the limited options being debated on the issue of data and privacy,

the future offered to us by Lanier and Pentland fits into the German “ordoliberal” tradition, which sees the preservation of
market competition as a moral project, and treats all monopolies as dangerous. The Google approach fits better with the American school of neoliberalism that developed at the University of Chicago. Its adherents are mostly focused on efficiency and consumer welfare, not morality; and monopolies are never assumed to be evil just because they are monopolies, some might be socially beneficial.

The essay covers some of the same ground that Mike Bulajewski covered in The Cult of Sharing about how the gift economy rhetoric is being hijacked by monetization interests.

Since established taxi and hotel industries are detested, the public
debate has been framed as a brave innovator taking on sluggish,
monopolistic incumbents. Such skewed presentation, while not inaccurate
in all cases, glosses over the fact that the start-ups of the “sharing
economy” operate on the pre-welfare model: social protections for
workers are minimal, they have to take on risks previously assumed by
their employers, and there are almost no possibilities for collective

NovelTM: Text Mining the Novel

This week SSHRC announced the new partnership grants awarded including one I am a co-investigator on, NovelTM: Text Mining the Novel.

This project brings together researchers and partners from 21 different academic and non-academic institutions to produce the first large-scale quantitative history of the novel. Our aim is to bring new computational approaches in the field of text mining to the study of literature as well as bring the unique knowledge of literary studies to bear on larger debates about data mining and the place of information technology within society.

NovelTM is led by Andrew Piper at McGill University. At the University of Alberta I will be gathering a team that will share the resulting computing methods through TAPoR and developing recipes or tutorials so that others can try them.

NYTimes: Inequality and Web Search Trends

The Upshot in the New York Times has a nice article titled In One America, Guns and Diet. In the Otehr, Cameras and ‘Zoolander.’: Inequality and Web Search Trends by David Leonhardt (August 18, 2014). They combined data from Google on favorite searches by county with socio economic data to show what searches correlate with the richer and poorer areas. While few of the correlations are surprising they provide details that one wouldn’t think of. Not only are religious searches more common in poorer areas, but so are searches for “about hell” and “antichrist.” In wealthy areas by contrast they search for “holiday greetings” presumably because they are more likely to live far from family.

Ayway, a neat study that illustrates who the aggregation of different datasets can work.

DH 2014, Dagstuhl, and Exploiting Text

Over the last month I’ve been to a number of conferences that I have been writing conference notes on.

  • At the beginning of July I was at DH 2014 in Lausanne Switzerland where I gave a workshop with Stéfan Sinclair on Your Very Own Voyant, participated in some panels and gave a paper (also with Stéfan).
  • I was at a Dagstuhl around data science and digital humanities at the end of July. We had a fascinating conversation. I ended up in a workshop on the ethics of big data which is going to become yet something else I wish I had the time to study properly.
  • At the beginning of August I went to a workshop at Waterloo that was in honour of Frank Wm. Tompa, Exploiting Text. This workshop had speakers, including myself, who spoke to issues that Tompa was interested in from dictionaries to algorithms for text retrieval. I was often lost in the algorithm talks but it was fascinating to listen to a different view of text.

Text Analysis with Topic Models


Fotis pointed me to this set of tutorials on Text Analysis with Topic Models for the Humanities and Social Sciences. The tutorials are built around Python, but most of it could be done with other tools. While I haven’t followed through the set of tutorials, they look like a great primer on text mining, visualization and interpretation. I particularly like how they include different datasets (British Novels, French plays …) to play with.

Topic Modeling and Gephi

Veronica Poplawski has posted a nice blog essay on Topic Modeling and Gephi: A Work in Progress : Digital Environmental Humanities. She walks through a project she did on 358 Environmental Humanities documents related to a workshop I was part of in the Fall (see my conference report here.) First she used Mallet to generate topics and then she created an XML file to bring the topics and associated words into Gephi for visualization. Nice work!

Scopeware Vision Professional

I was reading about the Yale Lifestreams project which may have been one of the first life-tracking projects. Lifestreams was developed by Eric Freeman (it was his 1997 PhD project) and David Gelernter. They had some interesting ideas about how the computer should organize your data into streams rather than you having to file stuff. The streams could take advantage of the flow of your life. Here is how lifestream is defined:

A lifestream is a time-ordered stream of documents that functions as a diary of your electronic life; every document you create and every document other people send you is stored in your lifestream.

Freeman and Gelernter tried to commercialize the ideas through Scopeware released by Mirror Worlds. If you search Google Images for Scopeware you can see a number of screenshots that give an idea of how the interface organized files into streams.

Many of their interface ideas seem to have reappeared in things like Apple’s Cover Flow and Time Machine which explains why Mirror Worlds sued Apple (unseccessfully).

The idea is supposed to have come from Gelernter’s semi-philosophical book Mirror Worlds: Or the Day Software Puts the Universe in a Shoebox…How It Will Happen and What It Will Mean (1991) in which he reflects on the change from small personal software to large networked software that “mirrors” the world. Google Street View and all the virtual surrogates available on the web would seem to prove him right, though he may have been imagining more of a VR type implementation. (Admission: I haven’t read the book, just reviews.)

What intrigues me is the focus on time and the move away from representations of time as a line that traverses from left to right. In streams you are in time and can swim back like driving down a road to the past.

Around the World Conference


Today we are running the Around the World Conference from the University of Alberta. This year’s topic is privacy and surveillance in the digital age. The Kule Institute for Advanced Study is hosting this online conference. Here are some of my opening comments,

I would like to welcome you to our second Around the World Conference. This year’s conference is on Privacy and Surveillance in the Digital Age.

The ATW conference was the idea of the Founding Director of KIAS, Jerry Varsava. The idea is to support a truly international discussion around a topic that concerns us all around the world.

This year we have speakers from 11 countries including Nigeria, Netherlands, Japan, Australia, Italy, Israel, Ireland, Germany, Brazil, the US, and of course Canada.

This ATW conference is an experiment. It is an experiment because it is difficult to coordinate the technology across so many countries and institutions. It is an experiment in finding ways to move ideas without moving bodies. It is an experiment in global discussion.

International Ethics Roundtable 2014

Last week I was at a great little conference, the International Ethics Roundtable 2014. My conference notes are at Information Ethics And Global Citizenship. I gave a paper titled, “Watching Olympia”, about the CSEC slides that showed the Olympia system developed by the Communications Security Establishment Canada. You can see the blog entry that my paper came from here.