Prism: Collaborative Interpretation

Prism is the coolest idea I have come across in a long time. Coming from the University of Virginia Scholar’s Lab, Prism is a collaborative interpretation environment. Someone comes up with categories like “Rhetoric”, “Orientalism” and “Social Darwinism” for a text like Notes on the State of Virginia. Then people (with accounts, which you can get freely) go through and mark passages. This creates overlapping interpretative markup of the sort you used to get with COCOA in TACT, but unlike TACT, many people can do the interpretation – it can be crowdsourced.

They are planning some visualizations of the results including what look like the types of visualizations that TACT gave where you can see words distributed over tagged areas.

Bethany Nowviskie explains the background to the project in this Scholar’s Lab post.

Whistleblower: The NSA is Lying–U.S. Government Has Copies of Most of Your Emails

According to National Security Agency (of the USA) whistleblower William Binney, the NSA probably has most of our email. See the video Whistleblower: The NSA is Lying–U.S. Government Has Copies of Most of Your Emails. The question then is what they are doing with it? He mentions that the email can be “put it into forms of graphing, which is building relationships or social networks for everybody, and then you watch it over time, you can build up knowledge about everyone in the country.” (see transcript on page). In other words they could (are) building a large social graph that they can use in various ways.

In the transcript of the longer video Binney talks about various programs developed to filter out all the information:

Well, it was called Thin Thread. I mean, Thin Thread was our—a test program that we set up to do that. By the way, I viewed it as we never had enough data, OK? We never got enough. It was never enough for us to work at, because I looked at velocity, variety and volume as all positive things. Volume meant you got more about your target. Velocity meant you got it faster. Variety meant you got more aspects. These were all positive things. All we had to do was to devise a way to use and utilize all of those inputs and be able to make sense of them, which is what we did.

Binney goes on to talk about the code named Stellar Wind program that Bush authorized and then was forced to change after a revolt of some sort in the Justice Department in 2004. Stories tell of senior Bush advisors trying to get Ashcroft to sign authorization papers for the program while he was in the hospital.  As for Stellar Wind, it seems to be mostly about metadata – the date, to, and from of emails that you could use to build a diachronic social graph which is what Binney was talking about. Strictly speaking this would be social network analysis rather than text analysis, but they might have supplemented the system with some keyword capabilities. Another story from Time points out the problem with such analysis – that it generates too many vague false positives. “Leads from the Stellar Wind program were so vague and voluminous that field agents called them “Pizza Hut cases” — ostensibly suspicious calls that turned out to be takeout food orders.”

Either way, these hints give us a tantalizing view into how text and network analysis is being experimented with. Are there any useful research applications?

Collaborative Research in the Digital Humanities by Marilyn Deegan and Willard McCarty

A new digital humanities collection focusing on collaboration, Collaborative Research in the Digital Humanities, has been published by Ashgate. The collection is edited by Marilyn Deegan and Willard McCarty and was developed in honour of Harold Short who retired a few years ago from King’s College London where he set up the Humanities Computing Centre (now called the Department of Digital Humanities).

I contributed a chapter on crowdsourcing entitled, “Crowdsourcing the humanities: social research and collaboration”.

Luis von Ahn on reCaptcha and Duolingo

Patrizia pointed me to a TEDxCMU talk by Luis von Ahn on The Next Chapter in Human Computation. von Ahn is known for Captcha and reCaptcha (which he talks about in the first 8 minutes of the talk.) In this talk he introduces his team’s new crowdsourcing project duolingo which aims to translate the web while teaching people a second language. Instead of paying $500 for RosettaStone software you can learn a language by translating progressively more complex sentences from the web.

von Ahn also calls this a “Fair Business Model for Education”. (There is actually a slide with this phrase.) His argument is that since most of the world doesn’t have the money for software, duolingo presents a fair way for them to contribute labour in return for learning a language. I note that the fair business model could apply not just to language education, but other types of education. How could you monetize the teaching of philosophy (or ethics)? What would people do to learn that could also benefit someone else?

@MentionMachine: Who’s up, who’s down on Twitter?

Reading the Washington Post I was annoyed by a panel at the bottom of my screen with their @MentionMachine tracks the presidential candidates: Who’s up, who’s down on Twitter?. The @MentionMachine tracks Twitter mentions using the Twitter API and also media mentions using Trove. This is real-time social media text analysis. The Washington Post blog page on @MentionMachine argues that “Twitter was the real-time warning system” that could tell us which candidates were trending up or down. I wonder if that is reliably true or only true in selective cases.

Happy Words Trump Negativity in the English Language

Happy Words Trump Negativity in the English Language is an interesting story about a study by Kloumann and colleagues on Positivity of the English Language. They used Mechanical Turk to get people to assess whether the high frequency words used in Twitter, Books, the New York Times and Music Lyrics were positive. Their study showed that overwhelmingly English is a positive language. Thanks to Stan for this.

Gamers’ discovery could generate anti-HIV drugs – Health – CBC News

CBC has a story about how Gamers’ discovery could generate anti-HIV drugs (Sept. 19, 2011). The story is about how players of Fold.It have solved a protein folding problem related to AIDS which has been recently published. (The paper is here.)

What is neat about this project is that it is an example of “citizen science” or crowdsourcing for research. Rather than use the computer to analyze the data, the computer/network was used to make it easier for humans to solve the problems. They turned protein folding into a game that enticed volunteers to play for science.

Day of Archaeology

Megan pointed me to the ADay of Archaeology project. This project was conceived of during one of our Day of Digital Humanities projects and builds on the idea. It serves partly as community outreach for archaeology:

The Day of Archaeology 2011 aims to give a window into the daily lives of archaeologists. Written by over 400 contributors, it chronicles what they did on one day, July 29th 2011, from those in the field through to specialists working in laboratories and behind computers. This date coincides with the Festival of British Archaeology, which runs from 16th – 31st July 2011.

I also note that they had far more participants in their first year than we had even in our third! We need to learn from them.