Every story has a beginning

Every story has a beginning is the text of a keynote by Tim Sheratt that nicely weaves individual stories together as an example of what we can do with information technology. I highly recommend it; he quotes Steve Ramsay and Tim Hitchcock to the effect that what is important are the stories of individuals like those he paints through the digital archives he has access to. He sets this humanistic view of how we can use the technology against the Culturomics approach which is trying to turn history and its archives into grist for cultural science. Sheratt calls the culturomic vision “barren” and I tend to agree. He ends by asking,

But who defines the problems?

His answer is Linked Data which “gives us a way to present an alternative to Google’s version of the world. We can argue back against the search engines, defining our own criteria for relevance, and building our own discovery networks.” (And his talk has a link for those who want to view the triples…) I would say that we can also build tools like Voyant (formerly Voyeur, which he uses) to help us begin to tell the stories.

Canadian Writing Research Collaboratory Launch

 

I am at the Canadian Writing Research Collaboratory (CWRC) launch. CWRC is building a collaborative editing environment that will allow editorial projects to manage the editing of electronic scholarly editions. Among other things CWRC is developing an online XML editor, a editorial workflow management tools, and integrated repository.

The keynote speakers for the event include Shawna Lemay and Aritha Van Herk.

Happy Words Trump Negativity in the English Language

Happy Words Trump Negativity in the English Language is an interesting story about a study by Kloumann and colleagues on Positivity of the English Language. They used Mechanical Turk to get people to assess whether the high frequency words used in Twitter, Books, the New York Times and Music Lyrics were positive. Their study showed that overwhelmingly English is a positive language. Thanks to Stan for this.

Old Bailey Trials Are Tabulated for Scholars Online

The New York Times now has an article on the Criminal Intent project I was part of. See, Old Bailey Trials Are Tabulated for Scholars Online. They quote a historian who is sceptical of the results of mining, though he appreciates the resource.

“The Old Bailey Online project has done a great service in making those sources widely (and costlessly) available,” Mr. Langbein wrote in an e-mail. But he complained that the claims about data mining have “a breathless quality: ‘you can expect big things from us,’ but as yet it’s all method and no results.” He said that the new findings belittle the work of a generation of scholars who focused on the 18th century as the turning point in the evolution of the criminal justice system.

Alas, he seems didn’t read our report, but the summary in the Chronicle. It is easy to use cute phrases like “breathless quality”, but is he right? Time will tell, but I think the historians on our team have backed up the results found with mining and they never belittled the work of previous scholars – we saw ourselves building on it.

What can mining do? I think mining can give you a big picture so that you see the forest rather than trees in a way that no one could before. Conclusions about the shape of the forest have to be checked against other evidence, but the results of mining is evidence that is not breathless even if it takes your breath away. As Bill Turkel put it,

Mr. Turkel, who developed some of the digital tools, said that data mining reveals unexpected trends and connections that no one would have thought to look for before. Previous scholars “tended to cherry-pick anecdotes without having a sense that it was possible to measure all of that text and treat the whole archive as a single unit,” he said.

Of course, if you then leverage traditional evidence to buttress your argument then the mining is forgotten or trivialized.

The Garden of Error and Decay

The Garden of Error and Decay is a real-time visualization of disasters mentioned in Twitter and other feeds. The text about the interactive says “this innovative moving image format is something like a real-time data driven narrative. This project is not a film, not a game, and not a nonlinear interactive story.” The visualization uses pictograms that represent the type of disaster. You can see the original twitter text.

Thanks to Scott for this.

Father Busa is dead

From Humanist I just found out that Father Roberto Busa has died. See Stop the reader, Fr. Busa has died in L’Osservatore Romano (English) or Morto padre Busa, è stato il pioniere dell’informatica linguistica from the Corriere del Veneto (Italian). Father Busa was a pioneer in humanities computing who started a project in the 1940s with help from IBM to create a complete concordance of Acquinas. The Index Thomisticus was arguably the first (big) humanities project to benefit from computing methods. For that reason the author of Stop the reader argues that,

If you surf the Internet, you owe it to him and if you use a PC to write emails and documents, you owe it to him. And if you can read this article, you owe it to him, we owe it to him

While it may be an exaggeration to say that we owe hypertext and the web to Father Busa, he was certainly one of the first to use computers to manipulate texts on a large scale. He saw the

Father Busa was also involved in developing the humanities computing field which is why we have named a prize after him. (See ADHO Roberto Busa Award). He wrote articles for journals like CHUM and Literary and Linguistic Computing. He was generous with his time and ideas. He was influential in Italy; others will know more about this. I met him in 1998 at the ACH/ALLC conference in Debrecen, Hungary where he was awarded the first Busa Award. As I speak Italian I was asked to join an executive dinner and had a pleasant evening talking about his ideas about hermeneutical text analysis which he delivered in his Award talk and which were later published in “Picture a Man …” in Literary and Linguistic Computing (14:1, 1999). At the end of his talk he played with the Cinderella metaphor for interpretative text analysis,

Metaphor is a linguistic phenomenon: when the name of one reality is chosen to signify another and different reality, because of some similarity between the two. I in fact applied the name of Cinderella to hermeneutical informatics, the two having in common youth, health, beauty, and poverty. Cinderella eventually got married to a prince. (p. 8)

Busa was a prince or perhaps a Cinderella who has now left the party.

Digging Into Data, Day 2: Making Tools and Using Them

I just discovered (thanks to the Digging Into Data site) that the Chronicle of Higher Education Wired Campus Blog has a nice story on the Digging Into Data Challenge Conference (2011) that talks about the Criminal Intent project I am on. See Digging Into Data, Day 2: Making Tools and Using Them. The article nicely summarizes Steve Ramsay who was our respondent to the effect that,

Mr. Ramsay’s talk celebrated how this kind of Big Data work can enhance rather than diminish the humanities’ traditional engagement with human experience. “The Old Bailey, like the Naked City, has eight million stories. Accessing those stories involves understanding trial length, numbers of instances of poisoning, and rates of bigamy,” he said in his response. “But being stories, they find their more salient expression in the weightier motifs of the human condition: justice, revenge, dishonor, loss, trial. This is what the humanities are about. This is the only reason for an historian to fire up Mathematica or for a student trained in French literature to get into Java.”

The article is by Jennifer Howard and was published June 12, 2011. This nicely contrasts with the Nature article on the event that focused on the culturnomics keynote by Erez Lieberman-Aiden & JB Michel from Harvard rather than the serious work of digging into data. You can see my earlier post on this conference (with a link to my conference report) here.

From Metadata to Linked Data Summer School | Digital Humanities Observatory

 

This week (July 4th, 2011) I’m instructing at the From Metadata to Linked Data Summer School at Trinity College, Dublin. I’m teaching a half-day hands-on workshop on Voyeur. You can see my workshop script here. I am trying a new version of our workshop script which will include worksheets.

I’m writing my notes at http://www.philosophi.ca/pmwiki.php/Main/FromMetadataToLinkedData – these are not a conference report so much as reflections on stuff I’m learning.