My Very Own Voyant Workshop

Stéfan Sinclair and I just finished a workshop on My Very Own Voyant. The workshop focused on how to run VoyantServer on your local machine. This allows you to run Voyant locally. There are all sorts of reasons to run locally:

  • It runs faster
  • You can upload large texts faster
  • It can process larger text corpora
  • You can control the server
  • You can keep your corpora confidential

You can download VoyantServer and read instructions here.

The Isolator, A Bizarre Helmet For Encouraging Concentration (1925)

From Geoff I learned about The Isolator, A Bizarre Helmet For Encouraging Concentration (1925). The Isolator was developed in 1925 by Hugo Gernsback a science fiction pioneer (and editor of Science and Invention magazine.) The idea is to force you to focus on your writing (with lots of oxygen.)

One wonders if it works? Could it be even more useful now?

Front Row to Fashion Week –

The New York Times has an interesting way of visualizing fashion that you can see in their article Front Row to Fashion Week – Interactive Feature. They have abstracted the colour hues to create small swatches of different designers who showed at the New York Fashion Week. These “sparklines” or sparkboxes are an interesting way to compare the shows by designers.

Social Digital Scholarly Editing

On July 11th and 12th I was at a conference in Saskatoon on Social Digital Scholarly Editing. This conference was organized by Peter Robinson and colleagues at the University of Saskatchewan. I kept conference notes here.

I gave a paper on “Social Texts and Social Tools.” My paper argued for text analysis tools as a “reader” of editions. I took the extreme case of big data text mining and what scraping/mining tools want in a text and don’t want in a text. I took this extreme view to challenge the scholarly editing view that the more interpretation you put into an edition the better. Big data wants to automate the process of gathering and mining texts – big data wants “clean” texts that don’t have markup, annotations, metadata and other interventions that can’t be easily removed. The variety of markup in digital humanities projects makes it very hard to clean them.

The response was appreciative of the provocation, but (thankfully) not convinced that big data was the audience of scholarly editors.

Tool Discourse

Character Density by Year in Tool DiscourseWe are finally getting results in a long slow process of trying to study tool discourse in the digital humanities. Amy Dyrbe and Ryan Chartier are building a corpus of discourse around tools that includes tool reviews, articles about what people are doing with tools, web pages about tools and so on. We took the first coherent chunk and Ryan has been analyzing it with R. The graph above shows which years have the most characters. My hypothesis was that tool reviews and discourse dropped off in the 1990s as the web became more important. This seems to be wrong.

Here are the high-frequency words (with stop words removed). Note the modal verbs “can”, “will”, and “may.” They indicate the potentiality of tools.

“can” 2305
“one” 1996
“text” 1940
“word” 1931
“words” 1859
“program” 1606
“ii” 1514 (Not sure why)
“will” 1361
“language” 1307
“data” 1285
“two” 1188
“system” 1183
“computer” 1116
“used” 1115
“use” 942
“user” 939
“file” 890
“first” 870
“may” 853
“also” 837

Globalization Compendium Archive

I have been working for a while on archiving the Globalization Compendium which I worked on. Yesterday I got it archived in two Institutional Repositories:

In both cases there is a Zip of a BagIt bag with the XML files, code and other documentation from the site. My first major deposit.

Old Bailey Trials Are Tabulated for Scholars Online

The New York Times now has an article on the Criminal Intent project I was part of. See, Old Bailey Trials Are Tabulated for Scholars Online. They quote a historian who is sceptical of the results of mining, though he appreciates the resource.

“The Old Bailey Online project has done a great service in making those sources widely (and costlessly) available,” Mr. Langbein wrote in an e-mail. But he complained that the claims about data mining have “a breathless quality: ‘you can expect big things from us,’ but as yet it’s all method and no results.” He said that the new findings belittle the work of a generation of scholars who focused on the 18th century as the turning point in the evolution of the criminal justice system.

Alas, he seems didn’t read our report, but the summary in the Chronicle. It is easy to use cute phrases like “breathless quality”, but is he right? Time will tell, but I think the historians on our team have backed up the results found with mining and they never belittled the work of previous scholars – we saw ourselves building on it.

What can mining do? I think mining can give you a big picture so that you see the forest rather than trees in a way that no one could before. Conclusions about the shape of the forest have to be checked against other evidence, but the results of mining is evidence that is not breathless even if it takes your breath away. As Bill Turkel put it,

Mr. Turkel, who developed some of the digital tools, said that data mining reveals unexpected trends and connections that no one would have thought to look for before. Previous scholars “tended to cherry-pick anecdotes without having a sense that it was possible to measure all of that text and treat the whole archive as a single unit,” he said.

Of course, if you then leverage traditional evidence to buttress your argument then the mining is forgotten or trivialized.

Turkel: A research workflow with off-the-shelf tools

I had heard about Bill Turkel’s ‘super secret’ project and how he had decided to keep the idea of the project secret but share the method, which is the opposite of what we usually do. As I am not on research leave (sabbatical) and working on 5 books (ha!) I thought I should learn from Bill. Here is the link to his excellent research workflow, How To « William J Turkel. What I like is that it is all stuff you can do with off-the-shelf tools, though not necessarily free ones.

Digitization Day

The CIRCA Histories and Archives group I am part of is organizing the University of Alberta’s first Digitization Day.

This one-day event is a chance for research projects that are digitizing evidence to meet up with each other and with units on campus that provide relevant research services. Projects that are creating digital archives of different sorts will give short presentations as will units on campus that support research.

The idea is to bring a lot of digitization projects together to learn about each other and what is happening on campus. My sense is that we have hit a critical mass on campus and now that we have a trusted digital repository ERA (Education and Research Archive) it is time to start talking and sharing knowledge. Each project should not have to reinvent itself.

Australian R18+ games rating gets govt support « GamePron

From Slashdot I came across a story in GamePron about how Australian R18+ games rating gets govt support. In Australia any game that isn’t classified MA 15+ or below is refused classification and thus can’t be sold. (The Australian system is law unlike the voluntary industry ESRB system.) The Australian government is now considering adding a new R 18+ designation based on government supported studies and consultations.

Of particular interest is a literature review on Literature review on the impact of playing violent video games on aggression (PDF). This excellent review concludes that “research into the effects of VVGs (Violent Video Games) on aggression is contested and inconclusive.” (p. 5) This 50 page review by the Australian Government Attorney-General’s Department is a model of clarity and balance – it is worth quoting in greater detail,

There is some consensus in the research that some members of the community, such as people with psychotic personality traits, may be more affected by VVGs than others. However, there is mixed evidence as to whether VVGs have a greater impact on children.
A number of other findings of this review arguably reduce the policy relevance of VVG research.

  • There is stronger evidence of short-term VVG effects than of long-term effects.
  • The possibility that third variables (like aggressive personality, family and peer influence, socio-economic status) are behind the effect has not been well explored.
  • Researchers who argue that VVGs cause aggression have not engaged with or disproved alternative theories propagated by their critics.
  • There is little evidence that violent video games have a greater impact than other violent media. (p. 5)