U of A text mining project could help businesses

Well, I made it into the computer press in Canada. An article on the Digging Into Data project I am working on has been published, see U of A text mining project could help businesses (Rafael Ruffolo, March 25, 2010 for ComputerWorld Canada.)

It is always interesting to see what the media find interesting in a story. They usually have a better idea of what their audience wants to read about so they adapt for that audience.

Oxford English Dictionary: The first crowdsourced humanities project?

As we think about how to use crowdsourcing in humanities research it is useful to look back at the pre-digital projects that used networks of volunteers to assist in research tasks. The development of the Oxford English Dictionary is an early example that comes to mind as it benefited from volunteer support in the time-consuming work of reading works to find early uses of words.

The OED makes a useful example to think about for a number of reasons:

  • First of all, looking at pre-digital projects lets us see the importance of how people are managed, motivated, and trained. According to the Wikipedia article, for example, “Furnivall then became editor; he was enthusiastic and knowledgeable, yet temperamentally ill-suited for the work. Many volunteer readers eventually lost interest in the project as Furnivall failed to keep them motivated. Furthermore, many of the slips had been misplaced.” It is easy to think that the technology is what makes crowdsourcing, but I suspect that often it distracts us from the ways we chunk the problem (for volunteers), bring them in, motivate them, manage them and recognize them.
  • It is an example in the humanities with an outcome that we recognize still as useful and relevant. It was initiated by a scholarly society, the Philological Society, and was actually an important project to switch to digital methods when they worked with the University of Waterloo to develop the SGML-based New OED.
  • There is a literature about the human dimensions of the project including The Professor and the Madman: A Tale of Murder, Insanity, and the Making of the Oxford English Dictionary which tells the story of a prolific and mad contributor, W. C. Minor. Thus we can learn from the stories told about the human aspects of the project.

Of course, it probably isn’t the “first” such project. What are some other examples? Can we recover a history of the human in the development of humanities resources.

Online Humanities Scholarship: The Shape of Things to Come

I’m at a conference organized by Jerome McGann, Online Humanities Scholarship: The Shape of Things to Come: Schedule at the University of Virginia. The focus is on sustainability and Mellon is supporting the conference. My conference report is at http://www.philosophi.ca/pmwiki.php/Main/ShapeOfThings.

Who’s your DH Blog Mate: Match-Making the Day of DH Bloggers with Topic Modeling

On the 18th of March we ran the second Day of Digital Humanities, which seems to have been a success. We had more participants and some interesting analysis. Matt Jockers, for example, tried Latent Dirichlet Allocation on the blogs and wrote up the results on his blog in a post,  Who’s your DH Blog Mate: Match-Making the Day of DH Bloggers with Topic Modeling. Neat!

Algorithms are Thoughts, Chainsaws are Tools on Vimeo

Steve Ramsay has put an interesting video essay about live coding up at Algorithms are Thoughts, Chainsaws are Tools on Vimeo. He provides commentary to the live coding of Andrew Sorensen which in turn is controlling electronic music. Very neat!

Note (April 2020): The video is no longer available. There is an Electronic Book Review essay Critical Code Studies Week Five Opener – Algorithms are thoughts, Chainsaws are tools that talks about the original video essay, but it too links to the missing Vimeo video.

Publishing scholarly projects using Google Sites

Thomas Crombez on his Doing Digital History site has a post on Publishing scholarly projects using Google Sites « Doing Digital History. His argument and instructions make a lot of sense. The idea is that you use something like TEI to encode your scholarly data and then you publish it on Google Sites instead of setting up something fancy at your university or lobbying for research infrastructure that doesn’t exist. Google provides stable infrastructure that you don’t have to maintain at an unbeatable price that is “off-campus” (which can have advantages) and which is as likely to survive as a university service.

Either way — running your website on a university server, a private hosting solution, or your own server — you are basically into self-publishing. Will you use an established platform aka CMS (Content Management System, e.g., WordPress or Drupal) or do you prefer to grow your own HTML/CSS? What is the most advantageous and flexible place to host it? If you run your own server, when does it need to be updated? Do you really need that latest Apache update? If you are doing a dynamic website, will the database continue to behave as it does today? When to update your database software? Is it possible that your website will one day attract a lot of traffic, necessitating more than one server? What search engine do you use for your collection of texts? Do you simply plug in a Google search box, or do you want some more searching power for your users? If so, what software do you choose?

I see more and more people moving to Google (and other commercial solutions) as a way of doing projects quickly and with modest resources. I call it Computing With The Infrastructure At Hand.