Well my vacation is over and I’m facilitating a retreat on text methods across disciplines. (See Towards a Methods Commons.) With support from the ITST program at SSHRC we brought together 15 linguists, philosophers, historians, and literary scholars to discuss methods in a structured way. The goal is to sketch a commons that gathers “recipes” that show people how to do research things with electronic texts. Stay tuned for a draft web site in about 6 months.
Category: Text Analysis
Google: Our commitment to the digital humanities
Google has announced the first projects they are funding to use Google Books and have announced a commitment to the digital humanities of nearly a million dollars. See Official Google Blog: Our commitment to the digital humanities.
we’d like to see the field blossom and take advantage of resources such as Google Books that are becoming increasingly available. We’re pleased to announce that Google has committed nearly a million dollars to support digital humanities research over the next two years.
Society for Digital Humanities Papers
With my graduate students and colleagues I was involved in a number of papers at the SDH-SEMI The Society for Digital Humanities / La Société pour l’Étude des Médias Interactifs conference at Congress 2010 in Montreal. They included:
- “Exclusionary Practices: A Historical Look at Public Representations of Computers in the 1950s and Early 1960s” presented by Sophia Hoosien
- “Before the Moments of Beginning” presented by Victoria Smith
- I presented on “Cyberinfrastructure for Research in the Humanities: Expectations and Capacity”
- Text Analysis for me Too: An embeddable text analysis widget” presented by Peter Organisciak
- Daniel Sondheim talked about the interface of the citation from print to the web as part of a panel on INKE Interface Design.
- “Theorizing Analytics” was presented by Stéfan Sinclair
- “Academic Capacity in Canada’s Digital Humanities Community: Opportunities and Challenges” was presented by Lynne Siemens
- “What do we say about ourselves? An analysis of the Day
of DH 2009 data” was presented by Peter Organisciak - and I presented on “The Unreality of the Timeline” as part of a panel on temporal modeling at the CHA
As the papers get posted, I’ll blog them.
U of A text mining project could help businesses
Well, I made it into the computer press in Canada. An article on the Digging Into Data project I am working on has been published, see U of A text mining project could help businesses (Rafael Ruffolo, March 25, 2010 for ComputerWorld Canada.)
It is always interesting to see what the media find interesting in a story. They usually have a better idea of what their audience wants to read about so they adapt for that audience.
Who’s your DH Blog Mate: Match-Making the Day of DH Bloggers with Topic Modeling
On the 18th of March we ran the second Day of Digital Humanities, which seems to have been a success. We had more participants and some interesting analysis. Matt Jockers, for example, tried Latent Dirichlet Allocation on the blogs and wrote up the results on his blog in a post, Who’s your DH Blog Mate: Match-Making the Day of DH Bloggers with Topic Modeling. Neat!
The General Inquirer
Reading John B. Smith’s “Computer Criticism”, (Style: Vol. XII, No. 4) I came a reference to a content analysis program called the The General Inquirer from the 1960s. This program still has a following and has been rewritten in Java. See the Inquirer Home Page. There is a web version where you can try it here (DO NOT USE A LARGE TEXT).
The General Inquirer “maps” a text to a thesaurus of categories, disambiguating on the way. The web page about How the General Inquirer is used describes what it does thus:
The General Inquirer is basically a mapping tool. It maps each text file with counts on dictionary-supplied categories. The currently distributed version combines the “Harvard IV-4” dictionary content-analysis categories, the “Lasswell” dictionary content-analysis categories, and five categories based on the social cognition work of Semin and Fiedler, making for 182 categories in all. Each category is a list of words and word senses. A category such as “self references” may contain only a dozen entries, mostly pronouns. Currently, the category “negative” is our largest with 2291 entries. Users can also add additional categories of any size.
As they say later on, their categories were developed for “social-science content-analysis research applications” and not for other uses like literary study. The original developer published a book on the tool in 1966:
Philip J. Stone, The General Inquirer: A Computer Approach to Content Analysis. (Cambridge: M. I. T. Press, 1966).
Ritsumeikan: Possibilities in Digital Humanities
The last week and a bit I have been in Kyoto to give a talk at a conference on the “Possibilities in Digital Humanities” which was organized by Professor Kozaburo Hachimura and sponsored by the Information Processing Society of Japan and by the Ritsumeikan University Digital Humanities Center for Japanese Arts and Culture.
While the talks were in Japanese I was able to follow most of the sessions with the help of Mistuyuki Inaba and Keiko Susuki. I was impressed by the quality of the research and the involvement of new scholars. There seemed to be a much higher participation of postdoctoral fellows and graduate students than at similar conferences in Canada which bodes well for digital humanities in Japan.
Continue reading Ritsumeikan: Possibilities in Digital Humanities
Teaching Literature and Language Online
A paper that Stéfan Sinclair and I wrote on “Between Language and Literature: Digital Text Exploration” has just be published by the MLA in a volume edited by Ian Lancashire, Teaching Literature and Language Online.
Information Visualization for Text Analysis
Googling around I came across a nice succinct chapter on Information Visualization for Text Analysis from a book called Search User Interfaces by Marti Hearst (Cambridge University Press, 2009).
The chapter goes from visualizations for text mining to concordances and then to citation relationships. It shows some of the usual suspects like TextArc and Wordle.
Text Analysis in the Wild
The Globe and Mail on November 13th had an interesting example of text analysis in the wild. Crossing pages A10 and A11 they had a box with the high frequency words in the old citizenship guide and the new one with a word cloud in the middle. Here is what the description says:
Discover Canada, a different look at the country
The new citizenship guide, Discover Canada, is much more comprehensive look at Canada’s history and system of government than its predecessor, A Look at Canada, which was produced under the Liberals in 1995. It’s longer (17,536 words to 10,433), with 10 pages devoted to Canadian history, compared to two in the previous version. Its emphasis also differs, with more attention paid to the military, the Crown and Quebec, and less to the environment.
>> Below is a graphi representation of the most frequently used words in the new citizendship guide. The bigger the word the more often it appears.
I had to fold the page to scan it as it is longer than my scanner, but you get the idea. The PDF is here. I would have preferred the two lists at either edge of the box to be closer to let us compare. Note the small print – they used May Eyes and WriteWords which has a word frequency counting tool.