Using Zotero and TAPOR on the Old Bailey Proceedings

The Digging Into Data program commissioned CLIR (Council on Library and Information Resources) to study and report on the first round of the programme. The report includes case studies on the 8 initial projects including one on our Criminal Intent project that is titled  Using Zotero and TAPOR on the Old Bailey Proceedings: Data Mining with Criminal Intent (DMCI). More interesting are some of the reflections on big data and research in the humanities that the authors make:

1. One Culture. As the title hints, one of the conclusions is that in digital research the lines between disciplines and sectors have been blurred to the point where it is more accurate to say there is one culture of e-research. This is obviously a play on C. P. Snow’s Two Cultures. In big data that two cultures of the science and humanities, which have been alienated from each other for a century or two, are now coming back together around big data.

Rather than working in silos bounded by disciplinary methods, participants in this project have created a single culture of e-research that encompasses what have been called the e-sciences as well as the digital humanities: not a choice between the scientific and humanistic visions of the world, but a coherent amalgam of people and organizations embracing both. (p. 1)

2. Collaborate. A clear message of the report is that to do this sort of e-research people need to learn to collaborate and by that they don’t just mean learning to get along. They mean deliberate collaboration that is managed. I know our team had to consciously develop patterns of collaboration to get things done across 3 countries and many more universities. It also means collaborating across disciplines and this is where the “one culture” of the report is aspirational – something the report both announces and encourages. Without saying so, the report also serves as a warning that we could end up with a different polarization just as the separation of scientific and humanistic culture is healed. We could end up with polarization between those who work on big data (of any sort) using computational techniques and those who work with theory and criticism in the small. We could find humanists and scientists who use statistical and empirical methods in one culture while humanists and scientists who use theory and modelling gather as a different culture. One culture always spawns two and so on.

3. Expand Concepts. The recommendations push the idea that all sorts of people/stakeholders need to expand their ideas about research. We need to expand our ideas about what constitutes research evidence, what constitutes research activity, what constitutes research deliverables and who should be doing research in what configurations. The humanities and other interpretative fields should stop thinking of research as a process that turns the reading of books and articles into the writing of more books and articles. The new scale of data calls for a new scale of concepts and a new scale of organization.

It is interesting how this report follows the creation of the Digging Into Data program. It is a validation of the act of creating the programme and creating it as it was. The funding agencies, led by Brett Bobley, ran a consultation and then gambled on a programme designed to encourage and foreground certain types of research. By and large their design had the effect they wanted. To some extent CLIR reports that research is becoming what Digging encouraged us to think it should be. Digging took seriously Greg Crane’s question, “what can you do with a million books”, but they abstracted it to “what can you do with gigabytes of data?” and created incentives (funding) to get us to come up with compelling examples, which in turn legitimize the program’s hypothesis that this is important.

In other words we should acknowledge and respect the politics of granting. Digging set out to create the conditions where a certain type of research thrived and got attention. The first round of the programme was, for this reason, widely advertised, heavily promoted, and now carefully studied and reported on. All the teams had to participate in a small conference in Washington that got significant press coverage. Digging is an example of how granting councils can be creative and change the research culture.

The Digging into Data Challenge presents us with a new paradigm: a digital ecology of data, algorithms, metadata, analytical and visualization tools, and new forms of scholarly expression that result from this research. The implications of these projects and their digital milieu for the economics and management of higher education, as well as for the practices of research, teaching, and learning, are profound, not only for researchers engaged in computationally intensive work but also for college and university administrations, scholarly societies, funding agencies, research libraries, academic publishers, and students. (p. 2)

The word “presents” can mean many things here. The new paradigm is both a creation of the programme and a result of changes in the research environment. The very presentation of research is changed by the scale of data. Visualizations replace quotations as the favored way into the data. And, of course, granting councils commission reports that re-present a heady mix of new paradigms and case studies.

 

 

Digital Infrastructure Summit 2012

A couple of weeks ago I gave a talk at Digital Infrastructure Summit 2012 which was hosted by the Canadian University Council of Chief Information Officers (CUCCIO). This short conference was very different from any other I’ve been at. CUCCIO, by its nature, is a group of people (university CIOs) who are used to doing things. They seemed committed to defining a common research infrastructure for Canadian universities and trying to prototype it. It seemed all the right people were there to start moving in the same direction.

For this talk I prepared a set of questions for auditing whether a university has good support for digital research in the humanities. See Check IT Out!. The idea is that anyone from a researcher to an administrator can use these questions to check out the IT support for humanists.

My conference notes are here.

Check IT Out!

I posted on 4Humanities a questionnaire that I call Check IT Out!. The idea is to give administrators and researchers a tool for checking out the research information technology (IT) that they have at their university. I developed it for a talk I give tomorrow at the Digital Infrastructure Summit 2012 in Saskatoon. I’m on the “Reality Check Panel” that presents realities faced by researchers. Check IT Out! is meant to address the issue of getting basic computing support and infrastructure for research. It is often sexier to build something new than to make sure that researchers have the basics. That raises the question of what are the basics, which is why I thought I would frame Check IT Out! as a series of questions, not assertions. Often people in computing services know the answers to these, but our colleagues don’t even know how to frame the question.

Save Library and Archives Canada

The Canadian Association of University Teachers has a campaign to Save Library and Archives Canada from the “Badly conceived restructuring, a redefinition of its mandate, and financial cutbacks (that) are undermining LAC’s ability to acquire, preserve and make publicly available Canada’s full documentary heritage.” The issue is not just cuts, but how LAC is dealing with the cuts.

Daniel Caron, Library and Archivist of Canada, has announced that “the new environment is totally decentralized and our monopoly as stewards of the national documentary heritage is over.”

LAC will be decentralizing a large portion of its collections to both public and private institutions. LAC documents refer to this voluntary group of “memory institutions” as a “coalition of the willing.”

Go to the site now, read up on the issues, and consider taking action!

.

On Graduate Education in the Humanities, by a Graduate Student in the Humanities

Lindsay Thomas, the hard working blogger for 4Humanities has written an excellent piece On Graduate Education in the Humanities, by a Graduate Student in the Humanities. She talks about how hard it is to complete quickly when you are making ends meet by TAing and teaching constantly. She talks about the “casualization” of academic labor.

I would add to her essay that we need to think about expanding outcomes for graduate students. We design graduate programs to produce junior faculty (or casual labor who hang on in hopes of getting full-time faculty jobs.) What we don’t do is to design programs so that they prepare people for knowledge work outside the academy. This is not rocket science, there are all sorts of ways to do it and digital humanities programs could take the lead as our student acquire skills of broader relevance. But, as Lindsay points out, if you start changing or adding to graduate programs you can just extend the time to completion and students might end up no better off.

Robo-Readers Used to Grade Test Essays

A nice story from the New York Times by Michael Winerip, Robo-Readers Used to Grade Test Essays (April 22, 2012) talks automated essay scoring software (AES). The story first reports a study from the University of Akron that showed that AES software is comparable to human graders (see A Win for the Robo-Readers by Steve Kolowich from Inside Higher Ed.) The NYT story goes then to report how Les Perelman, a director of writing at MIT, has shown how you can game AES tools. Among other things they don’t check facts or truth so you can write all sorts of outrageous things and still get a good score from AES. The story discusses some of the patterns that get good scores like lexical variety and long sentences. The story ends with the possibility that AES could be matched by essay writing software,

Two former students who are computer science majors told him (Perelman) that they could design an Android app to generate essays that would receive 6’s from e-Rater. He says the nice thing about that is that smartphones would be able to submit essays directly to computer graders, and humans wouldn’t have to get involved.

Particularly interesting is an essay Perelman wrote to show how poor essays can game the system. I wish I could say that I never saw writing like this and that therefore there was no danger of AES systems rewarding the poor writing found in real essays,

In today’s society, college is ambiguous. We need it to live, but we also need it to love. Moreover, without college most of the world’s learning would be egregious. College, however, has myriad costs. One of the most important issues facing the world is how to reduce college costs. Some have argued that college costs are due to the luxuries students now expect. Others have argued that the costs are a result of athletics. In reality, high college costs are the result of excessive pay for teaching assistants.

Faculty Advisory Council Memorandum on Journal Pricing § THE HARVARD LIBRARY TRANSITION

From Slashdot a story about how the Faculty Advisory Council to the Library (of Harvard) sent around a Memorandum on Journal Pricing arguing that periodical subscriptions are not sustainable and that faculty should therefore publishing in open-access journals.

The Faculty Advisory Council to the Library, representing university faculty in all schools and in consultation with the Harvard Library leadership, reached this conclusion: major periodical subscriptions, especially to electronic journals published by historically key providers, cannot be sustained: continuing these subscriptions on their current footing is financially untenable. Doing so would seriously erode collection efforts in many other areas, already compromised.

HUMlab Space – a set on Flickr

The HUMlab at Umeå University is one of the best designed computing labs I have seen. The director Patrik Svensson has created a multi-purpose space out of a library basement. When there in February I took some photos to document the space – see a Flickr set on the HUMlab Space.

Computer labs used to rows of desktop computers all facing a shared projection space. Now that most students have laptops we don’t need those sorts of labs. The HUMlab instead features all sorts of shared spaces with different screens and projectors. The idea is that a lab should support different arrangements of people around shared social screens. You have private spaces, couches, small seminar tables, exhibit screens, and larger presentation spaces.

Voyant at Georgia Tech

Today I Skyped into a class by Lauren Klein on Digital Humanities at Georgia Tech. The students all had to use Voyant for an assignment and they had a great set of questions to ask me. See Questions for Professor Rockwell.

Klein also had her students post short essays on using Voyant on Sherlock Holmes under the category Sherlock Holmes Text Analysis. You can see the range of reactions from frustration with the tool, to “so what”, to students who find the “surfing and stumbling” creative. I’m impressed at how Professor Klein has put together a reasonable exercise in text analysis for undergrads.

In the spirit of Voyant, here is a word cloud of the student assignments on the course blog: