Supporting Digital Scholarship

The Tri-Council Agencies (Research councils of Canada) and selected other institutions (going under the rubric TC3+) have released an important Consultation Document titled Capitalizing on Big Data: Toward a Policy Framework for Advancing Digital Scholarship in Canada. You can see a summary blog entry from the CommerceLab, How big data is reshaping the future of digital scholarship in Canada. The document suggest that we have many of the components of a “well-functioning digital infrastructure ecosystem for research and innovation”, but that these are not coordinated and Canada is not keeping up. They propose three initiatives:

  • Establishing a Culture of Stewardship
  • Coordination of Stakeholder Engagement
  • Developing Capacity and Future Funding Parameters

The first initiative is about research data management and something we have been working on the digital humanities for some time. It is great to see a call from our funding agencies.

The Wedding Data: What Marriage Notices Say About Social Change

Reading a collection of stories in the Atlantic about women and technology I came across a story about The Wedding Data: What Marriage Notices Say About Social Change. This article talks about Weding Crunchers – a database of New York Times wedding announcements since 1981 that you can search in an environment much like Google’s Ngram viewer. In the chart above you can see that I searched for different professions. Note how “teacher” takes off, probably because of the popularity of Teach for America.

I can’t help wondering if we are seeing the emergence of a genre of text visualization – the diachronic word viewer. This type of visualization depends on an associations between orthographic words (the actual words in texts) and concepts.

Multipoint Touch Variorum

MtV on Vimeo on Vimeo

Luciano Frizzera has put the video up that he showed at Digital Humanities 2013 in our INKE panel. His video shows his multi-point touch variorum edition prototypes. He has been prototyping how we could use gestures on large screens, especially tables. He has interesting ideas about how people can discuss something on different sides of a table.

Rowling and “Galbraith”: an authorial analysis

JK Rowling has been recently uncovered at the author of The Cuckoo’s Calling which was submitted under the name Robert Galbraith. The Sunday Times revealed this after a hint on Twitter and some forensic stylometry. Patrick Juola, one of the two people to do the analysis has a guest blog where he talks about what he did at: Rowling and “Galbraith”: an authorial analysis. Great short description of an authorship attribution project.

NSA slides explain the PRISM data-collection program

The Washington Post has been publishing  NSA slides that explain the PRISM data-collection program. These slides not only explain aspects of PRISM, but also allow us to see how the rhetoric of text analysis unfolds. How do people present PRISM to others? Note the “You Should Use Both” – the imperative in the voice.

Vicar – Access to Abbot TEI-A Conversion!

The brilliant folk at Nebraska and at Northwestern have teamed up to use Abbott and EEBO-MorphAdorner on a collection of TCP-ECCO texts. The Abbot tools is available here, Vicar – Access to Abbot TEI-A Conversion! Abbot tries to convert texts with different forms of markup into a common form. MorphAdorner does part of speech tagging. Together they have made available 2,000 ECCO texts that can be studied together.

I’m still not sure I understand the collaboration completely, but I know from experience that analyzing XML documents can be difficult if each document uses XML differently. Abbot tries to convert XML texts into a common form that preserves as much of the local tagging as possible.

Social Digital Scholarly Editing

On July 11th and 12th I was at a conference in Saskatoon on Social Digital Scholarly Editing. This conference was organized by Peter Robinson and colleagues at the University of Saskatchewan. I kept conference notes here.

I gave a paper on “Social Texts and Social Tools.” My paper argued for text analysis tools as a “reader” of editions. I took the extreme case of big data text mining and what scraping/mining tools want in a text and don’t want in a text. I took this extreme view to challenge the scholarly editing view that the more interpretation you put into an edition the better. Big data wants to automate the process of gathering and mining texts – big data wants “clean” texts that don’t have markup, annotations, metadata and other interventions that can’t be easily removed. The variety of markup in digital humanities projects makes it very hard to clean them.

The response was appreciative of the provocation, but (thankfully) not convinced that big data was the audience of scholarly editors.

Data Analytics’ Next Big Feat: Sarcasm Detection

Slashdot has a story about Data Analytics’ Next Big Feat: Sarcasm Detection. The BBC article that this draws from says the French company Spotter has algorithms for 29 different languages and that they can “identify sentiment up to an 80% accuracy rate.”

 

A screen shot from Spotter shows a tool running on an iPad with a word cloud for exploration and selection tools.

The same Slashdot story sent me also to a Wall Street Journal story about how the Obama 2012 campaign used Salesforce for sentiment analysis on email coming into the campaign.

World Development Indicators – Google Public Data Explorer

Ryan sent me a link to World Development Indicators – Google Public Data Explorer. This is a great visual data explorer with lots of data already available. It looks like the Gapminder Trendanalyzer, which Google bought in 2007. (Gapminder is now focused on keeping statistical data up-to-date and producing related media.) In Google Public Data you can search for datasets and then play with the type of visualization and so on. I’m struck by how this model of weaving datasets and tools together works so simply with the tools adapting to the datasets. I wonder if we could do something like this for texts?

Gapminder’s Hans Rosling has a TED talk on Stats that reshape your worldview that is worth watching where he talks about preconceptions we have about the world. He is really good at showing how much things have changed so that preconceptions true in the 1960s are not longer valid.

As Megan Garber explains in Dataviz, democratized: Google opens Public Data Explorer, one of the things Google has done is to now allow us to upload our data too, so this ceases to be such a passive interpretation tool. The trick is the Dataset Publishing Language that lets uploaders describe their data so the Public Data Explorer can present it properly.