Text Technology and TAPoR

Replication, Repetition, or Revivification

A short essay I wrote with Stéfan Sinclair on “Recapitulation, Replication, Reanalysis, Repetition, or Revivification” is now up in preprint form. The essay is part of a longer work on “Anatomy of tools: A closer look at ‘textual DH’ methodologies.” The longer work is a set of interventions looking at text tools. These came out of a ADHO SIG-DLS (Digital Literary Studies) workshop that took place in Utrecht in July 2019.

Our intervention at the workshop had the original title “Zombies as Tools: Revivification in Computer Assisted Interpretation” and concentrated on practices of exploring old tools – a sort of revivification or bringing back to life of zombie tools.

The full paper should be published soon by DHQ.

Ryan Cordell: Programmable Type: the Craft of Printing, the Craft of Code

A line of R code set in movable type

I want situate the kinds of programming typically practiced in digital humanities research and teaching in relation to practices more familiar to book historians and bibliographers, such as the work of compositors and printers working with moveable type.

Ryan Cordell sent me a link to a talk on Programmable Type: the Craft of Printing, the Craft of Code. The talk looks at the “modes of thought and labor” of composing movable type and programming. He is careful to warns us about the simplistic story that has movable type and the computer as two information technologies that caused revolutions in how we think about knowledge. What is particularly interesting is how he weaves hands-on work into his course Technologies of Text. He asks students to not just read about printing, but to try doing it. Likewise for programming in R. There is a knowing that comes from doing something and attending to the labor of that doing. Replicating the making of texts gives students (and researchers) a sense of the materiality and contexts of media. It is a way of doing media archaeology.

In the essay, Cordell writes about the example of the visual poem “A Dude” and its many iterations composed with different type. I had blogged about “A Dude”, but hadn’t thought about how the poem would have been a way for the compositor to show of their craft much like a twitterbot might be a way for a programmer to show off theirs.

Cordell frames this discussion by considering the controversy around whether digital humanists should need to be able to code. He raises an interesting challenge – whether learning the craft of programming (or letterpress printing) might make it harder to view the craft critically. In committing time and labour to learning a craft does one get implicated or corrupted by the craft? Doesn’t one want end up valuing the craft simply because it is something one can now do, and to critique it would be to critique oneself.

260,000 Words, Full of Self-Praise, From Trump on the Virus

The New York Times has a nice content analysis study of Trump’s Coronavirus briefings, 260,000 Words, Full of Self-Praise, From Trump on the Virus. They tagged the corpus for different types of utterances including:

Self-congratulations
Exaggerations and falsehoods
Displays of empathy or appeals to national unity
Blaming others
Credits others

Needless to say they found he spent a fair amount of time congratulating himself.

They then created a neat visualizations with colour coded sections showing where he shows empathy or congratulates himself.

According to the article they looked at 42 briefings or other remarks from March 9 to April 17, 2020 giving them a total of 260,000 words.

I decided to replicate their study with Voyant and I gathered 29 Coronavirus Task Force Briefings (and one Press Conference) from February 29 to April 17. These are all the Task Force Briefings I could find at the White House web site. The corpus has 418,775 words, but those include remarks by people other than Trump, questions, and metadata.

Some of the things that struck me are the absence of medical terminology in the high frequency words. I was also intrigued by the prominence of “going to”. Trump spends a fair amount of time talking about what he and others are going to be doing rather than what is done. Here you have a Contexts panel from Voyant.

Embedded Voyant panel

This post is a demonstration of how a Voyant panel or hermeneutica can be embedded in a WordPress post. See our Voyant tutorials at dialogi.ca.

To embed the panel I created a custom HTML block. In it I pasted the <iframe> element exported from the Voyant panel I wanted. While editing I see the HTML code, when I Preview (either the block or the whole post) or publish then I see the Voyant panel in place. Try playing with it!

Digitization in an Emergency: Fair Use/Fair Dealing and How Libraries Are Adapting to the Pandemic

In response to unprecedented exigencies, more systemic solutions may be necessary and fully justifiable under fair use and fair dealing. This includes variants of controlled digital lending (CDL), in which books are scanned and lent in digital form, preserving the same one-to-one scarcity and time limits that would apply to lending their physical copies. Even before the new coronavirus, a growing number of libraries have implemented CDL for select physical collections.

The Association of Research Libraries has a blog entry on Digitization in an Emergency: Fair Use/Fair Dealing and How Libraries Are Adapting to the Pandemic by Ryan Clough (April 1, 2020) with good links. The closing of the physical libraries has accelerated a process of moving from a hybrid of physical and digital resources to an entirely digital library. Controlled digital lending (where only a limited number of patrons can read an digital asset at a time) seems a sensible way to go.

To be honest, I am so tired of sitting on my butt that I plan to spend much more time walking to and browsing around the library at the University of Alberta. As much as digital access is a convenience, I’m missing the occasions for getting outside and walking that a library affords. Perhaps we should think of the library as a labyrinth – something deliberately difficult to navigate in order to give you an excuse to walk around.

Perhaps I need a book scanner on a standing desk at home to keep me on my feet.

Show and Tell at CHRIN

Stéphane Pouyllau’s photo of me presenting

Michael Sinatra invited me to a “show and tell” workshop at the new Université de Montréal campus where they have a long data wall. Sinatra is the Director of CRIHN (Centre de recherche interuniversitaire sur les humanitiés numériques) and kindly invited me to show what I am doing with Stéfan Sinclair and to see what others at CRIHN and in France are doing.

Continue reading Show and Tell at CHRIN

Conference notes for CSDH 2019

In early June I was at the Congress for the Humanities and Social Sciences. I took conference notes on the Canadian Society for Digital Humanities 2019 event and on the Canadian Game Studies Association conference, 2019. I was involved in a number of papers:

Exploring through Markup: Recovering COCOA. This paper looked at an experimental Voyant tool that allows one to use COCOA markup as a way of exploring a text in different ways. COCOA markup is a simple form of markup that was superseded by XML languages like those developed with the TEI. The paper recovered some of the history of markup and what we may have lost.
Designing for Sustainability: Maintaining TAPoR and Methodi.ca. This paper was presented by Holly Pickering and discussed the processes we have set up to maintain TAPoR and Methodi.ca.
Our team also had two posters, one on “Generative Ethics: Using AI to Generate” that showed a toy that generates statements about artificial intelligence and ethics. The other, “Discovering Digital Methods: An Exploration of Methodica for Humanists” showed what we are doing with Methodi.ca.

txtlab Multilingual Novels

This directory contains 450 novels that appeared between 1770 and 1930 in German, French and English. It is designed for us in teaching and research.

Andrew Piper mentioned a corpus that he put together, txtlab Multilingual Novels. This corpus is of some 450 novels from the late 18th century to the early 20th (1920s). It has a gender mix and is not only English novels. This corpus was supported by SSHRC through the Text Mining the Novel project.

Text Mining The Novel 2015

On Thursday and Friday (Oct. 22nd and 23rd) I was at the 2nd workshop for the Text Mining the Novel project. My conference notes are here Text Mining The Novel 2015. We had a number of great papers on the issue of genre (this year’s topic.) Here are some general reflections:

The obvious weakness of text mining is that it operates on the novel as text, specifically digital text (or string.) We need to find ways to also study the novel as material object (thing), as a social object, as a performance (of the reader), and as an economic object in a market place. Then we also have to find ways to connect these.
So many analytical and mining processes depend on bags of words from dictionaries to topics. Is this a problem or a limitation? Can we try to abstract characters, plot, or argument.
I was interested in the philosophical discussions around the epistemological in novels and philosophical claims about language and literature.

NovelTM: Text Mining the Novel

This week SSHRC announced the new partnership grants awarded including one I am a co-investigator on, NovelTM: Text Mining the Novel.

This project brings together researchers and partners from 21 different academic and non-academic institutions to produce the first large-scale quantitative history of the novel. Our aim is to bring new computational approaches in the field of text mining to the study of literature as well as bring the unique knowledge of literary studies to bear on larger debates about data mining and the place of information technology within society.

NovelTM is led by Andrew Piper at McGill University. At the University of Alberta I will be gathering a team that will share the resulting computing methods through TAPoR and developing recipes or tutorials so that others can try them.