This directory contains 450 novels that appeared between 1770 and 1930 in German, French and English. It is designed for us in teaching and research.
Andrew Piper mentioned a corpus that he put together, txtlab Multilingual Novels. This corpus is of some 450 novels from the late 18th century to the early 20th (1920s). It has a gender mix and is not only English novels. This corpus was supported by SSHRC through the Text Mining the Novel project.
Domenico Fiormonte has recently blogged about an interesting document he has by Father Busa that relates to a difficult moment in the history of the digital humanities in Italy in 2002. The two page “Conditional Agreement”, which I translate below, was given to Domenico and explained the terms under which Busa would agree to sign a letter to the Minister (of Education and Research) Moratti in response to Moratti’s public statement about the uselessness of humanities informatics. A letter was being prepared to be signed by a large number of Italian (and foreign) academics explaining the value of what we now call the digital humanities. Busa had the connections to get the letter published and taken seriously for which reason Domenico visited him to get his help, which ended up being conditional on certain things being made clear, as laid out in the document. Domenico kept the two pages Busa wrote and recently blogged about them. As he points out in his blog, these two pages are a mini-manifesto of Father Busa’s later views of the place and importance of what he called textual informatics. Domenico also points out how political is the context of these notes and the letter eventually signed and published. Defining the digital humanities is often about positioning the field in the larger academic and public political spheres we operate in.
Yesterday I gave a talk at Access 2016. This conference brings together archivists and librarians interested in library technology. I was honoured to give the Dave Binkley Memorial Lecture at the end of the conference. My conference notes are here. My talk was about the ethics of digitization, or more generally datafication.
I gave the first talk on “Tremendous Labour: Busa’s Methods” – a paper coming from the work Stéfan Sinclair and I are doing. I talked about the reconstruction of Busa’s Index project. I claimed that Busa and Tasman made two crucial innovations. The first was figuring out how to represent data on punched cards so that it could be processed (the data structures). The second was figuring out how to use the punched card machines at hand to tokenize unstructured text. I walked through what we know about their actual methods and talked about our attempts to replicate them:
The Canadian Writing Research Collaboratory (CWRC) today launched its Collaboratory. The Collaboratory is a distributed editing environment that allows projects to edit scholarly electronic texts (using CWRC Writer), manage editorial workflows, and publish collections. There are also links to other tools like CWRC Catalogue and Voyant (that I am involved in.) There is an impressive set of projects already featured in CWRC, but it is open to new projects and designed to help them.
Susan Brown deserves a lot of credit for imagining this, writing the CFI (and other) proposals, leading the development and now managing the release. I hope it gets used as it is a fabulous layer of infrastructure designed by scholars for scholars.
One important component in CWRC is CWRC-Writer, an in-browser XML editor that can be hooked into content management systems like the CWRC back-end. It allows for stand-off markup and connects to entity databases for tagging entities in standardized ways.
The obvious weakness of text mining is that it operates on the novel as text, specifically digital text (or string.) We need to find ways to also study the novel as material object (thing), as a social object, as a performance (of the reader), and as an economic object in a market place. Then we also have to find ways to connect these.
So many analytical and mining processes depend on bags of words from dictionaries to topics. Is this a problem or a limitation? Can we try to abstract characters, plot, or argument.
I was interested in the philosophical discussions around the epistemological in novels and philosophical claims about language and literature.
Thanks to 3quarksdaily.com I came across the wonderful short film by Alan Resnais, Toute la mémoire du monde (1956). The short is about memory and the Bibliothèque nationale (of France.) It starts at the roof of this fortress of knowledge and travels down through the architecture. It follows a book from when it arrives from a publisher to when it is shelved. It shows another book called by pneumatique to the reading room where it crosses a boundary to be read. All of this with a philosophical narration on information and memory.
The short shows big analogue information infrastructure at its technological and rational best, before digital informatics disrupted the library.
The Economist has a nice essay on The future of the book. (Thanks to Lynne for sending this along.) The essay has three interfaces:
A listening interface
A remediated book interface where you can flip pages
A scrolling interface
As much as we have moved beyond skeuomorphic interfaces that carry over design cues from older objects, the book interface is actually attractive. It suits the topic, which is captured in the title of the essay, “From Papyrus to Pixels: The Digital Transformation Has Only Just Begun.”
The content of the essay looks at how books have been remediated over time (from scroll to print) and then discusses the current shifts to ebooks. It points out that the ebook market is not like the digital music market. People still like print books and they don’t like to pick them apart like they do albums. The essay is particularly interesting on the self-publishing phenomenon and how authors are bypassing publishers and stores by publishing through Amazon.
The last chapter talks about audio books, one of the formats of the essay itself, and other formats (like treadmill forms that flash words at speed). This is where they get to the “transformation that has only just begun.”
The other is the site Book Traces where people upload interesting examples of marginal marks. Here is their call for examples:
Readers wrote in their books, and left notes, pictures, letters, flowers, locks of hair, and other things between their pages. We need your help identifying them because many are in danger of being discarded as libraries go digital. Books printed between 1820 and 1923 are at particular risk. Help us prove the value of maintaining rich print collections in our libraries.