Paolo showed me a neat demonstration of Word2Vec Vis of Pride and Prejudice. Lynn Cherny trained a Word2Vec model using Jane Austen’s novels and then used that to find close matches for key words. She then show the text of a novel with the words replaced by their match in the language of Austen. It serves as a sort of demonstration of how Word2Vec works.
“…it’s like writing with a deranged but very well-read parrot on your shoulder.”
Robin Sloan, author of Mr. Penumbra’s 24-Hour Bookstore, has been doing some interesting work with recursive neural nets in order to generate text. See Writing with the machine. He trained a machine on science fiction and then hooked it into a text editor so it can complete sentences. The New York Times has a nice story on Sloan’s experiments, Computer Stories: A.I. Is Beginning to Assist Novelists.
One wonders what it would be like if you trained it on your own writing. Would it help you be yourself or discourage you from rereading your prose?
From my students I heard about the game Mastaba Snoopy created in Twee and TiddlyWiki and being taught in another Humanities Computing course (our students are vectors of influence.) Here is a review where you can download the single HTML page that is the bizarre text adventure, Mastaba Snoopy is a Cronenbergian nightmare vision of childhood. The story takes place14,000 years in the future when a mutable alien has destroyed us and then reinvented itself following a collection of Peanuts comics. Play it.
An article about authorship attribution led me to this nice site on Common Errors in English Usage. The site is for a book with that title, but the author Paul Brians has organized all the errors into a hypertext here. For example, here is the entry on why you shouldn’t use enjoy to.
What does this have to do with authorship attribution? In a paper on Authorship Identification on the Large Scale the authors try using common errors as feature to discriminate potential authors.
The Canadian Writing Research Collaboratory (CWRC) today launched its Collaboratory. The Collaboratory is a distributed editing environment that allows projects to edit scholarly electronic texts (using CWRC Writer), manage editorial workflows, and publish collections. There are also links to other tools like CWRC Catalogue and Voyant (that I am involved in.) There is an impressive set of projects already featured in CWRC, but it is open to new projects and designed to help them.
Susan Brown deserves a lot of credit for imagining this, writing the CFI (and other) proposals, leading the development and now managing the release. I hope it gets used as it is a fabulous layer of infrastructure designed by scholars for scholars.
One important component in CWRC is CWRC-Writer, an in-browser XML editor that can be hooked into content management systems like the CWRC back-end. It allows for stand-off markup and connects to entity databases for tagging entities in standardized ways.
Cover of Playboy Roasted à la Edo
The Hamburg Museum of Arts and Crafts has a well designed exhibit called Hokusai x Manga that looks at the history of comics from the first kibyōshi to current manga and videogames. The exhibit draws on an extensive collection of woodblock books and prints from the Edo period. They mix a historical approach with themes like the depiction of demons (Yokai) in print and the franchise Yo-kai Watch.
One of the earliest comic picture books that they have is Kyōden’s Playboy, Roasted à la Edo (1785). Harvard has put up a Flash version of this in English and Japanese. (Second book here, third here.) They also have a copies of Hokusai Manga (or Hokusai’s Sketches) published starting in 1814 which was a sort of manual on how to draw with lots of examples. Note that “manga” at the time didn’t mean what it means now.
There is an excellent catalogue with useful essays including one at the end on “Manga in Transition” by Jaqueline Berndt.
Shulze, S., et al. (2016). Hokusai X Manga: Japanese Pop Culture since 1680. Munich, Hirmer.
The Guardian published an article on What’s in a number? William Shakespeare’s legacy analysed (April 22, 2016). This article is part of a Shakespeare 400 series in honour of the 400th anniversary of the bard’s death. The article is introduced thus:
Shakespeare’s ability to distil human nature into an elegant turn of phrase is rightly exalted – much remains vivid four centuries after his death. Less scrutiny has been given to statistics about the playwright and his works, which tell a story in their own right. Here we analyse the numbers behind the Bard.
The authors offer a series of visualizations of statistics about Shakespeare that are rather more of a tease than anything really interesting. They also ignore the long history of using quantitative methods to study Shakespeare going back to Mendenhall’s study of authorship using word lengths.
Mendenhall, T. C. (1901). “A Mechanical Solution of a Literary Problem.” The Popular Science Monthly. LX(7): 97-105.
On Thursday I was part of a conference here in Verona (see my conference notes) that celebrated the seminar I led at the University of Verona and the English publication of The Digital Humanist by Domenico Fiormonte, Francesca Tomasi, and Teresa Numerico (with a Preface by me). This is the English adaptation/translation of their 2010 Italian book which has finally come out in English. Here is the edited text of my presentation. (Thanks to Domenico for helping me with the Italian!)
Dear Friends and Colleagues,
Today we are here to celebrate the end of a laboratory on digital humanities and a beginning with the publication of the The Digital Humanist: A Critical Inquiry by Domenico Fiormonte, Teresa Numerico and Francesca Tomasi.
Oggi si celebra la fine questa laboratorio che abbiamo creato insieme e una la publicazione in Inglese del libro L’umanista digitale che è stato pubblicato per la prima volta in Italiano nel 2010 e poi aggiornato e tradotto in inglese da Desmond Schmidt e Christopher Ferguson.
The English publication of this book is important to the book because part of what makes it “A Critical Inquiry” is that it questions the universality of English. I use the word universality in two senses, both of which are to be questioned:
First, that there is an assumption that we need a universal language or metalanguage – a dream of philosophers, a dream that can be said to have led to the idea of a universal machine or computer,
E secondo, uso la parola universale per il modo in cui l’Inglese invade l’informatica, dai motori di recerca ai linguaggi di programmazione, come abbiamo sentito oggi nelle presentazioni degli studenti.
Il filosofo della scienza e della tecnologia, Langdon Winner, ha scritto un bel testo dal titolo: “Do Artifacts Have Politics?” In questo articolo Winner cerca di navigare tra due posizioni opposte – quella del determinismo tecnologico che sostiene che ogni messaggio è determinato dal tecnologia–
And, he argues that neither can technologies be said to be neutral – the argument of so many technologists that relieves them of the need to take responsibility for what they develop.
Instead Winner argues that we have to attend to the artefacts themselves – some bring baggage or structure experience and some less so.
One of the great contributions of this book is just such a critical attending to the digital artefacts themselves – especially those like search engines or electronic texts that are important to us in the humanities.
Questo libro, invece di parlare dell’informatica in generale – parla delle tecnologie che usiamo come umanisti e ci aiuta a capire l’importanza del nostro lavoro – infatti direi che ci aiuta capire come dobbiamo assumerci la responsabilità per le nostre technologie.
As Heidegger and others point out, sometimes the hardest thing to do is to notice technologies that we use every day like the glasses on the end of our nose. We need to find ways back to noticing the systems of ready-to-hand in which we navigate our desires and dreams. That includes for Heidegger also noticing the way language itself structures our thinking.
But how can we do that? How can we attend? What practices can we draw on from the humanities?
Lev Manovich in an online essay talks about the comedy of breakdowns as an interruption that forces us to notice technology – something that was normal in Russia, but isn’t normal in the West.
Siegfried Zielinski – in Deep Time of the Media proposes an archaeology that pays attention to the failed technologies – the branches that have been left out of the origin myths.
This book provides, I think, three other, uniquely humanities ways into thinking again about technology:
First, it is written from the margins – at least the linguistic margins of an Anglophone discourse of technology (and digital humanities.) It was first written in Italian and draws on an Italian humanities computing tradition. The book reminds us to pay attention to language, so important to the humanities and technology too.
Second, it historicizes the technologies we take for granted – looking, for example, at key figures who imagined our cybernetic future.
Terzo, questo libro non soltanto guarda agli artefatti e ai sistemi in un modo critico, ma guarda anche ai modi in cui noi organizziamo il discorso accademico sull’informatica umanistica – direi che tratta le digital humanities come artefatto umano che deve anche essere criticato, specialmente perché siamo ciechi ai modi nei quali l’organizzazione della disciplina segue la cultura anglo-sassone. The digital humanist ci chiede di criticare come siamo e potremmo essere dei digital humanists. Questa è un questione di ethos – come viviamo con la tecnologia, come ci organizziamo per porre attenzione alla tecnologia
E’ per questo che raccomando questo libro specialmente a voi dotorandi.
For those of you just discovering the digital as a subject for humanities attention I recommend this book – it is a way in for humanists.
Voglio concludere con un commento sulla presentazione dei libri – se un libro e come una neonato – un natio come ne parlava Vico –è anche importante come il libro viene educato insegnato e interpretato.
Remember the lesson of Frankenstein. The tragedy is not that he was made of parts, but that he was abandoned at birth. The same can be said of the digital humanities – a field made of parts.
Questa e la seconda volta che aiuto a presentare questo libro. La prima volta è stata la settimana scorsa a Roma. Direi che addesso sono diventato un presentatore con esperienza nell’ allevamento. Posso annuciare il tour?
As I was just saying in Italian, this is the second time I present this book – and I’ve chosen to do it in two tongues – English and Italian. In this I’m drawing on a Canadian political tradition of bilingual presentations which I have always admired. Such bilingual talks weave two languages to make something that is not a universal language but is free of the particular blindness of a particular language.
My reason for switching is that if we are to avoid the universalizing tendency of technologies of thinking like language we have to habituate ourselves to travel back and forth translating and thinking across. That used to be obvious to the humanities, but we seem to have forgotten that discipline.
Attraversare le lingue è qualcosa che voi Italiani dovete fare per forza – per noi anglo-sassoni è una nuova esperienza – troppo volte aspettiamo che l’atro venga da noi invece di incontrarci a metà strada.
Nel frattempo, The Digital Humanist è un importante tentativo che attraversa Italiano e Inglese per invitarci tutti a dialogare.
The tiny figure crawls out from under the sands. It’s dead.
“You win,” it says. “Okay, my turn again.”
Nothing left to do. Time passes.
The sun crawls higher.
*** SHADE ***
I just finished playing the interactive fiction (IF) Shade (2000) by Andrew Plotkin. A poetic work that plays with the genre without playing for the sake of playing. The meditation on life and the end of the game is for real and fiction. You can see other fictions by Plotkin at Zarf’s Interactive Fiction and/or read a nice review Enlightening Interactive Fiction: Andrew Plotkin’s Shade by Jeremy Douglass (electronic book review: 2008). I also recommend the review as a nice introduction to IF in general.
If you need some hints (as I did) see the comments here (and then enjoy his other posts).
On Thursday and Friday (Oct. 22nd and 23rd) I was at the 2nd workshop for the Text Mining the Novel project. My conference notes are here Text Mining The Novel 2015. We had a number of great papers on the issue of genre (this year’s topic.) Here are some general reflections:
- The obvious weakness of text mining is that it operates on the novel as text, specifically digital text (or string.) We need to find ways to also study the novel as material object (thing), as a social object, as a performance (of the reader), and as an economic object in a market place. Then we also have to find ways to connect these.
- So many analytical and mining processes depend on bags of words from dictionaries to topics. Is this a problem or a limitation? Can we try to abstract characters, plot, or argument.
- I was interested in the philosophical discussions around the epistemological in novels and philosophical claims about language and literature.