Eder has a nice page about the work he and ogthers in the Computational Stylistics Group are doing. In the workshop sessions I was able to attend he showed us how to set up and run his “stylo” package (PDF) that provides a simple user interface over R for doing stylometry. He also showed us how to then use Gephi for network visualization.
Information is Beautiful has a great interactive on World’s Biggest Data Breaches & Hacks. The interactive shows how data breaches are getting worse, but it also lets you look at different types of breaches.
I could see in my daily work how difficult it was to inform people about their privacy issues. Nobody seemed to care. My hypothesis was that the whole subject was too complex. There were no examples, no images that could help the audience to understand the process behind the mass surveillance.
The answer is to mock up a design fiction of an NSA surveillance dashboard based on what we know and then a video describing a fictional use of it to track an architecture student from Berlin. It seems to me the video and mock designs nicely bring together a number of things we can infer about the tools they have.
Past Visions: penned by Frederick William IV is a lovely visualization of hist historical sketches and doodles. The visualization has a rich prospect view where you see miniatures of all the sketches arranged over time. You can pan in and out or use the keywords to see subsets. There is information available about each sketch (in German.)
At the end of April I gave a talk at the University of Würzburg on Replication as a way of knowing in the digital humanities. This was sponsored by the Dr. Fotis Jannidis who holds the position of Chair of computer philology and modern German literature there. He and others have built a digital humanities program and interesting research agenda around text mining and German literature. The talk tried out some new ideas Stéfan Sinclair and I are working on. The abstract read:
Much new knowledge in the digital humanities comes from the practices of encoding and programming not through discourse. These practices can be considered forms of modelling in the active sense of making by modelling or, as I like to call them, practices of thinking-through. Alas, these practices and the associated ways of knowing are not captured or communicated very well through the usual academic forms of publication which come out of discursive knowledge traditions. In this talk I will argue for “replication” as a way of thinking-through the making of code. I will give examples and conclude by arguing that such thinking-through replication is critical to the digital literacy needed in the age of big data and algorithms.
What can we learn from the discourse around text tools? More than might be expected. The development of text analysis tools has been a feature of computing in the humanities since IBM supported Father Busa’s production of the Index Thomisticus (Tasman 1957). Despite the importance of tools in the digital humanities (DH), few have looked at the discourse around tool development to understand how the research agenda changed over the years. Recognizing the need for such an investigation a corpus of articles from the entire run of Computers and the Humanities (CHum) was analyzed using both distant and close reading techniques. By analyzing this corpus using traditional category assignments alongside topic modelling and statistical analysis we are able to gain insight into how the digital humanities shaped itself and grew as a discipline in what can be considered its “middle years,” from when the field professionalized (through the development of journals like CHum) to when it changed its name to “digital humanities.” The initial results (Simpson et al. 2013a; Simpson et al. 2013b), are at once informative and surprising, showing evidence of the maturation of the discipline and hinting at moments of change in editorial policy and the rise of the Internet as a new forum for delivering tools and information about them.
I just discovered that IBM to close Many Eyes. This is a pity. It was great environment that let people upload data and visualize it in different ways. I blogged about it ages ago (in computer ages anyway.) In particular I liked their Word Tree which seems one of the best ways to explore language use.
It seems that some of the programmers moved on and that IBM is now focusing on Watson Analytics.
Shakespeare’s ability to distil human nature into an elegant turn of phrase is rightly exalted – much remains vivid four centuries after his death. Less scrutiny has been given to statistics about the playwright and his works, which tell a story in their own right. Here we analyse the numbers behind the Bard.
The authors offer a series of visualizations of statistics about Shakespeare that are rather more of a tease than anything really interesting. They also ignore the long history of using quantitative methods to study Shakespeare going back to Mendenhall’s study of authorship using word lengths.
Mendenhall, T. C. (1901). “A Mechanical Solution of a Literary Problem.” The Popular Science Monthly. LX(7): 97-105.