The death of a woman hit by a self-driving car highlights an unfolding technological crisis, as code piled on code creates ‘a universe no one fully understands’
The Guardian has a good essay by Andrew Smith about Franken-algorithms: the deadly consequences of unpredictable code. The essay starts with the obvious problems of biased algorithms like those documented by Cathy O’Neil in Weapons of Math Destruction. It then goes further to talk about cases where algorithms are learning on the fly or are so complex that their behaviour becomes unpredictable. An example is high-frequency trading algorithms that trade on the stock market. These algorithmic traders try to outwit each other and learn which leads to unpredictable “flash crashes” when they go rogue.
The problem, he (George Dyson) tells me, is that we’re building systems that are beyond our intellectual means to control. We believe that if a system is deterministic (acting according to fixed rules, this being the definition of an algorithm) it is predictable – and that what is predictable can be controlled. Both assumptions turn out to be wrong.
The good news is that, according to one of the experts consulted this could lead to “a golden age for philosophy” as we try to sort out the ethics of these autonomous systems.
Emulation as a strategy for digital preservation is about to become an accepted technology for memory institutions as a method for coping a large variety of complex digital objects. Hence, the demand for ready-made and especially easy-to-use emulation services will grow. In order to provide user-friendly emulation services a scalable, distributed system model is required to be run on heterogeneous Grid or Cluster infrastructure.
The Emulation-as-a-Service architecture simplifies access to preserved digital assets allowing end users to interact with the original environments running on different emulators. Ready-made emulation components provide a flexible web service API allowing for development of individual and tailored digital preservation workflows.
Having just finished teaching a course on Big Data and Text Analysis where I taught students Python I can appreciate a well written tutorial on Python. Python Programming for the Humanities by Folgert Karsdorpis a great tutorial for humanists new to programming that takes the form of a series of Jupyter notebooks that students can download. As the tutorials are notebooks, if students have set up Python on their computers then they can use the tutorials interactively. Karsdorp has done a nice job of weaving in cells where the student has to code and Quizes which reinforce the materials which strikes me as an excellent use of the IPython notebook model.
Text Analysis with Topic Models for the Humanities and Social Sciences (TAToM) consists of a series of tutorials covering basic procedures in quantitative text analysis. The tutorials cover the preparation of a text corpus for analysis and the exploration of a collection of texts using topic models and machine learning.
Stéfan Sinclair and I (mostly Stéfan) have also produced a textbook for teaching programming to humanists called The Art of Literary Text Analysis. These tutorials are also written as Jupyter notebooks so you can download them and play with them.
I’ve been playing with DataCamp‘s Python lessons and they are quite good. Python is taught in the context of data analysis rather than the turtle drawing of How to Think Like a Computer Scientist. They have a nice mix of video tutorials and then exercises where you get a tripartite screen (see above.) You have an explanation and instructions on the left, a short script to fill in on the upper-right and interactive python shell where you can try stuff below.
An article about authorship attribution led me to this nice site on Common Errors in English Usage. The site is for a book with that title, but the author Paul Brians has organized all the errors into a hypertext here. For example, here is the entry on why you shouldn’t use enjoy to.
I gave the first talk on “Tremendous Labour: Busa’s Methods” – a paper coming from the work Stéfan Sinclair and I are doing. I talked about the reconstruction of Busa’s Index project. I claimed that Busa and Tasman made two crucial innovations. The first was figuring out how to represent data on punched cards so that it could be processed (the data structures). The second was figuring out how to use the punched card machines at hand to tokenize unstructured text. I walked through what we know about their actual methods and talked about our attempts to replicate them:
ProPublica has a great op-ed about Making Algorithms Accountable. The story starts from a decision from the Wisconsin Supreme Court on computer-generated risk (of recidivism) scores. The scores used in Wisconsin come from Northpointe who provide the scores as a service based on a proprietary alogorithm that seems biased against blacks and not that accurate. The story highlights the lack of any legislation regarding algorithms that can affect our lives.
What can we learn from the discourse around text tools? More than might be expected. The development of text analysis tools has been a feature of computing in the humanities since IBM supported Father Busa’s production of the Index Thomisticus (Tasman 1957). Despite the importance of tools in the digital humanities (DH), few have looked at the discourse around tool development to understand how the research agenda changed over the years. Recognizing the need for such an investigation a corpus of articles from the entire run of Computers and the Humanities (CHum) was analyzed using both distant and close reading techniques. By analyzing this corpus using traditional category assignments alongside topic modelling and statistical analysis we are able to gain insight into how the digital humanities shaped itself and grew as a discipline in what can be considered its “middle years,” from when the field professionalized (through the development of journals like CHum) to when it changed its name to “digital humanities.” The initial results (Simpson et al. 2013a; Simpson et al. 2013b), are at once informative and surprising, showing evidence of the maturation of the discipline and hinting at moments of change in editorial policy and the rise of the Internet as a new forum for delivering tools and information about them.
This is a story from early in the technological revolution, when the application was out searching for the hardware, from a time before the Internet, a time before the PC, before the chip, before the mainframe. From a time even before programming itself. (Winter 1999, 3)
Father Busa is rightly honoured as one of the first humanists to use computing for a humanities research task. He is considered the founder of humanities computing for his innovative application of information technology and for the considerable influence of his project and methods, not to mention his generosity to others. He did not only work out how use the information technology of the late 1940s and 1950s, but he pioneered a relationship with IBM around language engineering and with their support generously shared his knowledge widely. Ironically, while we have all heard his name and the origin story of his research into presence in Aquinas, we know relatively little about what actually occupied his time – the planning and implementation of what was for its time one of the major research computing projects, the Index Thomsticus.