Programming – Page 2

Show and Tell at CHRIN

Stéphane Pouyllau’s photo of me presenting

Michael Sinatra invited me to a “show and tell” workshop at the new Université de Montréal campus where they have a long data wall. Sinatra is the Director of CRIHN (Centre de recherche interuniversitaire sur les humanitiés numériques) and kindly invited me to show what I am doing with Stéfan Sinclair and to see what others at CRIHN and in France are doing.

Continue reading Show and Tell at CHRIN

The End of Agile

I knew the end of Agile was coming when we started using hockey sticks.

From Slashdot I found my way to a good essay on The End of Agile by Kurt Cagle in Forbes.

The Agile Manifesto, like most such screeds, started out as a really good idea. The core principle was simple – you didn’t really need large groups of people working on software projects to get them done. If anything, beyond a certain point extra people just added to the communication impedance and slowed a project down. Many open source projects that did really cool things were done by small development teams of between a couple and twelve people, with the ideal size being about seven.

Cagle points out that certain types of enterprise projects don’t lend themselves to agile development. In a follow up article he provides links to rebuttals and supporting articles including one on Agile and Toxic Masculinity (it turns out there are a lot of sporting/speed talk in agile.) He proposes the Studio model as an alternative and this model is based on how creative works like movies and games get made. There is an emphasis on creative direction and vision.

I wonder how this critique of agile could be adapted to critique agile-inspired management techniques?

$432 000 painting “by AI” sold at Christie’s

A painting created using GANs (generative adversarial networks) sold for $432 000 at Christies today.

Last year a $432 000 painting “by AI” sold at Christie’s. The painting was created by a collective called Obvious. They used a Generative Adversarial Network. In an essay titled, A naive yet educated perspective on Art and Artificial Intelligence, they talk about how they created the work.

Generative Adversarial Networks (GANs) analyze tens of thousands of images, learn from their features, and are trained with the aim to create new images that are undistinguishable from the original data source.

They also point out that many of the same concerns people have about AI art today were voiced about photography in the 19th century. Photography automated the image making business much as AIs are automating other tasks.

Can we use these GANs for other generative scholarship?

AI Weirdness

I just came across a neat site called AI Weirdness. The site describes all sorts of “weird” experiments in learning neural networks. Some examples:

Cat names like Colzyy, Mumhan and Tygrar
Paint colours like Boo Snow and Sudden Pine
Course titles like Genies and Engineering and Language of Circus Processing

The site has a nice FAQ that describes her tools and how to learn how to do it.

Franken-algorithms: the deadly consequences of unpredictable code

The death of a woman hit by a self-driving car highlights an unfolding technological crisis, as code piled on code creates ‘a universe no one fully understands’

The Guardian has a good essay by Andrew Smith about Franken-algorithms: the deadly consequences of unpredictable code. The essay starts with the obvious problems of biased algorithms like those documented by Cathy O’Neil in Weapons of Math Destruction. It then goes further to talk about cases where algorithms are learning on the fly or are so complex that their behaviour becomes unpredictable. An example is high-frequency trading algorithms that trade on the stock market. These algorithmic traders try to outwit each other and learn which leads to unpredictable “flash crashes” when they go rogue.

The problem, he (George Dyson) tells me, is that we’re building systems that are beyond our intellectual means to control. We believe that if a system is deterministic (acting according to fixed rules, this being the definition of an algorithm) it is predictable – and that what is predictable can be controlled. Both assumptions turn out to be wrong.

The good news is that, according to one of the experts consulted this could lead to “a golden age for philosophy” as we try to sort out the ethics of these autonomous systems.

EaaSI | The Software Preservation Network

I just learned about a new project called EaaSI | The Software Preservation Network. Stanford will be one of the nodes. They are looking at how to provide emulation as a service. They are using technology from Freiburg called bwFLA Emulation as Service.

Emulation as a strategy for digital preservation is about to become an accepted technology for memory institutions as a method for coping a large variety of complex digital objects. Hence, the demand for ready-made and especially easy-to-use emulation services will grow. In order to provide user-friendly emulation services a scalable, distributed system model is required to be run on heterogeneous Grid or Cluster infrastructure.

The Emulation-as-a-Service architecture simplifies access to preserved digital assets allowing end users to interact with the original environments running on different emulators. Ready-made emulation components provide a flexible web service API allowing for development of individual and tailored digital preservation workflows.

Emulation is going to be important to game preservation. Already the Internet Archive is making games and other software available with emulation. There is also the MAME (Multiple Arcade Machine Emulator) project that is a community project that has traditionally allowed people to play older games right from the bit sequence off cartridges.

Python Programming for the Humanities by Folgert Karsdorp

Having just finished teaching a course on Big Data and Text Analysis where I taught students Python I can appreciate a well written tutorial on Python. Python Programming for the Humanities by Folgert Karsdorp is a great tutorial for humanists new to programming that takes the form of a series of Jupyter notebooks that students can download. As the tutorials are notebooks, if students have set up Python on their computers then they can use the tutorials interactively. Karsdorp has done a nice job of weaving in cells where the student has to code and Quizes which reinforce the materials which strikes me as an excellent use of the IPython notebook model.

I learned about this reading a more advanced set of tutorials from Allen Riddell for Dariah-DE, Text Analysis with Topic Models for the Humanities and Social Sciences. The title doesn’t do this collection of tutorials justice because they include a lot more than just Topic Models. There are advanced tutorials on all sorts of topics like machine learning and classification. See the index for the range of tutorials.

Text Analysis with Topic Models for the Humanities and Social Sciences (TAToM) consists of a series of tutorials covering basic procedures in quantitative text analysis. The tutorials cover the preparation of a text corpus for analysis and the exploration of a collection of texts using topic models and machine learning.

Stéfan Sinclair and I (mostly Stéfan) have also produced a textbook for teaching programming to humanists called The Art of Literary Text Analysis. These tutorials are also written as Jupyter notebooks so you can download them and play with them.

We are now reimplementing them with our own Voyant-based notebook environment called Spyral. See The Art of Literary Text Analysis with Spyral Notebooks. More on this in another blog entry.

DataCamp

I’ve been playing with DataCamp‘s Python lessons and they are quite good. Python is taught in the context of data analysis rather than the turtle drawing of How to Think Like a Computer Scientist. They have a nice mix of video tutorials and then exercises where you get a tripartite screen (see above.) You have an explanation and instructions on the left, a short script to fill in on the upper-right and interactive python shell where you can try stuff below.

Continue reading DataCamp

Common Errors in English Usage

An article about authorship attribution led me to this nice site on Common Errors in English Usage. The site is for a book with that title, but the author Paul Brians has organized all the errors into a hypertext here. For example, here is the entry on why you shouldn’t use enjoy to.

What does this have to do with authorship attribution? In a paper on Authorship Identification on the Large Scale the authors try using common errors as feature to discriminate potential authors.

Instant History conference

This weekend I gave a talk at a lovely one day conference on Instant History, The Postwar Digital Humanities and Their Legacies. My conference notes are here. The conference was organized by Paul Eggert, among others. Steve Jones, Ted Underwood and Laura Mandell also talked.

I gave the first talk on “Tremendous Labour: Busa’s Methods” – a paper coming from the work Stéfan Sinclair and I are doing. I talked about the reconstruction of Busa’s Index project. I claimed that Busa and Tasman made two crucial innovations. The first was figuring out how to represent data on punched cards so that it could be processed (the data structures). The second was figuring out how to use the punched card machines at hand to tokenize unstructured text. I walked through what we know about their actual methods and talked about our attempts to replicate them:

I was lucky to have two great respondents (Kyle Roberts and Schlomo Argamon) who both pointed out important contextual issues to consider, as in:

We need to pay attention to the Jesuit and spiritual dimensions of Busa’s work.
We need to think about the dialectic of those critical of computing and those optimistic about it.