Having just finished teaching a course on Big Data and Text Analysis where I taught students Python I can appreciate a well written tutorial on Python. Python Programming for the Humanities by Folgert Karsdorpis a great tutorial for humanists new to programming that takes the form of a series of Jupyter notebooks that students can download. As the tutorials are notebooks, if students have set up Python on their computers then they can use the tutorials interactively. Karsdorp has done a nice job of weaving in cells where the student has to code and Quizes which reinforce the materials which strikes me as an excellent use of the IPython notebook model.
Text Analysis with Topic Models for the Humanities and Social Sciences (TAToM) consists of a series of tutorials covering basic procedures in quantitative text analysis. The tutorials cover the preparation of a text corpus for analysis and the exploration of a collection of texts using topic models and machine learning.
Stéfan Sinclair and I (mostly Stéfan) have also produced a textbook for teaching programming to humanists called The Art of Literary Text Analysis. These tutorials are also written as Jupyter notebooks so you can download them and play with them.
I’ve been playing with DataCamp‘s Python lessons and they are quite good. Python is taught in the context of data analysis rather than the turtle drawing of How to Think Like a Computer Scientist. They have a nice mix of video tutorials and then exercises where you get a tripartite screen (see above.) You have an explanation and instructions on the left, a short script to fill in on the upper-right and interactive python shell where you can try stuff below.
An article about authorship attribution led me to this nice site on Common Errors in English Usage. The site is for a book with that title, but the author Paul Brians has organized all the errors into a hypertext here. For example, here is the entry on why you shouldn’t use enjoy to.
I gave the first talk on “Tremendous Labour: Busa’s Methods” – a paper coming from the work Stéfan Sinclair and I are doing. I talked about the reconstruction of Busa’s Index project. I claimed that Busa and Tasman made two crucial innovations. The first was figuring out how to represent data on punched cards so that it could be processed (the data structures). The second was figuring out how to use the punched card machines at hand to tokenize unstructured text. I walked through what we know about their actual methods and talked about our attempts to replicate them:
ProPublica has a great op-ed about Making Algorithms Accountable. The story starts from a decision from the Wisconsin Supreme Court on computer-generated risk (of recidivism) scores. The scores used in Wisconsin come from Northpointe who provide the scores as a service based on a proprietary alogorithm that seems biased against blacks and not that accurate. The story highlights the lack of any legislation regarding algorithms that can affect our lives.
What can we learn from the discourse around text tools? More than might be expected. The development of text analysis tools has been a feature of computing in the humanities since IBM supported Father Busa’s production of the Index Thomisticus (Tasman 1957). Despite the importance of tools in the digital humanities (DH), few have looked at the discourse around tool development to understand how the research agenda changed over the years. Recognizing the need for such an investigation a corpus of articles from the entire run of Computers and the Humanities (CHum) was analyzed using both distant and close reading techniques. By analyzing this corpus using traditional category assignments alongside topic modelling and statistical analysis we are able to gain insight into how the digital humanities shaped itself and grew as a discipline in what can be considered its “middle years,” from when the field professionalized (through the development of journals like CHum) to when it changed its name to “digital humanities.” The initial results (Simpson et al. 2013a; Simpson et al. 2013b), are at once informative and surprising, showing evidence of the maturation of the discipline and hinting at moments of change in editorial policy and the rise of the Internet as a new forum for delivering tools and information about them.
This is a story from early in the technological revolution, when the application was out searching for the hardware, from a time before the Internet, a time before the PC, before the chip, before the mainframe. From a time even before programming itself. (Winter 1999, 3)
Father Busa is rightly honoured as one of the first humanists to use computing for a humanities research task. He is considered the founder of humanities computing for his innovative application of information technology and for the considerable influence of his project and methods, not to mention his generosity to others. He did not only work out how use the information technology of the late 1940s and 1950s, but he pioneered a relationship with IBM around language engineering and with their support generously shared his knowledge widely. Ironically, while we have all heard his name and the origin story of his research into presence in Aquinas, we know relatively little about what actually occupied his time – the planning and implementation of what was for its time one of the major research computing projects, the Index Thomsticus.
Stephen Wolfram has written a nice long blog essay on Untangling the Tale of Ada Lovelace. He tackles the question of whether Ada really contributed or was overestimated. He provides a biography of both Ada and Babbage. He speculates about what they were like and could have been. He believes Ada saw the big picture in a way Babbage didn’t and was able to communicate it.
Ada Lovelace was an intelligent woman who became friends with Babbage (there’s zero evidence they were ever romantically involved). As something of a favor to Babbage, she wrote an exposition of the Analytical Engine, and in doing so she developed a more abstract understanding of it than Babbage had—and got a glimpse of the incredibly powerful idea of universal computation.
The essay reflects on what might have happened if Ada had not died prematurely. Wolfram thinks they would have finished the Analytical Engine and possibly explored building an electromechanical version.
We will never know what Ada could have become. Another Mary Somerville, famous Victorian expositor of science? A Steve-Jobs-like figure who would lead the vision of the Analytical Engine? Or an Alan Turing, understanding the abstract idea of universal computation?
That Ada touched what would become a defining intellectual idea of our time was good fortune. Babbage did not know what he had; Ada started to see glimpses and successfully described them.
Lately I’ve been trying Wolfram Mathematica more an more for analytics. I was introduced to Mathematica by Bill Turkel and Ian Graham who have done some impressive stuff with it. Bill Turkel has now created a open access, open content, and open source textbook Digital Research Methods with Mathematica. The text is a Mathematica notebook itself so, if you have Mathematica you can actually use the text to do analytics on the spot.