An alternate beginning to humanities computing

Reading Andrew Booth’s Mechanical Resolution of Linguistic Problems (1958) I came across some interesting passages about the beginnings of text computing that suggest an alternative to the canonical Roberto Busa story of origin. Booth (the primary author) starts the book with a “Historical Introduction” in which he alludes to Busa’s project as part of a list of linguistic problems that run parallel to the problems of machine translation:

In parallel with these (machine translation) problems are various others, sometimes of a higher, sometimes of a lower degree of sophistry. There is, for example the problem of the analysis of the frequency of occurence of words in a given text. … Another problem of the same generic type is that of constructing concordances for given texts, that is, lists, usually in alphabetic order, of the words in these texts, each word being accompanied by a set of page and line references to the place of its occurrence. … The interest at Birkbeck College in this field was chiefly engendered by some earlier research work on the Dialogues of Plato … Parallel work in this field has been carried out by the I.B.M. Corporation, and it appears that some of this work is now being put to practical use in the preparation of a concordance for the works of Thomas Aquinas.
A more involved application of the same sort is to the stylistic analysis of a work by purely mechanical means. (p. 5-6)

In Mechanical Resolutions he continues with a discussion of how to use computers to count words and to generate concordances. He has a chapter on the problem of Plato’s dialogues which seems to have been a set problem at that time and, of course, there are chapters on dictionaries and machine translation. He describes some experiments he did starting in the late 40s that suggest that Searle’s Chinese Room Argument of 1980 might have been based on real human simulations.

Although no machine was available at this time (1948), the ideas of Booth and Richens were extensively tested by the construction of limited dictionaries of the type envisaged. These were used by a human untutored in the languages concerned, who applied only those rules which could eventually be performed by a machine. The results of these early ‘translations’ were extremely odd, … (p. 2)

Did others run such simulations of computing with “untutored” humans in the early years when they didn’t have access to real systems? See also the PDF of Richens and Booth, Some Methods of Mechanized Translation.

As for Andrew D. Booth, he ended up in Canada working on French/English translation for the Hansard, the bilingual transcript of parlimentary debates. (Note that Bill Winder has also been working on these, but using them as source texts for bilingual collocations. ) Andrew and Kathleen Booth wrote a contribution on The Origins of MT (PDF) that describes his early encounters with pioneers of computing around the possibilities of machine translation starting in 1946.

We date realistic possibilities starting with two meetings held in 1946. The first was between Warren Weaver, Director of the Natural Sciences Division of the Rockefeller Foundation, and Norbert Wiener. The second was between Weaver and A.D. Booth in that same year. The Weaver-Wiener discussion centered on the extensive code-breaking activities carried out during the World War II. The argument ran as follows: decryption is simply the conversion of one set of “words”–the code–into a second set, the message. The discussion between Weaver and A.D. Booth on June 20, 1946, in New York identified the fact that the code-breaking process in no way resembled language translation because it was known a priori that the decrypting process must result in a unique output. (p. 25)

Booth seems to have successfully raised funds from the Nuffield Foundation for a computer at Birkbeck College at the University of London that was used by L. Brandwood for work on Plato, among others. In 1962 he and his wife migrated to Saskatchewan to work on bilingual translation and then to Lakehead in Ontario where they “continued with emphasis on the construction of a large dictionary and the use of statistical techniques in linguistic analysis” in 1972. They retired to British Columbia in 1978 as most sensible Canadians do.

In short, Andrew Booth seems to have been involved in the design of early computers in order to get systems that could do machine translation and that led him to support a variety of text processing projects including stylistic analysis and concording. His work has been picked up as important to the history of machine translation, but not for the history of humanities computing. Why is that?
In a 1960 paper on The future of automatic digital computers he concludes,

My feeling on all questions of input-output is, however, the less the better. The ideal use of a machine is not to produce masses of paper with which to encourage Parkinsonian administrators and to stifle human inventiveness, but to make all decisions on the basis of its own internal operations. Thus computers of the future will communicate directly with each other and human beings will only be called on to make those judgements in which aesthetic considerations are involved. (p. 360)

Harpham: Science and the Theft of Humanity

American Scientist Online has a provocative essay by Geoffrey Harpham on Science and the Theft of Humanity. In it he argues that the humanities that take “human beings and their thoughts, imaginings, capacities and works as its subject” (Page 2) are experiencing a poaching from certain sciences and that this is a good thing. This poaching is “the most exciting and unpredictable unintended consequence of disciplinarity” as new disciplines that don’t fit with the old “gentleman’s agreement” as to who studies what begin to cross boundaries. (Page 3)

They–we–must understand that while scientists are indeed poaching our concepts, poaching in general is one of the ways in which disciplines are reinvigorated, and this particular act of thievery is nothing less than the primary driver of the transformation of knowledge today. (Page 4)

This poaching is not just a counterattack from the sciences threatened by the “debunking attention” of humanities disciplines. It is a symptom of how the disciplinary divisions encoded in the post WW II university don’t fit fabric of current research. Humanities computing is but one case of an emerging discipline that doesn’t fit the humanities, science division. (For that matter I don’t think computer science does either.)

One of the most striking features of contemporary intellectual life is the fact that questions formerly reserved for the humanities are today being approached by scientists in various disciplines such as cognitive science, cognitive neuroscience, robotics, artificial life, behavioral genetics and evolutionary biology. (Page 3)

I found this looking at the ASC (Autonomy | Singularity | Creativity) site of the National Humanities Center.

Ada: The Enchantress of Numbers

Image of Ada CoverAda: The Enchantress of Numbers is a biography of and selection of Ada’s lettes by Betty Alexandra Toole (Mill Valley, CA: Strawberry Press, 1998). The work is, as the author writes in the Acknowledgements, “the result of more than twenty years of addiction to Ada.” (page ix) This addiction shows itself in, for example, Toole’s e-mail, “adatoole at well dot com”. Toole seems concerned to protect Ada from claims that she was a drug addict and addicted gambler, though she doesn’t so much argue the case as unleash it. The paperback version (and hardback) is published by Strawberry Press: “Strawberry Press publishes reference books for the succulent world …” and much is made of the cover designer, Leah Schwartz (whose book, Leah Schwartz, the life of a woman who managed to keep painting was also published by Strawberry Press.) I’m not sure I would keep painting covers like the one for Ada.

Despite the strange presence of the author/editor and cover designer, the book nicely gathers Ada’s letters and her notes on Babbages Analytical Engine with biographical context. The correspondence with Babbage is startling as it is clear how firm Ada was with Babbage about her publication of the translation and notes (she refused to let Babbage append a rant about funding and almost fell out with him over this). The annotated selections from her notes on the Analytical Engine also make clear how they were an original reflection on the Engine. I’m curious now about what they thought “analysis” was then.

What comes through about her personality is that she was a brilliant woman, constantly sick, struggling with her mother (who pushes her into mathematics), and socially connected to many of the leading scientists and mathematicians (like Babbage) of the day. Her last months as documented by Toole are heartbreaking.

BBC: Fifteen years of the web

Image of Trojan Room Web PageThe BBC has a nice interactive timeline on Fifteen years of the web. It includes such “firsts” as the first webcam to go online – watching a coffee pot at Cambridge University (picture of it being disconnected here). Apparently the coffee percolator went offline in 2001 and was sold to Speigel (who seem to have put it, or another one, back online.)

I suspect that dot.com crash was due to a lack of coffeecams. The timing of the disconnection is suspect.

Computer History Museum – Selling the Computer Revolution – Marketing Brochures in the Collection

Image from cover of timeshare brochureComputer History Museum – Selling the Computer Revolution – Marketing Brochures in the Collection is a magnificent site that makes available brochures and manuals from their collection. These include the Apple – 1 Operation Manual. The cover images alone make an interesting study.

There are many ways to study the history of a technological topic. One of the most neglected, though also the most revealing, is to look at the advertising materials companies have produced to promote their products. In a technical field such as computing, buying decisions, as expressed in such materials, are often based on a complex blend of ‘atmospheric’ messages focusing on status, and highly-detailed technical information about the product itself. (From the Overview)

Lowood: The Hard Work of Software History

Thanks to Matt K who pointed me to this essay about the history and historiorgraphy of software, The Hard Work of Software History. Henry Lowood documents the problems with studying the history of software including the problems of preserving software for study.

A new twist in the Silicon Valley Project has been the acquisition of software in various forms, accompanied by research projects that seek to tell the story of the Silicon Valley in its own medium. In the first instance, the libraries have acquired materials such as data tapes from Engelbart’s ARC projeects, hard-disk images along with collections of personal papers such as those of Jef Raskin and Mark Weiser, e-mail archives, … Each of these formats requires special strategies for evaluating, recovering, stabilizing, possibly reformatting, and indexing content. In some cases, the strategies do not yet exist … (p. 17)

The main problem is the medium of study. “Traditional models of access focused on the service desk and reading room as means of mediating complex systems of indexing and identification of materials, as well as supervised reading, fall apart in delivery contexts shaped by computer hardware and virtual libraries of born-digital materials.” (p. 18) The practices of historians are also formed by the medium of their archives. Software is used not read, and software archives are more likely to look like the historical woodworking shop at Williamsburg where tools are tried in traditional practices than library reading rooms.

This article cites two others that are important, Weiser’s The Computer for the 21st Century from Scientific American (1991) which talks about “ubiquitous computing”; and Kittler’s There Is No Software from C-Theory: Theory, Technical, Culture 32 (Oct. 1995). Lowood ends by countering Kittler to the effect that “Kittler’s admonition that ‘there is no software’ provides little relief to archivists and librarians who discover that there is more of it than they can handle.” (p. 20)

Innovation in Information Technology

Innovation in ITThe 2001 report from the Computer Science and Telecommunications Board of the National Research Council (of the National Academies of the USA), Innovation in Information Technology, has interesting charts about how key technologies like the Internet benefited from government research support. See Figure 1. The report introduces the Figure thus,

Figure 1 illustrates some of the many cases in which fundamental research in IT, conducted in industry and universities, led 10 to 15 years later to the introduction of entirely new product categories that became billion-dollar industries. It also illustrates the complex interplay between industry, universities, and government. The flow of ideas and people—the interaction between university research, industry research, and product development—is amply evident. (Chapter 1)