An alternate beginning to humanities computing

Reading Andrew Booth’s Mechanical Resolution of Linguistic Problems (1958) I came across some interesting passages about the beginnings of text computing that suggest an alternative to the canonical Roberto Busa story of origin. Booth (the primary author) starts the book with a “Historical Introduction” in which he alludes to Busa’s project as part of a list of linguistic problems that run parallel to the problems of machine translation:

In parallel with these (machine translation) problems are various others, sometimes of a higher, sometimes of a lower degree of sophistry. There is, for example the problem of the analysis of the frequency of occurence of words in a given text. … Another problem of the same generic type is that of constructing concordances for given texts, that is, lists, usually in alphabetic order, of the words in these texts, each word being accompanied by a set of page and line references to the place of its occurrence. … The interest at Birkbeck College in this field was chiefly engendered by some earlier research work on the Dialogues of Plato … Parallel work in this field has been carried out by the I.B.M. Corporation, and it appears that some of this work is now being put to practical use in the preparation of a concordance for the works of Thomas Aquinas.
A more involved application of the same sort is to the stylistic analysis of a work by purely mechanical means. (p. 5-6)

In Mechanical Resolutions he continues with a discussion of how to use computers to count words and to generate concordances. He has a chapter on the problem of Plato’s dialogues which seems to have been a set problem at that time and, of course, there are chapters on dictionaries and machine translation. He describes some experiments he did starting in the late 40s that suggest that Searle’s Chinese Room Argument of 1980 might have been based on real human simulations.

Although no machine was available at this time (1948), the ideas of Booth and Richens were extensively tested by the construction of limited dictionaries of the type envisaged. These were used by a human untutored in the languages concerned, who applied only those rules which could eventually be performed by a machine. The results of these early ‘translations’ were extremely odd, … (p. 2)

Did others run such simulations of computing with “untutored” humans in the early years when they didn’t have access to real systems? See also the PDF of Richens and Booth, Some Methods of Mechanized Translation.

As for Andrew D. Booth, he ended up in Canada working on French/English translation for the Hansard, the bilingual transcript of parlimentary debates. (Note that Bill Winder has also been working on these, but using them as source texts for bilingual collocations. ) Andrew and Kathleen Booth wrote a contribution on The Origins of MT (PDF) that describes his early encounters with pioneers of computing around the possibilities of machine translation starting in 1946.

We date realistic possibilities starting with two meetings held in 1946. The first was between Warren Weaver, Director of the Natural Sciences Division of the Rockefeller Foundation, and Norbert Wiener. The second was between Weaver and A.D. Booth in that same year. The Weaver-Wiener discussion centered on the extensive code-breaking activities carried out during the World War II. The argument ran as follows: decryption is simply the conversion of one set of “words”–the code–into a second set, the message. The discussion between Weaver and A.D. Booth on June 20, 1946, in New York identified the fact that the code-breaking process in no way resembled language translation because it was known a priori that the decrypting process must result in a unique output. (p. 25)

Booth seems to have successfully raised funds from the Nuffield Foundation for a computer at Birkbeck College at the University of London that was used by L. Brandwood for work on Plato, among others. In 1962 he and his wife migrated to Saskatchewan to work on bilingual translation and then to Lakehead in Ontario where they “continued with emphasis on the construction of a large dictionary and the use of statistical techniques in linguistic analysis” in 1972. They retired to British Columbia in 1978 as most sensible Canadians do.

In short, Andrew Booth seems to have been involved in the design of early computers in order to get systems that could do machine translation and that led him to support a variety of text processing projects including stylistic analysis and concording. His work has been picked up as important to the history of machine translation, but not for the history of humanities computing. Why is that?
In a 1960 paper on The future of automatic digital computers he concludes,

My feeling on all questions of input-output is, however, the less the better. The ideal use of a machine is not to produce masses of paper with which to encourage Parkinsonian administrators and to stifle human inventiveness, but to make all decisions on the basis of its own internal operations. Thus computers of the future will communicate directly with each other and human beings will only be called on to make those judgements in which aesthetic considerations are involved. (p. 360)

Epstein: Dialectics of “Hyper”

Mikhail Epstein Hyper in 20th Century Culture: The Dialectics of Transition From Modernism to Postmodernism (Postmodern Culture 6:2, 1996) explores “the intricate relationship of Modernism and Postmodernism as the two complementary aspects of one cultural paradigm which can be designated by the notion ‘hyper’ and which in the subsequent analysis will fall into the two connected categories, those of ‘super’ and ‘pseudo.'” (para 7) Epstein plays with “hyper” as a prefix meaning that excess that goes beyond a limit then reflecting back on itself. Modernist revolutions overturn the inherited forms in a search for the “super” which in their excess zeal pass a limit becoming simulations of themselves or “pseudo”. The hyper encloses both the modernist search for the super truth and the postmodernist reaction to the simulations of modernity. The postmodern play on the excess depends on the modernist move for matter to the point where it serves to heighten (another meaning of hyper) the super-modern. Super and pseudo thus become intertwined in the ironic hyper.

In the final analysis, every “super” phenomenon sooner or later reveals its own reverse side, its “pseudo.” Such is the peculiarly postmodernist dialectics of “hyper,” distinct from both Hegelian dialectics of comprehensive synthesis and Leftist dialectics of pure negation. It is the ironic dialectics of intensification-simulation, of “super” turned into “pseudo.” (para 60)

Epstein looks at different spheres where this hyper-unfolding takes place using the word “hyper-texuality” in a different sense than how it is usually used for electronic literature. For Epstein hypertextuality describes a parallel process that happened in Russia and in the West where first modernist literary movements (Russian Formalism and Anglo-American New Criticism) stripped away the historical, authorial, and biographical to understand the pure “litterariness” of literature. The purification of literature left only the text as something “wholly depednent on and even engendered by criticism.” (para 21) “Postmodernism emerged no sooner than the reality of text itself was understood as an illusionary projection of a critic’s semiotic power or, more pluralistically, any reader’s interpretative power (‘dissemination of meanings’).” (para 25)

Epstein quotes Baudrillard about the net of mass communication replacing reality with a hyperreality, but doesn’t explore how the hyper in his sense is connected to the excess of networked information. It is in another essay, “The Paradox of Acceleration” that we see a clue,

Each singular fact becomes history the moment it appears, preserved in audio, visual, and textual images. It is recorded on tape, photographed, stored in the memory of a computer. It would be more accurate to say that each fact is generated in the form of history.

Ultimately, inscription of the fact precedes the occurrence of the fact, prescribing the forms in which it will be recorded, represented, and reflected.” (p. 179)

The ironic tension of the modern and postmodern is magnified by the hyper-excess of automated inscription. The excess of information is deadening us to the human in history as an unfolding. We are in a baroque phase where the only thing valued is the hyper-excess itself. Excess of archiving, excess theory, excess of reference, excess of quotation, excess of material, excess of publication, excess of criticism, excess of attention … but no time.

What next? Will we see the burning of books or a “simple thinking” movement? How do people react to an oppressive excess?

The essay in PMC is excerpted from an essay, “The Dialectics of Hyper: From Modernism to Postmodernism.” in Russian Postmodernism; New Perspectives on Post-Soviet Culture. Ed. M. Epstein, A. Genis, and S. Vladiv-Glover. New York: Berghahn Books, 1999. p. 3-30.

The essay on acceleration is, “The Paradox of Acceleration.” also in Russian Postmodernism. p. 177-181.

Long Bets Now

Have you ever wanted to go on record with a prediction? Would you like put money (that goes to charity) on your prediction? The Long Bets Foundation lets you do just that. It is a (partial) spin-off of The Long Now Foundation where you can register and make long-term predictions (up to thousands of years, I believe.) The money bet and challenged goes to charity; all you get if you are right is credit and the choice of charity. An example prediction in the text analysis arena is:

Gregory W. Webster predicts: “That by 2020 a wearable device will be available that will use voice recognition capability and high-volume storage to monitor and index conversations you have or conversations which occur in your vicinity for later searching as supplemental memory.” (Prediction 16)

Some of the other predictions of interest to humanists are: 177 about print on demand, 179 about reading on digital devices, and 295 about a second renaissance.

The Long Bet has some interesting people making predictions and bets (a prediction becomes a bet when formally challenged) including Ray Kurzweil betting against Mitch Kapor that “By 2029 no computer – or “machine intelligence” – will have passed the Turing Test.” (Bet 1)

Just to make life interesting there is a prediction 137 that “The Long Bets Foundation will no longer exist in 2104.” 63% of the voters seem to agree!

International Network of Digital Humanities Centres

There is a call circulating to set up a International Network of Digital Humanities Centres which looks like a good thing. It is in part a response to the Cyberinfrastructure report. The initiatives they imagine such a network being involved in are:

  • workshops and training opportunities for faculty, staff, and students
  • developing collaborative teams that are, in effect, pre-positioned to apply for predictable multi-investigator, multi-disciplinary, multi-national funding opportunities, beginning with an upcoming RFP that invites applications for supercomputing in the humanities
  • exchanging information about tools development, best practices, organizational strategies, standards efforts, and new digital collections, through a digital humanities portal

Towards a pattern language for text analysis and visualization

One outcome of the iMatter meeting at Montreal was . I have started a white paper on TADA that tries to think towards a Pattern Language for Text Analysis and Visualization. This white paper is not the language or a catalogue of patterns, but an attempt to orient myself towards what such a pattern language would be and what the dangers of such a move would be.

Interactive Matter Meeting

iMatter LogoThis weekend I participated in an Interactive Matter (iMatter) meeting at Montreal. The meeting was to figure out next steps on the project after our SSHRC proposal was unsuccessful.

Lynn Hughes and Jane Tingley of Concordia organized meetings at and tours of some of new media organizations in Montreal including:

  • Hexagram where we saw the textile labs, robot palace, machine shops, rapid prototyping lab, computer-controlled looms and so on. Very impressive facilities and research projects.
  • OBORO, a new media artists centre with great video and sound facilities.
  • Fondation Daniel Langlois where we got a tour of the Centre for Research and Documentation (CR+D) which collects materials (including grey matter) about new media art. I was dissappointed to learn that, on the issue of new media preservation, they haven’t really advanced past the The Variable Media Network discussion published in Permanence Through Change in 2003. They are just storing digital things in a cold dark room for the moment and concentrating on documentation.

One thing that is clear is the critical mass of artists, independent game developers, historians, philosophers, and organizations in Montreal. Montreal even have a Cit?© Mltim?©dia where the city is revitalizing an old industrial quarter to be a multimedia incubator. This public investment in multimedia arts, technology, and organizations stands in contrast to the lack of interest in cultural industries elsewhere.

Harpham: Science and the Theft of Humanity

American Scientist Online has a provocative essay by Geoffrey Harpham on Science and the Theft of Humanity. In it he argues that the humanities that take “human beings and their thoughts, imaginings, capacities and works as its subject” (Page 2) are experiencing a poaching from certain sciences and that this is a good thing. This poaching is “the most exciting and unpredictable unintended consequence of disciplinarity” as new disciplines that don’t fit with the old “gentleman’s agreement” as to who studies what begin to cross boundaries. (Page 3)

They–we–must understand that while scientists are indeed poaching our concepts, poaching in general is one of the ways in which disciplines are reinvigorated, and this particular act of thievery is nothing less than the primary driver of the transformation of knowledge today. (Page 4)

This poaching is not just a counterattack from the sciences threatened by the “debunking attention” of humanities disciplines. It is a symptom of how the disciplinary divisions encoded in the post WW II university don’t fit fabric of current research. Humanities computing is but one case of an emerging discipline that doesn’t fit the humanities, science division. (For that matter I don’t think computer science does either.)

One of the most striking features of contemporary intellectual life is the fact that questions formerly reserved for the humanities are today being approached by scientists in various disciplines such as cognitive science, cognitive neuroscience, robotics, artificial life, behavioral genetics and evolutionary biology. (Page 3)

I found this looking at the ASC (Autonomy | Singularity | Creativity) site of the National Humanities Center.

TextAnalyst – Text Mining or Text Analysis Software

Screen of TextAnalystTextAnalyst is text mining and analysis software from Megaputer. It is hard, without buying it, to tell what it does. They do have what sounds like a neat plug-in for IE that does analysis on the web page you are looking at (see screenshot with this post.) The plug-in, TextAnalyst for Microsoft Internet Explorer summarizes web pages, provides a semantic network and allows natural language querying.