High Performance Computing in the Humanities

High Performance Computing and Grid Computing are two terms used to describe new approaches to the use of computing in research, primarily in the sciences and engineering. These terms refer to trends at the high end of research computing where often unique systems are put together to solve computationally complex problems faster. Supercomputing, as it used to be called, is focused both on certain grand challenge problems like protein folding and weather modeling where computation can make difference, but is also concerned with computation and processing speed in and of themselves, developing new ways of solving problems quickly through parallel processing on grids and clusters of often off-the-shelf PCs.

Why is this of interest to literary and linguistic research? What is literary about quest for computational speed?

First, high performance computing is a subculture of the university that has an influence on cyberculture both through their metaphors, competitions and rhetoric, and through engineering achievements. HPC consortia are often behind the dramatic visualizatiions that become visual icons for the future of computing. Competition to have one of the world’s fastest supercomputers is a matter of national technological pride and can lead to engineering breakthroughs that become available is systems we use everyday. The challenge of speed is one of those iconic modernist challenges at the edge of computing culture that influences the language we use and science fiction literature of cyberculture.

Second, HPC is used in everyday applications like Google and these techniques are being adapted to large scale text challenges. One textual challenge that is computationally tractable is that of indexing all of the worlds accessible web pages while allowing millions of people to search the ever increasing index. While we don’t know what Google’s system looks like, it is bound to be some form of processing “farm” with teraflops (a trillion floating point operations per second) of processing power. On a smaller scale, digital humanities researchers have been developing distributed grid systems like TAPoR (Text Analysis Portal for Research) for text analysis and, like the NORA project, adapting data mining techniques to the study of large scale literary collections. Developing a community of researchers able to work on the breadth of evidence of human language, literature and visual culture is a Grand Challenge, probably of greater interest to the wider community than tomorrow’s weather (or perhaps not.) William A. Wulff, President of the National Academy of Engineering in a submission to Congress in 1998 on High Performance Computing wrote,

Many of the high visibility applications of IT currently are in the sciences and they deserve continued support. But the humanities offer a new opportunity to explore how information technology can be employed in fundamentally different ways that will provide fresh insights and enrich research in other applications. Effectively representing and enhancing an understanding of the human record presents an interesting challenge to information technology research. In addition, attention must be paid to how to archive digital materials and preserve them for the future. What is required is not only research into technologies that will assist in maintaining digital material over time, but also the creation of an infrastructure that will support digital archives.

The availability of a critical mass of the human cultural record in digital form gives us the change to ask new types of questions of large scale collections. The scale of research could change from the individual author and work to geospatial and diachronic collections with distributed computing. What sort of difference would that make?

This was written for a possible session for the MLA in 2007 as I meditated on my previous post on High Performance Computing.

Some useful links are Grid Computing in the Wikipedia, Supercomputer also in the Wikipedia, and the Top 500 Supercomputer Sites