The Index Thomisticus as Project

This is a story from early in the technological revolution, when the application was out searching for the hardware, from a time before the Internet, a time before the PC, before the chip, before the mainframe. From a time even before programming itself. (Winter 1999, 3)


Father Busa is rightly honoured as one of the first humanists to use computing for a humanities research task. He is considered the founder of humanities computing for his innovative application of information technology and for the considerable influence of his project and methods, not to mention his generosity to others. He did not only work out how use the information technology of the late 1940s and 1950s, but he pioneered a relationship with IBM around language engineering and with their support generously shared his knowledge widely. Ironically, while we have all heard his name and the origin story of his research into presence in Aquinas, we know relatively little about what actually occupied his time – the planning and implementation of what was for its time one of the major research computing projects, the Index Thomsticus.

This blog essay is an attempt to outline some of the features of the Index Thomisticus as a large-scale information technology project as a way of opening a discussion on the historiography of computing in the humanities. This essay follows from a two-day visit to the Busa Archives at the Università Cattolica del Sacro Cuore. This visit was made possible by Marco Carlo Passarotti who directs the “Index Thomisticus” Treebank project in CIRCSE (Centro Interdisciplinare di Ricerche per la Computerizzazione dei Segni dell’Espressione – Interdisciplinary Centre for Research into the Computerization of Expressive Signs) which evolved out of GIRCSE (Gruppo not Centro – or Group not Centre), the group that Father Busa helped form in the 1980s. Passarotti not only introduced me to the archives, he also helped correct this blog as he is himself an archive of stories and details. Growing up in Gallarate, his family knew Busa, he studied under Busa, he took over the project, and he is one of the few who can read Busa’s handwriting.


Original GIRCSE Plaque kept by Passarotti

Note that all the images are used with permission. See the note on Image permissions at the end.

I also have to thank Paolo Senna, the chief archivist at the U. Cattolica who selected and gave me access to materials knowing my interests and Angela Maria Contessi who explained the work they are doing to stabilize and document the collection. The archive is extensive and they need our help to make it accessible. That help can take the form of imagining how the archive can help us understand the field and in this regard I have to also thank Julianne Nyhan (see her blog entries here) and Steve Jones (see book here) in whose steps I follow. They brought back word of its riches and the generosity of its keepers. (links here)

Why focus on a project and not the archive or the man? First, I view the project as the defining unit of digital humanities work. We don’t imagine our work in terms of books or works of art or performances or even great individuals whose thoughts are represented in other works. We formalize and play out research questions through projects that involve shifting groups of people and deliverables. These projects are associated with leaders, but they are forms of distributed cognition where no one person can be said to have thought it all out. I believe this is as true of the Index Thomisticus as of any other project, though Father Busa’s leadership and management should not be discounted. Further, to understand the field as it understands itself you have to look at the projects, not the personalities, not the publications, or the databases. This is my beef with Stanley Fish’s take on the digital humanities – he looks to texts as a way to define the digital humanities and in critiquing them he misses the dispositifs that are actually influential. If we are to understand the impact of the field on the humanities we have to understand how projects like the Index Thomisticus were imagined, organized, implemented, evolved and eventually ended.

Which brings me back to the Index Thomisticus. I had suspected that it was for its time a major project, but I didn’t realize the scope of project or its impact in its time, even by today’s standards. Depending on how you account for scope, the Index Thomisticus may have been the largest and longest lasting project in the field. This blog essay is in many ways an outline of that scope and the questions raised about DH projects and their study made visible by such a large project. I call it a historiography as the questions raised then trigger questions about what evidence we have of DH projects and what we should try to preserve of the projects of our times. The logic behind this essay therefore goes something like this:

  • Computing in the humanities can be followed through digital projects.
  • The Index Thomisticus project was not only the first such project, but a defining one that influenced the design of others. It is therefore an important place to start.
  • The organization of projects should be studied not only in retrospective descriptions, but in the original planning and implementation documents that explicitly discuss the goals of the project. In this case we can also study the organization of the archives themselves, an organization developed and documented by Busa himself during the project. The organization of the archive is thus part of the project and therefore important evidence for the organization of the project as it unfolded.
  • Our memory infrastructure is generally designed to preserve the retrospective celebrations of projects and their results – that which is published, not the resulting artefacts (including the digital artefacts) or the working documents. We preserve books and articles not software or project plans. Without archives of the workings of projects we don’t really have knowledge of the project (as opposed to, for example, the knowledge generated by the project.) Which is why such archives are important if we want to understand projects as projects.
  • The Busa Archive, being one of the most complete and one of the oldest, is therefore all the more important. We can learn from it about how Busa and his collaborators developed the idea of a humanities computing project and its archive. This can provide us a starting framework for asking about what we have and know about other projects – what I am calling the historiography of the field.
  • We can also learn from it how the project promoted itself, explained itself, and how Busa communicated ideas about how to use computing in the humanities. One of the side-effects of the support from IBM is that IBM wanted publicity and influence in return. I believe we can see in the archives a symbiosis between the researcher in need of support and a company looking to expand the applications of its technology and gain legitimacy by promoting innovative uses for its equipment. As such the archive gives us a way into understanding the small community of people and organizations looking at automation and language.

All of these points remain to be proved or at least discussed, but they provided me with a way to think about projects and the study of projects through the archive. Without a framework of hypotheses and questions we are tempted to believe what is left of projects or the stories told about them, at least those that survive. Only with a framework, however tentative, can one ask what is missing. Only with a framework can one begin to appreciate the complexity of the project, the work of preserving its traces, and the distortions inevitable in any archive. More concretely, in the planning documents, the correspondence, the bills, the staffing plans and other such documents one can find a more nuanced view of a pioneering project. One can also compare the agenda or plan of the project to what it became and what it ended up as. The differences between plans, beginnings, ends and retrospective descriptions is what Mahoney tells us we should look at to understand the richness of history. How were projects imagined in their time and how is that different from how they are celebrated after they are over?


For a project that can be said to be the beginning of the systematic use of computing in the humanities, there is relatively little of the beginning of the project in the archives. The archive seems to start after the beginning of project. The project creates a context for organizing documents that can become an archive. Some letters exist from before 1949, but not much. Passarotti has identified a first letter from the 1st of November 1948 addressed to Pater (father) Peter O’Reilly, University of Notre Dame, Indiana (USA). In particular, in terms of the project, we are missing documents around the initial encounter with Watson at IBM, what Busa requested for support, or what IBM agreed to do. Passarotti believes that Busa would have kept these materials and therefore they must have been misplaced or be at another site. Given how systematic Busa was, it stands to reason that he would have filed the IBM stuff. The question is whether it can be found. Thus the beginning of the beginning is hidden for now.

What we do have are all of Busa’s publications including those from before the beginning of the project such as his PhD Dissertation which is published in 1949 under the title of La Terminologia Tomistica dell’Interiorità: Saggi di metodo per una interpretazione della metafisica della presenza (Milano, Bocca, 1949.) For his thesis he created his own card index by hand which was supposed to be made up of 10,000 cards, but we don’t have that either.

Of course this raises questions about what really is the beginning of the project. One could say, as Winter (1999) does, that it begins in 1951 with the proof-of-concept first machine-generated concordance (more on this later), or before that in 1949 when Thomas J. Watson Sr. assigns Paul Tasman to help Busa or even in the mid-1940s when Busa writes his PhD thesis. These days much is made of beginnings because one can make genetic arguments from them along the lines of the internet is a military technology because it was imagined during the cold war. Beginnings are easier to interpret as you wish as they aren’t well documented and they don’t depend on something that came before. Beginnings are their own form of memory limit before which there is nothing but definition, by definition, even if there always is yet another candidate beginning or cause. Either way, they are fascinating both for their mystery and for the problems of defining just what is the beginning when that beginning will define the project. For this reason we look to archives to hold their traces or at least to have holes where they could be.

The historiographical question is what sorts of documents do we have for the beginnings of projects starting with Busa’s project. Grant applications seem an obvious candidate these days since they can signal the formal request for (and receipt of) funding with which to start a project, but we all know how grant proposals are rarely the real birth of the idea or, for that matter, a true description of the idea of the project. Busa’s project didn’t get a grant in the traditional sense, but it would be interesting to see what documentation Busa provided IBM (and his superiors for institutional support, for that matter) and what their response was.

For all these reasons, we should take better care of initial conceptual documents be they grant proposals or meeting notes at which a decision is taken. These documents are often neglected by project leaders because we want to control the stories of our origins, because we don’t have the time to organize documents when we are trying to get things off the ground, and we like to keep grant proposals to ourselves for later adaptation and reuse. Despite the temptations to keep these to ourselves we really should be all the more willing to share the messy birthing documents if we care about the truth of projects.


I visited the archive in the hopes of being able to find documents that precisely described how the Index Thomisticus entered and then processed data in order to generate a concordance for publication. I was starting with Tasman’s important article from 1957 in which he discusses (and shows) some of the punch cards used and outlines how the cards are processed. This article gives an overview, but there isn’t enough detail to be able to reconstruct the entire method. I foolishly thought there would be more detailed materials in the archives to help me be able to model the method, but alas what I found were fairly high-level descriptions sent to others enquiring how Busa did things. The punch cards themselves are not in the archive, though Passarotti kept a small bunch of which there seem to be two different types of cards – phrase cards and word cards. Alas, neither seems to match the cards reproduced in Tasman’s article of 1957. Here are some pictures I took of the two types of cards.


Front of Phrase Card


Back of Phrase Card with Phrases


Word Card for “CLAUSIT”

Thanks to Passarotti I found a detailed discussion of the method Busa and Tasman developed for the 1951 concordance of Aquinas’ Hyms mentioned above. This concordance is subtitled “The First Index of Words Automatically Composed and Printed by IBM Punched Card Machines.” The Introduction to this concordance has parallel English and Italian text describing in detail how they used electro-mechanical sorting and printing machines to generate the concordance. I plan to discuss this and compare it to other descriptions of their method in another post, here I want to emphasize that the machines used for the 1951 concordance were not computers, but punched card processing machines that evolved from the methods Hollerith developed for the 1890 census. The 1951 concordance and thus the Busa project occupies therefore a liminal place between mechanical computing and digital computing. The method was imagined on hand-written cards for Busa’s thesis, prototyped on punched cards and card sorting machines and then adapted to computers with tape drives.

What the project does have is tapes as the project transitioned from punch cards to tape. These might, if read, give us hints as to what was on the cards, if they preserved the data format from the cards. They would also, of course, help us understand the data structures and methods. Passarotti also has administrative data from Busa’s personal computer and showed me Busa’s address book which seems to have been printed in Courier from a computer file Busa kept. This address book included names of partners and children. Busa cared for his network of friends and colleagues.

As with beginnings, we can reflect on the gap in the archives. Why are the cards missing and what else of the method is missing? The reason for the missing cards is, I think simple; the cards took up too much space and the data was better stored on tape. Why keep more than a few keepsake cards once the card processing machines are gone? Why keep a bulky form of data when the project had moved to a more efficient one. That said, there could be other sets of cards kept as keepsakes.

The other thing that we don’t have are the machines and how they were set up to run processes. Programming, before ideas about programmable computers influenced the design of the machines, meant physically setting up machines which also meant that you couldn’t easily save configurations. The programming, as we call it today, was therefore in the cards and machines and not in source code one can easily preserve, read, interpret and try to reimplement. The machines themselves, of course, would not have been kept once no longer needed. They probably would have been returned to IBM for use in other projects. What we do have is information about the types of machines (see the next section), but I didn’t see any documents describing the physical set up. This may have been obvious or tacit knowledge or something one could find in manuals. It could be that the programmers were all IBM technicians who then kept their documents out of the Busa archive. It could also be that I didn’t look in the right place. Either way, we can actually get fairly close to the methods from the careful description in the Introduction to the 1951 concordance. There Busa talks about the setting up of panels (for machines) and describes what the machines would do once set up. He points out that once the machines are set up they can process any number of cards without more set up.

It, must be borne in mind that the amount of human work entailed, by all this processing in the sorter and setting up of the reproducer panels – about two persons’ one day work — remains unchanged, notwithstanding the increased number of cards. (Busa 1951, 26)

Given that we have some descriptions of the machines (names and model numbers) and documentation for them should be available elsewhere it should be possible to infer how they might have been set up and what was involved in setting up the panels.

In terms of the historiography, we should ask what sorts of documents would help us bridge the pseudo-code of methods as described in publications to that which was actually done. We know from “Per Completare Lo Index Thomisticus Per L’Exposizione Mondiale Di New York 1964 – 1965” (To Complete the Index Thomisticus for the World Expo of New York 1964 – 1965) that there were flowcharts, instruction manuals for various phases, and a utility program 7090 PLANT (Program for Linguistic Analysis of Natural Texts). Particularly interesting would be documents that describe or instruct for actual practice (rather than the ideal of practice we share in publications.) We might ask why these documents weren’t preserved or whether they are preserved in some other way? What is clear is that the method of the first punch-card process was very different from what we call method today. It took far more people, and in different roles. It had a different material presence in the punch-card and large machines. Computing hadn’t yet been dematerialized into bits in the cloud. For Busa data was holes in cards. Stepping back, we can see that the full set of stages involved large groups of people and expensive machines in parallel. One can see why Busa was writing at this time about cybernetics – he was building a project that was a large hybrid of machines and people that had to be managed.

Budget, Machines and Staffing


Organigramma of the Proposed New York Expo Subproject

One of the few true project management documents that we have is a fascinating proposal “To Complete the Index Thomisticus for the World Expo of New York 1964 – 1965”. This 21 page document appears to be a proposal from 1962 to finish part of the Index Thomistic in time to be shown at the Expo in the IBM pavilion. It was in a Miscellaneous binder with other items. The proposal outlines the work to be done, the costs of the project up to 1962, the costs for the rush 2 year project, the machines needed and the personnel needed to complete a prototype for the Expo. As such the document doesn’t really describe the project so much as an imagined (and not implemented) sped-up version of project. It does, however give us a concrete idea of the costs, people and machines that went into the processes. Here is a chart of the people needed and for how long. Busa imagined ramping up from a staff of 33 already employed at the time to 70, for at least the 6 months it took to enter a lot of the texts and program the machines.


Personnel Needed Over Two + Years

A lot more can be inferred about the project from this document, and I plan a longer blog essay just on this document. For the moment it should be noted that this document could be compared to photographs of the project space at Gallarate and bills in the archives to build a clearer picture of the people (and types of roles) and machines actually involved over time.

Such a document again raises historiographic questions about planning and administrative documents. In two notes to the budget sheets Busa mentions how this differs from information presented by Mr. Tasman to IBM. This suggests that there were regular reports to the stakeholders and funders, notably IBM and the supporting church organizations. These documents may well be in boxes I didn’t consult or in the archives of the respective institutions. If we don’t have them then we need to ask why not? Either way, such documents form an important source of information about projects like the Index Thomisticus where there were stakeholders.


One type of document that Busa was careful to keep and file was his correspondence with others interested in the automated processing of linguistic data. I looked at two binders of early correspondence and it is beyond this blog essay to talk about the richness of interactions therein. Instead I will note some of the interactions that relate to how the project was managed. It should be noted that in the boxes I looked at the correspondence was organized by country and in reverse chronological order as letters received, copies of letters sent, translations and notes were added to a ring binder. It should also be noted that this is what I would call the scientific correspondence. As will be discussed later, there are also binders of administrative correspondence having to do with sales. Occasionally letters in one binder deal with things in other boxes as, for example, letters with invitations to conferences that are separate from the conference binders. Here are some of the letters that interested me as an illustration of his development of method and management of the project.

There are some letters from folk at the Cambridge Language Research Unit that Margaret Masterman led. One of them is from Mary Beasley thanking Father Busa after a visit to Gallarate. It would be intriguing to learn more about the connection to the CLRU, itself an extraordinary unit loosely connected to Cambridge University. Were they sharing methods? I am also curious as to who Mary Beasley was as I can’t find a mention of her in the documents I have found about the CLRU.

In a 1964 Elisabeth Halsall wrote Busa a letter to which he replied about A. Q. Morton’s authorship study of Paul’s letters that was being reported on in the popular press at the time. In both letters there are some preliminary thoughts about stylistics or the statistical study of texts. Neither Halsall or Busa seem to think much of Morton’s stylistics. Halsall was going to review the project and it would be interesting to find the review and, for that matter, follow Halsall’s career and see what else she did as she was interested in statistics.

The Halsall exchange was one of the few suggestions of criticism from Busa that I found in the correspondence. Most of the time Busa is building links and generously helping people. The other critical note is a strange missing note about an Italian philosopher Silvio Ceccato who had a lab in Milan that was funded for a while by the US Air Force. In an exchange with Allen Kent from April of 1959 it seems that Busa sent along with a letter explaining why he couldn’t attend a conference a confidential note to be destroyed with his opinion of Ceccato’s work. Kent’s reply includes a closing paragraph about Ceccato and how “I hope that we will be able to keep him ‘contained’ if his contribution is not a positive one.” Reading about Ceccato online I suspect that Ceccato had a very different idea about how computers should be used to process language than Busa did. Ceccato was developing a model for machine translation that involved an intermediate metalanguage. Perhaps he thought that concordances weren’t innovative enough. The text of Ceccato’s essay “The Machine Which Observes and Describes” is also in the archive. It would be interesting to compare the different methods to language computing these two took.


Letter from Howard Comfort to Busa from 1956

 One set of exchanges that are important to understanding the project as project are the letters Busa sent to people asking them for opinions on the project and asking them to send letters supportive of the project to Watson at IBM. We have copies of the letters that Busa sent to friends of the project earlier in 1952 with suggestions as to what they could write to Watson. We have letters like the one above from the classicist Howard Comfort telling Busa that he wrote to Watson, though this is probably from a later prompt. (Personal disclaimer: I lived in Comfort Hall or one of the other identical halls my first year as a Haverford student, though Comfort Hall may have been named after Howard Comfort’s father.) We have copies of the letters some sent to Watson and what look to be replies from Tasman (in a lovely circularity.)


Example of copy of prompting note from Busa from 1952

We even have one nasty note from a Chair of Philosophy, also a Jesuit, who didn’t think much of the proposed project along with the many more positive notes. This was addressed to the Rector of Loyola University of Los Angeles who solicited his Philosophy Chair’s opinion and returned it to Busa. It is an example of the rather limited imagination of Busa’s contemporaries. Today I doubt many would think that attention to all words is “a sort of fetish.” This note gives us a sense of the assumptions around research practices of the time. Busa is ahead of his time imagining new methods.


Philosophical Comment on the Proposed Index

I describe these exchanges because they show how Busa was contacting scholars and interested parties for letters of support for the project of different sorts. I’m guessing he wanted to maintain the support from IBM by showing how the scholarly community (excepting philosophers) was supportive. This was but one way Busa recognized the Watsons and IBM. The Archives show us traces of how Busa maintained support in other ways over the years. In the correspondence and photographs of events we see how IBM was in events and I believe the Watsons were even formally knighted by the Church for their service.

Another thing that stands out in the part of the collection that I skimmed is the international range of people with which Busa was in correspondence. If one takes Margaret Stevens important 1965 Automatic Indexing: A State-of-the-Art Report as a comparator, many of the people, including Stevens herself, are in touch with Busa. It was a fairly small world and they knew of each other’s work. Busa’s archives could be used to develop a social network of the research of the 50s and 60s around language engineering and indexing. It could help us discover the lost directions of research or lost projects, such as Ceccato’s. It could help us understand how they imagined the computer could help with language and literature at the time.

Design and Printing

The Index was designed from the beginning to produce a printed concordance. This concordance would not simply be a printout from the computer, but was designed to be an attractive and efficient print concordance. I was able to go through one box of materials related to the design of the different types of pages and to the phototypset printing of the final concordance. I am not sure of the right name for the technology, but it involved printed negatives and positives on transparencies. I don’t have the background in graphic design or the printing technologies of the time to appreciate the materials but it is clear that significant work went into designing the complex tables so that the information would be easy to consult. There are numerous printouts with marks on them showing that someone (possibly Busa) was continually trying to find the optimal fonts, line spacing and so on. There are some final test runs that record the speed at which pages were exposed or printed using photographic technology. I am assuming the idea was to find the highest speed that still provided a legible result. Of particular interest are the specifications and letters altering specifications. Someone with a knowledge of the technologies of the time might be able to reconstruct the series of decisions that went into designing the Index.


First Page of the Specifics for the Distribution Indexes

From the perspective of the project it is worth noting that they were not just duplicating a computer printout. Busa wanted to produce concordances that were as beautifully designed as those done by hand. He seems to have been experimenting with various computer typesetting systems and benefitted from the expertise of others. Passarotti tells me that recently an engineer who working on the computer typesetting donated some of his archives related to the Index. Historiographically I consider these very important as design always seems to get left out of our understanding of the research value of projects. Whether the project goes to print or web, the design is seen as separate from the “content”. One could argue that the separation of design from scholarship is deeply encoded in our understanding of information and in our print infrastructure. In the case of the Index we can overcome that divide, a divide that I don’t think existed in Busa’s mind.


While the beginnings of the project may be lost the endings are not. One collection of boxes I did go through were those documenting the sales of the finished Index directly from Gallarate. While the Index was published and sold by the German publisher Frommann-Holzboog, the project was allowed to sell in Italy up to 30 subscriptions directly (and at a discount.) Here is a letter outlining the costs and discount.


Letter to the Library of the Teresianum

A number of things should be noted about the sales process:

  • A project like the Index Thomisticus is not over when the computing is done. There was the work of designing and printing the Index and then selling it through their German publisher and directly.
  • To buy the Index you subscribed to the volumes, paying for them as they came out. It isn’t clear to me whether they knew how many volumes there would be all told or not.
  • They had a nice postcard for subscribing that you could fill out.
  • There was an administrative unit that dealt with sales and correspondence around sales. R. Padre (Father) Enrico Pozzi ran this administrative unit of the CAAL (Centro Automazione Analisi Linguistica – Centre for Automation Linguistic Analysis) until 1979 when he passed away and was replaced by Padre Armando Cattaneo.
  • Others were also involved including Dr. Maddelena Garcéa who signs the letter above. It would be interesting to find out more about her background and role.

Between the budgets we have, the sales documentation, and the bills/receipts (which I didn’t look at) we could probably develop a financial overview of the project and perhaps even a more accurate picture of who was involved based on salaries paid. Appreciating the costs and income sources for the project is important in such a large project that had to pay for staff and machines. Humanists are loath to talk too much about money, except to boast about the grants they get, but I would suggest that budgets are one of the articulation points for projects. They may not make or break a project, but they certainly change it. If IBM had funded the sped-up New York Expo version of the project proposed in 1962 how would that have changed things?

A note on all the collaborators and administration: I suspect Father Pozzi and Dr. Garcéa were important to the project as administrators who handled a lot of the day-to-day work, but such carework gets overlooked unless we ask about how the project actually ran. Julianne Nyhan has interviewed some of the punch-card operators who are still alive. What other roles were central to the project and what can we learn about them? How about the priests from Busa’s order? How much of their work and organization came from the religious organization? And IBM and Tasman? What I did find were found traces in the correspondence of Busa announcing the creation of a centre (CAAL) and using the creation of a formal centre as a sign of the support they were getting and should get. I didn’t find any formal description of the administration of the center except in the New York Expo document which described the administration of the project as separate from the staffing needed to do the project. I suspect one could infer a lot from the existing documents and we could ask those involved. I note that formation of a centre also suggests a shift from a team doing one project to a team organized to do many projects as the CAAL in fact did. In addition to the Index they did a number of smaller projects like the Dead Sea Scrolls concordance and we have correspondence in which people enquire whether Busa/CAAL would help with their projects.

Back to sales as the end of the project. The sales correspondence is not all dry numbers. There is an amusing exchange between the Teresianum Library and Father Pozzi. The librarian in a note acknowledging receiving a bill notes that “Speriamo che la pubblicazione dei volume continuerà con maggiore tempestività nonstante il grosso lavoro.” (“We hope the publication of the volumes will continue with greater speed {tempestivity} notwithstanding the amount of work.” My translation, letter of June 10, 1978.) Pozzi’s answer of the 23rd of June gives us a sense of the issues involved in publishing such a large multi-volume concordance.


Letter to the Director of the Teresianum of June 23rd, 1978

In the letter Pozzi complains that the observation about the speed of publication isn’t fair. He notes that publishing 31 volumes of such importance in 4 years is of a greater frequency than other publications of this sort. Also, the German editor won’t deliver more than 5 volumes a year and they have take into account the annual budgets of libraries.

Pozzi ends by saying that they are almost “in the port for the preparation of the end of the Index.” He says there will be “circa” 20 volumes appearing between 79 and 83 which suggests they actually didn’t know what the final number of volumes would be as the variety of indexes and tables unfolded from data to page design.

Speaking of ends, I should note that the project may be said to have ended with the publication of the Index, but like many successful projects it also continued by transforming itself, first into a digital publication (first on CD-Rom and then on the Web) and then into a computational linguistics project led by Passarotti. As I noted above, Busa set the stage for transformation when he created the initial Centre (CAAL) in Gallarate which was joined by CAEL (Associazione per la Computerizzazione delle Analisi Ermeneutiche Lessicologiche) or an association for the Computerization of Hermeneutical Lexicographical Anaysis. This organization was set up to raise funds for the project once the IBM support ended. Passarotti informs me that its founding charter is in the archives. Later on Busa set up at the Cattolica the group GIRCSE which then became the Centre CIRCSE.

In other words one can see in the organizations the trace of Busa’s ideas about the work ahead for which he and others needed organizations for support. I wasn’t looking, but it would interesting to see what there is in the Busa Archives of what he imagined the project could become. Certainly in his publications there are predictions about what could be done with this new technology, but that is different from imagining what the groups and centres could be beyond just the means to the Index as end.

Note: Passarotti tells me that they have among the print publications what are called the “Libri dei Metodi” (Books of Methods) which bring together a number of works that, for example, describe the work to be done in the future of which the most important was the addition of syntactic annotation to the corpus. This is the work that Passarotti has done as part of the Index Thomisticus Treebank project.

Other Boxes

There are a number of other types of information that Busa kept in the archives that I didn’t have time to check.

  • Tapes: As mentioned above the archive doesn’t have more than a handful of cards, but the do have a lots of tapes which presumably have data (and programs) on them.
  • Photographs: The Busa Archives have an extensive and well documented collection of photographs. Some of these Julianne Nyhan and Melissa Terras have posted. These photographs often show groups of people at official presentations and could give us a sense of the formalities of the project. How and to who it was presented.
  • Bills: There are apparently boxes of bills, other than those for sales of the Index. One can imagine these would provide a much more complete picture of the costs of the project.
  • News Clippings: One extraordinary collection of boxes contains press clippings about the project. This collection documents the impact the project had on the popular imagination, especially in Italy, where Busa was celebrated.
  • Conferences: A set of boxes I did not look at have materials from all of the conferences that Busa went to starting with one in Tubingen. I was told that he kept programs and other ephemera. This could be an incredible resource to understand the fast moving field of automatic language processing in the early years.
  • Publications: As mentioned above, all of Busa’s publications are preserved. Julianne Nyan, Arianna Ciulla, and Passarotti are editing a collection of 20 of the key publications with translations where necessary to be published by Springer. This should open us to reading more than Busa’s 1980 Chum article. Of particular interest and connecting with the news clippings are his writings on cybernetics for news venues. Busa was not just a digital humanist, he was someone with experience with computers who could and did explain their significance to the larger public. It would be interesting to look at what was written about him in the Italian press and what he wrote, especially in the 1960s when computers were a source of curiosity and anxiety. One wonders how he wove his philosophical and religious training into his cybernetic articles. What futures did he present?


In conclusion, the archives provide us with a framework for asking about projects based on what Busa and colleagues felt was how their information should be organized during the project. Note that many of the categories like the correspondence and receipts around sales were categories developed by the project to manage itself. Likewise the scientific correspondence was kept by Busa to keep track of who he was meeting and interacting with. These categories therefore partly reflect the organizing during the project, though a certain amount may have also happened retrospectively. Either way, I offer these broad categories, of which all but the first two are reflected in the archives in some fashion.

  • Beginnings and Proposals
  • Methods and Practices
  • Data and Programs
  • Correspondence
  • Design Documentation
  • Financial Documentation from Bills to Sales
  • Deliverable of the Project (the Index itself in this case)
  • Conferences and Presentations of the Project
  • Publications About Project (by its members)
  • News and Reviews of the Project

I will end this essay by proposing a few projects that I think could both help digitize the Archives and then use them to help us understand the early years of computing.

The Scholarly Network of Index Thomisticus and Language Engineering of the 1950s and 1960s. Busa and his collaborators corresponded widely, travelled to visit many of the early language computing projects, and presented at the key early conferences/meetings on the use of computers for language processing. Using the extensive documentation kept by Busa in the Archive we could build a picture of language engineering in the 1950s and 1960s at that liminal moment when it goes from punch-cards to computers and from demonstration projects to general purpose technology. A social network could be created that would document most of the key imagineers of those early years through the project that in many ways led the way.

The Index Thomisticus as Pioneering Information Technology. The Index Thomisticus project pioneered many of the basic techniques for processing language data that are now taken for granted. It was also one of the first large scale research infrastructure projects to use computers influencing subsequent projects. Through the Busa Archives we can reconstruct the project as it was conceived of and managed then. The Busa Archives could give us unique insight into informatics projects of the time and how they were imagined and managed.


Busa, R. (1991). Thomae Aquinatis Opera Omnia – cum hypertextibus – in CD-ROM – auctore Roberto Busa SJ.Milano, Editel. CD-ROM.

Busa, R. S. J. (1951). S. Thomae Aquinatis Hymnorum Ritualium Varia Specima Concordantiarum. Primo saggio di indici di parole automaticamente composti e stampati da macchine IBM a schede perforate. Milano, Fratelli Bocca.

Busa, R. (1980). “The Annals of Humanities Computing: The Index Thomisticus.” Computers and the Humanities. 14:2. 83-90.

Stevens, M. E. (1965). Automatic Indexing: A State-of-the-Art Report. Washington, D.C., National Bureau of Standards: Center for Computer Sciences and Technology.

Tasman, P. (1957). “Literary Data Processing.” IBM Journal of Research and Development. 1:3. 249-256.

Winter, T. N. (1999). “Roberto Busa, S.J., and the Invention of the Machine-Generated Concordance.” The Classical Bulletin 75:1. 3-20.

Image permissions: The images shown here are kindly made available under a Creative Commons CC-BY-NC license by permission of CIRCSE Research Centre, Università Cattolica del Sacro Cuore, Milan, Italy. The original documents pictured in the images are contained in the “Busa Archive”, held in the library of the same university. For further information, or to request permission for reuse, please contact Marco Passarotti, on marco.passarotti AT, or by post: Largo Gemelli 1, 20123 Milan, Italy.