I recently discovered (thanks to a note from Lou Burnard to the TEI list) a document online with extracts from the Funding Proposal for Phase 1 (Planning Conference) for the Text Encoding Initiative which led to the Poughkeepsie conference of 1987 that laid out the plan for the TEI.
The document is an appendix to the 1988 full Proposal for Funding for An Initiative to Formulate Guidelines for the Encoding and Interchange of Machine-Readable Texts. The planning proposal led to the Poughkeepsie conference where consensus was developed that led to the full proposal that funded the initial development of the TEI Guidelines. (Get that?)
The doubled document (the Extracts of the first proposal is an appendix to the 1988 proposal) is fascinating to read 20 years later. In section “3.4 Concrete Results” of the full proposal they describe the outcomes of the full grant thus:
Ultimately, this project will produce a single potentially large document which will:
- define a format for encoded texts, into which texts prepared using other schemes can be translated,
- define a formal metalanguage for the description of encoding schemes,
- describe existing schemes (and the new scheme) formally in that metalanguage and informally in prose,
- recommend the encoding of certain textual features as minimal practice in the encoding of new texts,
- provide specific methods for encoding specific textual features known empirically to be commonly used, and
- provide methods for users to encode features not already provided for, and for the formal definition of these extensions.
I am struck by how the TEI has achieved most of these goals (and others, like a consortial structure for sustainable evolution.) It is also interesting to note what seems to have been done differently, like the second and third bullet points – the development of a “metalanguage for the description of encoding schemes” and “describing existing schemes” with it. I hadn’t thought of the TEI Guidelines as a project to document the variety encoding schemes. Have they done that?
Another interesting wrinkle is in the first proposal extracts where the document talks about “What Text ‘Encoding’ Is”. First of all, why the single quotation marks around “encoding” – was this a new use of the term then? Second, they mention that “typically, a scheme for encoding texts must include:”
Conventions for reducing texts to a single linear sequence wherever footnotes, text-critical apparatus, parallel columns of text (as in polyglot texts), or other complications make the linear sequence problematic.
It is interesting to see linearity creep into what encoding schemes “must” do, including one that is ultimately hierarchical and non-linear. I wonder how to interpret this – is it simply a pragmatic matter of how to you organize the linear sequence of text and code in the TEI document, especially when what you are trying to represent is not linear? Could it be the need for encoded text to be a “string” for the computer to parse? Time to ask someone.
Drawing attention to the things that seem strange obscures the fact that these two proposals were immensely important for digital humanities. They describe how the proposers imagined problems of text representation could be solved by an international project. We can look back and admire the clarity of vision that led to the achievements of the TEI – achievements of not just a few people, but of many organized as per the proposal. These are beautiful (and influential) administrative documents, if we dare say there is such a thing. I would say that they and the Guidelines themselves are some of the most important scholarship in our field.