Guido Milanese: Filologia, letteratura, computer

Cover of the book "Filologia, Letteratura, Computer"
Philology, Literature, Computer: Ideas and instruments for humanistic informatics

Un manuale ampio ed esauriente che illustra tra teoria e prassi il tema dell’informatica umanistica per l’insegnamento e l’apprendimento universitario.

The publisher (Vita e Pensiero) kindly sent me a copy of Guido Milanese’s Filologia, letteratura, computer (Philology, Literature, Computer), an introduction to thinking about and thinking through the computer and texts. The book is designed to work as a text book that introduces students to the ideas and to key technologies, and then provides short guides to further ideas and readings.

The book focuses, as the title suggests, almost exclusively on digital filology or the computational study of texts. At the end Milanese has a short section on other media, but he is has chosen, rightly I think, to focus on set of technologies in depth rather than try a broad overview. In this he draws on an Italian tradition that goes back to Father Busa, but more importantly includes Tito Orlandi (who wrote the preface) and Numerico, Fiormonte, and Tomasi’s L’umanista digitale (this has been translated into English- see The digital humanist).

Milanese starts with the principle from Giambattista Vico that knowledge is made (verum ipsum factum.) Milanese believes that “reflection on the foundations identifies instruments and operations, and working with instruments and methods leads redefining the reflection on foundations.” (p. 9 – my rather free translation) This is virtuous circle in the digital humanities of theorizing and praxis where either one alone would be barren. Thus the book is not simply a list of tools and techniques one should know, but a series of reflections on humanistic knowledge and how that can be implemented in tools/techniques which in turn may challenge our ideas. This is what Stéfan Sinclair and I have been calling “thinking-through” where thinking through technology is both a way of learning about the thinking and about the technology.

An interesting example of this move from theory to praxis is in chapter 7 on “The Markup of Text.” (“La codifica del testo”) He moves from a discussion of adding metadata to the datafied raw text to Minsky’s idea of frames of knowledge as a way of understanding XML. I had never thought of Minsky’s ideas about articial intelligence contributing to the thinking behind XML, and perhaps Milanese is the first to do so, but it sort of works. The idea, as I understand it, goes something like this – human knowing, which Minsky wants to model for AI, brings frames of knowledge to any situation. If you enter a room that looks like a kitchen you have a frame of knowledge about how kitchens work that lets you infer things like “there must be a fridge somewhere which will have a snack for me.” Frames are Minsky’s way of trying to overcome the poverty of AI models based on collections of logical statements. It is a way of thinking about and actually representing the contextual or common sense knowledge that we bring to any situation such that we know a lot more than what is strictly in sight.

Frame systems are made up of frames and connections to other frames. The room frame connects hierarchically to the kitchen-as-a-type-of-room frame which connects to the fridge frame which then connects to the snack frame. The idea then is to find a way to represent frames of knowledge and their connections such that they can be used by AI systems. This is where Milanese slides over to XML as a hierarchical way of adding metadata to a text that enriches it with a frame of knowledge. I assume the frame (or Platonic form?) would be the DTD or Schema which then lets you do some limited forms of reasoning about an instance of an encoded text. The markup explicitly tells the computer something about the parts of the text like this (<author>Guido Milanese</author>) is the author.

The interesting thing is to refect on this application of Minsky’s theory. To begin, I wonder if it is historically true that the designers of XML (or its parent SGML) were thinking of Minsky’s frames. I doubt it, as SGML is descended from GML that predates Minsky’s 1974 Memo on “A Framework for Representing Knowledge.” That said, what I think Milanese is doing is using Minsky’s frames as a way of explaining what we do when modelling a phenomena like a text (and our knowledge of it.) Modelling is making explicit a particular frame of knowledge about a text. I know that certain blocks are paragraphs so I tag them as such. I also model in the sense of create a paradigmatic version of what my perspective on the text is. This would be the DTD or Schema which defines the parts and their potential relationships. Validating a marked up text would be a way of testing the instance against the model.

This nicely connects back to Vico’s knowing is making. We make digital knowledge not by objectively representing the world in digital form, but by creating frames or models for what can be digital known and then apply those frames to instances. It is a bit like object-oriented programming. You create classes that frame what can be represented about a type of object.

There is an attractive correspondence between the idea of knowledge as a hierarchy of frames and an XML representation of a text as a hierarchy of elements. There is a limit, however, to the move. Minsky was developing a theory of knowing such that knowledge could be artificially represented on a computer that could then do knowing (in the sense of complete AI tasks like image recognition.) Markup and marking up strike me as more limited activities of structuring. A paragraph tag doesn’t actually convey to the computer all that we know about paragraphs. It is just a label in a hierarchy of labels to which styles and processes can be attached. Perhaps the human modeller is thinking about texts in all their complexity, but they have to learn not to confuse what they know with what they can model for the computer. Perhaps a human reader of the XML can bring the frames of knowledge to reconstitute some of what the tagger meant, but the computer can’t.

Another way of thinking about this would be Searle’s Chinese room paradox. The XML is the bits of paper handed under the door in Chinese for the interpreter in the room. An appropriate use of XML will provoke the right operations to get something out (like a legible text on the screen) but won’t mean anything. Tagging a string with <paragraph> doesn’t make it a real paragraph in the fullness of what is known of paragraphs. It makes it a string of characters with associated metadata that may or may not be used by the computer.

Perhaps these limitations of computing is exactly what Milanese wants us to think about in modelling. Frames in the sense of picture frames are a device for limiting the view. For Minsky you can have many frames with which to make sense of any phenomena – each one is a different perspective that bears knowledge, sometimes contradictory. When modelling a text for the computer you have to decide what you want to represent and how to do it so that users can see the text through your frame. You aren’t helping the computer understand the text so much as representing your interpretation for other humans to use and, if they read the XML, re-interpret. This is making a knowing.

References

Milanese, G. (2020). Filologia, Letteratura, Computer: Idee e strumenti per l’informatica umanistica. Milan, Vita e Pensiero.

Minsky, M. (1974, June). A Framework for Representing Knowledge. MIT-AI Laboratory Memo 306. MIT.

Searle, J. R. (1980). “Minds, Brains and Programs.” Behavioral and Brain Sciences. 3:3. 417-457.

Knowledge is a commons – Pour des savoirs en commun

The Canadian Comparative Literature Association (CCLA/ACLC) celebrated in 2019 its fiftieth anniversary. The association’s annual conference, which took place from June 2 to 5, 2019 as part of the Congress of the Humanities and Social Sciences of Canada at UBC (Vancouver), provided an opportunity to reflect on the place of comparative literature in our institutions. We organized a joint bilingual roundtable bringing together comparatists and digital humanists who think and put in place collaborative editorial practices. Our goal was to foster connections between two communities that ask similar questions about the modalities for the creation, dissemination and legitimation of our research. We wanted our discussions to result in a concrete intervention, thought and written collaboratively and demonstrating what comparative literature promotes. The manifesto you will read, “Knowledge is a commons – Pour des savoirs en commun”, presents the outcome of our collective reflexion and hopes to be the point of departure for more collaborative work.

Thanks to a panel on the Journal in the digital age at CSDH-SCHN 2020 I learned about the manifesto, Knowledge is a commons – Pour des savoirs en commun. The manifesto was “written colingually, that is, alternating between English and French without translating each element into both languages. This choice, which might appear surprising, puts into practice one of our core ideas: the promotion of active and fluid multilingualism.” This is important.

The manifesto makes a number of important points which I summarize in my words:

  • We need to make sure that knowledge is truly made public. It should be transparent, open and reversible (read/write).
  • We have to pay attention to the entire knowledge chain of research to publication and rebuild it in its entirety so as to promote access and inclusion.
  • The temporalities, spaces, and formats of knowledge making matter. Our tools and forms like our thought should be fluid and plural as they can structure our thinking.
  • We should value the collectives that support knowledge-making rather than just authoritative individuals and monolithic texts. We should recognize the invisible labourers and those who provide support and infrastructure.
  • We need to develop inclusive circles of conversation that cross boundaries. We need an ethics of open engagement.
  • We should move towards an active and fluid multilingualism (of which the manifesto is an example.)
  • Writing is co-writing and re-writing and writing beyond words. Let’s recognize a plurality of writing practices.

 

Welcome to Dialogica: Thinking-Through Voyant!

Do you need online teaching ideas and materials? Dialogica was supposed to be a text book, but instead we are adapting it for use in online learning and self-study. It is shared here under a CC BY 4.0 license so you can adapt as needed.

Stéfan Sinclair and I have put up a web site with tutorial materials for learning Voyant. See Dialogi.ca: Thinking-Through Voyant!.

Dialogica (http://dialogi.ca) plays with the idea of learning through a dialogue. A dialogue with the text; a dialogue mediated by the tool; and a dialogue with instructors like us.

Dialogica is made up of a set of tutorials that students should be able to alone or with minimal support. These are Word documents that you (instructors) can edit to suit your teaching and we are adding to them. We have added a gloss of teaching notes. Later we plan to add Spyral notebooks that go into greater detail on technical subjects, including how to program in Spyral.

Dialogica is made available with a CC BY 4.0 license so you can do what you want with it as long as you give us some sort of credit.

Show and Tell at CHRIN


Stéphane Pouyllau’s photo of me presenting

Michael Sinatra invited me to a “show and tell” workshop at the new Université de Montréal campus where they have a long data wall. Sinatra is the Director of CRIHN (Centre de recherche interuniversitaire sur les humanitiés numériques) and kindly invited me to show what I am doing with Stéfan Sinclair and to see what others at CRIHN and in France are doing.

Continue reading Show and Tell at CHRIN

Linked Infrastructure For Networked Cultural Scholarship Team Meeting 2019

This weekend I was at the Linked Infrastructure For Networked Cultural Scholarship (LINCS) Team Meeting 2019. The meeting/retreat was in Banff at the Banff International Research Station and I kept my research notes at philosophi.ca.

The goal of Lincs is to create a shared linked data store that humanities projects can draw on and contribute to. This would let us link our digital resources in ways that create new intellectual connections and that allow us to reason about linked data.

Peter Robinson, “Textual Communities: A Platform for Collaborative Scholarship on Manuscript Heritages”

Peter Robinson gave a talk on “Textual Communities: A Platform for Collaborative Scholarship on Manuscript Heritages” as part of the Singhmar Guest Speaker Program | Faculty of Arts.

He started by talking about whether textual traditions had any relationship to the material world. How do texts relate to each other?

Today stemata as visualizations are models that go beyond the manuscripts themselves to propose evolutionary hypotheses in visual form.

He then showed what he is doing with the Canterbury Tales Project and then talked about the challenges adapting the time-consuming transcription process to other manuscripts. There are lots of different transcription systems, but few that handle collation. There is also the problem of costs and involving a distributed network of people.

He then defined text:

A text is an act of (human) communication that is inscribed in a document.

I wondered how he would deal with Allen Renear’s argument that there are Real Abstract Objects which, like Platonic Forms are real, but have no material instance. When we talk, for example, of “hamlet” we aren’t talking about a particular instance, but an abstract object. Likewise with things like “justice”, “history,” and “love.” Peter responded that the work doesn’t exist except as its instances.

He also mentioned that this is why stand-off markup doesn’t work because texts aren’t a set of linear objects. It is better to represent it as a tree of leaves.

So, he launched Textual Communities – https://textualcommunities.org/

This is a distributed editing system that also has collation.

Distant Reading after Moretti

The question I want to explore today is this: what do we do about distant reading, now that we know that Franco Moretti, the man who coined the phrase “distant reading,” and who remains its most famous exemplar, is among the men named as a result of the #MeToo movement.

Lauren Klein has posted an important blog entry on Distant Reading after MorettiThis essay is based on a talk delivered at the 2018 MLA convention for a panel on Varieties of Digital Humanities. Klein asks about distant reading and whether it shelters sexual harassment in some way. She asks us to put not just the persons, but the structures of distant reading and the digital humanities under investigation. She suggests that it is “not a coincidence that distant reading does not deal well with gender, or with sexuality, or with race.” One might go further and ask if the same isn’t true of the digital humanities in general or the humanities, for that matter. Klein then suggests some thing we can do about it:

  • We need more accessible corpora that better represent the varieties of human experience.
  • We need to question our models and ask about what is assumed or hidden.

 

 

txtlab Multilingual Novels

This directory contains 450 novels that appeared between 1770 and 1930 in German, French and English. It is designed for us in teaching and research.

Andrew Piper mentioned a corpus that he put together, txtlab Multilingual NovelsThis corpus is of some 450 novels from the late 18th century to the early 20th (1920s). It has a gender mix and is not only English novels.  This corpus was supported by SSHRC through the Text Mining the Novel project.

 

Busa Letter Outlining Textual Informatics

Page 1 of “Conditional Agreement” by Father Busa

Domenico Fiormonte has recently blogged about an interesting document he has by Father Busa that relates to a difficult moment in the history of the digital humanities in Italy in 2002. The two page “Conditional Agreement”, which I translate below, was given to Domenico and explained the terms under which Busa would agree to sign a letter to the Minister (of Education and Research) Moratti in response to Moratti’s public statement about the uselessness of humanities informatics. A letter was being prepared to be signed by a large number of Italian (and foreign) academics explaining the value of what we now call the digital humanities. Busa had the connections to get the letter published and taken seriously for which reason Domenico visited him to get his help, which ended up being conditional on certain things being made clear, as laid out in the document. Domenico kept the two pages Busa wrote and recently blogged about them. As he points out in his blog, these two pages are a mini-manifesto of Father Busa’s later views of the place and importance of what he called textual informatics. Domenico also points out how political is the context of these notes and the letter eventually signed and published. Defining the digital humanities is often about positioning the field in the larger academic and public political spheres we operate in.

Continue reading Busa Letter Outlining Textual Informatics

A flow chart for Busa’s “Mechanized Linguistic Analysis”

Steven Jones has just put up a historic flowchart from the Busa Archive at the Università Cattolica del Sacro Cuore, Milan, Italy. See A flow chart for Busa’s “Mechanized Linguistic Analysis”. Jones has been posting important historical images associated with his book Roberto Busa, S.J., and the Emergence of Humanities Computing. This flow chart shows the logic of the processing using punched cards and tape that was developed by Busa and Paul Tasman (who is probably one of the designers of this chart.) The folks at the Busa Archive had shared this flow chart with me for a paper I gave at the Instant History conference in Chicago on Busa’s Methods. Now Steven has shared it openly with permission.

For more on the Busa Archives and what they show us about the Index Thomisticus as Project see here.