Markup and Text Representation

Brandolini’s law

In a 3QuarksDaily post about Bullshit and Cons: Alberto Brandolini and Mark Twain Issue a Warning About Trump I came across Brandolini’s law of Refutation which states:

The amount of energy needed to refute bullshit is an order of magnitude bigger than to produce it.

This law or principle goes a long way to explaining why bullshit, conspiracy theories, and disinformation are so hard to refute. The very act of refutation becomes suspect as if you are protesting too much. The refuter is made to look like the person with an agenda that we should be skeptical of.

The corollary is that it is less work to lie about someone before they have accused you of lying than to try to refute the accusation. Better to accuse the media of purveying fake news early than to wait until they publish news about you.

As for AI hallucinations, which I believe should be called AI bullshit, we can imagine Rockwell’s corollary:

The amount of energy needed to correct for AI hallucinations in a prompted essay is an order of magnitude bigger than the work of just writing it yourself.

Ricordando Dino Buzzetti, co-fondatore e presidente onorario dell’AIUCD

The AIUCD (Association for Humanistic Informatics and Digital Culture) have posted a nice blog entry with memories of Dino Buzetti (in Italian). See Ricordando Dino Buzzetti, co-fondatore e presidente onorario dell’AIUCD – Informatica Umanistica e Cultura Digitale: il blog dell’ AIUCD.

Dino was the co-founder and honorary president of the AIUCD. He was one of the few other philosophers in the digital humanities. I last saw him in Tuscany and wish I had taken more time to talk with him about his work. His paper “Towards an operational approach to computational text analysis” is in the recent collection I helped edit On Making in the Digital Humanities.

ChatGPT listed as author on research papers: many scientists disapprove

The editors-in-chief of Nature and Science told Nature’s news team that ChatGPT doesn’t meet the standard for authorship. “An attribution of authorship carries with it accountability for the work, which cannot be effectively applied to LLMs,” says Magdalena Skipper, editor-in-chief of Nature in London. Authors using LLMs in any way while developing a paper should document their use in the methods or acknowledgements sections, if appropriate, she says.

We are beginning to see interesting ethical issues crop up regarding the new LLMs (Large Language Models) like ChatGPT. For example, Nature has an news article, ChatGPT listed as author on research papers: many scientists disapprove,.

It makes sense to document use, but why would we document use of ChatGPT and not, for example, use of a library or of a research tool like Google Scholar? What about the use of ChatGPT demands that it be acknowledged?

I Wrote a Book with GPT-3 AI in 24 Hours — And Got It Published

“How do we know that you haven’t written the poems you claim were authored by GPT-3?” the publisher asked in one of our calls.

That was what my life had become. Convincing a publisher that AI really had written the poems. No, I hadn’t done any editing. Yes, all the poems were generated in 24 hours and translated to Finnish with Google Translate — within those same 24 hours. Yes, I own the copyright, not OpenAI. Yes, they know and they said yes.

From Humanist I learned about the Medium post by Jukka Aalho I Wrote a Book with GPT-3 AI in 24 Hours. Aalho wrote a short book of poetry in 24 hours using GPT-3. I’ve bought the book and I’m not impressed by the poems. Perhaps they were better in Finnish. Here is one of the better ones (where the bold was the prompt),

why do we need to be happy, really
wouldn’t it be enough if we just don’t suffer

but everyone seems to be so very consumed by it
as if it’s the most important thing they can do
is try to get a grip on this
and I guess if that’s what they want
then I suppose I don’t have to be the one telling them
that they are mistaken
that they are making a big mistake
but how could they know that if no one tells them?
I don’t know

I’m just rambling

I’m rambling again

I’m sorry

What’s more impressive is that he got it published, but that may be because people like me would want to check it out.

Aalhu’s reflections on how such AIs might change creativity and editing are, however, quite interesting.

Michael GRODEN Obituary

I just found out that Michael GRODEN (1947 – 2021) passed away a year ago. Groden was a member of CSDH/SCHN when it was called COCH/COSH and gave papers at our conferences. He developed an hypertext version of Ulysses that was never published because of rights issues. He did, however, talk about it. He did, however, publish about his ideas about hypertext editions of complex works like Ulysses. See his online CV for more.

Guido Milanese: Filologia, letteratura, computer

Philology, Literature, Computer: Ideas and instruments for humanistic informatics

Un manuale ampio ed esauriente che illustra tra teoria e prassi il tema dell’informatica umanistica per l’insegnamento e l’apprendimento universitario.

The publisher (Vita e Pensiero) kindly sent me a copy of Guido Milanese’s Filologia, letteratura, computer (Philology, Literature, Computer), an introduction to thinking about and thinking through the computer and texts. The book is designed to work as a text book that introduces students to the ideas and to key technologies, and then provides short guides to further ideas and readings.

The book focuses, as the title suggests, almost exclusively on digital filology or the computational study of texts. At the end Milanese has a short section on other media, but he is has chosen, rightly I think, to focus on set of technologies in depth rather than try a broad overview. In this he draws on an Italian tradition that goes back to Father Busa, but more importantly includes Tito Orlandi (who wrote the preface) and Numerico, Fiormonte, and Tomasi’s L’umanista digitale (this has been translated into English- see The digital humanist).

Milanese starts with the principle from Giambattista Vico that knowledge is made (verum ipsum factum.) Milanese believes that “reflection on the foundations identifies instruments and operations, and working with instruments and methods leads redefining the reflection on foundations.” (p. 9 – my rather free translation) This is virtuous circle in the digital humanities of theorizing and praxis where either one alone would be barren. Thus the book is not simply a list of tools and techniques one should know, but a series of reflections on humanistic knowledge and how that can be implemented in tools/techniques which in turn may challenge our ideas. This is what Stéfan Sinclair and I have been calling “thinking-through” where thinking through technology is both a way of learning about the thinking and about the technology.

An interesting example of this move from theory to praxis is in chapter 7 on “The Markup of Text.” (“La codifica del testo”) He moves from a discussion of adding metadata to the datafied raw text to Minsky’s idea of frames of knowledge as a way of understanding XML. I had never thought of Minsky’s ideas about articial intelligence contributing to the thinking behind XML, and perhaps Milanese is the first to do so, but it sort of works. The idea, as I understand it, goes something like this – human knowing, which Minsky wants to model for AI, brings frames of knowledge to any situation. If you enter a room that looks like a kitchen you have a frame of knowledge about how kitchens work that lets you infer things like “there must be a fridge somewhere which will have a snack for me.” Frames are Minsky’s way of trying to overcome the poverty of AI models based on collections of logical statements. It is a way of thinking about and actually representing the contextual or common sense knowledge that we bring to any situation such that we know a lot more than what is strictly in sight.

Frame systems are made up of frames and connections to other frames. The room frame connects hierarchically to the kitchen-as-a-type-of-room frame which connects to the fridge frame which then connects to the snack frame. The idea then is to find a way to represent frames of knowledge and their connections such that they can be used by AI systems. This is where Milanese slides over to XML as a hierarchical way of adding metadata to a text that enriches it with a frame of knowledge. I assume the frame (or Platonic form?) would be the DTD or Schema which then lets you do some limited forms of reasoning about an instance of an encoded text. The markup explicitly tells the computer something about the parts of the text like this (<author>Guido Milanese</author>) is the author.

The interesting thing is to refect on this application of Minsky’s theory. To begin, I wonder if it is historically true that the designers of XML (or its parent SGML) were thinking of Minsky’s frames. I doubt it, as SGML is descended from GML that predates Minsky’s 1974 Memo on “A Framework for Representing Knowledge.” That said, what I think Milanese is doing is using Minsky’s frames as a way of explaining what we do when modelling a phenomena like a text (and our knowledge of it.) Modelling is making explicit a particular frame of knowledge about a text. I know that certain blocks are paragraphs so I tag them as such. I also model in the sense of create a paradigmatic version of what my perspective on the text is. This would be the DTD or Schema which defines the parts and their potential relationships. Validating a marked up text would be a way of testing the instance against the model.

This nicely connects back to Vico’s knowing is making. We make digital knowledge not by objectively representing the world in digital form, but by creating frames or models for what can be digital known and then apply those frames to instances. It is a bit like object-oriented programming. You create classes that frame what can be represented about a type of object.

There is an attractive correspondence between the idea of knowledge as a hierarchy of frames and an XML representation of a text as a hierarchy of elements. There is a limit, however, to the move. Minsky was developing a theory of knowing such that knowledge could be artificially represented on a computer that could then do knowing (in the sense of complete AI tasks like image recognition.) Markup and marking up strike me as more limited activities of structuring. A paragraph tag doesn’t actually convey to the computer all that we know about paragraphs. It is just a label in a hierarchy of labels to which styles and processes can be attached. Perhaps the human modeller is thinking about texts in all their complexity, but they have to learn not to confuse what they know with what they can model for the computer. Perhaps a human reader of the XML can bring the frames of knowledge to reconstitute some of what the tagger meant, but the computer can’t.

Another way of thinking about this would be Searle’s Chinese room paradox. The XML is the bits of paper handed under the door in Chinese for the interpreter in the room. An appropriate use of XML will provoke the right operations to get something out (like a legible text on the screen) but won’t mean anything. Tagging a string with <paragraph> doesn’t make it a real paragraph in the fullness of what is known of paragraphs. It makes it a string of characters with associated metadata that may or may not be used by the computer.

Perhaps these limitations of computing is exactly what Milanese wants us to think about in modelling. Frames in the sense of picture frames are a device for limiting the view. For Minsky you can have many frames with which to make sense of any phenomena – each one is a different perspective that bears knowledge, sometimes contradictory. When modelling a text for the computer you have to decide what you want to represent and how to do it so that users can see the text through your frame. You aren’t helping the computer understand the text so much as representing your interpretation for other humans to use and, if they read the XML, re-interpret. This is making a knowing.

References

Milanese, G. (2020). Filologia, Letteratura, Computer: Idee e strumenti per l’informatica umanistica. Milan, Vita e Pensiero.

Minsky, M. (1974, June). A Framework for Representing Knowledge. MIT-AI Laboratory Memo 306. MIT.

Searle, J. R. (1980). “Minds, Brains and Programs.” Behavioral and Brain Sciences. 3:3. 417-457.

Knowledge is a commons – Pour des savoirs en commun

The Canadian Comparative Literature Association (CCLA/ACLC) celebrated in 2019 its fiftieth anniversary. The association’s annual conference, which took place from June 2 to 5, 2019 as part of the Congress of the Humanities and Social Sciences of Canada at UBC (Vancouver), provided an opportunity to reflect on the place of comparative literature in our institutions. We organized a joint bilingual roundtable bringing together comparatists and digital humanists who think and put in place collaborative editorial practices. Our goal was to foster connections between two communities that ask similar questions about the modalities for the creation, dissemination and legitimation of our research. We wanted our discussions to result in a concrete intervention, thought and written collaboratively and demonstrating what comparative literature promotes. The manifesto you will read, “Knowledge is a commons – Pour des savoirs en commun”, presents the outcome of our collective reflexion and hopes to be the point of departure for more collaborative work.

Thanks to a panel on the Journal in the digital age at CSDH-SCHN 2020 I learned about the manifesto, Knowledge is a commons – Pour des savoirs en commun. The manifesto was “written colingually, that is, alternating between English and French without translating each element into both languages. This choice, which might appear surprising, puts into practice one of our core ideas: the promotion of active and fluid multilingualism.” This is important.

The manifesto makes a number of important points which I summarize in my words:

We need to make sure that knowledge is truly made public. It should be transparent, open and reversible (read/write).
We have to pay attention to the entire knowledge chain of research to publication and rebuild it in its entirety so as to promote access and inclusion.
The temporalities, spaces, and formats of knowledge making matter. Our tools and forms like our thought should be fluid and plural as they can structure our thinking.
We should value the collectives that support knowledge-making rather than just authoritative individuals and monolithic texts. We should recognize the invisible labourers and those who provide support and infrastructure.
We need to develop inclusive circles of conversation that cross boundaries. We need an ethics of open engagement.
We should move towards an active and fluid multilingualism (of which the manifesto is an example.)
Writing is co-writing and re-writing and writing beyond words. Let’s recognize a plurality of writing practices.

Welcome to Dialogica: Thinking-Through Voyant!

Do you need online teaching ideas and materials? Dialogica was supposed to be a text book, but instead we are adapting it for use in online learning and self-study. It is shared here under a CC BY 4.0 license so you can adapt as needed.

Stéfan Sinclair and I have put up a web site with tutorial materials for learning Voyant. See Dialogi.ca: Thinking-Through Voyant!.

Dialogica (http://dialogi.ca) plays with the idea of learning through a dialogue. A dialogue with the text; a dialogue mediated by the tool; and a dialogue with instructors like us.

Dialogica is made up of a set of tutorials that students should be able to alone or with minimal support. These are Word documents that you (instructors) can edit to suit your teaching and we are adding to them. We have added a gloss of teaching notes. Later we plan to add Spyral notebooks that go into greater detail on technical subjects, including how to program in Spyral.

Dialogica is made available with a CC BY 4.0 license so you can do what you want with it as long as you give us some sort of credit.

Show and Tell at CHRIN

Stéphane Pouyllau’s photo of me presenting

Michael Sinatra invited me to a “show and tell” workshop at the new Université de Montréal campus where they have a long data wall. Sinatra is the Director of CRIHN (Centre de recherche interuniversitaire sur les humanitiés numériques) and kindly invited me to show what I am doing with Stéfan Sinclair and to see what others at CRIHN and in France are doing.

Continue reading Show and Tell at CHRIN