Ethics of Data Science

Colloque « DH@LLM: Grands modèles de langage et humanités numériques » @ IEA & Sorbonne U

DH@LLM: Grands modèles de langage et humanités numériques Colloque organisé par Alexandre Gefen (CNRS-Sorbonne Nouvelle), Glenn Roe (Sorbonne Université), Ayla Rigouts Terryn (Université de Montréal) et Michael Sinatra (Université de Montréal) En collaboration avec l’Observatoire des textes, des idées et des corpus (ObTIC), le Centre de recherche interuniversitaire sur les humanités numériques (CRIHN), l’Institut d’Études […]

Today I gave a keynote to open this symposium on Large Language Models and the digital humanities, Colloque « DH@LLM: Grands modèles de langage et humanités numériques » @ IEA & Sorbonne U. I didn’t talk much about LLMs, instead I talked about “Care and Repair for Responsibility Practices in Artificial Intelligence”. I argued that the digital humanities has a role play in developing the responsibility practices that address the challenges of LLMs. I argued for an ethics of care approach that looks at the relationships between stakeholders (both individual and institutional) and asks how we can care for those more vulnerable and how can we repair emergent systems.

Brandolini’s law

In a 3QuarksDaily post about Bullshit and Cons: Alberto Brandolini and Mark Twain Issue a Warning About Trump I came across Brandolini’s law of Refutation which states:

The amount of energy needed to refute bullshit is an order of magnitude bigger than to produce it.

This law or principle goes a long way to explaining why bullshit, conspiracy theories, and disinformation are so hard to refute. The very act of refutation becomes suspect as if you are protesting too much. The refuter is made to look like the person with an agenda that we should be skeptical of.

The corollary is that it is less work to lie about someone before they have accused you of lying than to try to refute the accusation. Better to accuse the media of purveying fake news early than to wait until they publish news about you.

As for AI hallucinations, which I believe should be called AI bullshit, we can imagine Rockwell’s corollary:

The amount of energy needed to correct for AI hallucinations in a prompted essay is an order of magnitude bigger than the work of just writing it yourself.

CSDH/SCHN Congress 2025: Reframing Togetherness

These last few days I have been at the CSDH/SCHN conference that is part of the Congress 2025. With colleagues and graduate research assistants I was part of a number of papers and panels. See CSDH/SCHN Congress 2025: Reframing Togetherness. The programme is here. Some of the papers I was involved in included:

Exploring the Deceptive Patterns of Chinook: Visualization and Storytelling Approaches Critical Software Study – Roya Sharifi; Ralph Padilla; Zahra Farhangfar; Yasmeen Abu-Laban; Eleyan Sawafta; and Geoffrey Rockwell
Building a Consortium: An Approach to Sustainability – Geoffrey Martin Rockwell; Michael Sinatra; Susan Brown; John Bradley; Ayushi Khemka; and Andrew MacDonald
Integrating Large Language Models with Spyral Notebooks – Sean Lis and Geoffrey Rockwell
AI-Driven Textual Analysis to Decode Canadian Immigration Social Media Discourse – Augustine Farinola & Geoffrey Martin Rockwell
The List in Text Analysis – Geoffrey Martin Rockwell; Ryan Chartier; and Andrew MacDonald

I was also part of a panel on Generative AI, LLMs, and Knowledge Structures organized by Ray Siemens. My paper was on Forging Interpretations with Generative AI. Here is the abstract:

Using large language models we can now generate fairly sophisticated interpretations of documents using natural language prompts. We can ask for classifications, summaries, visualizations, or specific content to be extracted. In short we can automate content analysis of the sort we used to count as research. As we play with the forging of interpretations at scale we need to consider the ethics of using generative AI in our research. We need to ask how we can use these models with respect for sources, care for transparency, and attention to positionality.

Welcome to the Artificial Intelligence Incident Database

The starting point for information about the AI Incident Database

Maria introduced me to the Artificial Intelligence Incident Database. It contains summaries and links regarding different types of incidents related to AI. Good place to get a sense of the hazards.

The AI Incident Database is dedicated to indexing the collective history of harms or near harms realized in the real world by the deployment of artificial intelligence systems. Like similar databases in aviation and computer security, the AI Incident Database aims to learn from experience so we can prevent or mitigate bad outcomes.

Moral Resposibility

On Thursday (the 29th of May) I gave a talk on Moral Responsibility and Artificial Intelligence at the Canadian AI 2025 conference in Calgary, Alberta.

I discussed what moral responsibility might be in the context of AI and argued for an ethic of care (and repair) approach to building relationships of responsibility and responsibility practices.

There was a Responsible AI track in the conference that had some great talks by Gideon Christian (U of Calgary in Law) and Julita Vassileva (U Saskatchewan.)

News Media Publishers Run Coordinated Ad Campaign Urging Washington to Protect Content From Big Tech and AI

Today, hundreds of news publishers launched the “Support Responsible AI” ad campaign, which calls on Washington to make Big Tech pay for the content it takes to run its AI products.

I came across one these ads about AI Theft from the News Media Alliance and followed it to this site about how, News Media Publishers Run Coordinated Ad Campaign Urging Washington to Protect Content From Big Tech and AI. They have three asks:

Require Big Tech and AI companies to fairly compensate content creators.

Mandate transparency, sourcing, and attribution in AI-generated content

Prevent monopolies from engaging in coercive and anti-competitive practices.

Gary Marcus has a substack column on Sam Altman’s attitude problem that talks about Altman’s lack of a response when confronted with an example of what seems like IP theft. I think the positions are hardening as groups begin to use charged language like “theft” for what AI companies are doing.

A Superforecaster’s View on AGI – 3 Quarks Daily

Malcom Murray has a nice discussion about whether we will have AGI by 2030 in 3 Quarks Daily, A Superforecaster’s View on AGI. He first spends some time defining Artificial General Intelligence. He talks about input and output definitions:

Input definitions would be based on what the AGI can do.
Output definitions would be based on the effect, often economic, of AI.

OpenAI has defined AGI as “a highly autonomous system that outperforms humans at most economically valuable work.” I think this would be an input definition. They and Microsoft are also supposed to have agreed that AGI will have been achieved when a system can generate $100 million on profit. This would be an output definition.

He settles on two forecasting questions:

Will there exist by Dec 31, 2030, an AI that is able to do every cognitive digital task equivalently or better than the best human, in an equivalent or shorter time, and for an equivalent or cheaper cost?
Will by Dec 31, 2030, the U.S. have seen year-on-year full-year GDP growth rate of 19% or higher?

He believes the answer to the first is affirmative (Yes), but that such an AI will a clean system working in the lab and not one deployed in the real world. The answer to the second question he believes is No because of the friction of the real world. It will take longer to see the deployment that would have such a level of effect on GDP growth.

Life, Liberty, and Superintelligence

Are American institutions ready for the AI age?

3QuarksDaily pointed me to an essay in Arena on Life, Liberty, and Superintelligence. The essay starts with the question that Dario Amodei tackled in Machines of Loving Grace, namely, what might be the benefits of artificial intelligence (AI). It then questions whether we could actually achieve the potential benefits without the political will and changes needed to nimbly pivot.

Benefits: Amodei outlined a set of domains where intelligence could make a real difference, including:

Biology and health,
Neuroscience and mind,
Economic development and poverty, and
Peace and governance.

Amodei concluded with some thoughts on Work and meaning, though the loss of work and meaning may not be a benefit.

It is important that we talk about the benefits as massive investments are made in infrastructure for AI. We should discuss what we think we are going to get other than some very rich people and yet more powerful companies. Discussion of benefits can also balance the extensive documentation of risks.

Institutions: The essay then focuses on whether we could actually see the benefits Amodei outlines even if we get powerful AI. Ball points out that everyone (like JD Vance) believes the USA should lead in AI, but questions if we have the political will and appropriate institutions,

Viewed in this light, the better purpose of “AI policy” is not to create guardrails for AI — though most people agree some guardrails will be needed. Instead, our task is to create the institutions we will need for a world transformed by AI—the mechanisms required to make the most of a novus ordo seclorum. America leads the world in AI development; she must also lead the world in the governance of AI, just as our constitution has lit the Earth for two-and-a-half centuries. To describe this undertaking in shrill and quarrelsome terms like “AI policy” or, worse yet, “AI regulation,” falls far short of the job that is before us.

There could be other countries (read China) who may lag when it comes to innovation, but are better able to deploy and implement the innovations. What sort of institutions and politics does one need to be able to flexibly and ethically redesign civil institutions?

US State Dept. to use AI to Revoke Visas of Foreign Students who appear “pro-Hamas”

Axios has a story about how the State Department is launching a programme to review social media of foreign students to see if they are “pro-Hamas.” If they appear to support Hamas then they may get their visas revoked.

A senior official is quoted as saying “it would be negligent for the department that takes national security seriously to ignore publicly available information about [visa] applicants in terms of AI tools. … AI is one of the resources available to the government that’s very different from where we were technologically decades ago.”

There are obvious free-speech issues, but I also wonder at the use of AI to police speech. What will be policed next? Pro-EDI speech?

Thanks to Gary Marcus’ Substack for this.

Responsible AI Lecture in Delhi

A couple of days ago I gave an Institute Lecture on What is Responsible About Responsible AI at the Indian Institute of Technology Delhi, India. In it I looked at how AI ethics governance is discussed in Canada under the rubric of Responsible AI and AI Safety. I talked about the emergence of AI Safety Institutes like CAISI (Canadian AI Safety Institute.) Just when it seemed that “safety” was the emergent international approach to ethics governance, Vice President JD Lance’s speech at the Paris Summit made it clear that the Trump administration in not interested,

The AI future is not going to be won by hand-wringing about safety. (Vance)