The Ends of Safety

The AI future is not going to be won by hand-wringing about safety. (Vance)

At the Paris Summit, Vice President JD Vance gave a speech indicating a shift in US policy towards AI including a move away from “safety” as a common ethical ground. He also discussed how:

  • The US intends to dominate in AI, including chip manufacturing, though other countries can join them.
  • They want to focus on opportunity not guardrails.
  • They need reliable energy to power AI, which I take to mean non-green energy.
  • AI should be free of “ideology” by which he probably means “woke” language.

He apparently didn’t stay to hear the response from the Europeans. Nor did the USA or the UK sign the summit Statement on Inclusive and Sustainable Artificial Intelligence for People and the Planet that included language saying,

Harnessing the benefits of AI technologies to support our economies and societies depends on advancing Trust and Safety. We commend the role of the Bletchley Park AI Safety Summit and Seoul Summits that have been essential in progressing international cooperation on AI safety and we note the voluntary commitments launched there. We will keep addressing the risks of AI to information integrity and continue the work on AI transparency.

This marks a shift from the consensus that emerged from the Bletchley Declaration from 2023 which led to the formation of the UK AI Safety Institute and the US AI Safety Institute and the Canadian AI Safety Institute (etc.).

With Vance signalling the shift away from safety the UK has renamed its AISI the AI Security Institute (note how they substituted “security” for “safety”). The renamed unit is going to focus more on cybersecurity. It looks like the UK, unlike Europe, is going to try to stay ahead of a Trump ideological turn.

The US AI Safety Institute (AISI), which was set up with a small budget by NIST. is likely to also be affected (or have its acronym changed.)

Trump eliminates Biden AI policies

Trump has signed an Executive Order “eliminating harmful Biden Administration AI policies and enhancing America’s global AI dominance.” (Fact Sheet) In a Fact Sheet he calls Biden’s order(s) dangerous and onerous using the usual stifling innovation argument:

The Biden AI Executive Order established unnecessarily burdensome requirements for companies developing and deploying AI that would stifle private sector innovation and threaten American technological leadership.

There are, however, other components to the rhetoric:

  • It “established the commitment … to sustain and enhance America’s dominance to promote human flourishing, economic competitiveness, and national security.” The human flourishing seems to be
  • It directs the creation of an “AI Action Plan” within 180 days to sustain dominance. Nothing is mentioned about flourishing in regards to the plan. Presumably dominance is flourishing. This plan and review of policies will presumably where we will see the details of implementation. It sounds like the Trump administration may keep some of the infrastructure and policies. Will they, for example, keep the AI Safety Institute in NIST?
  • There is an interesting historic section reflecting back to activities of the first Trump administration noting that “President Trump also took executive action in 2020 to establish the first-ever guidance for Federal agency adoption of AI to more effectively deliver services to the American people and foster public trust in this critical technology.” Note the use of the word “trust”. I wonder if they will return to trustworthy AI language.
  • There is language about how “development of AI systems must be free from ideological bias or engineered social agendas.” My guess is that the target is AIs that don’t have “woke” guardrails.

It will be interesting to track what parts of the Biden orders are eliminated and what parts are kept.

 

Humanity’s Last Exam

Researchers with the Center for AI Safety and Scale AI are gathering submissions for Humanity’s Last Exam. The submission form is here. The idea is to develop an exam with questions from a breadth of academic specializations that current LLMs can’t answer.

While current LLMs achieve very low accuracy on Humanity’s Last Exam, recent history shows benchmarks are quickly saturated — with models dramatically progressing from near-zero to near-perfect performance in a short timeframe. Given the rapid pace of AI development, it is plausible that models could exceed 50% accuracy on HLE by the end of 2025. High accuracy on HLE would demonstrate expert-level performance on closed-ended, verifiable questions and cutting-edge scientific knowledge, but it would not alone suggest autonomous research capabilities or “artificial general intelligence.” HLE tests structured academic problems rather than open-ended research or creative problem-solving abilities, making it a focused measure of technical knowledge and reasoning. HLE may be the last academic exam we need to give to models, but it is far from the last benchmark for AI.

One wonders if it really will be the last exam. Perhaps we will get more complex exams that test for integrated skills. Andrej Karpathy criticises the exam on X. I agree that what we need are AIs able to do intern-level complex tasks rather than just answering questions.

Do we really know how to build AGI?

Sam Altman in a blog post titled Reflections looks back at what OpenAI has done and then predicts that they know how to build AGI,

We are now confident we know how to build AGI as we have traditionally understood it. We believe that, in 2025, we may see the first AI agents “join the workforce” and materially change the output of companies. We continue to believe that iteratively putting great tools in the hands of people leads to great, broadly-distributed outcomes.

It is worth noting that the definition of AGI (Artificial General Intelligence) is sufficiently vague that meeting this target could become a matter of semantics. None the less, here are some definitions of AGI from OpenAI or others about OpenAI,

  • OpenAI’s mission is to ensure that artificial general intelligence (AGI)—by which we mean highly autonomous systems that outperform humans at most economically valuable work—benefits all of humanity.” – Note the “economically valuable work”. I wonder if philosophizing or making art is valuable? Is intelligence being limited here to economics?
  • “AI systems that are generally smarter than humans” – This is somewhat circular as brings us back to defining “smartness”, another work for “intelligence”.
  • “any system that can outperform humans at most tasks” – This could be timed to the quote above and the idea of AI agents that can work for companies outperforming humans. It seems to me we are nowhere near this if you include physical tasks.
  • an AI system that can generate at least $100 billion in profits” – This is the definition used by OpenAI and Microsoft to help identify when OpenAI doesn’t have to share technology with Microsoft any more.

How safe is AI safety?

Today I gave a plenary talk on “How Safe is AI Safety?” to open a Workshop on AI and DH (Part 1) organized by the Centre de recherche interuniversitaire sur les humanités numériques (CRIHN) at the Université de Montréal.

In the paper I looked at how AI safety is being implemented in Canada and what is the scope of the idea. I talked about the shift from Responsible AI to AI Safety in the Canadian government’s rhetoric.

I’m trying to figure out what to call the methodology I have developed for this and other research excursions. It has elements of Foucault’s geneaology of ideas – trying to follow ideas that are obvious through the ways the ideas are structured in institutions. Or, it is an extension of Ian Hacking’s idea of historical ontology where we try to understand ideas about things through their history.

 

Claudette – An Automated Detector of Potentially Unfair Clauses in Online Terms of Service

Randy Goebel gave a great presentation on the use of AI in Judicial Decision Making on Friday to my AI Ethics course. He showed us an example tool called Claudette which can be used to identify potentially unfair clauses in a Terms and Conditions document. You can try it here at the dedicated web site here.

Why is this useful? It provides a form of summary of a document none of us read that could help us catch problematic clauses. It could help us be more careful users of applications.

Can A.I. Be Blamed for a Teen’s Suicide?

The New York Times has a story about youth who committed suicide after extended interactions with a character on Character.ai. The story, Can A.I. Be Blamed for a Teen’s Suicide? describes how Sewell Setzer III has long discussions with a character called Daenerys Targaryen from the Game of Thrones series. He became isolated and got attached to Daenerys. He eventually shot himself and now his mother is suing Character.ai.

Here is an example of what he wrote in his journal,

I like staying in my room so much because I start to detach from this ‘reality,’ and I also feel more at peace, more connected with Dany and much more in love with her, and just happier.

The suit claims that Character.ai’s product was untested, dangerous and defective. It remains to be seen if these types of suits will succeed. In the meantime we need to be careful with these social AIs.

The 18th Annual Hurtig Lecture 2024: Canada’s Role in Shaping our AI Future

The video for the 2024 Hurtig Lecture is up. The speaker was Dr. Elissa Strome, Executive Director of the Pan-Canadian AI Strategy. She gave an excellent overview of the AI Strategy here in Canada and ended by discussing some of the challenges.

The Hurtig Lecture was organized by my colleague Dr. Yasmeen Abu-Laban. I got to moderate the panel discussion and Q & A after the lecture.

Dario Amodei: Machines of Loving Grace

Dario Amodei of Anthropic fame has published a long essay on AI titled Machines of Loving Grace: How AI Could Transform the World for Better. In the essay he talks about how he doesn’t like the term AGI and prefers to instead talk about “powerful AI” and he provides a set of characteristics he considers important, including the ability to work on issues in sustained fashion over time.

Amodei also doesn’t worry much about the Singularity as he believes powerful AI will still have to deal with real world problems when designing more powerful AI like building physical systems. I tend to agree.

The point of the essay is, however, to focus on five categories of positive applications of AI that are possible:

  1. Biology and physical health
  2. Neuroscience and mental health
  3. Economic development and poverty
  4. Peace and governance
  5. Work and meaning

The essay is long, so I won’t go into detail. What is important is that he articulates a set of positive goals that AI could help with in these categories. He calls his vision both radical and obvious. In a sense he is right – we have stopped trying to imagine a better world through technology, whether out of cynicism or attention only to details.

Throughout writing this essay I noticed an interesting tension. In one sense the vision laid out here is extremely radical: it is not what almost anyone expects to happen in the next decade, and will likely strike many as an absurd fantasy. Some may not even consider it desirable; it embodies values and political choices that not everyone will agree with. But at the same time there is something blindingly obvious—something overdetermined—about it, as if many different attempts to envision a good world inevitably lead roughly here.

UNESCO – Artificial Intelligence for Information Accessibility (AI4IA) Conference

Yesterday I organized a satellite panel for the UNESCO – Artificial Intelligence for Information Accessibility (AI4IA) Conference. This full conference takes place on GatherTown, a conferencing system that feels like an 8-bit 80s game. You wander around our AI4IA conference space and talk with others who are close and watch short prerecorded video talks of which there are about 60. I’m proud that Amii and the University of Alberta provided the technical support and funding to make the conference possible. The videos will also be up on YouTube for those who don’t make the conference.

The event we organized at the University of Alberta on Friday was an online panel on What is Responsible in Responsible Artificial Intelligence with Bettina Berendt, Florence Chee, Tugba Yoldas, and Katrina Ingram.

Bettina Berendt looked at what the Canadian approach to responsible AI could be and how it might be short sighted. She talked about a project that, like a translator, lets a person “translate” their writing in whistleblowing situations into prose that won’t identify them. It helps you remove the personal identifiable signal from the text. She then pointed out how this might be responsible, but might also lead to problems.

Florence Chee talked about how responsibility and ethics should be a starting point rather than an afterthought.

Tugba Yoldas talked about how meaningful human control is important to responsible AI and what it takes for there to be control.

Katrina Ingram of Ethically Aligned AI nicely wrapped up the short talks by discussing how she advises organizations that want to weave ethics into their work. She talked about the 4 Cs: Context, Culture, Content, and Commitment.