System Prompts – Anthropic

From a story on Tech Crunch it seems that Anthropic has made their system prompts public. See System Prompts – Anthropic. For example, the system prompt for Claude 3.5 Sonnet starts with,

<claude_info> The assistant is Claude, created by Anthropic. The current date is {}. Claude’s knowledge base was last updated on April 2024.

These system prompts are fascinating since they describe how Anthropic hopes Claude will behave. A set of commandments, if you will. Anthropic describes the purpose of the system prompts thus:

Claude’s web interface (Claude.ai) and mobile apps use a system prompt to provide up-to-date information, such as the current date, to Claude at the start of every conversation. We also use the system prompt to encourage certain behaviors, such as always providing code snippets in Markdown. We periodically update this prompt as we continue to improve Claude’s responses. These system prompt updates do not apply to the Anthropic API.

South Korea faces deepfake porn ’emergency’

The president has addressed the growing epidemic after Telegram users were found exchanging doctored photos of underage girls.

Once again, deepfake porn is in the news as South Korea faces deepfake porn ’emergency’Teenagers have been posting deepfake porn images of people they know, including minors, on sites like Telegram.

South Korean President Yoon Suk Yeol on Tuesday instructed authorities to “thoroughly investigate and address these digital sex crimes to eradicate them”.

This has gone beyond a rare case in Spain or Winnipeg. In South Korea it has spread to hundreds of schools. Porn is proving to be a major use of AI.

When A.I.’s Output Is a Threat to A.I. Itself

As A.I.-generated data becomes harder to detect, it’s increasingly likely to be ingested by future A.I., leading to worse results.

The New York Times has a terrific article on model collapse, When A.I.’s Output Is a Threat to A.I. Itself. They illustrate what happens when an AI is repeatedly trained on its own output.

Model collapse is likely to become a problem for new generative AI systems trained on the internet which, in turn, is more and more a trash can full of AI generated misinformation. That companies like OpenAI don’t seem to respect the copyright and creativity of others makes is likely that there will be less and less free human data available. (This blog may end up the last source of fresh human text 🙂

The article also has an example of how output can converge and thus lose diversity as it trained on its own output over and over.

Perhaps the biggest takeaway of this research is that high-quality, diverse data is valuable and hard for computers to emulate.

One solution, then, is for A.I. companies to pay for this data instead of scooping it up from the internet, ensuring both human origin and high quality.

Words Used at the Democratic and Republican National Conventions

Counting frequently spoken words and phrases at both events.

The New York Times ran a neat story that used text analysis to visualize the differences between Words Used at the Democratic and Republican National Conventions. They used a number of different visualization including butterfly bar graphs like the one above. They also had a form of word bubbles that I thought was less successful.

Replaying Japan 2024

I just got back from Replaying Japan 2024 which was at the University at Buffalo, SUNY. Taro Yoko was one of the keynotes and he was quite interesting on developing games like Nier Automata that are partly about AI in this age of AI. I was a coauthor of two papers:

  • A paper on “Parachuting over the Angel: Nintendo in Mexico” presented by Victor Fernandez. This paper looked at the development of a newsletter and then magazine about Nintendo in Mexico that then spread around Spanish South America.
  •  

    A second paper on “The Slogan Game: Missions, Visions and Values in Japanese Game Companies” presented by Keiji Amano. This paper built on work documented in this Spyral notebook, Japanese Game Company Slogans, Missions, Visions, and Values. We gathered various promotional statements of Japanese game companies and analyzed them.

The conference was one of the best Replaying Japan conferences thanks to Mimi Okabe’s hard work. There were lots of participants, including virtual ones, and great papers.

DH 2024: Visualization Ethics and Text Analysis Infrastructure

This week I’m at DH 2024 at George Mason in Washington DC. I presented as part of two sessions. 

On Wednesday I presented a short paper with Lauren Klein on work a group of us are doing on Visualization Ethics: A Case Study Approach. We met at a Dagstuhl on Visualization and the Humanities: Towards a Shared Research Agenda. We developed case studies for teaching visualization ethics and that’s what our short presentation was about. The link above is to a Google Drive with drafts of our cases.

Thursday morning I was part of a panel on Text Analysis Tools and Infrastructure in 2024 and Beyond. (The link, again, takes you to a web page where you can download the short papers we wrote for this “flipped” session.) This panel brought together a bunch of text analysis projects like WordCruncher and Lexos to talk about how we can maintain and evolve our infrastructure.

How to Write Poetry Using Copilot

How to Write Poetry Using Copilot is a short guide on how to use Microsoft Copilot to write different genres of poetry. Try it out, it is rather interesting. Here are some of the reasons they give for asking Copilot to write poetry:

  • Create a thoughtful surprise. Why not surprise a loved one with a meaningful poem that will make their day?
  • Add poems to cards. If you’re creating a birthday, anniversary, or Valentine’s Day card from scratch, Copilot can help you write a unique poem for the occasion.
  • Create eye-catching emails. If you’re trying to add humor to a company newsletter or a marketing email that your customers will read, you can have Copilot write a fun poem to spice up your emails.
  • See poetry examples. If you’re looking for examples of different types of poetry, like sonnets or haikus, you can use Copilot to give you an example of one of these poems.

 

Home | Constellate

The new text and data analysis service from JSTOR and Portico.

Thanks to John I have been exploring Constellate. This comes from ITHAKA that has developed JSTOR. Constellate lets you build a dataset from their collections and then visualize the data (see image above.) They also have a Jupyter lab where you can then run notebooks on your data.

They are now experimenting with AI tools.

In Ukraine War, A.I. Begins Ushering In an Age of Killer Robots

Driven by the war with Russia, many Ukrainian companies are working on a major leap forward in the weaponization of consumer technology.

The New York Times has an important story on how, In Ukraine War, A.I. Begins Ushering In an Age of Killer Robots. In short, the existential threat of the overwhelming Russian attack is creating a situation where Ukraine is developing a home-grown autonomous weapons industry that repurposes consumer technologies. Not only are all sorts of countries testing AI powered weapons in Ukraine, the Ukrainians are weaponizing cheap technologies and, in the process, removing a lot of the guardrails.

The pressure to outthink the enemy, along with huge flows of investment, donations and government contracts, has turned Ukraine into a Silicon Valley for autonomous drones and other weaponry.

There isn’t necessarily any “human in the loop” in the cheap systems they are developing. One wonders how the development of this industry will affect other conflicts. Could we see a proliferation of terrorist drone attacks put together following plans circulating on the internet?

ChatGPT is Bullshit.

The Hallucination Lie

Ignacio de Gregorio has a nice Medium essay about why ChatGPT is bullshit. The essay is essentially a short and accessible version of an academic article by Hicks, M. T., et al. (2024), ChatGPT is bullshit. They make the case that people make decisions based on their understanding about what LLMs are doing and that “hallucination” is the wrong word because ChatGPT is not misperceiving the way a human would. Instead they need to understand that LLMs are designed with no regard for the truth and are therefore bullshitting.

Because these programs cannot themselves be concerned with truth, and because they are designed to produce
text that looks truth-apt without any actual concern for truth,
it seems appropriate to call their outputs bullshit. (p. 1)

Given this process, it’s not surprising that LLMs have a
problem with the truth. Their goal is to provide a normal-
seeming response to a prompt, not to convey information
that is helpful to their interlocutor. (p. 2)

At the end the authors make the case that if we adopt Dennett’s intentional stance then we would do well to attribute to ChatGPT the intentions of a hard bullshitter as that would allow us to better diagnose what it was doing. There is also a discussion of the intentions of the developers. You could say that they made available a tool that bullshitted without care for the truth.

Are we, as a society, at risk of being led by these LLMs and their constant use, to confuse the simulacra “truthiness” for true knowledge?