Constellate Sunset

The neat ITHAKA Constellate project is being shut down. It sounds like it was not financially sustainable.

As of November 2024, ITHAKA made the decision to sunset Constellate on July 1, 2025. While we’re proud of the meaningful impact Constellate has had on individuals and institutions, helping advance computational literacy and text analysis skills across academia, we have concluded that continuing to support the platform and classes is not sustainable for ITHAKA in the long term. As a nonprofit organization, we need to focus our resources on initiatives that can achieve broad-scale impact aligned with our mission. Despite Constellate’s success with its participating institutions, we haven’t found a path to achieve this broader impact.

It sounds like this sort of analytical support is best supported in universities by courses, workshops etc. Constellate developed cool notebooks (available in GitHub), courses built on the notebooks, and webinar recordings.

Words Used at the Democratic and Republican National Conventions

Counting frequently spoken words and phrases at both events.

The New York Times ran a neat story that used text analysis to visualize the differences between Words Used at the Democratic and Republican National Conventions. They used a number of different visualization including butterfly bar graphs like the one above. They also had a form of word bubbles that I thought was less successful.

Replaying Japan 2024

I just got back from Replaying Japan 2024 which was at the University at Buffalo, SUNY. Taro Yoko was one of the keynotes and he was quite interesting on developing games like Nier Automata that are partly about AI in this age of AI. I was a coauthor of two papers:

  • A paper on “Parachuting over the Angel: Nintendo in Mexico” presented by Victor Fernandez. This paper looked at the development of a newsletter and then magazine about Nintendo in Mexico that then spread around Spanish South America.
  •  

    A second paper on “The Slogan Game: Missions, Visions and Values in Japanese Game Companies” presented by Keiji Amano. This paper built on work documented in this Spyral notebook, Japanese Game Company Slogans, Missions, Visions, and Values. We gathered various promotional statements of Japanese game companies and analyzed them.

The conference was one of the best Replaying Japan conferences thanks to Mimi Okabe’s hard work. There were lots of participants, including virtual ones, and great papers.

DH 2024: Visualization Ethics and Text Analysis Infrastructure

This week I’m at DH 2024 at George Mason in Washington DC. I presented as part of two sessions. 

On Wednesday I presented a short paper with Lauren Klein on work a group of us are doing on Visualization Ethics: A Case Study Approach. We met at a Dagstuhl on Visualization and the Humanities: Towards a Shared Research Agenda. We developed case studies for teaching visualization ethics and that’s what our short presentation was about. The link above is to a Google Drive with drafts of our cases.

Thursday morning I was part of a panel on Text Analysis Tools and Infrastructure in 2024 and Beyond. (The link, again, takes you to a web page where you can download the short papers we wrote for this “flipped” session.) This panel brought together a bunch of text analysis projects like WordCruncher and Lexos to talk about how we can maintain and evolve our infrastructure.

How to Write Poetry Using Copilot

How to Write Poetry Using Copilot is a short guide on how to use Microsoft Copilot to write different genres of poetry. Try it out, it is rather interesting. Here are some of the reasons they give for asking Copilot to write poetry:

  • Create a thoughtful surprise. Why not surprise a loved one with a meaningful poem that will make their day?
  • Add poems to cards. If you’re creating a birthday, anniversary, or Valentine’s Day card from scratch, Copilot can help you write a unique poem for the occasion.
  • Create eye-catching emails. If you’re trying to add humor to a company newsletter or a marketing email that your customers will read, you can have Copilot write a fun poem to spice up your emails.
  • See poetry examples. If you’re looking for examples of different types of poetry, like sonnets or haikus, you can use Copilot to give you an example of one of these poems.

 

Home | Constellate

The new text and data analysis service from JSTOR and Portico.

Thanks to John I have been exploring Constellate. This comes from ITHAKA that has developed JSTOR. Constellate lets you build a dataset from their collections and then visualize the data (see image above.) They also have a Jupyter lab where you can then run notebooks on your data.

They are now experimenting with AI tools.

Rebind | Read Like Never Before With Experts, AI, & Original Content

Experience the next chapter in reading with Rebind: the first AI-reading platform. Embark on expert-guided journeys through timeless classics.

From a NYTimes story I learned about John Kaag’s new initiative Rebind | Read Like Never Before With Experts, AI, & Original Content. The philosophers Kaag and Clancy Martin have teamed up with an investor to start a company that create AI enhanced “rebindings” of classics. They work with out of copyright book and then pay someone to interpret or comment on the book. The commentary is then used to train an AI with whom you can dialogue as you go through the book. The end result (which I am on the waitlist to try) will be a reading experience enhanced by interpretative videos and chances to interact. It answers Plato’s old critique of text that you can ask questions of it. Now you can.

This reminds me of an experiment by Schwitzgebel, Strasser, and Crosby who created a Daniel Dennett chatbot. Here you can see SChwitzgebel’s reflections on the project.

This project raised ethical issues like whether it was ethical to simulate a living person. In this case they asked for Dennett’s permission and didn’t give people direct access to the chatbot. With the announcements about Apple Intelligence it looks like Apple may provide an AI that is part of the system that will have access to your combined files so as to help with search and to help you talk with yourself. Internal dialogue, of course, is the paradigmatic manifestation of consciousness. Could one import one or two thinkers to have a multi-party dialogue about ones thinking over time … “What do you think Plato; should I write another paper about ethics and technology?”

The Lives of Literary Characters

The goal of this project is to generate knowledge about the behaviour of literary characters at large scale and make this data openly available to the public. Characters are the scaffolding of great storytelling. This Zooniverse project will allow us to crowdsource data to train AI models to better understand who characters are and what they do within diverse narrative worlds to answer one very big question: why do human beings tell stories?

Today we are going live on Zooinverse with our Citizen Science (crowdsourcing) project, The Lives of Literary Characters. The goal of the project is offer micro-tasks that allow volunteers to annotate literary passages that help annotate training data. It will be interesting to see if we get a decent number of volunteers.

Before setting this up we did some serious reading around the ethics of crowdsourcing as we didn’t want to just exploit readers.

 

‘New York Times’ considers legal action against OpenAI as copyright tensions swirl : NPR

The news publisher and maker of ChatGPT have held tense negotiations over striking a licensing deal for the use of the paper’s articles to train the chatbot. Now, legal action is being considered.

Finally we are seeing a serious challenge to the way AI companies are exploiting written resources on the web as the New York Times engaged Open AI,  ‘New York Times’ considers legal action against OpenAI as copyright tensions swirl.

A top concern for the Times is that ChatGPT is, in a sense, becoming a direct competitor with the paper by creating text that answers questions based on the original reporting and writing of the paper’s staff.

It remains to be seen what the legalities are. Does using a text in order to train a model constitute the making of a copy in violation of copyright? Does the model contain something equivalent to a copy of the original? These issues are being explored in the AI image generating space where Stability AI is being sued by Getty Images. I hope the New York Times doesn’t just settle quietly before there is a public airing of the issues around the exploitation/ownership of written work. I also note that the Author’s Guild is starting to advocate on behalf of authors,

“It says it’s not fair to use our stuff in your AI without permission or payment,” said Mary Rasenberger, CEO of The Author’s Guild. The non-profit writers’ advocacy organization created the letter, and sent it out to the AI companies on Monday. “So please start compensating us and talking to us.”

This could also have repercussions in academia as many of us scrape the web and social media when studying contemporary issues. For that matter what do we think about the use of our work? One could say that our work, supported as it is by the public, should be fair game from gathering, training and innovative reuse. Aren’t we supported for the public good? Perhaps we should assert that academic prose is available for training models?

What are our ethics?

OpenAI adds Code Interpreter to ChatGPT Plus

Upload datasets, generate reports, and download them in seconds!

OpenAI has just released a plug-in called Code Interpreter which is truly impressive. You need to have ChatGPT Plus to be able to turn it on. It then allows you to upload data and to use plain English to analyze it. You write requests/prompts like:

What are the top 20 content words in this text?

It then interprets your request and describes what it will try to do in Python. Then it generates the Python and runs it. When it has finished, it shows the results. You can see examples in this Medium article: 

ChatGPT’s Code Interpreter Was Just Released. Here’s How It Will Change Data Science Forever

I’ve been trying to see how I can use it to analyze a text. Here are some of the limitations:

  • It can’t handle large texts. This can be used to study a book length text, not a collection of books.
  • It frequently tries to load NLTK or other libraries and then fails. What is interesting is that it then tries other ways of achieving the same goal. For example, I asked for adjectives near the word “nature” and when it couldn’t load the NLTK POS library it then accessed a list of top adjectives in English and searched for those.
  • It can generate graphs of different sorts, but not interactives.
  • It is difficult to get the full transcript of an experiment where by “full” I mean that I want the Python code, the prompts, the responses, and any graphs generated. You can ask for a iPython notebook with the code which you can download. Perhaps I can also get a PDF with the images.

The Code Interpreter is in beta so I expect they will be improving it. It is none the less very impressive how it can translate prompts into processes. Particularly impressive is how it tries different approaches when things fail.

Code Interpreter could make data analysis and manipulation much more accessible. Without learning to code you can interrogate a data set and potentially run other processes. It is possible to imagine an unshackled Code Interpreter that could access the internet and do all sorts of things (like running a paper-clip business.)