Big Data – Page 2 – Theoreti.ca

UNESCO – Artificial Intelligence for Information Accessibility (AI4IA) Conference

Yesterday I organized a satellite panel for the UNESCO – Artificial Intelligence for Information Accessibility (AI4IA) Conference. This full conference takes place on GatherTown, a conferencing system that feels like an 8-bit 80s game. You wander around our AI4IA conference space and talk with others who are close and watch short prerecorded video talks of which there are about 60. I’m proud that Amii and the University of Alberta provided the technical support and funding to make the conference possible. The videos will also be up on YouTube for those who don’t make the conference.

The event we organized at the University of Alberta on Friday was an online panel on What is Responsible in Responsible Artificial Intelligence with Bettina Berendt, Florence Chee, Tugba Yoldas, and Katrina Ingram.

Bettina Berendt looked at what the Canadian approach to responsible AI could be and how it might be short sighted. She talked about a project that, like a translator, lets a person “translate” their writing in whistleblowing situations into prose that won’t identify them. It helps you remove the personal identifiable signal from the text. She then pointed out how this might be responsible, but might also lead to problems.

Florence Chee talked about how responsibility and ethics should be a starting point rather than an afterthought.

Tugba Yoldas talked about how meaningful human control is important to responsible AI and what it takes for there to be control.

Katrina Ingram of Ethically Aligned AI nicely wrapped up the short talks by discussing how she advises organizations that want to weave ethics into their work. She talked about the 4 Cs: Context, Culture, Content, and Commitment.

AI for Information Accessibility: From the Grassroots to Policy Action

It’s vital to “keep humans in the loop” to avoid humanizing machine-learning models in research

Today I was part of a panel organized by the Carnegie Council and the UNESCO Information for All Programme Working Group on AI for Information Accessibility: From the Grassroots to Policy Action. We discussed three issues starting with the issue of environmental sustainability and artificial intelligence, then moving to the issue of principles for AI, and finally policies and regulation. I am in awe of the other speakers who were excellent and introduced new ways of thinking about the issues.

Dariia Opryshko, for example, talked about the dangers of how Too Much Trust in AI Poses Unexpected Threats to the Scientific Process. We run the risk of limiting what we think is knowable to what can be researchers by AI. We also run the risk that we trust only research conducted by AI. Alternatively the misuse of AI could lead to science ceasing to be trusted. The Scientific American article linked to above is based on the research published in Nature on Artificial intelligence and illusions of understanding in scientific research.

I talked about the implications of the sorts of regulations we seen in AIDA (AI and Data Act) in C-27. AIDA takes a risk-management approach to regulating AI where they define a class of dangerous AIs called “high-risk” that will be treated differently. This allows the regulation to be “agile” in the sense that it can be adapted to emerging types of AIs. Right now we might be worried about LLMs and misinformation at scale, but five years from now it may be AIs that manage nuclear reactors. The issue with agility is that it will depend on there being government officers who stay on top of the technology or the government will end up relying on the very companies they are supposed to regulate to advise them. We thus need continuous training and experimentation in government for it to be able to regulate in an agile way.

When A.I.’s Output Is a Threat to A.I. Itself

As A.I.-generated data becomes harder to detect, it’s increasingly likely to be ingested by future A.I., leading to worse results.

The New York Times has a terrific article on model collapse, When A.I.’s Output Is a Threat to A.I. Itself. They illustrate what happens when an AI is repeatedly trained on its own output.

Model collapse is likely to become a problem for new generative AI systems trained on the internet which, in turn, is more and more a trash can full of AI generated misinformation. That companies like OpenAI don’t seem to respect the copyright and creativity of others makes is likely that there will be less and less free human data available. (This blog may end up the last source of fresh human text 🙂

The article also has an example of how output can converge and thus lose diversity as it trained on its own output over and over.

Perhaps the biggest takeaway of this research is that high-quality, diverse data is valuable and hard for computers to emulate.

One solution, then, is for A.I. companies to pay for this data instead of scooping it up from the internet, ensuring both human origin and high quality.

Words Used at the Democratic and Republican National Conventions

Counting frequently spoken words and phrases at both events.

The New York Times ran a neat story that used text analysis to visualize the differences between Words Used at the Democratic and Republican National Conventions. They used a number of different visualization including butterfly bar graphs like the one above. They also had a form of word bubbles that I thought was less successful.

DH 2024: Visualization Ethics and Text Analysis Infrastructure

This week I’m at DH 2024 at George Mason in Washington DC. I presented as part of two sessions.

On Wednesday I presented a short paper with Lauren Klein on work a group of us are doing on Visualization Ethics: A Case Study Approach. We met at a Dagstuhl on Visualization and the Humanities: Towards a Shared Research Agenda. We developed case studies for teaching visualization ethics and that’s what our short presentation was about. The link above is to a Google Drive with drafts of our cases.

Thursday morning I was part of a panel on Text Analysis Tools and Infrastructure in 2024 and Beyond. (The link, again, takes you to a web page where you can download the short papers we wrote for this “flipped” session.) This panel brought together a bunch of text analysis projects like WordCruncher and Lexos to talk about how we can maintain and evolve our infrastructure.

How to Write Poetry Using Copilot

How to Write Poetry Using Copilot is a short guide on how to use Microsoft Copilot to write different genres of poetry. Try it out, it is rather interesting. Here are some of the reasons they give for asking Copilot to write poetry:

Create a thoughtful surprise. Why not surprise a loved one with a meaningful poem that will make their day?

Add poems to cards. If you’re creating a birthday, anniversary, or Valentine’s Day card from scratch, Copilot can help you write a unique poem for the occasion.

Create eye-catching emails. If you’re trying to add humor to a company newsletter or a marketing email that your customers will read, you can have Copilot write a fun poem to spice up your emails.

See poetry examples. If you’re looking for examples of different types of poetry, like sonnets or haikus, you can use Copilot to give you an example of one of these poems.

Home | Constellate

The new text and data analysis service from JSTOR and Portico.

Thanks to John I have been exploring Constellate. This comes from ITHAKA that has developed JSTOR. Constellate lets you build a dataset from their collections and then visualize the data (see image above.) They also have a Jupyter lab where you can then run notebooks on your data.

They are now experimenting with AI tools.

Why the pope has the ears of G7 leaders on the ethics of AI

Pope Francis is leaning on thinking of Paolo Benanti, a friar adept at explaining how technology can change world

The Guardian has some good analysis on Why the pope has the ears of G7 leaders on the ethics of AI | Artificial intelligence (AI). The PM of Italy, Meloni, invited the pope to address the G7 leaders on the issue of AI. I blogged about this here. It is worth pointing out that this is not the first time the Vatican has intervened on the issue of AI ethics. Here is a short timeline:

In 2020 a bunch of Catholic organizations and industry heavyweights sign the Rome Call (Call for AI Ethics). The Archbishop of Canterbury just signed.
In 2021 they create the RenAIssance Foundation building on the Rome Call. It’s scientific director is Paolo Benati, a charismatic Franciscan monk, professor, and writer on religion and technology. He is apparently advising both Meloni and the pope and he coined the term “algo-ethics”. Most of his publications are in Italian, but there is an interview in English. He is also apparently on the OECD’s expert panel now.
2022 Benati publishes Human in the Loop: Decisioni umane e intelligenze artificiali (Human in the Loop: Human Decisions and Artificial Intelligences) which is about the importance of ethics to AI and the human in ethics.
2024 Meloni invites the Pope to address the G7 leaders gathered in Italy on AI.

UN launches recommendations for urgent action to curb harm from spread of mis and disinformation and hate speech Global Principles for Information Integrity address risks posed by advances in AI

United Nations, New York, 24 June 2024 – The world must respond to the harm caused by the spread of online hate and lies while robustly upholding human rights, United Nations Secretary- General António Guterres said today at the launch of the United Nations Global Principles for Information Integrity.

The UN has issued a press release announcing that the UN launches recommendations for urgent action to curb harm from spread of mis and disinformation and hate speech Global Principles for Information Integrity address risks posed by advances in AI. This press release marks the launch of the United Nations Global Principles for Information Integrity.

The recommendations in the press release include:

Tech companies should ensure safety and privacy by design in all products, alongside consistent application of policies and resources across countries and languages, with particular attention to the needs of those groups often targeted online. They should elevate crisis response and take measures to support information integrity around elections.

Tech companies should scope business models that do not rely on programmatic advertising and do not prioritize engagement above human rights, privacy, and safety, allowing users greater choice and control over their online experience and personal data.

Advertisers should demand transparency in digital advertising processes from the tech sector to help ensure that ad budgets do not inadvertently fund disinformation or hate or undermine human rights.

Tech companies and AI developers should ensure meaningful transparency and allow researchers and academics access to data while respecting user privacy, commission publicly available independent audits and co-develop industry accountability frameworks.

Rebind | Read Like Never Before With Experts, AI, & Original Content

Experience the next chapter in reading with Rebind: the first AI-reading platform. Embark on expert-guided journeys through timeless classics.

From a NYTimes story I learned about John Kaag’s new initiative Rebind | Read Like Never Before With Experts, AI, & Original Content. The philosophers Kaag and Clancy Martin have teamed up with an investor to start a company that create AI enhanced “rebindings” of classics. They work with out of copyright book and then pay someone to interpret or comment on the book. The commentary is then used to train an AI with whom you can dialogue as you go through the book. The end result (which I am on the waitlist to try) will be a reading experience enhanced by interpretative videos and chances to interact. It answers Plato’s old critique of text that you can ask questions of it. Now you can.

This reminds me of an experiment by Schwitzgebel, Strasser, and Crosby who created a Daniel Dennett chatbot. Here you can see SChwitzgebel’s reflections on the project.

This project raised ethical issues like whether it was ethical to simulate a living person. In this case they asked for Dennett’s permission and didn’t give people direct access to the chatbot. With the announcements about Apple Intelligence it looks like Apple may provide an AI that is part of the system that will have access to your combined files so as to help with search and to help you talk with yourself. Internal dialogue, of course, is the paradigmatic manifestation of consciousness. Could one import one or two thinkers to have a multi-party dialogue about ones thinking over time … “What do you think Plato; should I write another paper about ethics and technology?”