Big Data – Page 3 – Theoreti.ca

Fuck the Poetry Police: On the Index of Major Literary Prizes in the United States

The LARB has a nice essay by Dan Sinykin on how researchers have used data to track how poetry prizes are distributed unequally titled, Fuck the Poetry Police: On the Index of Major Literary Prizes in the United States. The essay talks about the creation of the Post45 Data Collective which provides peer review for post-1945 cultural datasets.

Sinykin talks about this as an “act as groundbreaking as the research itself” which seems a bit of an exaggeration. It is important that data is being reviewed and published, but it has been happening for a while in other fields. Nonetheless, this is a welcome initiative, especially if it gets attention like the LARB article. In 2013 the Tri-Council (of research agencies in Canada) called for a culture of research data stewardship. In 2015 I worked with Sonja Sapach and Catherine Middleton on a report on a Data Management Plan Recommendation for Social Science and Humanities Funding Agencies. This looks more at the front end of requiring plans from people submitting grant proposals that are asking for funding for data-driven projects, but this was so that data could be made available for future research.

Sinykin’s essay looks at the poetry publishing culture in the US and how white it is. He shows how data can be used to study inequalities. We also need to ask about the privilege of English poetry and that of culture from the Global North. Not to mention research and research infrastructure.

Why scientists are building AI avatars of the dead | WIRED Middle East

Advances in AI and humanoid robotics have brought us to the threshold of a new kind of capability: creating lifelike digital renditions of the deceased.

Wired Magazine has a nice article about Why scientists are building AI avatars of the dead. The article talks about digital twin technology designed to create an avatar of a particular person that could serve as a family companion. You could have your grandfather modelled so that you could talk to him and hear his stories after he has passed.

The article also talks about the importance of the body and ideas about modelling personas with bodies. Imagine wearing motion trackers and other sensors so that your bodily presence could be modelled. Then imagine your digital twin being instantiated in a robot.

Needless to say we aren’t anywhere close yet. See this spoof video of the robot Sophia on a date with Will Smith. There are nonetheless issues about the legalities and ethics of creating bots based on people. What if one didn’t have permission from the original? Is it ethical to create a bot modelled on a historical person? a living person?

We routinely animate other people in novels, dialogue (of the dead), and in conversation. Is impersonating someone so wrong? Should people be able to control their name and likeness under all circumstances?

Then there are the possibilities for the manipulation of a digital twin or through such a twin.

As for the issue of data breaches, digital resurrection opens up a whole new can of worms. “You may share all of your feelings, your intimate details,” Hickok says. “So there’s the prospect of malicious intent—if I had access to your bot and was able to talk to you through it, I could change your attitude about things or nudge you toward certain actions, say things your loved one never would have said.”

How AI image generators work, like DALL-E, Lensa and stable diffusion

Use our simulator to learn how AI generates images from “noise.”

The Washington Post has a nice explainer on how text to image generators work: How AI image generators work, like DALL-E, Lensa and stable diffusion. They let you play with the generator, though you have to stick with the predefined phrases. What I hadn’t realized was the role of static noise in the diffusion model. Not sure how it works, but it seems to train the AI to recognize and then generate in noisy images.

Character.AI: Dialogue on AI Ethics

Part of image generated from text, “cartoon pencil drawing of ethics professor and student talking” by Midjourney, Oct. 5, 2022.

Last week I created a character on Character.AI, a new artificial tool created by some ex-Google engineers who worked on LaMDA, the language model from Google that I blogged about before.

Character.AI, which is now down for maintenance due to all the users, lets you quickly create a character and then enter into dialogue with it. It actually works quite well. I created “The Ethics Professor” and then wrote a script of questions that I used to engage the AI character. The dialogue is below.

Pages: 12

Issues around AI text to art generators

A new art-generating AI system called Stable Diffusion can create convincing deepfakes, including of celebrities.

TechCrunch has a nice discussion of Deepfakes for all: Uncensored AI art model prompts ethics questions. The relatively sudden availability of AI text to art generators has provoked discussion on the ethics of creation and of large machine learning models. Here are some interesting links:

Ars Technica has a article on how Artists begin selling AI-generated artwork on stock photography websites. I note that MidJourney generated images all seem to have a similar style. We may find it becomes more and more identifiable like some smell in the background.
Ars Technica has another article on various projects to be able to see what original images might have been used in training AIs like MidJourney. Have AI image generators assimilated your art? New tool lets you check. The provenance of some of the training sets is documented here. It remains to be seen what you can do if your images have been used.
And of course there are art groups that are banning AI generated art, Flooded with AI-generated images, some art communities ban them completely. This raises the question of whether one can tell?

It is worth identifying some of the potential issues:

These art generating AIs may have violated copyright in scraping millions of images. Could artists whose work has been exploited sue for compensation?
The AIs are black boxes that are hard to query. You can’t tell if copyrighted images were used.
These AIs could change the economics of illustration. People who used to commission and pay for custom art for things like magazines, book covers, and posters, could start just using these AIs to save money. Just as Flickr changed the economics of photography, MidJourney could put commercial illustrators out of work.
We could see a lot more “original” art in situations where before people could not afford it. Perhaps poster stores could offer to generate a custom image for you and print it. Get your portrait done as a cyberpunk astronaut.
The AIs could reinforce visual bias in our visual literacy. Systems that always see Philosophers as old white guys with beards could limit our imagination of what could be.
These could be used to create pornographic deepfakes with people’s faces on them or other toxic imagery.

EU Artificial Intelligence Act

With the introduction of the Artificial Intelligence Act, the European Union aims to create a legal framework for AI to promote trust and excellence. The AI Act would establish a risk-based framework to regulate AI applications, products and services. The rule of thumb: the higher the risk, the stricter the rule. But the proposal also raises important questions about fundamental rights and whether to simply prohibit certain AI applications, such as social scoring and mass surveillance, as UNESCO has recently urged in the Recommendation on AI Ethics, endorsed by 193 countries. Because of the significance of the proposed EU Act and the CAIDP’s goal to protect fundamental rights, democratic institutions and the rule of law, we have created this informational page to provide easy access to EU institutional documents, the relevant work of CAIDP and others, and to chart the important milestones as the proposal moves forward. We welcome your suggestions for additions. Please email us.

The Center for AI and Digital Policy (CAIDP) has a good page on the EU Artificial Intelligence Act with links to different resources. I’m trying to understand this Act the network of documents related to it, as the AI Act could have a profound impact on how AI is regulated, so I’ve put together some starting points.

First, the point about the potential influence of the AI Act is made in a slide by Giuliano Borter, a CAIDP Fellow. The slide deck is a great starting point that covers key points to know.

Key Point #1 – EU Shapes Global Digital Policy

• Unlike OECD AI Principles, EU AI legislation will have legal force with consequences for businesses and consumers

• EU has enormous influence on global digital policy (e.g. GDPR)

• EU AI regulation could have similar impact

Borter goes on to point out that the Proposal is based on a “risk-based approach” where the higher the risk the more (strict) regulation. This approach is supposed to provide legal room for innovative businesses not working on risky projects while controlling problematic (riskier) uses. Borter’s slides suggest that an unresolved issue is mass surveillance. I can imagine that there is the danger that data collected or inferred by smaller (or less risky) services is aggregated into something with a different level of risk. There are also issues around biometrics (from face recognition on) and AI weapons that might not be covered.

The Act is at the moment only a proposal for a Regulation laying down harmonised rules on artificial intelligence (Artificial Intelligence Act) – the Proposal was launched in April of 2021 and all sorts of entities, including the CAIDP are suggesting amendments.

What was the reason for this AI Act? In the Reasons and Objective opening to the Proposal they write that “The proposal is based on EU values and fundamental rights and aims to give people and other users the confidence to embrace AI-based solutions, while encouraging businesses to develop them.” (p. 1) You can see the balancing of values, trust and business.

But I think it is really the economic/business side of the issue that is driving the Act. This can be seen in the Explanatory Statement at the end of the Report on artificial intelligence in a digital age (PDF) from the European Parliament Special Committee on Artificial Intelligence in a Digital Age (AIDA).

Within the global competition, the EU has already fallen behind. Significant parts of AI innovation and even more the commercialisation of AI technologies take place outside of Europe. We neither take the lead in development, research or investment in AI. If we do not set clear standards for the human-centred approach to AI that is based on our core European ethical standards and democratic values, they will be determined elsewhere. The consequences of falling further behind do not only threaten our economic prosperity but also lead to an application of AI that threatens our security, including surveillance, disinformation and social scoring. In fact, to be a global power means to be a leader in AI. (p. 61)

The AI Act may be seen as way to catch up. AIDA makes the supporting case that “Instead of focusing on threats, a human-centric approach to AI based on our values will use AI for its benefits and give us the competitive edge to frame AI regulation on the global stage.” (p. 61) The idea seems to be that a values based proposal that enables regulated responsible AI will not only avoid risky uses, but create the legal space to encourage low-risk innovation. In particular I sense that there is a linkage to the Green Deal – ie. that AI is being a promising technology that could help reduce energy use through smart systems.

Access Now also has a page on the AI Act. They have a nice clear set of amendments that show where some of the weaknesses in the AI Act could be.

Colorado artist used artificial intelligence program Midjourney to win first place

When Jason Allen submitted his “Théâtre D’opéra Spatial” into the Colorado State Fair’s fine arts competition last week, the sumptuous print was an immediate hit. It also marked a new milestone in the growth of artificial intelligence.

There has been a lot of comment about how a Colorado artist used artificial intelligence program Midjourney to win first place. This is seen as historic, but, as is pointed out in the Washington Post piece, people weren’t sure photography is an art. You could say that in both cases the art is in selection, not the image making that is taken over by a machine.

I can’t help thinking that an important part of art is the making. When I make art things they are amateurish and wouldn’t win any prizes, but I enjoy the making and improving at making. Having played with Midjourney it does have some of the pleasures of creating, but now the creation is through iteratively trying different combinations of words.

The New York Times has story about the win too, An A.I.-Generated Picture Won an Art Prize. Artists Aren’t Happy.

Vol. 31 No. 1 (2022): Ethics in the Age of Smart Systems: Special Issue | The International Review of Information Ethics

The Special Issue of the International Review of Information Ethics has just been fully put up at Vol. 31 No. 1 (2022): Ethics in the Age of Smart Systems: Special Issue. In addition to co-editing it, I co-authored an Editorial commenting On Dialogue and Artificial Intelligence that deals with the LaMDA as sentience issue.

This special issue came out a series of dialogues that AI4Society organized with our partners. These were followed by a symposium on “Ethics in the Age of Smart Machine.”

Workplace Productivity: Are You Being Tracked?

“We’re in this era of measurement but we don’t know what we should be measuring,” said Ryan Fuller, former vice president for workplace intelligence at Microsoft.

The New York Times has essay on Workplace Productivity: Are You Being Tracked? The neat thing is that the article tracks your reading of it to give you a taste of the sorts of tracking now being deployed for remote (and on site) workers. If you pause and don’t scroll it puts up messages like “Hey are you still there? You’ve been inactive for 32 seconds.”

But Ms. Kraemer, like many of her colleagues, found that WorkSmart upended ideas she had taken for granted: that she would have more freedom in her home than at an office; that her M.B.A. and experience had earned her more say over her time.

What is new is the shift to remote work due to Covid. Many companies are fine with remote work if they can guarantee productivity. The other thing that is changing is the use of tracking for not just manual work, but also for white-collar work.

I’ve noticed that this goes hand in hand with self-tracking. My Apple Watch/iPhone offer a weekly summary of my browsing. It also offers to track my physical activity. If I go for a walk, somewhere close to a kilometer it asks if I want this tracked as exercise.

The questions raised by the authors of the New York Time article include Whether we are tracking the right things? What are we losing with all this tracking? What is happening to all this data? Can companies sell the data about employees?

The article is by Jodi Kantor and Arya Sundaram. It is produced by Aliza Aufrichtig and Rumsey Taylor. Aug. 14, 2022

Zampolli Prize Awarded to Voyant Tools

Spyral Notebook Detail (showing code cell and stacked graphs)

Yesterday I gave the triennial Zampolli Prize lecture that honoured Voyant. The lecture is given at the annual ADHO Digital Humanities conference which this year is being hosted by the University of Tokyo. The award notice is here Zampolli Prize Awarded to Voyant Tools. Some of the things I touched on in the talk included:

The genius of of Stéfan Sinclair who passed in August 2020. Voyant was his vision from the time of his dissertation for which he develop HyperPo.
The global team of people involved in Voyant including many graduate research assistants at the U of Alberta. See the About page of Voyant.
How Voyant built on ideas Stéfan and I developed in Hermeneutica about collaborative research as opposed to the inherited solitary paradigm.
How we have now developed an extension to Voyant called Spyral. Spyral is a notebook programming environment built on JavaScript. It allows you to document your Voyant explorations, save parameters for corpora and tools, preprocess texts, postprocess results, and create new visualizations. It is, in short, a full data analysis and visualization environment built into Voyant so you can easily call up and explore results in Voyant’s already rich tool set.
In the image above you can see a Spyral code cell that outputs two stacked graphs where the same pattern of words is graphed over two different, but synchronized, corpora. You can thus compare the use of the pattern over time between the two datasets.
Replication as a practice for recovering an understanding of innovative technologies now taken for granted like tokenization or the KWIC. I talked about how Stéfan and I have been replicating important text processing technologies as a way of understanding the history of computing and the digital humanities. Spyral was the environment we developed for documenting our replications.
I then backed up and talked about the epistemological questions about knowledge and knowledge things in the digital age that grew out of and then inspired our experiments in replication. These go back to attempts to think-through tools as knowledge things that bear knowledge in ways that discourse doesn’t. In this context I talked about the DIKW pyramid (data, information, knowledge, wisdom) that captures current views about the relationships between data and knowledge.
Finally I called for help to maintain and extend Voyant/Spyral. I announced the creation of a consortium to bring us together to sustain Voyant.

It was an honour to be able to give the Zampolli lecture on behalf of all the people who have made Voyant such a useful tool.