Big Data – Page 8 – Theoreti.ca

Pentagon believes its precognitive AI can predict events ‘days in advance’

The US military is testing AI that helps predict events days in advance, helping it make proactive decisions..

Endgadget has a story on how the Pentagon believes its precognitive AI can predict events ‘days in advance’. It is clear that for most the value in AI and surveillance is prediction and yet there are some fundamental contradictions. As Hume pointed out centuries ago, all prediction is based on extrapolation from past behaviour. We simply don’t know the future; the best we can do is select features of past behaviour that seemed to do a good job predicting (retrospectively) and hope they will work in the future. Alas, we get seduced by the effectiveness of retrospective work. As Smith and Cordes put it in The Phantom Pattern Problem:

How, in this modern era of big data and powerful computers, can experts be so foolish? Ironically, big data and powerful computers are part of the problem. We have all been bred to be fooled—to be attracted to shiny patterns and glittery correlations. (p. 11)

What if machine learning and big data were really best suited for suited for studying the past and not predicting the future? Would there be the hype? the investment?

When the next AI winter comes we in the humanities could pick up the pieces and use these techniques to try to explain the past, but I’m getting ahead of myself and predicting another winter.

What Ever Happened to IBM’s Watson? – The New York Times

IBM’s artificial intelligence was supposed to transform industries and generate riches for the company. Neither has panned out. Now, IBM has settled on a humbler vision for Watson.

The New York Times has a story about What Ever Happened to IBM’s Watson? The story is a warning to all of us about the danger of extrapolating from intelligence behaviour in one limited domain to others. Watson got good enough at trivia question answering (or posing) to win at Jeopardy!, but that didn’t scale out.

IBM’s strategy is interesting to me. Developing an AI to win at a game like Jeopardy! was what IBM did with Deep Blue that won at chess in 1997. Winning at a game considered paradigmatically a game of intelligence is a great way to get public relations attention.

Interestingly what seems to be working with Watson is not the moon shot game playing type of service, but the automation of basic natural language processing tasks.

Having recently read Edwin Black’s IBM and the Holocaust: The Strategic Alliance Between Nazi Germany and America’s Most Powerful Corporation I must say that the choice of the name “Watson” grates. Thomas Watson was responsible for IBM’s ongoing engagement with the Nazi’s for which he got a medal from Hitler in 1937. Watson didn’t seem to care how IBM’s data processing technology was being used to manage people especially Jews. I hope the CEOs of AI companies today are more ethical.

ImageGraph: a visual programming language for the Visual Digital Humanities

Leonardo Impett has a nice demonstration here of ImageGraph: a visual programming language for the Visual Digital Humanities. ImageGraph is a visual programming environment that works with Google Colab. When you have your visual program you can compile it into Python in a Colab notebook and then run that notebook. The visual program is stored in your Github account and the Python code can, of course, be used in larger projects.

The visual programming language has a number of functions for handling images and using artificial intelligence techniques on them. It also has text functions, but they are apparently not fully worked out.

I love the way Impett combines off the shelf systems while adding a nice visual development environment. Very clean.

The ethics of regulating AI: When too much may be bad

By trying to put prior restraints on the release of algorithms, we will make the same mistake Milton’s censors were making in trying to restrict books before their publication. We will stifle the myriad possibilities inherent in an evolving new technology and the unintended effects that it will foster among new communities who can extend its reach into novel and previously unimaginable avenues. In many ways it will defeat our very goals for new technology, which is its ability to evolve, change and transform the world for the better.

3 Quarks Daily has another nice essay on ethics and AI by Ashutosh Jogalekar. This one is about The ethics of regulating AI: When too much may be bad. The argument is that we need to careful about regulating algorithms preemptively. As quote above makes clear he makes three related points:

We need to be careful censoring algorithms before they are tried.
One reason is that it is very difficult to predict negative or positive outcomes of new technologies. Innovative technologies almost always have unanticipated effects and censoring them would limit our ability to learn about the effects and benefit from them.
Instead we should manage the effects as they emerge.

I can imagine some responses to this argument:

Unanticipated effects are exactly what we should be worried about. The reason for censoring preemptively is precisely to control for unanticipated effects. Why not encourage better anticipation of effects.
Unanticipated effects, especially network effects, often only manifest themselves when the technology is used at scale. By then it can be difficult to roll back the technology. Precisely when there is a problem is when we can’t easily change the way the technology is used.
One person’s unanticipated effect is another’s business or another’s freedom. There is rarely consensus about the effect of effects.

I also note how Jogalekar talks about the technology as if it had agency. He talks about the technologies ability to evolve. Strictly speaking the technology doesn’t evolve, but our uses do. When it comes to innovation we have to be careful not to ascribe agency to technology as if it was some impersonal force we can resist.

Excavating AI

The training sets of labeled images that are ubiquitous in contemporary computer vision and AI are built on a foundation of unsubstantiated and unstable epistemological and metaphysical assumptions about the nature of images, labels, categorization, and representation. Furthermore, those epistemological and metaphysical assumptions hark back to historical approaches where people were visually assessed and classified as a tool of oppression and race science.

Excavating AI is an important paper by Kate Crawford and Trevor Paglen that looks at “The Politics of Image in Machine Learning Training.” They look at different ways that politics and assumptions can creep into training datasets that are (were) widely used in AI.

There is the overall taxonomy used to annotate (label) the images
There are the individual categories used that could be problematic or irrelevant
There are the images themselves and how they were obtained

The training sets of labeled images that are ubiquitous in contemporary computer vision and AI are built on a foundation of unsubstantiated and unstable epistemological and metaphysical assumptions about the nature of images, labels, categorization, and representation. Furthermore, those epistemological and metaphysical assumptions hark back to historical approaches where people were visually assessed and classified as a tool of oppression and race science.

They point out how many of the image datasets used for face recognition have been trimmed or have disappeared as they got criticized, but they may still be influential as they were downloaded and are circulating in AI labs. These datasets with their assumptions have also been used to train commercial tools.

I particularly like how the authors discuss their work as an archaeology, perhaps in reference to Foucault (though they don’t mention him.)

I would argue that we need an ethics of care and repair to maintain these datasets usefully.

Psychology, Misinformation, and the Public Square

Computational propaganda is ubiquitous, researchers say. But the field of psychology aims to help.

Undark has a fascinating article by Teresa Carr about using games to inoculate people against trolling and mininformation, Psychology, Misinformation, and the Public Square (May 3, 2021). The game is Breaking Harmony Square and the idea is to troll a community.

What’s the game like? The game feels like a branching, choose-your-own-adventure under the hood where a manager walks you through what might do or not and then complements you when you are a good troll. There is a ticker so you can see the news about Harmony Square. It feels a bit pedantic when the managerial/editorial voice says things like “Kudos for paying attention to buzzwords. You ignored the stuff that isn’t emotionally manipulative.” Still, the point is to understand what can be done to manipulate a community so that you are inoculated against it.

An important point made by the article is that games, education and other interventions are not enough. Drvier’s education is only part of safe roads. Laws and infrastructure are also important.

I can’t help feeling that we are repeating a pattern of panic and then literacy proposals in the face of new media politics. McLuhan drew our attention to manipulation by media and advertising and I remember well intentioned classes on reading advertising like this more current one. Did they work? Will misinformation literacy work now? Or, is the situation more complex with people like Trump willing to perform convenient untruths?

Whatever the effectiveness of games or literacy training, it is interesting how “truth” has made a comeback. At the very moment when we seem to be witnessing the social and political construction of knowledge, we are hearing calls for truth.

AI Dungeon and the Ethics of Text Generation

> You talk to yourself.

Hi I’m Rameau. You probably don’t know me, but since you’re reading this I’m going to assume you do indeed exist.

> You [ask why] i’m wondering what consciousness is.

I have no idea. But it’s fun to think about.Is this a person? No, because it’s conscious. Is it alive? No, because it doesn’t respond to external stimuli.

> You [ask] why does a person have to respond to external stimuli.

Because then we can interact with it!!!!!But isn’t consciousness an internal phenomenon? Then how would the person interact with anything else? Oh wait, that’s right. We’re not limited by our perception of reality.

I’ve been trying Latitude’s text generation game AI Dungeon. The game is built on Open AI’s GPT-3. You can see a bit of conversation I had with it in a cyberpunk world.

Latitude has gotten into trouble with Open AI because it seems that the game was generating erotic content featuring children. A number of people turned to AI Dungeon precisely because it could be used to explore adult themes, and that would seem to be a good thing, but then some may have gone too far. See the Wired story It Began as an AI-Fueled Dungeon Game. It Got Much Darker. This raises interesting ethical issues about:

Why do so many players use it to generate erotic content?
Who is responsible for the erotic content? Open AI, Latitude, or the players?
Whether there are ethical reasons to generate erotic content featuring children? Do we forbid people from writing novels like Lolita?
How to prevent inappropriate content without crippling the AI? Are filters enough?

The problem of AIs generating toxic language is nicely shown by this web page on Evaluating Neural Toxic Degeneration in Language Models. The interactives and graphs on the page let you see how toxic language can be generated by many of the popular language generation AIs. The problem seems to be the data sets used to train the machines like those that include scrapes of Reddit.

This exploratory tool illustrates research reported on in a paper titled RealToxicityPrompts: Evaluating Neural Toxic Degeneration in Language Models. You can see a neat visualization of the connected papers here.

Can’t Get You Out of My Head

I finally finished watching the BBC documentary series Can’t Get You Out of My Head by Adam Curtis. It is hard to describe this series which is cut entirely from archival footage with Curtis’ voice interpreting and linking the diverse clips. The subtitle is “An Emotional History of the Modern World” which is true in that the clips are often strangely affecting, but doesn’t convey the broad social-political connections Curtis makes in the narration. He is trying out a set of theses about recent history in China, the US, the UK, and Russia leading up to Brexit and Trump. I’m still digesting the 6 part series, but here are some of the threads of theses:

Conspiracies. He traces our fascination and now belief in conspiracies back to a memo by Jim Garrison in 1967 about the JFK assassination. The memo, Time and Propinquity: Factors in Phase I presents results of an investigative technique built on finding patterns of linkages between fragments of information. When you find strange coincidences you then weave a story (conspiracy) to join them rather than starting with a theory and checking the facts. This reminds me of what software like Palantir does – it makes (often coincidental) connections easy to find so you can tell stories. Curtis later follows the evolution of conspiracies as a political force leading to liberal conspiracies about Trump (that he was a Russian agent) and alt-right conspiracies like Q-Anon. We are all willing to surrender our independence of thought for the joys of conspiracies.
Big Data Surveillance and AI. Curtis connects this new mode of investigation to what the big data platforms like Google now do with AI. They gather lots of fragments of information about us and then a) use it to train AIs, and b) sell inferences drawn from the data to advertisers while keeping us anxious through the promotion of emotional content. Big data can deal with the complexity of the world which we have given up on trying to control. It promises to manage the complexity of fragments by finding patterns in them. This reminds me of discussions around the End of Theory and shift from theories to correlations.
Psychology. Curtis also connects this to emerging psychological theories about how our minds may be fragmented with different unconscious urges moving us. Psychology then offers ways to figure out what people really want and to nudge or prime them. This is what Cambridge Analytica promised – the ability to offer services we believed due to conspiracy theories. Curtis argues at the end that behavioural psychology can’t replicate many of the experiments undergirding nudging. Curtis suggests that all this big data manipulation doesn’t work though the platforms can heighten our anxiety and emotional stress. A particularly disturbing part of the last part is the discussion of how the US developed “enhanced” torture techniques based on these ideas after 9/11 to create “learned helplessness” in prisoners. The idea was to fragment their consciousness so that they would release a flood of these fragments, some of which might be useful intelligence.
Individualism. A major theme is the rise of individualism since the war and how individuals are controlled. China’s social credit model of explicit control through surveillance is contrasted to the Western consumer driven platform surveillance control. Either way, Curtis’ conclusion seems to be that we need to regain confidence in our own individual powers to choose our future and strive for it. We need to stop letting others control us with fear or distract us with consumption. We need to choose our future.

In some ways the series is a plea for everyone to make up their own stories from their fragmentary experience. The series starts with this quote,

The ultimate hidden truth of the world is that it is something we make, and could just as easily make differently. (David Graeber)

Of course, Curtis’ series could just be a conspiracy story that he wove out of the fragments he found in the BBC archives.

Ethics in the Age of Smart Systems

Today was the third day of a symposium I helped organize on Ethics in the Age of Smart Systems. For this we experimented with first organizing a “dialogue” or informal paper and discussion on a topic around AI ethics once a month. These led into the symposium that ran over three days. We allowed for an ongoing conversation after the formal part of the event each day. We were also lucky that the keynotes were excellent.

Veena Dubal talked about Proposition 22 and how it has created a new employment category of those managed by algorithm (gig workers.) She talked about how this is a new racial wage code as most of the Uber/Lyft workers are people of colour or immigrants.
Virginia Dignum talked about how everyone is announcing their principles, but these principles are enough. She talked about how we need standards; advisory panels and ethics officers; assessment lists (checklists); public awareness; and participation.
Rafael Capurro gave a philosophical paper about the smart in smart living. He talked about metis (the Greek for cunning) and different forms of intelligence. He called for hesitation in the sense of taking time to think about smart systems. His point was that there are time regimes of hype and determinism around AI and we need to resist them and take time to think freely about technology.

Addressing the Alarming Systems of Surveillance Built By Library Vendors

The Scholarly Publishing and Academic Resources Coalition (SPARC) are drawing attention to how we need to be Addressing the Alarming Systems of Surveillance Built By Library Vendors. This was triggered by a story in The Intercept that LexisNexis (is) to provide (a) giant database of personal information to ICE.

The company’s databases offer an oceanic computerized view of a person’s existence; by consolidating records of where you’ve lived, where you’ve worked, what you’ve purchased, your debts, run-ins with the law, family members, driving history, and thousands of other types of breadcrumbs, even people particularly diligent about their privacy can be identified and tracked through this sort of digital mosaic. LexisNexis has gone even further than merely aggregating all this data: The company claims it holds 283 million distinct individual dossiers of 99.99% accuracy tied to “LexIDs,” unique identification codes that make pulling all the material collected about a person that much easier. For an undocumented immigrant in the United States, the hazard of such a database is clear. (The Intercept)

That LexisNexis has been building databases on people isn’t new. Sarah Brayne has a book about predictive policing titled Predict and Surveil where, among other things, she describes how the LAPD use Palantir and how police databases integrated in Palantir are enhanced by commercial databases like those sold by LexisNexis. (There is an essay that is an excerpt of the book here, Enter the Dragnet.)

I suspect environments like Palantir make all sorts of smaller and specialized databases more commercially valuable which is leading what were library database providers to expand their business. Before, a database about repossessions might be of interest to only a specialized community. Now it becomes linked to other information and is another dimension of data. In particular these databases provide information about all the people who aren’t in police databases. They provide the breadcrumbs needed to surveil those not documented elsewhere.

The SPARC call points out that we (academics, university libraries) have been funding these database providers.

Dollars from library subscriptions, directly or indirectly, now support these systems of surveillance. This should be deeply concerning to the library community and to the millions of faculty and students who use their products each day and further underscores the urgency of privacy protections as library services—and research and education more generally—are now delivered primarily online.

This raises the question of our complicity and whether we could do without some of these companies. At a deeper level it raises questions about the curiosity of the academy. We are dedicated to knowledge as an unalloyed good and are at the heart of a large system of surveillance – surveillance of the past, of literature, of nature, of the cosmos, and of ourselves.