Can GPT-3 Pass a Writer’s Turing Test?

While earlier computational approaches focused on narrow and inflexible grammar and syntax, these new Transformer models offer us novel insights into the way language and literature work.

The Journal of Cultural Analytics has a nice article that asks  Can GPT-3 Pass a Writer’s Turing Test? They didn’t actually get access to GPT-3, but did test GPT-2 extensively in different projects and they assessed the output of GPT-3 reproduced in an essay on Philosophers On GPT-3. At the end they marked and commented on a number of the published short essays GPT-3 produced in response to the philosophers. They reflect on how would decide if GPT-3 were as good as an undergraduate writer.

What they never mention is Richard Powers’ novel Galatea 2.2 (Harper Perennial, 1996). In the novel an AI scientist and the narrator set out to see if they can create an AI that could pass a Masters English Literature exam. The novel is very smart and has a tragic ending.

Update: Here is a link to Awesome GPT-3 – a collection of links and articles.

Why Uber’s business model is doomed

Like other ridesharing companies, it made a big bet on an automated future that has failed to materialise, says Aaron Benanav, a researcher at Humboldt University

Aaron Benanav has an important opinion piece in The Guardian about Why Uber’s business model is doomed. Benanav argues that Uber and Lyft’s business model is to capture market share and then ditch the drivers they have employed for self-driving cars as they become reliable. In other words they are first disrupting the human taxi services so as to capitalize on driverless technology when it comes. Their current business is losing money as they feast on venture capital to get market share and if they can’t make the switch to driverless it is likely they go bankrupt.

This raises the question of whether we will see driverless technology good enough to oust the human drivers? I suspect that we will see it for certain geo-fenced zones where Uber and Lyft can pressure local governments to discipline the streets so as to be safe for driverless. In countries with chaotic and hard to accurately map streets (think medieval Italian towns) it may never work well enough.

All of this raises the deeper ethical issue of how driverless vehicles in particular and AI in general are being imagined and implemented. While there may be nothing unethical about driverless cars per se, there IS something unethical about a company deliberately bypassing government regulations, sucking up capital, driving out the small human taxi businesses, all in order to monopolize a market that they can then profit on by firing the drivers that got them there for driverless cars. Why is this the way AI is being commercialized rather than trying to create better public transit systems or better systems for helping people with disabilities? Who do we hold responsible for the decisions or lack of decisions that sees driverless AI technology implemented in a particularly brutal and illegal fashion. (See Benanav on the illegality of what Uber and Lyft are doing by forcing drivers to be self-employed contractors despite rulings to the contrary.)

It is this deeper set of issues around the imagination, implementation, and commercialization of AI that needs to be addressed. I imagine most developers won’t intentionally create unethical AIs, but many will create cool technologies that are commercialized by someone else in brutal and disruptive ways. Those commercializing and their financial backers (which are often all of us and our pension plans) will also feel no moral responsibility because we are just benefiting from (mostly) legal innovative businesses. Corporate social responsibility is a myth. At most corporate ethics is conceived of as a mix of public relations and legal constraints. Everything else is just fair game and the inevitable disruptions in the marketplace. Those who suffer are losers.

This then raises the issue of the ethics of anticipation. What is missing is imagination, anticipation and planning. If the corporate sector is rewarded for finding ways to use new technologies to game the system, then who is rewarded for planning for the disruption and, at a minimum, lessening the impact on the rest of us? Governments have planning units like city planning units, but in every city I’ve lived in these units are bypassed by real money from developers unless there is that rare thing – a citizen’s revolt. Look at our cities and their spread – despite all sorts of research and a history of spread, there is still very little discipline or planning to constrain the developers. In an age when government is seen as essentially untrustworthy planning departments start from a deficit of trust. Companies, entrepreneurs, innovation and yes, even disruption, are blessed with innocence as if, like children, they just do their thing and can’t be expected to anticipate the consequences or have to pick up after their play. We therefore wait for some disaster to remind everyone of the importance of planning and systems of resilience.

Now … how can teach this form of deeper ethics without sliding into political thought?

Automatic grading and how to game it

Edgenuity involves short answers graded by an algorithm, and students have already cracked it

The Verge has a story on how students are figuring out how to game automatic marking systems like Edgenuity. The story is titled, These students figured out their tests were graded by AI — and the easy way to cheat. The story describes a keyword salad approach where you just enter a list of words that the grader may be looking for. The grader doesn’t know whether what your wrote is legible or nonsense, it just looks for the right words. The students in turn get good as skimming the study materials for the keywords needed (or find lists shared by other students online.)

Perhaps we could build a tool called Edgenorance which you could feed the study materials to and it would generate the keyword list automatically. It could watch the lectures for you, do the speech recognition, then extract the relevant keywords based on the text of the question.

None of this should be surprising. Companies have been promoting algorithms that were probably word based for a while. The algorithm works if it is not understood and thus not gamed. Perhaps we will get AIs that can genuinely understand a short paragraph answer and assess it, but that will be close to an artificial general intelligence and such an AGI will change everything.

AI Dungeon

AI Dungeon, an infinitely generated text adventure powered by deep learning.

Robert told me about AI Dungeon, a text adventure system that uses GPT-2, a language model from OpenAI that got a lot of attention when it was “released” in 2019. OpenAI felt it was too good to release openly as it could be misused. Instead they released a toy version. Now they have GPT-3, about which I wrote before.

AI Dungeon allows you to choose the type of world you want to play in (fantasy, zombies …). It then generates an infinite game by basically generating responses to your input. I assume there is some memory as it repeats my name and the basic setting.

The Man Behind Trump’s Facebook Juggernaut

Brad Parscale used social media to sway the 2016 election. He’s poised to do it again.

I just finished reading important reporting about The Man Behind Trump’s Facebook Juggernaut in the March 9th, 2020 issue of the New Yorker. The long article suggests that it wasn’t Cambridge Analytica or the Russians who swung the 2016 election. If anything had an impact it was the extensive use of social media, especially Facebook, by the Trump digital campaign under the leadership of Brad Parscale. The Clinton campaign focused on TV spots and believed they were going to win. The Trump campaign gathered lots of data, constantly tried new things, and drew on their Facebook “embed” to improve their game.

If each variation is counted as a distinct ad, then the Trump campaign, all told, ran 5.9 million Facebook ads. The Clinton campaign ran sixty-six thousand. “The Hillary campaign thought they had it in the bag, so they tried to play it safe, which meant not doing much that was new or unorthodox, especially online,” a progressive digital strategist told me. “Trump’s people knew they didn’t have it in the bag, and they never gave a shit about being safe anyway.” (p. 49)

One interesting service Facebook offered was “Lookalike Audiences” where you could upload a spotty list of information about people and Facebook would first fill it out from their data and then find you more people who are similar. This lets you expand your list of people to microtarget (and Facebook gets you paying for more targeted ads.)

The end of the article gets depressing as it recounts how little the Democrats are doing to counter or match the social media campaign for Trump which was essentially underway right after the 2016 election. One worries, by the end, that we will see a repeat.

Marantz, Andrew. (2020, March 9). “#WINNING: Brad Parscale used social media to sway the 2016 election. He’s posed to do it again.” New Yorker. Pages 44-55.

Philosophers On GPT-3

GPT-3 raises many philosophical questions. Some are ethical. Should we develop and deploy GPT-3, given that it has many biases from its training, it may displace human workers, it can be used for deception, and it could lead to AGI? I’ll focus on some issues in the philosophy of mind. Is GPT-3 really intelligent, and in what sense? Is it conscious? Is it an agent? Does it understand?

On the Daily Nous (news by and for philosophers) there is a great collection of short essays on OpenAI‘s recently released API to GPT-3, see Philosophers On GPT-3 (updated with replies by GPT-3). And … there is a response from GPT-3. Some of the issues raised include:

Ethics: David Chalmers raises the inevitable ethics issues. Remember that GPT-2 was considered so good as to be dangerous. I don’t know if it is brilliant marketing or genuine concern, but OpenAI is continuing to treat this technology as something to be careful about. Here is Chalmers on ethics,

GPT-3 raises many philosophical questions. Some are ethical. Should we develop and deploy GPT-3, given that it has many biases from its training, it may displace human workers, it can be used for deception, and it could lead to AGI? I’ll focus on some issues in the philosophy of mind. Is GPT-3 really intelligent, and in what sense? Is it conscious? Is it an agent? Does it understand?

Annette Zimmerman in her essay makes an important point about the larger justice context of tools like GPT-3. It is not just a matter of ironing out the biases in the language generated (or used in training.) It is not a matter of finding a techno-fix that makes bias go away. It is about care.

Not all uses of AI, of course, are inherently objectionable, or automatically unjust—the point is simply that much like we can do things with words, we can do things with algorithms and machine learning models. This is not purely a tangibly material distributive justice concern: especially in the context of language models like GPT-3, paying attention to other facets of injustice—relational, communicative, representational, ontological—is essential.

She also makes an important and deep point that any AI application will have to make use of concepts from the application domain and all of these concepts will be contested. There are no simple concepts just as there are no concepts that don’t change over time.

Finally, Shannon Vallor has an essay that revisits Hubert Dreyfus’s critique of AI as not really understanding.

Understanding is beyond GPT-3’s reach because understanding cannot occur in an isolated behavior, no matter how clever. Understanding is not an act but a labor.

In the realm of paper tigers – exploring the failings of AI ethics guidelines

But even the ethical guidelines of the world’s largest professional association of engineers, IEEE, largely fail to prove effective as large technology companies such as Facebook, Google and Twitter do not implement them, notwithstanding the fact that many of their engineers and developers are IEEE members.

AlgorithmWatch is maintaining an inventory of frameworks and principles. Their evaluation is that these are not making much of a difference. See In the realm of paper tigers – exploring the failings of AI ethics guidelines. They also note there are few from the Global South. It seems to be mostly countries that have an AI industry where principles are being published.

The International Review of Information Ethics

The International Review of Information Ethics (IRIE) has just published Volume 28 which collects papers on Artificial Intelligence, Ethics and Society. This issue comes from the AI, Ethics and Society conference that the Kule Institute for Advanced Study (KIAS) organized.

This issue of the IRIE also marks the first issue published on the PKP platform managed by the University of Alberta Library. KIAS is supporting the transition of the journal over to the new platform as part of its focus on AI, Ethics and Society in partnership with the AI for Society signature area.

We are still ironing out all the bugs and missing links, so bear with us, but the platform is solid and the IRIE is now positioned to sustainably publish original research in this interdisciplinary area.

The bad things that happen when algorithms run online shops

Smart software controls the prices and products you see when you shop online – and sometimes it can go spectacularly wrong, discovers Chris Baraniuk.

The BBC has a stroy about The bad things that happen when algorithms run online shops. The story describes how e-commerce systems designed to set prices dynamically (in comparison with someone else’s price, for example) can go wrong and end up charging customers much more than they will pay or charging them virtually nothing so the store loses money.

The story links to an instructive blog entry by Michael Eisen about how two algorithms pushed up the price on a book into the millions, Amazon’s $23,698,655.93 book about flies. The blog entry is a perfect little story about about the problems you get when you have algorithms responding iteratively to each other without any sanity checks.

MIT apologizes, permanently pulls offline huge dataset that taught AI systems to use racist, misogynistic slurs

Vinay Prabhu, chief scientist at UnifyID, a privacy startup in Silicon Valley, and Abeba Birhane, a PhD candidate at University College Dublin in Ireland, pored over the MIT database and discovered thousands of images labelled with racist slurs for Black and Asian people, and derogatory terms used to describe women. They revealed their findings in a paper undergoing peer review for the 2021 Workshop on Applications of Computer Vision conference.

Another one of those “what were they thinking when they created the dataset stories” from The Register tells about how MIT apologizes, permanently pulls offline huge dataset that taught AI systems to use racist, misogynistic slurs. The MIT Tiny Images dataset was created automatically using scripts that used the WordNet database of terms which itself held derogatory terms. Nobody thought to check either the terms taken from WordNet or the resulting images scoured from the net. As a result there are not only lots of images for which permission was not secured, but also racists, sexist, and otherwise derogatory labels on the images which in turn means that if you train an AI on these it will generate racist/sexist results.

The article also mentions a general problem with academic datasets. Companies like Facebook can afford to hire actors to pose for images and can thus secure permissions to use the images for training. Academic datasets (and some commercial ones like the Clearview AI  database) tend to be scraped and therefore will not have the explicit permission of the copyright holders or people shown. In effect, academics are resorting to mass surveillance to generate training sets. One wonders if we could crowdsource a training set by and for people?