Franken-algorithms: the deadly consequences of unpredictable code

The death of a woman hit by a self-driving car highlights an unfolding technological crisis, as code piled on code creates ‘a universe no one fully understands’

The Guardian has a good essay by Andrew Smith about Franken-algorithms: the deadly consequences of unpredictable code. The essay starts with the obvious problems of biased algorithms like those documented by Cathy O’Neil in Weapons of Math Destruction. It then goes further to talk about cases where algorithms are learning on the fly or are so complex that their behaviour becomes unpredictable. An example is high-frequency trading algorithms that trade on the stock market. These algorithmic traders try to outwit each other and learn which leads to unpredictable “flash crashes” when they go rogue.

The problem, he (George Dyson) tells me, is that we’re building systems that are beyond our intellectual means to control. We believe that if a system is deterministic (acting according to fixed rules, this being the definition of an algorithm) it is predictable – and that what is predictable can be controlled. Both assumptions turn out to be wrong.

The good news is that, according to one of the experts consulted this could lead to “a golden age for philosophy” as we try to sort out the ethics of these autonomous systems.

CSDH and CGSA 2018

This year we had busy CSDH and CGSA meetings at Congress 2018 in Regina. My conference notes are here. Some of the papers I was involved in include:

CSDH-SCHN:

  • “Code Notebooks: New Tools for Digital Humanists” was presented by Kynan Ly and made the case for notebook-style programming in the digital humanities.
  • “Absorbing DiRT: Tool Discovery in the Digital Age” was presented by Kaitlyn Grant. The paper made the case for tool discovery registries and explained the merger of DiRT and TAPoR.
  • “Splendid Isolation: Big Data, Correspondence Analysis and Visualization in France” was presented by me. The paper talked about FRANTEXT and correspondence analysis in France in the 1970s and 1980s. I made the case that the French were doing big data and text mining long before we were in the Anglophone world.
  • “TATR: Using Content Analysis to Study Twitter Data” was a poster presented by Kynan Ly, Robert Budac, Jason Bradshaw and Anthony Owino. It showed IPython notebooks for analyzing Twitter data.
  • “Climate Change and Academia – Joint Panel with ESAC” was a panel I was on that focused on alternatives to flying for academics.

CGSA:

  • “Archiving an Untold History” was presented by Greg Whistance-Smith. He talked about our project to archive John Szczepaniak’s collection of interviews with Japanese game designers.
  • “Using Salience to Study Twitter Corpora” was presented by Robert Budac who talked about different algorithms for finding salient words in a Twitter corpus.
  • “Political Mobilization in the GG Community” was presented by ZP who talked about a study of a Twitter corpus that looked at the politics of the community.

Also, a PhD student I’m supervising, Sonja Sapach, won the CSDH-SCHN (Canadian Society for Digital Humanities) Ian Lancashire Award for Graduate Student Promise at CSDHSCHN18 at Congress. The Award “recognizes an outstanding presentation at our annual conference of original research in DH by a graduate student.” She won the award for a paper on “Tagging my Tears and Fears: Text-Mining the Autoethnography.” She is completing an interdisciplinary PhD in Sociology and Digital Humanities. Bravo Sonja!

Re-Imagining Education In An Automating World conference at George Brown

On May 25th I had a chance to attend a gem of a conference organized the Philosophy of Education (POE) committee at George Brown. They organized a conference with different modalities from conversations to formal talks to group work. The topic was Re-Imagining Education in An Automating World (see my conference notes here) and this conference is a seed for a larger one next year.

I gave a talk on Digital Citizenship at the end of the day where I tried to convince people that:

  • Data analytics are now a matter of citizenship (we all need to understand how we are being manipulated).
  • We therefore need to teach data literacy in the arts and humanities, so that
  • Students are prepared to contribute to and critique the ways analytics are used deployed.
  • This can be done by integrating data and analytical components in any course using field-appropriate data.

 

Too Much Information and the KWIC

A paper that Stéfan Sinclair and wrote about Peter Luhn and the Keyword-in-Context (KWIC) has just been published by the Fudan Journal of the Humanities and Social Sciences, Too Much Information and the KWIC | SpringerLink. The paper is part of a series that replicates important innovations in text technology, in this case, the development of the KWIC by Peter Luhn at IBM. We use that as a moment to reflect on the datafication of knowledge after WW II, drawing on Lyotard.

Duplex shows Google failing at ethical and creative AI design

Google CEO Sundar Pichai milked the woos from a clappy, home-turf developer crowd at its I/O conference in Mountain View this week with a demo of an in-the-works voice assistant feature that will e…

A number of venues, including TechCruch have discussed the recent Google demonstration of an intelligent agent Duplex who can make appointments. Many of the stories note how Duplex shows Google failing at ethical and creative AI design. The problem is that the agent didn’t (at least during the demo) identify as a robot. Instead it appeared to deceive the person it was talking to. As the TechCrunch article points out, there is really no good reason to deceive if the purpose is to make an appointment.

What I want to know is what are the ethics of dealing with a robot? Do we need to identify as human to the robot? Do we need to be polite and give them the courtesy that we would a fellow human? Would it be OK for me to hang up as I do on recorded telemarketing calls? Most of us have developed habits of courtesy when dealing with people, including strangers, that the telemarketers take advantage of in their scripts. Will the robots now take advantage of that? Or, to be more precise, will those that use the robots to save their time take advantage of us?

A second question is how Google considers the ethical implications of their research? It is easy to castigate them for this demonstration, but the demonstration tells us nothing about a line of research that has been going on for a while and what processes Google may have in place to check the ethics of what they do. As companies explore the possibilities for AI, how are they to check their ethics in the excitement of achievement?

I should note that Google’s parent Alphabet has apparently dropped the “Don’t be evil” motto from their code of conduct. There has also been news about how a number of employees quit over a Google program to apply machine learning to drone footage for the military.  This is after over 3000 Google employees signed a letter taking issue with the project. See also the Open Letter in Support of Google Employees and Tech Workers that researchers signed. As they say:

We are also deeply concerned about the possible integration of Google’s data on people’s everyday lives with military surveillance data, and its combined application to targeted killing. Google has moved into military work without subjecting itself to public debate or deliberation, either domestically or internationally. While Google regularly decides the future of technology without democratic public engagement, its entry into military technologies casts the problems of private control of information infrastructure into high relief.

 

The Ethics of Datafiction


Information Wants to Be Free, Or Does It? The Ethics of Datafication has just come out in the Electronic Book Review. This article was written with Bettina Berendt at KU Leuven and is about thinking about the ethics of digitization. The article first looks at the cliche phrase “information wants to be free” and then moves on to survey a number of arguments why some things should be digitized.

The Aggregate IQ Files, Part One: How a Political Engineering Firm Exposed Their Code Base

The Research Director for UpGuard, Chris Vickery (@VickerySec) has uncovered code repositories from AggregateIQ, the Canadian company that was building tools for/with SCL and Cambridge Analytica. See The Aggregate IQ Files, Part One: How a Political Engineering Firm Exposed Their Code Base and AggregateIQ Created Cambridge Analytica’s Election Software, and Here’s the Proof from Gizmodo.

The screenshots from the repository show on project called ephemeral with a description “Because there is no such thing as THE TRUTH”. The “Primary Data Storage” of Ephemeral is called “Mamba Jamba”, presumably a joke on “mumbo jumbo” which isn’t a good sign. What is mort interesting is the description (see image above) of the data storage system as “The Database of Truth”. Here is a selection of that description:

The Database of Truth is a database system that integrates, obtains, and normalizes data from disparate sources including starting with the RNC data trust.  … This system will be created to make decisions based upon the data source and quality as to which data constitutes the accepted truth and connect via integrations or API to the source systems to acquire and update this data on a regular basis.

A robust front-end system will be built that allows an authrized user to query the Database of Truth to find data for a particular upcoming project, to see how current the data is, and to take a segment of that data and move it to the Escrow Database System. …

The Database of Truth is the Core source of data for the entire system. …

One wonders if there is a philosophical theory, of sorts, in Ephemeral. A theory where no truth is built on the mumbo jumbo of a database of truth(s).

Ephemeral would seem to be part of Project Ripon, the system that Cambridge Analytica never really delivered to the Cruz campaign. Perhaps the system was so ephemeral that it never worked and therefore the Database of Truth never held THE TRUTH. Ripon might be better called Ripoff.

After the Facebook scandal it’s time to base the digital economy on public v private ownership of data

In a nutshell, instead of letting Facebook get away with charging us for its services or continuing to exploit our data for advertising, we must find a way to get companies like Facebook to pay for accessing our data – conceptualised, for the most part, as something we own in common, not as something we own as individuals.

Evgeny Morozov has a great essay in The Guardian on how After the Facebook scandal it’s time to base the digital economy on public v private ownership of data. He argues that better data protection is not enough. We need to “to articulate a truly decentralised, emancipatory politics, whereby the institutions of the state (from the national to the municipal level) will be deployed to recognise, create, and foster the creation of social rights to data.” In Alberta that may start with a centralized clinical information system called Connect Care managed by the Province. The Province will presumably control access to our data to those researchers and health-care practitioners that commit to using access appropriately. Can we imagine a model where Connect Care is expanded to include social data that we can then control and give others (businesses) access to?

An Evening with Edward Snowden on Security, Public Life and Research

This evening we are hosting a video conferencing talk by Edward Snowden at the University of Alberta. These are some live notes taken during the talk for which I was one of the moderators. Like all live notes they will be full of misunderstandings.

Joseph Wiebe of Augustana College gave the introduction. Wiebe asked what is the place of cybersecurity in public life?

“What an incredible time?” is how Snowden started, talking about the Cambridge Analytica and Facebook story. Technology is changing and connecting across borders. We are in the midst of the greatest redistribution of power in the history of humankind without anyone being asked for their vote or opinion. Large platforms take advantage of our need for human connection and turn our desires into a weakness. They have perfected the most effective system of control.

The revelations of 2013 were never about just surveillance, they were about democracy. We feel something has been neglected in the news and in politics. It is the death of influence. It is a system of manipulation that robs us of power by a cadre of the unaccountable. It works because it is largely invisible and is all connected to the use and abuse of our data. We are talking about power that comes from information.

He told us to learn from the mistake of 5 years ago and not focus too much on surveillance, but to look beyond the lever to those putting their weight on it.

Back to the problem of illiberal technologies. Information and control is meant to be distributed among the people. Surveillance technology change has outstripped democratic institutions. Powerful institutions are trying to get as much control of these technologies as they can before their is a backlash. It will be very hard to take control back once everyone gets used to it.

Snowden talked about how Facebook was gathering all sorts of information from our phones. They (Facebook and Google) operate on our ignorance because there is no way we can keep up with changes in privacy policies. Governments are even worse with laws that allow mass surveillance.

There is an interesting interaction between governments with China modelling its surveillance laws on those of the US. Governments seem to experiment with clearly illegal technologies and the courts don’t do anything. Everything is secret so we can’t even know and make a decision.

What can we do when ordinary oversight breaks down and our checks and balances are bypassed. The public is left to rely on public resources like journalism and academia. We depend then public facts. Governments can manipulate those facts.

This is the tragedy of our times. We are being forced to rely on the press. This press is being captured and controlled and attacked. And how does the press know what is happening? They depend on whistleblowers who have no protection. Governments see the press as a threat.  Journalists rank in the hierarchy of danger between hackers and terrorists.

What sort of world will we face when governments figure out how to manage the press? What will we not know without the press.

One can argue that extraordinary times call for extraordinary measures, but who gets to decide? We don’t seem to have a voice even through our elected officials.

National security is a euphemism. We are witnessing the construction of a world where the most common political value is fear. Everyone argues we are living in danger and using that to control us. What is really happening is that morality has been replaced with legalisms. Rights have become a vulnerability.

Snowden disagrees. If we all disagree then things can change. Even in the face of real danger, there are limits to what should be allowed. Following Thoreau we need to resist. We don’t need a respect for the law, but for the right. The law is no substitute for justice or conscience.

Snowden would not be surprised if Facebook’s final defense is that “its legal.” But we need to ask if it is right. A wrong should not be turned into a right. We should be skeptical of those in power and the powers that shape our future. There times in history and in our lives when the only possible decision is to break the law.

More on Cambridge Analytica

More stories are coming out about Cambridge Analytica and the scraping of Facebook data. The Guardian has some important new articles:

Perhaps the most interesting article is in The Conversation and argues that Claims about Cambridge Analytica’s role in Africa should be taken with a pinch of saltThe article carefully sets out evidence that CA didn’t have the effect they were hired to have in either the Nigerian election (when they failed to get Goodluck Jonathan re-elected) or the Kenyan election where they may have helped Uhuru Kenyatta stay in power. The authors (Gabrielle Lynch, Justin Willis, and Nic Cheeseman) talk about how,

Ahead of the elections, and as part of a comparative research project on elections in Africa, we set up multiple profiles on Facebook to track social media and political adverts, and found no evidence that different messages were directed at different voters. Instead, a consistent negative line was pushed on all profiles, no matter what their background.

They also point out that the majority of Kenyans are not on Facebook and that negative advertising has a long history. They conclude that exaggerating what they can do is what CA does.

Mother Jones has another story, one of the best summaries around, Cloak and Data, that questions the effectiveness of Cambridge Analytica when it comes to the Trump election. They point out how CA’s work before in Virginia and for Cruz at the beginning of the primaries doesn’t seem to have worked. They go on to suggest that CA had little to do with the Trump victory which instead was ascribed by Parscale, the head of digital operations, to investing heavily in Facebook advertising.

During an interview with 60 Minutes last fall, Parscale dismissed the company’s psychographic methods: “I just don’t think it works.” Trump’s secret strategy, he said, wasn’t secret at all: The campaign went all-in on Facebook, making full use of the platform’s advertising tools. “Donald Trump won,” Parscale said, “but I think Facebook was the method.”

The irony may be that Cambridge Analytica is brought down by its boasting, not what it actually did. Further irony is how it may bring down Facebook and finally draw attention to how our data is used to manipulate us, even though it didn’t work.

The story of Cambridge Analytica’s rise—and its rapid fall—in some ways parallels the ascendance of the candidate it claims it helped elevate to the presidency. It reached the apex of American politics through a mix of bluffing, luck, failing upward, and—yes—psychological manipulation. Sound familiar?