After the Facebook scandal it’s time to base the digital economy on public v private ownership of data

In a nutshell, instead of letting Facebook get away with charging us for its services or continuing to exploit our data for advertising, we must find a way to get companies like Facebook to pay for accessing our data – conceptualised, for the most part, as something we own in common, not as something we own as individuals.

Evgeny Morozov has a great essay in The Guardian on how After the Facebook scandal it’s time to base the digital economy on public v private ownership of data. He argues that better data protection is not enough. We need to “to articulate a truly decentralised, emancipatory politics, whereby the institutions of the state (from the national to the municipal level) will be deployed to recognise, create, and foster the creation of social rights to data.” In Alberta that may start with a centralized clinical information system called Connect Care managed by the Province. The Province will presumably control access to our data to those researchers and health-care practitioners that commit to using access appropriately. Can we imagine a model where Connect Care is expanded to include social data that we can then control and give others (businesses) access to?

An Evening with Edward Snowden on Security, Public Life and Research

This evening we are hosting a video conferencing talk by Edward Snowden at the University of Alberta. These are some live notes taken during the talk for which I was one of the moderators. Like all live notes they will be full of misunderstandings.

Joseph Wiebe of Augustana College gave the introduction. Wiebe asked what is the place of cybersecurity in public life?

“What an incredible time?” is how Snowden started, talking about the Cambridge Analytica and Facebook story. Technology is changing and connecting across borders. We are in the midst of the greatest redistribution of power in the history of humankind without anyone being asked for their vote or opinion. Large platforms take advantage of our need for human connection and turn our desires into a weakness. They have perfected the most effective system of control.

The revelations of 2013 were never about just surveillance, they were about democracy. We feel something has been neglected in the news and in politics. It is the death of influence. It is a system of manipulation that robs us of power by a cadre of the unaccountable. It works because it is largely invisible and is all connected to the use and abuse of our data. We are talking about power that comes from information.

He told us to learn from the mistake of 5 years ago and not focus too much on surveillance, but to look beyond the lever to those putting their weight on it.

Back to the problem of illiberal technologies. Information and control is meant to be distributed among the people. Surveillance technology change has outstripped democratic institutions. Powerful institutions are trying to get as much control of these technologies as they can before their is a backlash. It will be very hard to take control back once everyone gets used to it.

Snowden talked about how Facebook was gathering all sorts of information from our phones. They (Facebook and Google) operate on our ignorance because there is no way we can keep up with changes in privacy policies. Governments are even worse with laws that allow mass surveillance.

There is an interesting interaction between governments with China modelling its surveillance laws on those of the US. Governments seem to experiment with clearly illegal technologies and the courts don’t do anything. Everything is secret so we can’t even know and make a decision.

What can we do when ordinary oversight breaks down and our checks and balances are bypassed. The public is left to rely on public resources like journalism and academia. We depend then public facts. Governments can manipulate those facts.

This is the tragedy of our times. We are being forced to rely on the press. This press is being captured and controlled and attacked. And how does the press know what is happening? They depend on whistleblowers who have no protection. Governments see the press as a threat.  Journalists rank in the hierarchy of danger between hackers and terrorists.

What sort of world will we face when governments figure out how to manage the press? What will we not know without the press.

One can argue that extraordinary times call for extraordinary measures, but who gets to decide? We don’t seem to have a voice even through our elected officials.

National security is a euphemism. We are witnessing the construction of a world where the most common political value is fear. Everyone argues we are living in danger and using that to control us. What is really happening is that morality has been replaced with legalisms. Rights have become a vulnerability.

Snowden disagrees. If we all disagree then things can change. Even in the face of real danger, there are limits to what should be allowed. Following Thoreau we need to resist. We don’t need a respect for the law, but for the right. The law is no substitute for justice or conscience.

Snowden would not be surprised if Facebook’s final defense is that “its legal.” But we need to ask if it is right. A wrong should not be turned into a right. We should be skeptical of those in power and the powers that shape our future. There times in history and in our lives when the only possible decision is to break the law.

More on Cambridge Analytica

More stories are coming out about Cambridge Analytica and the scraping of Facebook data. The Guardian has some important new articles:

Perhaps the most interesting article is in The Conversation and argues that Claims about Cambridge Analytica’s role in Africa should be taken with a pinch of saltThe article carefully sets out evidence that CA didn’t have the effect they were hired to have in either the Nigerian election (when they failed to get Goodluck Jonathan re-elected) or the Kenyan election where they may have helped Uhuru Kenyatta stay in power. The authors (Gabrielle Lynch, Justin Willis, and Nic Cheeseman) talk about how,

Ahead of the elections, and as part of a comparative research project on elections in Africa, we set up multiple profiles on Facebook to track social media and political adverts, and found no evidence that different messages were directed at different voters. Instead, a consistent negative line was pushed on all profiles, no matter what their background.

They also point out that the majority of Kenyans are not on Facebook and that negative advertising has a long history. They conclude that exaggerating what they can do is what CA does.

Mother Jones has another story, one of the best summaries around, Cloak and Data, that questions the effectiveness of Cambridge Analytica when it comes to the Trump election. They point out how CA’s work before in Virginia and for Cruz at the beginning of the primaries doesn’t seem to have worked. They go on to suggest that CA had little to do with the Trump victory which instead was ascribed by Parscale, the head of digital operations, to investing heavily in Facebook advertising.

During an interview with 60 Minutes last fall, Parscale dismissed the company’s psychographic methods: “I just don’t think it works.” Trump’s secret strategy, he said, wasn’t secret at all: The campaign went all-in on Facebook, making full use of the platform’s advertising tools. “Donald Trump won,” Parscale said, “but I think Facebook was the method.”

The irony may be that Cambridge Analytica is brought down by its boasting, not what it actually did. Further irony is how it may bring down Facebook and finally draw attention to how our data is used to manipulate us, even though it didn’t work.

The story of Cambridge Analytica’s rise—and its rapid fall—in some ways parallels the ascendance of the candidate it claims it helped elevate to the presidency. It reached the apex of American politics through a mix of bluffing, luck, failing upward, and—yes—psychological manipulation. Sound familiar?

How Trump Consultants Exploited the Facebook Data of Millions

Cambridge Analytica harvested personal information from a huge swath of the electorate to develop techniques that were later used in the Trump campaign.

The New York Times has just published a story about How Trump Consultants Exploited the Facebook Data of MillionsThe story is about how Cambridge Analytica, the US arm of SCL, a UK company, gathered a massive dataset from Facebook with which to do “psychometric modelling” in order to benefit Trump.

The Guardian has been reporting on Cambridge Analytica for some time – see their Cambridge Analytica Files. The service they are supposed to have provided with this massive dataset was to model types of people and their needs/desires/politics and then help political campaigns, like Trump’s, through microtargeting to influence voters. Using the models a campaign can create content tailored to these psychometrically modelled micro-groups to shift their opinions. (See articles by Paul-Olivier Dehaye about what Cambridge Analytica does and has.)

What is new is that there is a (Canadian) whistleblower from Cambridge Analytica, Christopher Wylie who was willing to talk to the Guardian and others. He is “the data nerd who came in from the cold” and he has a trove of documents that contradict what other said.

The Intercept has a earlier and related story about how Facebook Failed to Protect 30 Million Users From Having Their Data Harvested By Trump Campaign Affiliate. This tells how people were convinced to download a Facebook app that then took your data and that of their friends.

It is difficult to tell how effective the psychometric profiling with data is and if can really be used to sway voters. What is clear, however, is that Facebook is not really protecting their users’ data. To some extent their set up to monetize such psychometric data by convincing those who buy access to the data that you can use it to sway people. The problem is not that it can be done, but that Facebook didn’t get paid for this and are now getting bad press.

Distant Reading after Moretti

The question I want to explore today is this: what do we do about distant reading, now that we know that Franco Moretti, the man who coined the phrase “distant reading,” and who remains its most famous exemplar, is among the men named as a result of the #MeToo movement.

Lauren Klein has posted an important blog entry on Distant Reading after MorettiThis essay is based on a talk delivered at the 2018 MLA convention for a panel on Varieties of Digital Humanities. Klein asks about distant reading and whether it shelters sexual harassment in some way. She asks us to put not just the persons, but the structures of distant reading and the digital humanities under investigation. She suggests that it is “not a coincidence that distant reading does not deal well with gender, or with sexuality, or with race.” One might go further and ask if the same isn’t true of the digital humanities in general or the humanities, for that matter. Klein then suggests some thing we can do about it:

  • We need more accessible corpora that better represent the varieties of human experience.
  • We need to question our models and ask about what is assumed or hidden.

 

 

Cooking Up Literature: Talk at U of South Florida

Last week I presented a paper based on work that Stéfan Sinclair and I are doing at the University of South Florida. The talk, titled, “Cooking Up Literature: Theorizing Statistical Approaches to Texts” looked at a neglected period of French innovation in the 1970s and 1980s. During this period the French were developing a national corpus, FRANTEXT, while there was also a developing school of exploratory statistics around Jean-Paul Benzécri. While Anglophone humanities computing was concerned with hypertext, the French were looking at using statistical methods like correspondence analysis to explore large corpora. This is long before Moretti and “distant reading.”

The talk was organized by Steven Jones who holds the DeBartolo Chair in Liberal Arts and is a Professor of Digital Humanities. Steven Jones leads a NEH funded project called RECALL that Stéfan and I are consulting on. Jones and colleagues at USF are creating a 3D model of Father Busa’s original factory/laboratory.

What a fossil revolution reveals about the history of ‘big data’

Example of Heinrich Georg Bronn’s Spindle Diagram

David Sepkoski has published a nice essay in Aeon about What a fossil revolution reveals about the history of ‘big data’. Sepkoski talks about his father (Jack Sepkoski), a paleontologist, who developed the first database to provide a comprehensive record of fossils. This data was used to interpret the fossil record differently. The essay argues that it changed how we “see” data and showed that there had been mass extinctions before (and that we might be in one now).

The analysis that he and his colleagues performed revealed new understandings of phenomena such as diversification and extinction, and changed the way that palaeontologists work.

Sepkoski (father) and colleagues

The essay then makes the interesting move of arguing that, in fact, Jack Sepkoski was not the first to do quantitative palaeontology. The son, a historian, argues that Heinrich Georg Bronn in the 19th century was collecting similar data on paper and visualizing it (see spindle diagram above), but his approach didn’t take.

This raises the question of why Sepkoski senior’s data-driven approach changed palaeontology while Bronn’s didn’t. Sepkoski junior’s answer is a combination of changes. First, that palaeontology became more receptive to ideas like Stephen Jay Gould’s “punctuated equillibrium” that challenged Darwin’s gradualist view. Second, that culture has become more open to data-driven approaches and the interpretation visualizations needed to grasp such approaches.

The essay concludes by warning us about the dangers of believing data black boxes and visualizations that you can’t unpack.

Yet in our own time, it’s taken for granted that the best way of understanding large, complex phenomena often involves ‘crunching’ the numbers via computers, and projecting the results as visual summaries.

That’s not a bad thing, but it poses some challenges. In many scientific fields, from genetics to economics to palaeobiology, a kind of implicit trust is placed in the images and the algorithms that produce them. Often viewers have almost no idea how they were constructed.

This leads me to ask about the warning as gesture. This is a gesture we see more and more, especially about the ethics of big data and about artificial intelligence. No thoughtful person, including myself, has not warned people about the dangers of these apparently new technologies. But what good are these warnings?

Johanna Drucker in Graphesis proposes what to my mind is a much healthier approach to the dangers and opportunities of visualization. She does what humanists do, she asks us to think of visualization as interpretation. If you think of it this way than it is no more or less dangerous than any other interpretation. And, we have the tools to think-through visualization. She shows us how to look at the genealogy of different types of visualization. She shows us how all visualizations are interpretations and therefore need to be read. She frees us to be interpretative with our visualizations. If they are made by the visualizer and are not given by the data as by Moses coming down the mountain, then they are an art that we can play with and through. This is what the 3DH project is about.

Digital Cultures Big Data And Society

Last week I presented a keynote at the Digital Cultures, Big Data and Society conference. (You can seem my conference notes at Digital Cultures Big Data And Society.) The talk I gave was titled “Thinking-Through Big Data in the Humanities” in which I argued that the humanities have the history, skills and responsibility to engage with the topic of big data:

  • First, I outlined how the humanities have a history of dealing with big data. As we all know, ideas have histories, and we in the humanities know how to learn from the genesis of these ideas.
  • Second, I illustrated how we can contribute by learning to read the new genres of documents and tools that characterize big data discourse.
  • And lastly, I turned to the ethics of big data research, especially as it concerns us as we are tempted by the treasures at hand.

Continue reading Digital Cultures Big Data And Society

America is about to kill the open internet – and towns like this will pay the price

Residents of Winlock, Washington can barely stream Spotify and Netflix. Changes to Obama’s net neutrality rules are going to make things even worse

There are lots of stories right now about net neutrality and how the FCC (of the USA) is repeal requirements of ISPs. I find it hard to explain why net neutrality is important which is probably why there isn’t more a public outcry. The Guardian has a story that makes this real,  America is about to kill the open internet – and towns like this will pay the price. Global News has a nice story about Net neutrality: Why Canadians should care about the internet changes in the U.S. This story describes what happens in countries like Portugal which don’t have net neutrality regulations and it includes some John Oliver segments on how the FCC is going to fix the Internet (which isn’t broken.)

 

Common Crawl

The Common Crawl is a project that has been crawling the web and making an open corpus of web data from the last 7 years available for research. There crawl corpus is petabytes of data and available as WARCs (Web Archives.) For example, their 2013 dataset is 102TB and has around 2 billion web pages. Their collection is not as complete as the Internet Archive, which goes back much further, but it is available in large datasets for research.