How Trump Consultants Exploited the Facebook Data of Millions

Cambridge Analytica harvested personal information from a huge swath of the electorate to develop techniques that were later used in the Trump campaign.

The New York Times has just published a story about How Trump Consultants Exploited the Facebook Data of MillionsThe story is about how Cambridge Analytica, the US arm of SCL, a UK company, gathered a massive dataset from Facebook with which to do “psychometric modelling” in order to benefit Trump.

The Guardian has been reporting on Cambridge Analytica for some time – see their Cambridge Analytica Files. The service they are supposed to have provided with this massive dataset was to model types of people and their needs/desires/politics and then help political campaigns, like Trump’s, through microtargeting to influence voters. Using the models a campaign can create content tailored to these psychometrically modelled micro-groups to shift their opinions. (See articles by Paul-Olivier Dehaye about what Cambridge Analytica does and has.)

What is new is that there is a (Canadian) whistleblower from Cambridge Analytica, Christopher Wylie who was willing to talk to the Guardian and others. He is “the data nerd who came in from the cold” and he has a trove of documents that contradict what other said.

The Intercept has a earlier and related story about how Facebook Failed to Protect 30 Million Users From Having Their Data Harvested By Trump Campaign Affiliate. This tells how people were convinced to download a Facebook app that then took your data and that of their friends.

It is difficult to tell how effective the psychometric profiling with data is and if can really be used to sway voters. What is clear, however, is that Facebook is not really protecting their users’ data. To some extent their set up to monetize such psychometric data by convincing those who buy access to the data that you can use it to sway people. The problem is not that it can be done, but that Facebook didn’t get paid for this and are now getting bad press.

Distant Reading after Moretti

The question I want to explore today is this: what do we do about distant reading, now that we know that Franco Moretti, the man who coined the phrase “distant reading,” and who remains its most famous exemplar, is among the men named as a result of the #MeToo movement.

Lauren Klein has posted an important blog entry on Distant Reading after MorettiThis essay is based on a talk delivered at the 2018 MLA convention for a panel on Varieties of Digital Humanities. Klein asks about distant reading and whether it shelters sexual harassment in some way. She asks us to put not just the persons, but the structures of distant reading and the digital humanities under investigation. She suggests that it is “not a coincidence that distant reading does not deal well with gender, or with sexuality, or with race.” One might go further and ask if the same isn’t true of the digital humanities in general or the humanities, for that matter. Klein then suggests some thing we can do about it:

  • We need more accessible corpora that better represent the varieties of human experience.
  • We need to question our models and ask about what is assumed or hidden.



Cooking Up Literature: Talk at U of South Florida

Last week I presented a paper based on work that Stéfan Sinclair and I are doing at the University of South Florida. The talk, titled, “Cooking Up Literature: Theorizing Statistical Approaches to Texts” looked at a neglected period of French innovation in the 1970s and 1980s. During this period the French were developing a national corpus, FRANTEXT, while there was also a developing school of exploratory statistics around Jean-Paul Benzécri. While Anglophone humanities computing was concerned with hypertext, the French were looking at using statistical methods like correspondence analysis to explore large corpora. This is long before Moretti and “distant reading.”

The talk was organized by Steven Jones who holds the DeBartolo Chair in Liberal Arts and is a Professor of Digital Humanities. Steven Jones leads a NEH funded project called RECALL that Stéfan and I are consulting on. Jones and colleagues at USF are creating a 3D model of Father Busa’s original factory/laboratory.

What a fossil revolution reveals about the history of ‘big data’

Example of Heinrich Georg Bronn’s Spindle Diagram

David Sepkoski has published a nice essay in Aeon about What a fossil revolution reveals about the history of ‘big data’. Sepkoski talks about his father (Jack Sepkoski), a paleontologist, who developed the first database to provide a comprehensive record of fossils. This data was used to interpret the fossil record differently. The essay argues that it changed how we “see” data and showed that there had been mass extinctions before (and that we might be in one now).

The analysis that he and his colleagues performed revealed new understandings of phenomena such as diversification and extinction, and changed the way that palaeontologists work.

Sepkoski (father) and colleagues

The essay then makes the interesting move of arguing that, in fact, Jack Sepkoski was not the first to do quantitative palaeontology. The son, a historian, argues that Heinrich Georg Bronn in the 19th century was collecting similar data on paper and visualizing it (see spindle diagram above), but his approach didn’t take.

This raises the question of why Sepkoski senior’s data-driven approach changed palaeontology while Bronn’s didn’t. Sepkoski junior’s answer is a combination of changes. First, that palaeontology became more receptive to ideas like Stephen Jay Gould’s “punctuated equillibrium” that challenged Darwin’s gradualist view. Second, that culture has become more open to data-driven approaches and the interpretation visualizations needed to grasp such approaches.

The essay concludes by warning us about the dangers of believing data black boxes and visualizations that you can’t unpack.

Yet in our own time, it’s taken for granted that the best way of understanding large, complex phenomena often involves ‘crunching’ the numbers via computers, and projecting the results as visual summaries.

That’s not a bad thing, but it poses some challenges. In many scientific fields, from genetics to economics to palaeobiology, a kind of implicit trust is placed in the images and the algorithms that produce them. Often viewers have almost no idea how they were constructed.

This leads me to ask about the warning as gesture. This is a gesture we see more and more, especially about the ethics of big data and about artificial intelligence. No thoughtful person, including myself, has not warned people about the dangers of these apparently new technologies. But what good are these warnings?

Johanna Drucker in Graphesis proposes what to my mind is a much healthier approach to the dangers and opportunities of visualization. She does what humanists do, she asks us to think of visualization as interpretation. If you think of it this way than it is no more or less dangerous than any other interpretation. And, we have the tools to think-through visualization. She shows us how to look at the genealogy of different types of visualization. She shows us how all visualizations are interpretations and therefore need to be read. She frees us to be interpretative with our visualizations. If they are made by the visualizer and are not given by the data as by Moses coming down the mountain, then they are an art that we can play with and through. This is what the 3DH project is about.

Digital Cultures Big Data And Society

Last week I presented a keynote at the Digital Cultures, Big Data and Society conference. (You can seem my conference notes at Digital Cultures Big Data And Society.) The talk I gave was titled “Thinking-Through Big Data in the Humanities” in which I argued that the humanities have the history, skills and responsibility to engage with the topic of big data:

  • First, I outlined how the humanities have a history of dealing with big data. As we all know, ideas have histories, and we in the humanities know how to learn from the genesis of these ideas.
  • Second, I illustrated how we can contribute by learning to read the new genres of documents and tools that characterize big data discourse.
  • And lastly, I turned to the ethics of big data research, especially as it concerns us as we are tempted by the treasures at hand.

America is about to kill the open internet – and towns like this will pay the price

Residents of Winlock, Washington can barely stream Spotify and Netflix. Changes to Obama’s net neutrality rules are going to make things even worse

There are lots of stories right now about net neutrality and how the FCC (of the USA) is repeal requirements of ISPs. I find it hard to explain why net neutrality is important which is probably why there isn’t more a public outcry. The Guardian has a story that makes this real,  America is about to kill the open internet – and towns like this will pay the price. Global News has a nice story about Net neutrality: Why Canadians should care about the internet changes in the U.S. This story describes what happens in countries like Portugal which don’t have net neutrality regulations and it includes some John Oliver segments on how the FCC is going to fix the Internet (which isn’t broken.)


Common Crawl

The Common Crawl is a project that has been crawling the web and making an open corpus of web data from the last 7 years available for research. There crawl corpus is petabytes of data and available as WARCs (Web Archives.) For example, their 2013 dataset is 102TB and has around 2 billion web pages. Their collection is not as complete as the Internet Archive, which goes back much further, but it is available in large datasets for research.


I’ve been playing with DataCamp‘s Python lessons and they are quite good. Python is taught in the context of data analysis rather than the turtle drawing of How to Think Like a Computer Scientist. They have a nice mix of video tutorials and then exercises where you get a tripartite screen (see above.) You have an explanation and instructions on the left, a short script to fill in on the upper-right and interactive python shell where you can try stuff below.

Continue reading DataCamp

The Real Threat of Artificial Intelligence – The New York Times

It’s not robot overlords. It’s economic inequality and a new global order.

Kai-Fu Lee has written a short and smart speculation on the effects of AI, The Real Threat of Artificial Intelligence . To summarize his argument:

  • AI is not going to take over the world the way the sci-fi stories have it.
  • The effect will be on tasks as AI takes over tasks that people are paid to do, putting them out of work.
  • How then will we deal with the unemployed? (This is a question people asked in the 1960s when the first wave computerization threatened massive unemployment.)
  • One solution is “Keynesian policies of increased government spending” paid for taxing the companies made wealthy by AI. This spending would pay for “service jobs of love” where people act as the “human interface” to all sorts of services.
  • Those in the jobs that can’t be automated and that make lots of money might also scale back on their time at work so as to provide more jobs of this sort.

Continue reading The Real Threat of Artificial Intelligence – The New York Times

Busa Letter Outlining Textual Informatics

Page 1 of “Conditional Agreement” by Father Busa

Domenico Fiormonte has recently blogged about an interesting document he has by Father Busa that relates to a difficult moment in the history of the digital humanities in Italy in 2002. The two page “Conditional Agreement”, which I translate below, was given to Domenico and explained the terms under which Busa would agree to sign a letter to the Minister (of Education and Research) Moratti in response to Moratti’s public statement about the uselessness of humanities informatics. A letter was being prepared to be signed by a large number of Italian (and foreign) academics explaining the value of what we now call the digital humanities. Busa had the connections to get the letter published and taken seriously for which reason Domenico visited him to get his help, which ended up being conditional on certain things being made clear, as laid out in the document. Domenico kept the two pages Busa wrote and recently blogged about them. As he points out in his blog, these two pages are a mini-manifesto of Father Busa’s later views of the place and importance of what he called textual informatics. Domenico also points out how political is the context of these notes and the letter eventually signed and published. Defining the digital humanities is often about positioning the field in the larger academic and public political spheres we operate in.

Continue reading Busa Letter Outlining Textual Informatics