Google has announced some cool text projects. See Google AI experiment has you talking to books. One of them, Talk to Books, lets you ask questions or type statements and get answers that are passages from books. This strikes me as a useful research tool as it allows you to see some (book) references that might be useful for defining an issue. The project is somewhat similar to the Veliza tool that we built into Voyant. Veliza is given a particular text and then uses an Eliza-like algorithm to answer you with passages from the text. Needless to say, Talking to Books is far more sophisticated and is not based simply on word searches. Veliza, on the other hand can be reprogrammed and you can specify the text to converse with.
In a nutshell, instead of letting Facebook get away with charging us for its services or continuing to exploit our data for advertising, we must find a way to get companies like Facebook to pay for accessing our data – conceptualised, for the most part, as something we own in common, not as something we own as individuals.
Evgeny Morozov has a great essay in The Guardian on how After the Facebook scandal it’s time to base the digital economy on public v private ownership of data. He argues that better data protection is not enough. We need to “to articulate a truly decentralised, emancipatory politics, whereby the institutions of the state (from the national to the municipal level) will be deployed to recognise, create, and foster the creation of social rights to data.” In Alberta that may start with a centralized clinical information system called Connect Care managed by the Province. The Province will presumably control access to our data to those researchers and health-care practitioners that commit to using access appropriately. Can we imagine a model where Connect Care is expanded to include social data that we can then control and give others (businesses) access to?
The question I want to explore today is this: what do we do about distant reading, now that we know that Franco Moretti, the man who coined the phrase “distant reading,” and who remains its most famous exemplar, is among the men named as a result of the #MeToo movement.
Lauren Klein has posted an important blog entry on Distant Reading after Moretti. This essay is based on a talk delivered at the 2018 MLA convention for a panel on Varieties of Digital Humanities. Klein asks about distant reading and whether it shelters sexual harassment in some way. She asks us to put not just the persons, but the structures of distant reading and the digital humanities under investigation. She suggests that it is “not a coincidence that distant reading does not deal well with gender, or with sexuality, or with race.” One might go further and ask if the same isn’t true of the digital humanities in general or the humanities, for that matter. Klein then suggests some thing we can do about it:
- We need more accessible corpora that better represent the varieties of human experience.
- We need to question our models and ask about what is assumed or hidden.
David Sepkoski has published a nice essay in Aeon about What a fossil revolution reveals about the history of ‘big data’. Sepkoski talks about his father (Jack Sepkoski), a paleontologist, who developed the first database to provide a comprehensive record of fossils. This data was used to interpret the fossil record differently. The essay argues that it changed how we “see” data and showed that there had been mass extinctions before (and that we might be in one now).
The analysis that he and his colleagues performed revealed new understandings of phenomena such as diversification and extinction, and changed the way that palaeontologists work.
Sepkoski (father) and colleagues
The essay then makes the interesting move of arguing that, in fact, Jack Sepkoski was not the first to do quantitative palaeontology. The son, a historian, argues that Heinrich Georg Bronn in the 19th century was collecting similar data on paper and visualizing it (see spindle diagram above), but his approach didn’t take.
This raises the question of why Sepkoski senior’s data-driven approach changed palaeontology while Bronn’s didn’t. Sepkoski junior’s answer is a combination of changes. First, that palaeontology became more receptive to ideas like Stephen Jay Gould’s “punctuated equillibrium” that challenged Darwin’s gradualist view. Second, that culture has become more open to data-driven approaches and the interpretation visualizations needed to grasp such approaches.
The essay concludes by warning us about the dangers of believing data black boxes and visualizations that you can’t unpack.
Yet in our own time, it’s taken for granted that the best way of understanding large, complex phenomena often involves ‘crunching’ the numbers via computers, and projecting the results as visual summaries.
That’s not a bad thing, but it poses some challenges. In many scientific fields, from genetics to economics to palaeobiology, a kind of implicit trust is placed in the images and the algorithms that produce them. Often viewers have almost no idea how they were constructed.
This leads me to ask about the warning as gesture. This is a gesture we see more and more, especially about the ethics of big data and about artificial intelligence. No thoughtful person, including myself, has not warned people about the dangers of these apparently new technologies. But what good are these warnings?
Johanna Drucker in Graphesis proposes what to my mind is a much healthier approach to the dangers and opportunities of visualization. She does what humanists do, she asks us to think of visualization as interpretation. If you think of it this way than it is no more or less dangerous than any other interpretation. And, we have the tools to think-through visualization. She shows us how to look at the genealogy of different types of visualization. She shows us how all visualizations are interpretations and therefore need to be read. She frees us to be interpretative with our visualizations. If they are made by the visualizer and are not given by the data as by Moses coming down the mountain, then they are an art that we can play with and through. This is what the 3DH project is about.
3quarksdaily, one of the better web sites for extracts of interesting essays, pointed me to this essay on Are Algorithms Building the New Infrastructure of Racism? in Nautilus by Aaron M. Bornstein (Dec. 21, 2017). The article reviews some of the terrain covered by Cathy O’Neil’s book Weapons of Math Destruction, but the article also points out how AIs are becoming infrastructure and infrastructure with bias baked in is very hard to change, like the low bridges that Robert Moses built to make it hard for public transit to make it into certain areas of NYC. Algorithmic decisions that are biased and visible can be studied and corrected. Decisions that get built into infrastructure disappear and get much harder to fix.
a fundamental question in algorithmic fairness is the degree to which algorithms can be made to understand the social and historical context of the data they use …
Just as important is paying attention to the data that is used to train the AIs in the first place. Historic data carries the biases of these generations and they need to be questioned as they get woven into our infrastructure.
It’s not robot overlords. It’s economic inequality and a new global order.
Kai-Fu Lee has written a short and smart speculation on the effects of AI, The Real Threat of Artificial Intelligence . To summarize his argument:
- AI is not going to take over the world the way the sci-fi stories have it.
- The effect will be on tasks as AI takes over tasks that people are paid to do, putting them out of work.
- How then will we deal with the unemployed? (This is a question people asked in the 1960s when the first wave computerization threatened massive unemployment.)
- One solution is “Keynesian policies of increased government spending” paid for taxing the companies made wealthy by AI. This spending would pay for “service jobs of love” where people act as the “human interface” to all sorts of services.
- Those in the jobs that can’t be automated and that make lots of money might also scale back on their time at work so as to provide more jobs of this sort.