The Digital Public Library of America (DPLA) has a fascinating collection of Primary Source Sets that bring together materials around a subject for teaching and historical thinking. For example they have a set on Commodore Perry’s Expedition to Japan that allows you to see both American and Japanese representations of Perry and the important visit. These sets show how a digital archive can be repurposed in different ways.
— DH at USF (@UsfDh) February 27, 2018
Last week I presented a paper based on work that Stéfan Sinclair and I are doing at the University of South Florida. The talk, titled, “Cooking Up Literature: Theorizing Statistical Approaches to Texts” looked at a neglected period of French innovation in the 1970s and 1980s. During this period the French were developing a national corpus, FRANTEXT, while there was also a developing school of exploratory statistics around Jean-Paul Benzécri. While Anglophone humanities computing was concerned with hypertext, the French were looking at using statistical methods like correspondence analysis to explore large corpora. This is long before Moretti and “distant reading.”
The talk was organized by Steven Jones who holds the DeBartolo Chair in Liberal Arts and is a Professor of Digital Humanities. Steven Jones leads a NEH funded project called RECALL that Stéfan and I are consulting on. Jones and colleagues at USF are creating a 3D model of Father Busa’s original factory/laboratory.
David Sepkoski has published a nice essay in Aeon about What a fossil revolution reveals about the history of ‘big data’. Sepkoski talks about his father (Jack Sepkoski), a paleontologist, who developed the first database to provide a comprehensive record of fossils. This data was used to interpret the fossil record differently. The essay argues that it changed how we “see” data and showed that there had been mass extinctions before (and that we might be in one now).
The analysis that he and his colleagues performed revealed new understandings of phenomena such as diversification and extinction, and changed the way that palaeontologists work.
Sepkoski (father) and colleagues
The essay then makes the interesting move of arguing that, in fact, Jack Sepkoski was not the first to do quantitative palaeontology. The son, a historian, argues that Heinrich Georg Bronn in the 19th century was collecting similar data on paper and visualizing it (see spindle diagram above), but his approach didn’t take.
This raises the question of why Sepkoski senior’s data-driven approach changed palaeontology while Bronn’s didn’t. Sepkoski junior’s answer is a combination of changes. First, that palaeontology became more receptive to ideas like Stephen Jay Gould’s “punctuated equillibrium” that challenged Darwin’s gradualist view. Second, that culture has become more open to data-driven approaches and the interpretation visualizations needed to grasp such approaches.
The essay concludes by warning us about the dangers of believing data black boxes and visualizations that you can’t unpack.
Yet in our own time, it’s taken for granted that the best way of understanding large, complex phenomena often involves ‘crunching’ the numbers via computers, and projecting the results as visual summaries.
That’s not a bad thing, but it poses some challenges. In many scientific fields, from genetics to economics to palaeobiology, a kind of implicit trust is placed in the images and the algorithms that produce them. Often viewers have almost no idea how they were constructed.
This leads me to ask about the warning as gesture. This is a gesture we see more and more, especially about the ethics of big data and about artificial intelligence. No thoughtful person, including myself, has not warned people about the dangers of these apparently new technologies. But what good are these warnings?
Johanna Drucker in Graphesis proposes what to my mind is a much healthier approach to the dangers and opportunities of visualization. She does what humanists do, she asks us to think of visualization as interpretation. If you think of it this way than it is no more or less dangerous than any other interpretation. And, we have the tools to think-through visualization. She shows us how to look at the genealogy of different types of visualization. She shows us how all visualizations are interpretations and therefore need to be read. She frees us to be interpretative with our visualizations. If they are made by the visualizer and are not given by the data as by Moses coming down the mountain, then they are an art that we can play with and through. This is what the 3DH project is about.
Last week I presented a keynote at the Digital Cultures, Big Data and Society conference. (You can seem my conference notes at Digital Cultures Big Data And Society.) The talk I gave was titled “Thinking-Through Big Data in the Humanities” in which I argued that the humanities have the history, skills and responsibility to engage with the topic of big data:
- First, I outlined how the humanities have a history of dealing with big data. As we all know, ideas have histories, and we in the humanities know how to learn from the genesis of these ideas.
- Second, I illustrated how we can contribute by learning to read the new genres of documents and tools that characterize big data discourse.
- And lastly, I turned to the ethics of big data research, especially as it concerns us as we are tempted by the treasures at hand.
The problem isn’t that poor children don’t have access to computers. It’s that they spend too much time in front of them.
The New York Times has an important Opinion about America’s Real Digital Divide by Naomi S. Riley from Feb. 11, 2018. She argues that TV and video game screen time is bad for children and there is no evidence that computer screen time is helpful. The digital divide is not one of access to screens but one of attitude and education on screen time.
But no one is telling poorer parents about the dangers of screen time. For instance, according to a 2012 Pew survey, just 39 percent of parents with incomes of less than $30,000 a year say they are “very concerned” about this issue, compared with about six in 10 parents in higher-earning households.
On Humanist there was an announcement from the Hagley Museum and Library that they had put up a 1969 Sperry-UNIVAC short film An Introduction to Digital Computers. The 22 minute short is a dated, but interesting introduction to how a digital computer works. The short was sponsored by Sperry-UNIVAC which had its origins in the Eckert-Mauchly Computer Corporation founded by Eckert and Mauchly of ENIAC fame.
The museum is in Delaware at the site of E.I. du Pont gunpowder works from 1802. The Hagley library is dedicated to American enterprise and has archival material from Sperry-UNIVAC:
Hagley’s library furthers the study of business and technology in America. The collections include individuals’ papers and companies’ records ranging from eighteenth-century merchants to modern telecommunications and illustrate the impact of the business system on society.
[N]etworks themselves offer ways in which bad actors – and not only the Russian government – can undermine democracy by disseminating fake news and extreme views. “These social platforms are all invented by very liberal people on the west and east coasts,” said Brad Parscale, Mr. Trump’s digital-media director, in an interview last year. “And we figure out how to use it to push conservative values. I don’t think they thought that would ever happen.” Too right.
The Globe and Mail this weekend had an essay by Niall Ferguson on how Social networks are creating a global crisis of democracy. The article is based on Ferguson’s new book The Square and the Tower: Networks and Power from the Freemasons to Facebook. The article points out that manipulation is not just an American problem, but also points out that the real problem is our dependence on social networks in the first place.
Having just finished teaching a course on Big Data and Text Analysis where I taught students Python I can appreciate a well written tutorial on Python. Python Programming for the Humanities by Folgert Karsdorp is a great tutorial for humanists new to programming that takes the form of a series of Jupyter notebooks that students can download. As the tutorials are notebooks, if students have set up Python on their computers then they can use the tutorials interactively. Karsdorp has done a nice job of weaving in cells where the student has to code and Quizes which reinforce the materials which strikes me as an excellent use of the IPython notebook model.
I learned about this reading a more advanced set of tutorials from Allen Riddell for Dariah-DE, Text Analysis with Topic Models for the Humanities and Social Sciences. The title doesn’t do this collection of tutorials justice because they include a lot more than just Topic Models. There are advanced tutorials on all sorts of topics like machine learning and classification. See the index for the range of tutorials.
Text Analysis with Topic Models for the Humanities and Social Sciences (TAToM) consists of a series of tutorials covering basic procedures in quantitative text analysis. The tutorials cover the preparation of a text corpus for analysis and the exploration of a collection of texts using topic models and machine learning.
Stéfan Sinclair and I (mostly Stéfan) have also produced a textbook for teaching programming to humanists called The Art of Literary Text Analysis. These tutorials are also written as Jupyter notebooks so you can download them and play with them.
We are now reimplementing them with our own Voyant-based notebook environment called Spyral. See The Art of Literary Text Analysis with Spyral Notebooks. More on this in another blog entry.