Michael Sinatra invited me to a “show and tell” workshop at the new Université de Montréal campus where they have a long data wall. Sinatra is the Director of CRIHN (Centre de recherche interuniversitaire sur les humanitiés numériques) and kindly invited me to show what I am doing with Stéfan Sinclair and to see what others at CRIHN and in France are doing.
In early June I was at the Congress for the Humanities and Social Sciences. I took conference notes on the Canadian Society for Digital Humanities 2019 event and on the Canadian Game Studies Association conference, 2019. I was involved in a number of papers:
Exploring through Markup: Recovering COCOA. This paper looked at an experimental Voyant tool that allows one to use COCOA markup as a way of exploring a text in different ways. COCOA markup is a simple form of markup that was superseded by XML languages like those developed with the TEI. The paper recovered some of the history of markup and what we may have lost.
- Our team also had two posters, one on “Generative Ethics: Using AI to Generate” that showed a toy that generates statements about artificial intelligence and ethics. The other, “Discovering Digital Methods: An Exploration of Methodica for Humanists” showed what we are doing with Methodi.ca.
JSTOR, and some other publishers of electronic research, have started building text analysis tools into their publishing tools. I came across this at the end of a JSTOR article where there was a link to “Get more results on Text Analyzer” which leads to a beta of the JSTOR labs Text Analyzer environment.
This analyzer environment provides simple an analytical tools for surveying an issue of a journal or article. The emphasis is on extracting keywords and entities so that one can figure out if an article or journal is useful. One can use this to find other similar things.
What intrigues me is this embedding of tools into reading environments which is different from the standard separate data and tools model. I wonder how we could instrument Voyant so that it could be more easily embedded in other environments.
Computer programming once had much better gender balance than it does today. What went wrong?
The New York Times has a nice long article on The Secret History of Women in Coding – The New York Times. We know a lot of the story from books like Campbell-Kelly’s From Airline Reservations to Sonic the Hedgehog: a History of the Software Industry (2003), Chang’s Brotopia (2018), and Rankin’s A People’s History of Computing in the United States (2018).
The history is not the heroic story of personal computing that I was raised on. It is a story of how women were driven out of computing (both the academy and businesses) starting in the 1960s.
A group of us at the U of Alberta are working on archiving the work of Sally Sedelow, one of the forgotten pioneers of humanities computing. Dr. Sedelow got her PhD in English in 1960 and did important early work on text analysis systems.
Paolo showed me a neat demonstration of Word2Vec Vis of Pride and Prejudice. Lynn Cherny trained a Word2Vec model using Jane Austen’s novels and then used that to find close matches for key words. She then show the text of a novel with the words replaced by their match in the language of Austen. It serves as a sort of demonstration of how Word2Vec works.
Ted Underwood in a talk at the Novel Worlds conference talked about a fascinating project, Every Noise at Once. This project has tried to map the genres of music so you can explore these by clicking and listening. You should, in theory, be able to tell the difference between “german techno” and “diva house” by listening. (I’m not musically literate enough to.)
In this codebook we will investigate the macro-structure of philosophical literature. As a base for our investigation I have collected about fifty-thousand reco
Stéfan sent me a link to this interesting post, The structure of recent philosophy (II) · Visualizations. Maximilian Noichl has done a fascinating job using the Web of Science to develop a model of the field of Philosophy since the 1950s. In this post he describes his method and the resulting visualization of clusters (see above). In a later post (version III of the project) he gets a more nuanced visualization that seems more true to the breadth of what people do in philosophy. The version above is heavily weighted to anglo-american analytic philosophy while version III has more history of philosophy and continental philosophy.
Here is the final poster (PDF) for version III.
I can’t help wondering if his snowball approach doesn’t bias the results. What if one used full text of major journals?
Robin Sloan of Mr. Penumbra’s 24-hour Bookstore fame, has been talking about Writing with the machine. He was inspired by presentations like Adrej Karpathy’s blog post on The Unreasonable Effectiveness of Recurrent Neural Networks and Bowman et al’s Generating Sentences from a Continuous Space to try developing a neural net that could generate text. He used as a training corpus a collection of early science-fiction from the Internet Archive and created different text generation tools like the short video of that which you can see above and hear explained in this Eyeo video.
One of the points he emphasizes is that he didn’t do this just for the fun of seeing strange phrases generated, but wants to use it seriously as a writing aide.
I can’t help wondering if this could be used philosophically. Could we generate philosophical or ethical phrases in response to questions?
This year we had busy CSDH and CGSA meetings at Congress 2018 in Regina. My conference notes are here. Some of the papers I was involved in include:
- “Code Notebooks: New Tools for Digital Humanists” was presented by Kynan Ly and made the case for notebook-style programming in the digital humanities.
- “Absorbing DiRT: Tool Discovery in the Digital Age” was presented by Kaitlyn Grant. The paper made the case for tool discovery registries and explained the merger of DiRT and TAPoR.
- “Splendid Isolation: Big Data, Correspondence Analysis and Visualization in France” was presented by me. The paper talked about FRANTEXT and correspondence analysis in France in the 1970s and 1980s. I made the case that the French were doing big data and text mining long before we were in the Anglophone world.
- “TATR: Using Content Analysis to Study Twitter Data” was a poster presented by Kynan Ly, Robert Budac, Jason Bradshaw and Anthony Owino. It showed IPython notebooks for analyzing Twitter data.
- “Climate Change and Academia – Joint Panel with ESAC” was a panel I was on that focused on alternatives to flying for academics.
- “Archiving an Untold History” was presented by Greg Whistance-Smith. He talked about our project to archive John Szczepaniak’s collection of interviews with Japanese game designers.
- “Using Salience to Study Twitter Corpora” was presented by Robert Budac who talked about different algorithms for finding salient words in a Twitter corpus.
- “Political Mobilization in the GG Community” was presented by ZP who talked about a study of a Twitter corpus that looked at the politics of the community.
Also, a PhD student I’m supervising, Sonja Sapach, won the CSDH-SCHN (Canadian Society for Digital Humanities) Ian Lancashire Award for Graduate Student Promise at CSDHSCHN18 at Congress. The Award “recognizes an outstanding presentation at our annual conference of original research in DH by a graduate student.” She won the award for a paper on “Tagging my Tears and Fears: Text-Mining the Autoethnography.” She is completing an interdisciplinary PhD in Sociology and Digital Humanities. Bravo Sonja!
On May 25th I had a chance to attend a gem of a conference organized the Philosophy of Education (POE) committee at George Brown. They organized a conference with different modalities from conversations to formal talks to group work. The topic was Re-Imagining Education in An Automating World (see my conference notes here) and this conference is a seed for a larger one next year.
I gave a talk on Digital Citizenship at the end of the day where I tried to convince people that:
- Data analytics are now a matter of citizenship (we all need to understand how we are being manipulated).
- We therefore need to teach data literacy in the arts and humanities, so that
- Students are prepared to contribute to and critique the ways analytics are used deployed.
- This can be done by integrating data and analytical components in any course using field-appropriate data.