JSTOR, and some other publishers of electronic research, have started building text analysis tools into their publishing tools. I came across this at the end of a JSTOR article where there was a link to “Get more results on Text Analyzer” which leads to a beta of the JSTOR labs Text Analyzer environment.
This analyzer environment provides simple an analytical tools for surveying an issue of a journal or article. The emphasis is on extracting keywords and entities so that one can figure out if an article or journal is useful. One can use this to find other similar things.
What intrigues me is this embedding of tools into reading environments which is different from the standard separate data and tools model. I wonder how we could instrument Voyant so that it could be more easily embedded in other environments.
The history is not the heroic story of personal computing that I was raised on. It is a story of how women were driven out of computing (both the academy and businesses) starting in the 1960s.
A group of us at the U of Alberta are working on archiving the work of Sally Sedelow, one of the forgotten pioneers of humanities computing. Dr. Sedelow got her PhD in English in 1960 and did important early work on text analysis systems.
Paolo showed me a neat demonstration of Word2Vec Vis of Pride and Prejudice. Lynn Cherny trained a Word2Vec model using Jane Austen’s novels and then used that to find close matches for key words. She then show the text of a novel with the words replaced by their match in the language of Austen. It serves as a sort of demonstration of how Word2Vec works.
Ted Underwood in a talk at the Novel Worlds conference talked about a fascinating project, Every Noise at Once. This project has tried to map the genres of music so you can explore these by clicking and listening. You should, in theory, be able to tell the difference between “german techno” and “diva house” by listening. (I’m not musically literate enough to.)
In this codebook we will investigate the macro-structure of philosophical literature. As a base for our investigation I have collected about fifty-thousand reco
Stéfan sent me a link to this interesting post, The structure of recent philosophy (II) · Visualizations. Maximilian Noichl has done a fascinating job using the Web of Science to develop a model of the field of Philosophy since the 1950s. In this post he describes his method and the resulting visualization of clusters (see above). In a later post (version III of the project) he gets a more nuanced visualization that seems more true to the breadth of what people do in philosophy. The version above is heavily weighted to anglo-american analytic philosophy while version III has more history of philosophy and continental philosophy.
“Code Notebooks: New Tools for Digital Humanists” was presented by Kynan Ly and made the case for notebook-style programming in the digital humanities.
“Absorbing DiRT: Tool Discovery in the Digital Age” was presented by Kaitlyn Grant. The paper made the case for tool discovery registries and explained the merger of DiRT and TAPoR.
“Splendid Isolation: Big Data, Correspondence Analysis and Visualization in France” was presented by me. The paper talked about FRANTEXT and correspondence analysis in France in the 1970s and 1980s. I made the case that the French were doing big data and text mining long before we were in the Anglophone world.
“TATR: Using Content Analysis to Study Twitter Data” was a poster presented by Kynan Ly, Robert Budac, Jason Bradshaw and Anthony Owino. It showed IPython notebooks for analyzing Twitter data.
“Climate Change and Academia – Joint Panel with ESAC” was a panel I was on that focused on alternatives to flying for academics.
“Archiving an Untold History” was presented by Greg Whistance-Smith. He talked about our project to archive John Szczepaniak’s collection of interviews with Japanese game designers.
“Using Salience to Study Twitter Corpora” was presented by Robert Budac who talked about different algorithms for finding salient words in a Twitter corpus.
“Political Mobilization in the GG Community” was presented by ZP who talked about a study of a Twitter corpus that looked at the politics of the community.
Also, a PhD student I’m supervising, Sonja Sapach, won the CSDH-SCHN (Canadian Society for Digital Humanities) Ian Lancashire Award for Graduate Student Promise at CSDHSCHN18 at Congress. The Award “recognizes an outstanding presentation at our annual conference of original research in DH by a graduate student.” She won the award for a paper on “Tagging my Tears and Fears: Text-Mining the Autoethnography.” She is completing an interdisciplinary PhD in Sociology and Digital Humanities. Bravo Sonja!
A paper that Stéfan Sinclair and wrote about Peter Luhn and the Keyword-in-Context (KWIC) has just been published by the Fudan Journal of the Humanities and Social Sciences, Too Much Information and the KWIC | SpringerLink. The paper is part of a series that replicates important innovations in text technology, in this case, the development of the KWIC by Peter Luhn at IBM. We use that as a moment to reflect on the datafication of knowledge after WW II, drawing on Lyotard.
Google has announced some cool text projects. See Google AI experiment has you talking to books. One of them, Talk to Books, lets you ask questions or type statements and get answers that are passages from books. This strikes me as a useful research tool as it allows you to see some (book) references that might be useful for defining an issue. The project is somewhat similar to the Veliza tool that we built into Voyant. Veliza is given a particular text and then uses an Eliza-like algorithm to answer you with passages from the text. Needless to say, Talking to Books is far more sophisticated and is not based simply on word searches. Veliza, on the other hand can be reprogrammed and you can specify the text to converse with.