Text Analysis – Page 4

JSTOR Text Analyzer

JSTOR, and some other publishers of electronic research, have started building text analysis tools into their publishing tools. I came across this at the end of a JSTOR article where there was a link to “Get more results on Text Analyzer” which leads to a beta of the JSTOR labs Text Analyzer environment.

This analyzer environment provides simple an analytical tools for surveying an issue of a journal or article. The emphasis is on extracting keywords and entities so that one can figure out if an article or journal is useful. One can use this to find other similar things.

What intrigues me is this embedding of tools into reading environments which is different from the standard separate data and tools model. I wonder how we could instrument Voyant so that it could be more easily embedded in other environments.

The Secret History of Women in Coding

Computer programming once had much better gender balance than it does today. What went wrong?

The New York Times has a nice long article on The Secret History of Women in Coding – The New York Times. We know a lot of the story from books like Campbell-Kelly’s From Airline Reservations to Sonic the Hedgehog: a History of the Software Industry (2003), Chang’s Brotopia (2018), and Rankin’s A People’s History of Computing in the United States (2018).

The history is not the heroic story of personal computing that I was raised on. It is a story of how women were driven out of computing (both the academy and businesses) starting in the 1960s.

A group of us at the U of Alberta are working on archiving the work of Sally Sedelow, one of the forgotten pioneers of humanities computing. Dr. Sedelow got her PhD in English in 1960 and did important early work on text analysis systems.

Word2Vec Vis of Pride and Prejudice

Paolo showed me a neat demonstration of Word2Vec Vis of Pride and Prejudice. Lynn Cherny trained a Word2Vec model using Jane Austen’s novels and then used that to find close matches for key words. She then show the text of a novel with the words replaced by their match in the language of Austen. It serves as a sort of demonstration of how Word2Vec works.

Every Noise at Once

Ted Underwood in a talk at the Novel Worlds conference talked about a fascinating project, Every Noise at Once. This project has tried to map the genres of music so you can explore these by clicking and listening. You should, in theory, be able to tell the difference between “german techno” and “diva house” by listening. (I’m not musically literate enough to.)

The structure of recent philosophy (II) · Visualizations

In this codebook we will investigate the macro-structure of philosophical literature. As a base for our investigation I have collected about fifty-thousand reco

Stéfan sent me a link to this interesting post, The structure of recent philosophy (II) · Visualizations. Maximilian Noichl has done a fascinating job using the Web of Science to develop a model of the field of Philosophy since the 1950s. In this post he describes his method and the resulting visualization of clusters (see above). In a later post (version III of the project) he gets a more nuanced visualization that seems more true to the breadth of what people do in philosophy. The version above is heavily weighted to anglo-american analytic philosophy while version III has more history of philosophy and continental philosophy.

Here is the final poster (PDF) for version III.

I can’t help wondering if his snowball approach doesn’t bias the results. What if one used full text of major journals?

Writing with the machine

“…it’s like writing with a deranged but very well-read parrot on your shoulder.”

Robin Sloan of Mr. Penumbra’s 24-hour Bookstore fame, has been talking about Writing with the machine. He was inspired by presentations like Adrej Karpathy’s blog post on The Unreasonable Effectiveness of Recurrent Neural Networks and Bowman et al’s Generating Sentences from a Continuous Space to try developing a neural net that could generate text. He used as a training corpus a collection of early science-fiction from the Internet Archive and created different text generation tools like the short video of that which you can see above and hear explained in this Eyeo video.

One of the points he emphasizes is that he didn’t do this just for the fun of seeing strange phrases generated, but wants to use it seriously as a writing aide.

I can’t help wondering if this could be used philosophically. Could we generate philosophical or ethical phrases in response to questions?

CSDH and CGSA 2018

This year we had busy CSDH and CGSA meetings at Congress 2018 in Regina. My conference notes are here. Some of the papers I was involved in include:

CSDH-SCHN:

“Code Notebooks: New Tools for Digital Humanists” was presented by Kynan Ly and made the case for notebook-style programming in the digital humanities.
“Absorbing DiRT: Tool Discovery in the Digital Age” was presented by Kaitlyn Grant. The paper made the case for tool discovery registries and explained the merger of DiRT and TAPoR.
“Splendid Isolation: Big Data, Correspondence Analysis and Visualization in France” was presented by me. The paper talked about FRANTEXT and correspondence analysis in France in the 1970s and 1980s. I made the case that the French were doing big data and text mining long before we were in the Anglophone world.
“TATR: Using Content Analysis to Study Twitter Data” was a poster presented by Kynan Ly, Robert Budac, Jason Bradshaw and Anthony Owino. It showed IPython notebooks for analyzing Twitter data.
“Climate Change and Academia – Joint Panel with ESAC” was a panel I was on that focused on alternatives to flying for academics.

CGSA:

“Archiving an Untold History” was presented by Greg Whistance-Smith. He talked about our project to archive John Szczepaniak’s collection of interviews with Japanese game designers.
“Using Salience to Study Twitter Corpora” was presented by Robert Budac who talked about different algorithms for finding salient words in a Twitter corpus.
“Political Mobilization in the GG Community” was presented by ZP who talked about a study of a Twitter corpus that looked at the politics of the community.

Also, a PhD student I’m supervising, Sonja Sapach, won the CSDH-SCHN (Canadian Society for Digital Humanities) Ian Lancashire Award for Graduate Student Promise at CSDHSCHN18 at Congress. The Award “recognizes an outstanding presentation at our annual conference of original research in DH by a graduate student.” She won the award for a paper on “Tagging my Tears and Fears: Text-Mining the Autoethnography.” She is completing an interdisciplinary PhD in Sociology and Digital Humanities. Bravo Sonja!

Re-Imagining Education In An Automating World conference at George Brown

On May 25th I had a chance to attend a gem of a conference organized the Philosophy of Education (POE) committee at George Brown. They organized a conference with different modalities from conversations to formal talks to group work. The topic was Re-Imagining Education in An Automating World (see my conference notes here) and this conference is a seed for a larger one next year.

I gave a talk on Digital Citizenship at the end of the day where I tried to convince people that:

Data analytics are now a matter of citizenship (we all need to understand how we are being manipulated).
We therefore need to teach data literacy in the arts and humanities, so that
Students are prepared to contribute to and critique the ways analytics are used deployed.
This can be done by integrating data and analytical components in any course using field-appropriate data.

Too Much Information and the KWIC

A paper that Stéfan Sinclair and wrote about Peter Luhn and the Keyword-in-Context (KWIC) has just been published by the Fudan Journal of the Humanities and Social Sciences, Too Much Information and the KWIC | SpringerLink. The paper is part of a series that replicates important innovations in text technology, in this case, the development of the KWIC by Peter Luhn at IBM. We use that as a moment to reflect on the datafication of knowledge after WW II, drawing on Lyotard.

Google AI experiment has you talking to books

Google has announced some cool text projects. See Google AI experiment has you talking to books. One of them, Talk to Books, lets you ask questions or type statements and get answers that are passages from books. This strikes me as a useful research tool as it allows you to see some (book) references that might be useful for defining an issue. The project is somewhat similar to the Veliza tool that we built into Voyant. Veliza is given a particular text and then uses an Eliza-like algorithm to answer you with passages from the text. Needless to say, Talking to Books is far more sophisticated and is not based simply on word searches. Veliza, on the other hand can be reprogrammed and you can specify the text to converse with.

Continue reading Google AI experiment has you talking to books