Thanks to Sarah I was led to a nice custom set of visualizations by Salahub and Oldford of Secretary Clinton’s Email (Source: Wikileaks). The visualizations are discussed in a paper titled Interactive Filter and Display of Hillary Clinton’s Emails: A Cautionary Tale of Metadata. Here is how the article concludes.
Finally, this is a cautionary tale. The collection and storage of metadata from any individual in our society should be of concern to all of us. While it is possible to discern patterns from several sources, it is also far too easy to construct a false narrative, particularly one that ﬁts an already held point of view. As analysts, we fall prey to our cognitive biases. Interactive ﬁlter and display of metadata from a large corpus of communications add another tool to an already powerful analytic arsenal. As with any other powerful tool it needs to be used with caution.
Their cautionary tale touches on the value of metadata. After the Snowden revelations government officials like Dianne Feinstein have tried to reassure us that mining metadata shouldn’t be a concern because it isn’t content. Research like this shows what can be inferred from metadata.
Each of our lectures will explore one specific facet of bullshit. For each week, a set of required readings are assigned. For some weeks, supplementary readings are also provided for those who wish to delve deeper.
On Twitter I came across this terrific syllabus: Calling Bullshit: Syllabus. The syllabus is learned, full of useful links, clear and funny. I wish I could write a syllabus like this. For example, here are some of the learning objectives:
- Recognize said bullshit whenever and wherever you encounter it.
- Figure out for yourself precisely why a particular bit of bullshit is bullshit.
What could be more important an objective in the humanities?
Teaching machines to understand – and summarize – text is an article from the The Conversation about the use of machine learning in text summarization. The example they give is how machines could summarize software licenses in ways that would make them more meaningful to us. While these seems a potentially useful application I can’t help wondering why we don’t expect the licensors to summarize their licenses in ways that we can read. Or, barring that, why not make cartoon versions of the agreements like Terms and Conditions.
The issues raised by the use of computers in summarizing texts are many:
- What is proposed would only work in a constrained situation like licenses where the machine can be trained to classify text following some sort of training set. It is unlikely to surprise you with poetry (not that it is meant to.)
- The idea is introduced with the ultimate goal of reducing all the exabytes of data that we have to deal with. This is the “too much information” trope again. The proposed solution doesn’t really deal with the problems that have beguiled us since we started complaining since part of the problem is too much information of unknown types. That is not to say that machine learning doesn’t have a place, but it won’t solve the underlying problem (again.)
- How would the licensors react if we had tools to digest the text we have to deal with? The licensors will have to think about the legal liability (or advantage) of presenting text we won’t read, but which will be summarized for us. They might chose to be opaque to analytics to force us to read for ourselves.
- Which raises the question of just what is the problem with too much information? Is it the expectation that we will consume it in some useful way? Is it that we have no time left for just thinking? Is it that we are constantly afraid that someone will have said something important already and we missed it?
- A wise colleague asked what it would take for something to change us? Are we open to change when we think of too-much-information as something to be handled? Could machine learning become another wall in the interpretative ghetto we build around us?
Last week I was at the Congress of the Humanities and Social Sciences attending the Canadian Society for Digital Humanities 2017 conference. (See the program here.) It was a brilliant conference organized by the folk at Ryerson. I loved being back in downtown Toronto. The closing keynote by Tracy Fullerton on Finer Fruits: A game as participatory text was fascinating. You can see my conference notes here.
Stéfan Sinclair and I were awarded the Outstanding Achievement Award for our work on Voyant and Hermeneutica. I was also involved in some of the presentations:
- Todd Suomela presented a paper I contributed to on “GamerGate and Digital Humanities: Applying an Ethics of Care to Internet Research”
- I presented a paper with Stéfan Sinclair on “The Beginnings of Content Analysis: From the General Inquirer to Sally Sedelow”.
- Greg Whistance-Smith presented a demo/poster on “Methodi.ca: A Commons for Text Analysis Methods”
- Jinman Zhang presented a demo/poster on our work on “Commenting, Gamification and Analytics in an Online Writing Environment: GWrit (Game of Writing)”
From Humanist I learned about a project called Just Review: All Genders, All Genres, All Reviewed that “conducts reserach and develops tools and resources for combatting gender bias in book reviews.” For example, they have a Topic Bias Study of the New York Times. Their web site promises more.
On Thursday I presented a paper at Digital Narratives Around the World, a colloquium organized by Astrid Ensslin and Jérémie Pelletier-Gagnon. The colloquium brought together a nice mix of papers about digital story-telling. I kept my conference notes on Philosophi.ca.
The paper I gave discussed the surveillance software Palantir as a story-telling environment. Palantir is designed not to automate intelligence work, but to augment the analyst and provide them a sandbox where they can try stories about groups of people.
On Friday I delivered the opening keynote at an conference Colloque ACFAS 2017 « La publication savante en contexte numérique » organized by CRIHN. The keynote was on “Hermeneutica: Le dialogue du texte et le jeu de l’interprétation,” presenting work Stéfan Sinclair and I have been doing on how to integrate text and tools. The context of the talk was a previous colloquium organized by CRIHN:
Après un premier colloque à l’ACFAS du Centre de Recherche Interuniversitaire sur les Humanités Numériques en 2014 (sur les besoins d’analyser l’impact du numérique sur les sciences humaines), l’objectif de notre colloque en 2017 est de repenser d’un point de vue théorique et pratique l’édition savante à l’époque du numérique.
In the talk I demonstrated a new tool based on Eliza that we call Veliza. Veliza implements Weizenbaum’s Eliza algorithm but adds the ability to pull a random sentence from the text you are analyzing and send that to the machine. The beta version (not the standard one yet) I was using had two other features.
- It allows you to ask for things like “the occurrences of father” and it responds with a Voyant panel in the dialogue.
- Second, it allows you to edit the script that controls Veliza so you can ask it to respond to different keywords.
This talk was actually the first time we have showed either Veliza or Spiral. Both are still in beta, but will be coming soon to the distribution Voyant.
Thanks to Humanist I came across this project that offers bwFLA: Emulation as a Service. This will become increasingly important in the digital humanities and game studies as more and more content-rich projects become unreadable on contemporary machines. Just think of the CD-ROM. How many of us still have a CD drive on our computer? I think I have a USB drive somewhere … not sure where it is though. Emulation projects like this and MAME are becoming more and more important to preservation and history.
Also, take a look at their Use Cases.
Compute Canada just published a story about Voyant with the title, High-powered computing: It’s not just for astrophysics anymore.
Researchers in the humanities and social sciences are using digital infrastructure to help advance their research as well, and a Canadian-made tool called Voyant is allowing those who work with texts to do it with ease.
The story points out that Voyant may have more unique users than any other tool on Compute Canada, which is gratifying to read. This doesn’t mean more research is supported by Voyant, or more important research; comparisons are not really useful. What is more important is that the way humanists use infrastructure is different and being recognized. Humanists typically aren’t doing “big science.” They don’t need thousands of processors and batch interfaces. They want a more interactive and “always on” type of service. Compute Canada has listened and has been supporting our style/pace of infrastructure. Bravo!
Every year the University of Alberta Libraries organizes a Research Data Management Week to bring faculty, staff, students, and community data specialists together around data management. I was part of an panel session today on the subject. One of the issues we discussed with was how to deal with a likely requirement from funding agencies like SSHRC that Research Data Management Plans be submitted with grants. Some thoughts on this:
- Such a requirement will build on the Principles on Digital Data Management
- Researchers will initially need help understanding what a DMP (Data Management Plan) is. The Portage Network DMP Assistant can help, but many will need an introduction to the issues.
- Research universities and libraries will need to develop strategies for supporting projects to meet their new obligations. We will need the infrastructure to match.
- There will be push back from some scholarly associations. Others, like CSDH-SCHN will welcome this as we have policies that support the idea.
- There is a cost to properly curating, documenting and depositing research data. This cost comes typically at the end of projects when the funds are spent. We will need to do a better job budgeting for data management/deposit.
- We need to develop small grants and services for projects to help them go the last mile in curating and depositing their content. At the Kule Institute we developed CRAfT grants in partnership with the UofA Libraries. These grants are meant for prototyping digital archives. Now we need to think about a program to help with the final archiving.