Thanks to Sarah I was led to a nice custom set of visualizations by Salahub and Oldford of Secretary Clinton’s Email (Source: Wikileaks). The visualizations are discussed in a paper titled Interactive Filter and Display of Hillary Clinton’s Emails: A Cautionary Tale of Metadata. Here is how the article concludes.
Finally, this is a cautionary tale. The collection and storage of metadata from any individual in our society should be of concern to all of us. While it is possible to discern patterns from several sources, it is also far too easy to construct a false narrative, particularly one that ﬁts an already held point of view. As analysts, we fall prey to our cognitive biases. Interactive ﬁlter and display of metadata from a large corpus of communications add another tool to an already powerful analytic arsenal. As with any other powerful tool it needs to be used with caution.
Their cautionary tale touches on the value of metadata. After the Snowden revelations government officials like Dianne Feinstein have tried to reassure us that mining metadata shouldn’t be a concern because it isn’t content. Research like this shows what can be inferred from metadata.
I’ve been playing with DataCamp‘s Python lessons and they are quite good. Python is taught in the context of data analysis rather than the turtle drawing of How to Think Like a Computer Scientist. They have a nice mix of video tutorials and then exercises where you get a tripartite screen (see above.) You have an explanation and instructions on the left, a short script to fill in on the upper-right and interactive python shell where you can try stuff below.
Continue reading DataCamp
The Internet Archive’s new software emulator will take you back to 1984.
From Twitter again (channelled from Justin Trudeau) is a story in the Atlantic about the Internet Archive’s early Macintosh emulator, What It’s Like to Use an Original Macintosh in 2017. The emulator comes with a curated set of apps and games, including Dark Castle, which I remember my mother liking. (I was more fond of Déjà Vu.) Here is what MacPaint 2.0 looked like back then.
Each of our lectures will explore one specific facet of bullshit. For each week, a set of required readings are assigned. For some weeks, supplementary readings are also provided for those who wish to delve deeper.
On Twitter I came across this terrific syllabus: Calling Bullshit: Syllabus. The syllabus is learned, full of useful links, clear and funny. I wish I could write a syllabus like this. For example, here are some of the learning objectives:
- Recognize said bullshit whenever and wherever you encounter it.
- Figure out for yourself precisely why a particular bit of bullshit is bullshit.
What could be more important an objective in the humanities?
It’s not robot overlords. It’s economic inequality and a new global order.
Kai-Fu Lee has written a short and smart speculation on the effects of AI, The Real Threat of Artificial Intelligence . To summarize his argument:
- AI is not going to take over the world the way the sci-fi stories have it.
- The effect will be on tasks as AI takes over tasks that people are paid to do, putting them out of work.
- How then will we deal with the unemployed? (This is a question people asked in the 1960s when the first wave computerization threatened massive unemployment.)
- One solution is “Keynesian policies of increased government spending” paid for taxing the companies made wealthy by AI. This spending would pay for “service jobs of love” where people act as the “human interface” to all sorts of services.
- Those in the jobs that can’t be automated and that make lots of money might also scale back on their time at work so as to provide more jobs of this sort.
Continue reading The Real Threat of Artificial Intelligence – The New York Times
Teaching machines to understand – and summarize – text is an article from the The Conversation about the use of machine learning in text summarization. The example they give is how machines could summarize software licenses in ways that would make them more meaningful to us. While these seems a potentially useful application I can’t help wondering why we don’t expect the licensors to summarize their licenses in ways that we can read. Or, barring that, why not make cartoon versions of the agreements like Terms and Conditions.
The issues raised by the use of computers in summarizing texts are many:
- What is proposed would only work in a constrained situation like licenses where the machine can be trained to classify text following some sort of training set. It is unlikely to surprise you with poetry (not that it is meant to.)
- The idea is introduced with the ultimate goal of reducing all the exabytes of data that we have to deal with. This is the “too much information” trope again. The proposed solution doesn’t really deal with the problems that have beguiled us since we started complaining since part of the problem is too much information of unknown types. That is not to say that machine learning doesn’t have a place, but it won’t solve the underlying problem (again.)
- How would the licensors react if we had tools to digest the text we have to deal with? The licensors will have to think about the legal liability (or advantage) of presenting text we won’t read, but which will be summarized for us. They might chose to be opaque to analytics to force us to read for ourselves.
- Which raises the question of just what is the problem with too much information? Is it the expectation that we will consume it in some useful way? Is it that we have no time left for just thinking? Is it that we are constantly afraid that someone will have said something important already and we missed it?
- A wise colleague asked what it would take for something to change us? Are we open to change when we think of too-much-information as something to be handled? Could machine learning become another wall in the interpretative ghetto we build around us?
Last week I was at the Congress of the Humanities and Social Sciences attending the Canadian Society for Digital Humanities 2017 conference. (See the program here.) It was a brilliant conference organized by the folk at Ryerson. I loved being back in downtown Toronto. The closing keynote by Tracy Fullerton on Finer Fruits: A game as participatory text was fascinating. You can see my conference notes here.
Stéfan Sinclair and I were awarded the Outstanding Achievement Award for our work on Voyant and Hermeneutica. I was also involved in some of the presentations:
- Todd Suomela presented a paper I contributed to on “GamerGate and Digital Humanities: Applying an Ethics of Care to Internet Research”
- I presented a paper with Stéfan Sinclair on “The Beginnings of Content Analysis: From the General Inquirer to Sally Sedelow”.
- Greg Whistance-Smith presented a demo/poster on “Methodi.ca: A Commons for Text Analysis Methods”
- Jinman Zhang presented a demo/poster on our work on “Commenting, Gamification and Analytics in an Online Writing Environment: GWrit (Game of Writing)”
From Humanist I learned about a project called Just Review: All Genders, All Genres, All Reviewed that “conducts reserach and develops tools and resources for combatting gender bias in book reviews.” For example, they have a Topic Bias Study of the New York Times. Their web site promises more.
On Thursday I presented a paper at Digital Narratives Around the World, a colloquium organized by Astrid Ensslin and Jérémie Pelletier-Gagnon. The colloquium brought together a nice mix of papers about digital story-telling. I kept my conference notes on Philosophi.ca.
The paper I gave discussed the surveillance software Palantir as a story-telling environment. Palantir is designed not to automate intelligence work, but to augment the analyst and provide them a sandbox where they can try stories about groups of people.
On Friday I delivered the opening keynote at an conference Colloque ACFAS 2017 « La publication savante en contexte numérique » organized by CRIHN. The keynote was on “Hermeneutica: Le dialogue du texte et le jeu de l’interprétation,” presenting work Stéfan Sinclair and I have been doing on how to integrate text and tools. The context of the talk was a previous colloquium organized by CRIHN:
Après un premier colloque à l’ACFAS du Centre de Recherche Interuniversitaire sur les Humanités Numériques en 2014 (sur les besoins d’analyser l’impact du numérique sur les sciences humaines), l’objectif de notre colloque en 2017 est de repenser d’un point de vue théorique et pratique l’édition savante à l’époque du numérique.
In the talk I demonstrated a new tool based on Eliza that we call Veliza. Veliza implements Weizenbaum’s Eliza algorithm but adds the ability to pull a random sentence from the text you are analyzing and send that to the machine. The beta version (not the standard one yet) I was using had two other features.
- It allows you to ask for things like “the occurrences of father” and it responds with a Voyant panel in the dialogue.
- Second, it allows you to edit the script that controls Veliza so you can ask it to respond to different keywords.
This talk was actually the first time we have showed either Veliza or Spiral. Both are still in beta, but will be coming soon to the distribution Voyant.