Spurious Correlations is a great web site that shows correlations that are spurious like this one between revenue generated by arcades and computer science doctorates. The gathered correlations show how correlation is not causation.
Thanks to Dan for this.
I just discovered that IBM to close Many Eyes. This is a pity. It was great environment that let people upload data and visualize it in different ways. I blogged about it ages ago (in computer ages anyway.) In particular I liked their Word Tree which seems one of the best ways to explore language use.
It seems that some of the programmers moved on and that IBM is now focusing on Watson Analytics.
Godwin’s Bot is a good essay from Misha Lepetic on 3QuarksDaily on artificial intelligence (AI). The essay reflects on the recent Microsoft debacle with @TayandYou, an AI chat bot that was “targeted at 18 to 24 year old in the US.” (About Tay & Privacy) For a New Yorker story on how Microsoft shut it down after Twitter trolls trained it to be offensive see I’ve Seen the Greatest A.I. Minds of My Generation Destroyed By Twitter. Lepetic calls her Godwin’s Bot after Godwin’s Law that asserts that in any online conversation there will eventually be a comparison to Hitler.
What is interesting about the essay is that it then moves to an interview wtih Stephen Wolfram on AI & The Future of Civilization where Wolfram distinguishes between inventing a goal, which is difficult to automate, and (once one can articulate a goal clearly) executing it, which can be automated.
How do we figure out goals for ourselves? How are goals defined? They tend to be defined for a given human by their own personal history, their cultural environment, the history of our civilization. Goals are something that are uniquely human.
Lepetic then asks if Tay had a goal or who had goals for Tay. Microsoft had a goal, and that had to do with “learning” from and about a demographic that uses social media. Lepetic sees it as a “vacuum cleaner for data.” In many ways the trolls did us a favor by misleading it.
Or … TayandYou was troll-bait to train a troll filter.
My question is whether anyone has done a good analysis of how the Tay campaign actually worked?
3quarksdaily, one of my favourite sites to read just posted a very nice essay by Sanjukta Paul on Where Probability Meets Literature and Language: Markov Models for Text Analysis. The essay starts with Markov, who in the 19th century was doing linguistic analysis by hand and goes to authorship attribution by people like Fiona Tweedie (the image above is from a study she co-authored). It also explains markov models on the way.
On the ethos of digital presence: I participated today in a panel launching the Italian version of Paolo Sordi’s book I Am: Remix Your Web Identity. (The Italian title is Bloggo Con WordPress Dunque Sono.) The panel included people like Domenico Fiormonte, Luisa Capelli, Daniela Guardamangna, Raul Mordenti, and, of course, Paolo Sordi.
Continue reading Paolo Sordi: I blog therefore I am
Emil Johansson, a student in Gothenburg, has created a fabulous site called the LOTRProject (or Lord Of The Rings Project. The site provides different types of visualizations about Tolkien’s world (Silmarillion, Hobbit, and LOTR) from maps to family trees to character mentions (see image above).
Continue reading LOTRProject: Visualizing the Lord of the Rings
Is it Pokemon or Big Data ? is a simple game where you are presented with a name and you have to guess if it is a big data company or a Pokemon creature. My thanks to Jane for this.
Lately I’ve been trying Wolfram Mathematica more an more for analytics. I was introduced to Mathematica by Bill Turkel and Ian Graham who have done some impressive stuff with it. Bill Turkel has now created a open access, open content, and open source textbook Digital Research Methods with Mathematica. The text is a Mathematica notebook itself so, if you have Mathematica you can actually use the text to do analytics on the spot.
Wolfram has also posted an interesting blog entry on Literary Analysis and the Wolfram Language: Jumping Down a Reading Rabbit Hole. They show how you can generate word clouds and sentiment analysis graphs easily.
While I am still learning Mathematica, some of the features that make it attractive include:
- It uses a “literate programming” model where you write notebooks meant to be read by humans with embedded code rather than writing code with awkward comments embedded.
- It has a lot of convenient Web, Language, and Visualization functions that let you do things we want to do in the digital humanities.
- You can call on Wolfram Alpha in a notebook to get real world knowledge like capital cities or maps or language information.
On Thursday and Friday (Oct. 22nd and 23rd) I was at the 2nd workshop for the Text Mining the Novel project. My conference notes are here Text Mining The Novel 2015. We had a number of great papers on the issue of genre (this year’s topic.) Here are some general reflections:
- The obvious weakness of text mining is that it operates on the novel as text, specifically digital text (or string.) We need to find ways to also study the novel as material object (thing), as a social object, as a performance (of the reader), and as an economic object in a market place. Then we also have to find ways to connect these.
- So many analytical and mining processes depend on bags of words from dictionaries to topics. Is this a problem or a limitation? Can we try to abstract characters, plot, or argument.
- I was interested in the philosophical discussions around the epistemological in novels and philosophical claims about language and literature.