Text classification tool on the web


Michael pointed me to a story about how Stanford scientists put free text-analysis tool on the web. The tool allows you to pass a text (or a Twitter hashtag) to an existing classifier like the Twitter Sentiment classifier. It then gives you a interactive graph like the one above (which shows tweets about #INKEWhistler14 over time.) You can upload your own datasets to analyze and also create your own classifiers. The system saves classifiers for others to try.

I’m impressed at how this tool lets people understand classification and sentiment analysis easily through Twitter classifications. The graph, however, takes a bit of reading – in fact, I’m not sure I understand it. When there are no tweets the bars go stable, and then when there is activity the negative bar seems to go both up and down.

How Netflix Reverse Engineered Hollywood

Alexis C. Madrigal has a fine article in The Atlantic on How Netflix Reverse Engineered Hollywood (Jan. 2, 2014). The article moves from an interesting problem about Netflix’s micro-genres, to text analysis of results of a scrape, to reverse engineering the Netflix algorithm, to creating a genre generator (at the top of the article) and then to an interview with the Netflix VP of Product who was responsible for the tagging system. It is a lovely example of thinking through something and using technology when needed. The text analysis isn’t the point, it is a tool to use in understanding the 76,897 micro-genres uncovered. (Think about it … Netflix has over 70,000 genres of movies and TV shows, some with no actual movies or shows as examples of the micro-genre.)

Madrigal goes on to talk about the procedure Netflix uses to create genres and use them in recommending shows. It turns out to be a combination of content analysis (actual humans watching a movie/show and ranking it in various ways) and automatic methods that combine tags. This combination of human and machine methods is also the process Madrigal describes for his own pursuit of Netflix genres. It is another sense of humanities computing – those procedures that involve both human and algorithmic interventions.

The post ends with an anomaly that illustrates the hybridity of procedure. It turns out the most named actor is Raymond Burr of Perry Mason. Netflix has a larger number of altgenres with Raymond Burr than anyone else. Why would he rank so high in micro-genres? Madrigal tries a theory as to why this is that is refuted by the VP Yellin, but Yellin can’t explain the anomaly either. As Madrigal points out, in Perry Mason shows the mystery is always resolved by the end, but in the case of the mystery of Raymond Burr in genre, there is no revealing bit of evidence that helps us understand how he rose in the ranks.

On the other hand, no one — not even Yellin — is quite sure why there are so many altgenres that feature Raymond Burr and Barbara Hale. It’s inexplicable with human logic. It’s just something that happened.

I tried on a bunch of different names for the Perry Mason thing: ghost, gremlin, not-quite-a-bug. What do you call the something-in-the-code-and-data which led to the existence of these microgenres?

The vexing, remarkable conclusion is that when companies combine human intelligence and machine intelligence, some things happen that we cannot understand.

“Let me get philosophical for a minute. In a human world, life is made interesting by serendipity,” Yellin told me. “The more complexity you add to a machine world, you’re adding serendipity that you couldn’t imagine. Perry Mason is going to happen. These ghosts in the machine are always going to be a by-product of the complexity. And sometimes we call it a bug and sometimes we call it a feature.”

Perhaps this serendipity is what is original in the hybrid procedures involving human practices and algorithms? For some these anomalies are the false positives that disrupt big data’s certainty, for others they are the other insight that emerges from the mixing of human and computer processes. As Madrigal concludes:

Perry Mason episodes were famous for the reveal, the pivotal moment in a trial when Mason would reveal the crucial piece of evidence that makes it all makes sense and wins the day.

Now, reality gets coded into data for the machines, and then decoded back into descriptions for humans. Along the way, humans ability to understand what’s happening gets thinned out. When we go looking for answers and causes, we rarely find that aha! evidence or have the Perry Mason moment. Because it all doesn’t actually make sense.

Netflix may have solved the mystery of what to watch next, but that generated its own smaller mysteries.

And sometimes we call that a bug and sometimes we call it a feature.

Wikileaks – The Spy files

On December 1st, 2011 Wikileaks began releasing The Spy files, a collection of documents from the intelligence contractors. These documents include presentations, brochures, catalogs, manuals and so on. There are hundreds of companies selling tools to anyone (country/telecom) who wants to spy on email, messaging and phones. I find fascinating what they should about the types of tools available to monitor communications, especially the interfaces they have designed for operatives. Here are some slides from a presentation by Glimmerglass Networks (click to download entire PDF).

Continue reading Wikileaks – The Spy files

A Short History of the Highrise

The New York Times and the National Film Board (of Canada) have collaborated on a great interactive A Short History of the Highrise. The interactive plays as a documentary that you can stop at any point to explore details. The director, Katerina Cizek, on the About page talks about their inspiration:

I was inspired by the ways storybooks have been reinvented for digital tablets like the iPad. We used rhymes to zip through history, and animation and interactivity to playfully revisit a stunning photographic collection and reinterpret great feats of engineering.

For the NFB this is part of their larger Highrise many-media project.

Reshaping New York

The New York Times has a fabulous new interactive visualization called Reshaping New York that shows how Bloomberg has changed the city of 12 years. It shows new buildings, the rezoning, the introduction of bike lanes, and the celebration of the waterfront. The visualization is more of a tour that combines a 3D model of the city with images of before and after Bloomberg.

Tropes vs. Women in Video Games

I’ve been meaning to write about sexism in games for a while, but today I came across a YouTube video essay More than a Damsel in a Dress: A Response by Commander Kite Tales. This a response to Damsel in Distress: Part 1 – Tropes vs Women in Video Games by Anita Sarkeesian.

But first, a bit of history.

On May 17th, 2012 Anita Sarkeesian launched a Kickstarter campaign to improve the Feminist Frequency video web series of essays on problematic gender representations. The first of the new series came out recently in March 7, 2013, Damsel in Distress: Part 1 – Tropes vs Women in Video Games. It is well worth watching.

Alas the campaign and Sarkeesian were attacked systematically; see, for a brutal example, the Amateur game invites player to beat up woman. The obscene and hateful attacks have been documented by columnists like Helen Lewis in the New Statesman article, This is what online harassment looks like. What did Sarkeesian do? Lewis puts it succinctly,

She’s somebody with a big online presence through her website, YouTube channel and social media use. All of that has been targeted by people who – and I can’t say this enough – didn’t like her asking for money to make feminist videos.

So why did all these trolls attack Sarkeesian? 4Chan seems to have been one site where they organized, but what bothered them so much about her campaign? Sarkeesian’s interpretation is that they made a game of harassing her. As she puts it, “in their mind they concocted this grand fiction in which they are the heroic players in a massively multiplayer online game…” She goes on to describe how the players of this “gamified misogyny” were mostly grown men, they used discussion boards as their home base for coordination and bragging, the setting of the game was the whole internet, and the goal was to silence the evil Sarkeesian to save gaming for men. The trolls would go out, harass her, and come back to their boards to show off what they had done. It was a particularly nasty example of an internet flash crowd organizing to silence a woman. It was also an example of how the internet can amplify behaviour and provide haven for misogynist communities.

Sarkeesian’s video essay wasn’t even an attack on men or games. It is clearly the work of someone who likes games but is critical of the repeated use of the “damesel in distress” plot device and other sexist crap. The video essay is, however, effective at challenging the uncritical consumption of cliched tropes in games using a medium commonly used in gamer culture (short video essays that show game play and comment on games.)

Now, back to More than a Damsel in a Dress: A Response which argues that Sarkeesian didn’t look at the evidence with an open mind and that the princess in distress in both the Mario and Zelda series of games should be seen as brave individuals dealing bravely with distress that also represent the peace of their kingdom. While I find Kite Tales’ argument somewhat sophistical and mostly answered already by Sarkeesian, we should probably welcome responses like those of Tale that don’t attack the messager, but try to respond to the argument in some fashion; and there are quite a few responses if you care to work through a lot of poor arguments. It would be nice to say that video essayists are modeling how a conversation on these issues should take place rather than hurl abuse, but the medium doesn’t really lend itself to conversation. Instead we have isolated video essays with lots of comments. Not exactly a dialogue, but better than abuse.

While I’m on this issue of damsel’s in distress like Princess Peach, Ars Technica has a story about how a Dad hacks Donkey Kong for his daughter; Pauline now saves Mario. Alas, it too got abusive comments, the worst of which have been compiled into YouTube Reacts to Donkey Kong: Pauline Edition. The compilation focuses on the sexist and homophobic comments. If you scroll through the comments now you will find that they are mostly supportive of the Dad. The good news seems to be that the sorts of comments Sarkeesian faced are being shamed down or being reflected back.

As for Anita Sarkeesian, her Kickstarter campaign raised much more than she asked for and she now has the funds and attention to do a whole series. I look forward to the next part on Damsel in Distress that promises to look at more contemporary games.

The Expression of Emotions in 20th Century Books

Emilie pointed me to an NPR strory on mining mood in 20th century books, Mining Books To Map Emotions Through A Century. This story draws on a very readable article The Expression of Emotions in 20th Century Books in PLOS One. The article reports on a study of “mood” or sentiment over time in literature. The used the Google Ngram data. I like how they report first and then discuss methodology at the end.

They mention support from an interesting EU funded project TrendMiner. TrendMiner is developing real-time multi-lingual analysis tools.

Continue reading The Expression of Emotions in 20th Century Books

Future Hype: Near Futures

I gave a lecture at Kim Solez’s course on the future of medicine and he taped it and put it up on YouTube here:

Geoffrey Rockwell FutureHype LABMP 590 2013 March 7 – YouTube.

This talk came out of a conversation we had at a pub about Ray Kurzweil where I disagreed with Kim about Kurzweil’s predictions. Thinking about Kurzweil I realized how fundamental prediction is. We call it hope. It is easy to make fun of the futurists, but we need to recognize how we always look forward to the near future.

Visual Music

In Dublin I heard DAH student Maura McDonnell present on Visual Music (her blog), which is her PdD research area. Visual Music is one term among many of experiments in light and sound and her blog is a nice collection of resources on this new media form.

From her blog I learned that there is a also a Center for Visual Music that has documentation and an online store.

Maura’s own work can be seen online, see Silk Chroma. The image above is taken from the Vimeo video.