How Netflix Reverse Engineered Hollywood

Alexis C. Madrigal has a fine article in The Atlantic on How Netflix Reverse Engineered Hollywood (Jan. 2, 2014). The article moves from an interesting problem about Netflix’s micro-genres, to text analysis of results of a scrape, to reverse engineering the Netflix algorithm, to creating a genre generator (at the top of the article) and then to an interview with the Netflix VP of Product who was responsible for the tagging system. It is a lovely example of thinking through something and using technology when needed. The text analysis isn’t the point, it is a tool to use in understanding the 76,897 micro-genres uncovered. (Think about it … Netflix has over 70,000 genres of movies and TV shows, some with no actual movies or shows as examples of the micro-genre.)

Madrigal goes on to talk about the procedure Netflix uses to create genres and use them in recommending shows. It turns out to be a combination of content analysis (actual humans watching a movie/show and ranking it in various ways) and automatic methods that combine tags. This combination of human and machine methods is also the process Madrigal describes for his own pursuit of Netflix genres. It is another sense of humanities computing – those procedures that involve both human and algorithmic interventions.

The post ends with an anomaly that illustrates the hybridity of procedure. It turns out the most named actor is Raymond Burr of Perry Mason. Netflix has a larger number of altgenres with Raymond Burr than anyone else. Why would he rank so high in micro-genres? Madrigal tries a theory as to why this is that is refuted by the VP Yellin, but Yellin can’t explain the anomaly either. As Madrigal points out, in Perry Mason shows the mystery is always resolved by the end, but in the case of the mystery of Raymond Burr in genre, there is no revealing bit of evidence that helps us understand how he rose in the ranks.

On the other hand, no one — not even Yellin — is quite sure why there are so many altgenres that feature Raymond Burr and Barbara Hale. It’s inexplicable with human logic. It’s just something that happened.

I tried on a bunch of different names for the Perry Mason thing: ghost, gremlin, not-quite-a-bug. What do you call the something-in-the-code-and-data which led to the existence of these microgenres?

The vexing, remarkable conclusion is that when companies combine human intelligence and machine intelligence, some things happen that we cannot understand.

“Let me get philosophical for a minute. In a human world, life is made interesting by serendipity,” Yellin told me. “The more complexity you add to a machine world, you’re adding serendipity that you couldn’t imagine. Perry Mason is going to happen. These ghosts in the machine are always going to be a by-product of the complexity. And sometimes we call it a bug and sometimes we call it a feature.”

Perhaps this serendipity is what is original in the hybrid procedures involving human practices and algorithms? For some these anomalies are the false positives that disrupt big data’s certainty, for others they are the other insight that emerges from the mixing of human and computer processes. As Madrigal concludes:

Perry Mason episodes were famous for the reveal, the pivotal moment in a trial when Mason would reveal the crucial piece of evidence that makes it all makes sense and wins the day.

Now, reality gets coded into data for the machines, and then decoded back into descriptions for humans. Along the way, humans ability to understand what’s happening gets thinned out. When we go looking for answers and causes, we rarely find that aha! evidence or have the Perry Mason moment. Because it all doesn’t actually make sense.

Netflix may have solved the mystery of what to watch next, but that generated its own smaller mysteries.

And sometimes we call that a bug and sometimes we call it a feature.

Wikileaks – The Spy files

On December 1st, 2011 Wikileaks began releasing The Spy files, a collection of documents from the intelligence contractors. These documents include presentations, brochures, catalogs, manuals and so on. There are hundreds of companies selling tools to anyone (country/telecom) who wants to spy on email, messaging and phones. I find fascinating what they should about the types of tools available to monitor communications, especially the interfaces they have designed for operatives. Here are some slides from a presentation by Glimmerglass Networks (click to download entire PDF).

Continue reading Wikileaks – The Spy files

Rap Game Riff Raff Textual Analysis

Tyler Trkowski has written a Feature for NOISEY (Music by Vice) on Rap Game Riff Raff Textual Analysis. It is a neat example of text analysis outside the academy. He used Voyant and Many Eyes to analyze Riff Raff’s lyrical canon. (Riff Raff, or Horst Christian Simco, is an eccentric rapper.) What is neat is that they embedded a Voyant word cloud right into their essay along with Word Trees from Many Eyes. Riff Raff apparently “might” like “diamonds” and “versace”.

HedgeChatter – Social Media Stock Sentiment Analysis Dashboard

HedgeChatter – Social Media Stock Sentiment Analysis Dashboard is a site that analyzes social media chatter about stocks and then lets you see how a stock is doing. In the picture above you can see the dashboard for Apple (APPL). Rolling over it you can see what people are saying over time – what the “Social Sentiment” is for the stock. I’m assuming with an account one can keep a portfolio and perhaps get alerts when the sentiment drops.

To do this they must have some sort of text analysis running that gives them the sentiment.

Building Inspector by NYPL Labs

The New York Public Library has another cool digital project called the Building Inspector. They are crowdsourcing the training and correction of a building recognition tool that is combing through old maps. You see a portion of a map with red dots outlining a building and you click “Yes” (if the outline is correct), “No” (if it is wrong), and “Fix” (if it is close, but needs to be fixed.)

They also have a neat subtitle to the project, “Kill Time. Make History.”

CBC.ca alberta@noon Monday June 10, 2013

Last week I was interviewed by Judy Aldous on the CBC programme alberta@noon Monday June 10, 2013. We took calls about social media. I was intrigued by the range of reactions from “I don’t need anything other than messaging” to “I use it all the time for my company.” One point I was trying to make is that we all have to now manage our social media presence. There are too many venues to be present in all of them and, as my colleague Julie Rak points out, we are now all celebrities in the sense that we have to worry about how we appear in media. That means we need to educate ourselves to some degree and experiment with developing a voice.

Around the World Symposium on Digital Culture

Tomorrow we are organizing an Around the World Symposium on Digital Culture. This symposium brings together scholars from different countries talking about digital culture for about 17-20 hours as it goes from place to place streaming their talks and discussions. The Symposium is being organized by the Kule Institute for Advanced Study here at the University of Alberta. Visit the site to see the speakers and to tune in.

Please join in using the Twitter hashtag #UofAworld

U. of Virginia Teams Up With ‘Crowdfunding’ Site

Mike linked me to a Chronicle Bottom Line blog story about how U. of Virginia Teams Up With ‘Crowdfunding’ Site to Finance Research. UVa is teaming up with USEED, a company that has built a “fundraising platform [that] taps the power of social networks and the voice of your students to engage alumni and win new donors…” USEED is unlike Kickstarter in that it creates a unique site for each university rather than forcing them to compete on the same site. It is closer to the FutureFunder.ca site for Carleton.

USEED is an example of a company that is experimenting with “social entrepreneurship” a gray area between for-profit and not-for-profit work. The Chronicle also has a story on the ambiguities of social entrepreurship. At times it seems like there are a lot of startups that are circling universities trying to figure out how to feed on our antiquated corpse.

Visualizing Collaboration

Ofer showed me a interactive visualization of the collaboration around a Wikipedia article. The visualization shows the edits (deletions/insertions) over time in different ways. It allows one to study distributed collaborations (or lack thereof) around things like a Wikipedia article. The ideas can be applied to visualizing any collaboration for which you have data (as often happens when the collaboration happens through digital tools that record activity.)

His hypothesis is that theories about how site-specific teams collaboration don’t apply to distributed teams. Office teams have been studied, but there isn’t a lot of research on how voluntary and distributed teams work.

Tropes vs. Women in Video Games

I’ve been meaning to write about sexism in games for a while, but today I came across a YouTube video essay More than a Damsel in a Dress: A Response by Commander Kite Tales. This a response to Damsel in Distress: Part 1 – Tropes vs Women in Video Games by Anita Sarkeesian.

But first, a bit of history.

On May 17th, 2012 Anita Sarkeesian launched a Kickstarter campaign to improve the Feminist Frequency video web series of essays on problematic gender representations. The first of the new series came out recently in March 7, 2013, Damsel in Distress: Part 1 – Tropes vs Women in Video Games. It is well worth watching.

Alas the campaign and Sarkeesian were attacked systematically; see, for a brutal example, the Amateur game invites player to beat up woman. The obscene and hateful attacks have been documented by columnists like Helen Lewis in the New Statesman article, This is what online harassment looks like. What did Sarkeesian do? Lewis puts it succinctly,

She’s somebody with a big online presence through her website, YouTube channel and social media use. All of that has been targeted by people who – and I can’t say this enough – didn’t like her asking for money to make feminist videos.

So why did all these trolls attack Sarkeesian? 4Chan seems to have been one site where they organized, but what bothered them so much about her campaign? Sarkeesian’s interpretation is that they made a game of harassing her. As she puts it, “in their mind they concocted this grand fiction in which they are the heroic players in a massively multiplayer online game…” She goes on to describe how the players of this “gamified misogyny” were mostly grown men, they used discussion boards as their home base for coordination and bragging, the setting of the game was the whole internet, and the goal was to silence the evil Sarkeesian to save gaming for men. The trolls would go out, harass her, and come back to their boards to show off what they had done. It was a particularly nasty example of an internet flash crowd organizing to silence a woman. It was also an example of how the internet can amplify behaviour and provide haven for misogynist communities.

Sarkeesian’s video essay wasn’t even an attack on men or games. It is clearly the work of someone who likes games but is critical of the repeated use of the “damesel in distress” plot device and other sexist crap. The video essay is, however, effective at challenging the uncritical consumption of cliched tropes in games using a medium commonly used in gamer culture (short video essays that show game play and comment on games.)

Now, back to More than a Damsel in a Dress: A Response which argues that Sarkeesian didn’t look at the evidence with an open mind and that the princess in distress in both the Mario and Zelda series of games should be seen as brave individuals dealing bravely with distress that also represent the peace of their kingdom. While I find Kite Tales’ argument somewhat sophistical and mostly answered already by Sarkeesian, we should probably welcome responses like those of Tale that don’t attack the messager, but try to respond to the argument in some fashion; and there are quite a few responses if you care to work through a lot of poor arguments. It would be nice to say that video essayists are modeling how a conversation on these issues should take place rather than hurl abuse, but the medium doesn’t really lend itself to conversation. Instead we have isolated video essays with lots of comments. Not exactly a dialogue, but better than abuse.

While I’m on this issue of damsel’s in distress like Princess Peach, Ars Technica has a story about how a Dad hacks Donkey Kong for his daughter; Pauline now saves Mario. Alas, it too got abusive comments, the worst of which have been compiled into YouTube Reacts to Donkey Kong: Pauline Edition. The compilation focuses on the sexist and homophobic comments. If you scroll through the comments now you will find that they are mostly supportive of the Dad. The good news seems to be that the sorts of comments Sarkeesian faced are being shamed down or being reflected back.

As for Anita Sarkeesian, her Kickstarter campaign raised much more than she asked for and she now has the funds and attention to do a whole series. I look forward to the next part on Damsel in Distress that promises to look at more contemporary games.