Common Crawl

The Common Crawl is a project that has been crawling the web and making an open corpus of web data from the last 7 years available for research. There crawl corpus is petabytes of data and available as WARCs (Web Archives.) For example, their 2013 dataset is 102TB and has around 2 billion web pages. Their collection is not as complete as the Internet Archive, which goes back much further, but it is available in large datasets for research.

BuzzFeed on Breitbart courting the alt-right

Screen of emails from Dan Lyons

Buzzfeed News has an article on Here’s How Breitbart and Milo Smuggled Nazi and White Nationalist Ideas Into The Mainstream. The article in based on a cache of internal Breitbart emails and mostly deals with what Milo Yiannopoulos was up to.

From this motley chorus of suburban parents, journalists, tech leaders, and conservative intellectuals, Yiannopoulos’s function within Breitbart and his value to Bannon becomes clear. He was a powerful magnet, able to attract the cultural resentment of an enormously diverse coalition and process it into an urgent narrative about the way liberals imperiled America. It was no wonder Bannon wanted to groom Yiannopoulos for media infamy: The bigger the magnet got, the more ammunition it attracted.

Part of the story also deals with some “liberal” journalists who apparently were emailing Milo like Dan Lyons. It just get more and more sordid.

Many of those who wrote Milo seem to be disgruntled people who feel oppressed by the “political correctness” of their situation, whether in a tech company or entertainment business. They email Milo to vent or pass tips or just get sympathy.

Alice and Bob: the World’s Most Famous Cryptocouple

Alice and Bob is a web site and paper by Quinn DuPont and Alana Cattapan that nicely tells the history of the famous virtual couple used to explain cryptology.

While Alice, Bob, and their extended family were originally used to explain how public key cryptography works, they have since become widely used across other science and engineering domains. Their influence continues to grow outside of academia as well: Alice and Bob are now a part of geek lore, and subject to narratives and visual depictions that combine pedagogy with in-jokes, often reflecting of the sexist and heteronormative environments in which they were born and continue to be used. More than just the world’s most famous cryptographic couple, Alice and Bob have become an archetype of digital exchange, and a lens through which to view broader digital culture.

The web site provides a timeline going back to 1978. The history is then explained more fully in the full paper (PDF). They end by talking about the gendered history of cryptography. They mention other examples where images of women serve as standard test images like the image of Lena from Playboy.

The design of the site nicely shows how a paper can be remediated as an interactive web site. It isn’t that fancy, but you can navigate the timeline and follow links to get a sense of this “couple”.

Vault7 – Wikileaks releases CIA documents

Wikileaks has just released a first part of a series of what purports to be a large collection of CIA documents documenting their hacking tools. See Vault7, as they call the whole leak. Numerous news organizations like the New York Times are reporting on this and saying they think they might be “on first review”.

Continue reading Vault7 – Wikileaks releases CIA documents

Why We Need to Talk About Indigenous Literature in the Digital Humanities

Screenshot from 1991 BBC Horizon documentary

I’ve just come across some important blog essays by David Gaertner. One is Why We Need to Talk About Indigenous Literature in the Digital Humanities where he argues that colleagues from Indigenous literature are rightly skeptical of the digital humanities because DH hasn’t really taken to heart the concerns of Indigenous communities around the expropriation of data.

Continue reading Why We Need to Talk About Indigenous Literature in the Digital Humanities

FBI Game: What is Violent Extremism?

sheep

From Slashdot a story about an FBI game/interactive that is online and which aims at Countering Violent Extremism | What is Violent Extremism?. The subtitle is “Don’t Be A Puppet” and the game is part of a collection of interactive materials that try to teach about extremism in general and encourage some critical distance from the extremism. The game has you as a sheep avoiding pitfalls.

Continue reading FBI Game: What is Violent Extremism?

Geofeedia ‘allowed police to track protesters’

geofeedia
From the BBC a story about US start-up Geofeedia ‘allowed police to track protesters’. Geofeedia is apparently using social media data from Twitter, Facebook and Instagram to monitor activists and protesters for law enforcement. Access to these social media was changed once the ACLU reported on the surveillance product. The ACLU discovered the agreements with Geofeedia when they requested public records of California law enforcement agencies. Geofeedia was boasting to law enforcement about their access. The ACLU has released some of the documents of interest including a PDF of a Geofeedia Product Update email discussing “sentiment” analytics (May 18, 2016).

Frome the Geofeedia web site I was surprised to see that they are offering solutions for education too.

Marking 70 years of eavesdropping in Canada

Bill Robinson has penned a nice essay Marking 70 years of eavesdropping in Canada. The essay gives the background of Canada’s signals intelligence unit, the Communications Security Establishment (CSE) which just marked its 70th anniversary (on Sept. 1st.)

The original unit was the peacetime version of the Joint Discrimination Unit called the CBNRC (Communications Branch of the National Research Council). I can’t help wondering what was meant by “discrimination”?

Unable to read the Soviets’ most secret messages, the UKUSA allies resorted to plain-language (unencrypted) communications and traffic analysis, the study of the external features of messages such as sender, recipient, length, date and time of transmission—what today we call metadata. By compiling, sifting, and fusing a myriad of apparently unimportant facts from the huge volume of low-level Soviet civilian and military communications, it was possible to learn a great deal about the USSR’s armed forces, the Soviet economy, and other developments behind the Iron Curtain without breaking Soviet codes. Plain language and traffic analysis remained key sources of intelligence on the Soviet Bloc for much of the Cold War.

Robinson is particularly interesting on “The birth of metadata collection” as the Soviets frustrated developed encryption that couldn’t be broken.

Robinson is also the author of one of the best blogs on Canadian Signals Intelligence activities Lux Ex Umbra. He posts long thoughtful discussions like this one on Does CSE comply with the law?

They know (on surveillance)

They know is a must see design project by Christian Gross from the Interface Design Programme at University of Applied Sciences in Potsdam (FHP), Germany. The idea behind the project, described in the They Know showcase for FHP, is,

I could see in my daily work how difficult it was to inform people about their privacy issues. Nobody seemed to care. My hypothesis was that the whole subject was too complex. There were no examples, no images that could help the audience to understand the process behind the mass surveillance.

The answer is to mock up a design fiction of an NSA surveillance dashboard based on what we know and then a video describing a fictional use of it to track an architecture student from Berlin. It seems to me the video and mock designs nicely bring together a number of things we can infer about the tools they have.

Speak Up & Stay Safe(r): A Guide to Protecting Yourself From Online Harassment

Feminist Frequency has posted an excellent Speak Up & Stay Safe(r): A Guide to Protecting Yourself From Online Harassment. This is clearly written and thorough discussion of how to protect yourself better from the sorts of harassment Anita Sarkeesian has documented in blog entries like Harassment Through Impersonation: The Creation of a Cyber Mob.

As the title suggests the guide doesn’t guarantee complete protection – all you can do is get better at it. The guide is also clear that it is not for protection against government surveillance. For those worried about government harassment they provide links to other resources like the Workbook on Security.

In her blog entry announcing the guide, Anita Sarkeesian explains the need for this guide thus and costs of harassment thus:

Speak Up & Stay Safe(r): A Guide to Protecting Yourself From Online Harassment was made necessary by the failure of social media services to adequately prevent and deal with the hateful targeting of their more marginalized users. As this guide details, forcing individual victims or potential targets to shoulder the costs of digital security amounts to a disproportionate tax of in time, money, and emotional labor. It is a tax that is levied disproportionately against women, people of color, queer and trans people and other oppressed groups for daring to express an opinion in public.

How did we get to this point? What happened to the dreams of internet democracy and open discourse? What does it say about our society that such harassment has become commonplace? What can we do about it?