Bill Robinson has penned a nice essay Marking 70 years of eavesdropping in Canada. The essay gives the background of Canada’s signals intelligence unit, the Communications Security Establishment (CSE) which just marked its 70th anniversary (on Sept. 1st.)
The original unit was the peacetime version of the Joint Discrimination Unit called the CBNRC (Communications Branch of the National Research Council). I can’t help wondering what was meant by “discrimination”?
Unable to read the Soviets’ most secret messages, the UKUSA allies resorted to plain-language (unencrypted) communications and traffic analysis, the study of the external features of messages such as sender, recipient, length, date and time of transmission—what today we call metadata. By compiling, sifting, and fusing a myriad of apparently unimportant facts from the huge volume of low-level Soviet civilian and military communications, it was possible to learn a great deal about the USSR’s armed forces, the Soviet economy, and other developments behind the Iron Curtain without breaking Soviet codes. Plain language and traffic analysis remained key sources of intelligence on the Soviet Bloc for much of the Cold War.
Robinson is particularly interesting on “The birth of metadata collection” as the Soviets frustrated developed encryption that couldn’t be broken.
Robinson is also the author of one of the best blogs on Canadian Signals Intelligence activities Lux Ex Umbra. He posts long thoughtful discussions like this one on Does CSE comply with the law?
At the European Summer University in Digital Humanities 2016 I was luck to be able to attend some sessions on Stylometry run by Maciej Eder. In his historical review he mentioned people like Valla and Mendenhall, but also mentioned a fellow Pole, Wincenty Lutoslawksi whose book The origin and growth of Plato’s logic; with an account of Plato’s style and of the chronology of his writings (1897) is the first to use the term “stylometry”. Lutoslawski develops a Theory of Stylometry and reviewed “500 peculiarities of Plato’s style” as part of his work on Plato’s logic. The nice thing is that the book is available through the Internet Archive.
Eder has a nice page about the work he and ogthers in the Computational Stylistics Group are doing. In the workshop sessions I was able to attend he showed us how to set up and run his “stylo” package (PDF) that provides a simple user interface over R for doing stylometry. He also showed us how to then use Gephi for network visualization.
Information is Beautiful has a great interactive on World’s Biggest Data Breaches & Hacks. The interactive shows how data breaches are getting worse, but it also lets you look at different types of breaches.
ProPublica has a great op-ed about Making Algorithms Accountable. The story starts from a decision from the Wisconsin Supreme Court on computer-generated risk (of recidivism) scores. The scores used in Wisconsin come from Northpointe who provide the scores as a service based on a proprietary alogorithm that seems biased against blacks and not that accurate. The story highlights the lack of any legislation regarding algorithms that can affect our lives.
Update: ProPublica has responded to a Northpointe critique of their findings.
Spurious Correlations is a great web site that shows correlations that are spurious like this one between revenue generated by arcades and computer science doctorates. The gathered correlations show how correlation is not causation.
Thanks to Dan for this.
I just discovered that IBM to close Many Eyes. This is a pity. It was great environment that let people upload data and visualize it in different ways. I blogged about it ages ago (in computer ages anyway.) In particular I liked their Word Tree which seems one of the best ways to explore language use.
It seems that some of the programmers moved on and that IBM is now focusing on Watson Analytics.
Godwin’s Bot is a good essay from Misha Lepetic on 3QuarksDaily on artificial intelligence (AI). The essay reflects on the recent Microsoft debacle with @TayandYou, an AI chat bot that was “targeted at 18 to 24 year old in the US.” (About Tay & Privacy) For a New Yorker story on how Microsoft shut it down after Twitter trolls trained it to be offensive see I’ve Seen the Greatest A.I. Minds of My Generation Destroyed By Twitter. Lepetic calls her Godwin’s Bot after Godwin’s Law that asserts that in any online conversation there will eventually be a comparison to Hitler.
What is interesting about the essay is that it then moves to an interview wtih Stephen Wolfram on AI & The Future of Civilization where Wolfram distinguishes between inventing a goal, which is difficult to automate, and (once one can articulate a goal clearly) executing it, which can be automated.
How do we figure out goals for ourselves? How are goals defined? They tend to be defined for a given human by their own personal history, their cultural environment, the history of our civilization. Goals are something that are uniquely human.
Lepetic then asks if Tay had a goal or who had goals for Tay. Microsoft had a goal, and that had to do with “learning” from and about a demographic that uses social media. Lepetic sees it as a “vacuum cleaner for data.” In many ways the trolls did us a favor by misleading it.
Or … TayandYou was troll-bait to train a troll filter.
My question is whether anyone has done a good analysis of how the Tay campaign actually worked?
3quarksdaily, one of my favourite sites to read just posted a very nice essay by Sanjukta Paul on Where Probability Meets Literature and Language: Markov Models for Text Analysis. The essay starts with Markov, who in the 19th century was doing linguistic analysis by hand and goes to authorship attribution by people like Fiona Tweedie (the image above is from a study she co-authored). It also explains markov models on the way.
On the ethos of digital presence: I participated today in a panel launching the Italian version of Paolo Sordi’s book I Am: Remix Your Web Identity. (The Italian title is Bloggo Con WordPress Dunque Sono.) The panel included people like Domenico Fiormonte, Luisa Capelli, Daniela Guardamangna, Raul Mordenti, and, of course, Paolo Sordi.
Continue reading Paolo Sordi: I blog therefore I am
Emil Johansson, a student in Gothenburg, has created a fabulous site called the LOTRProject (or Lord Of The Rings Project. The site provides different types of visualizations about Tolkien’s world (Silmarillion, Hobbit, and LOTR) from maps to family trees to character mentions (see image above).
Continue reading LOTRProject: Visualizing the Lord of the Rings