I’ve just come across some important blog essays by David Gaertner. One is Why We Need to Talk About Indigenous Literature in the Digital Humanities where he argues that colleagues from Indigenous literature are rightly skeptical of the digital humanities because DH hasn’t really taken to heart the concerns of Indigenous communities around the expropriation of data.
I just came across a great French project called Transcrire. The Huma-Num Very Large Facility has built a system for the crowdsourcing of transcription of archival materials. It looks like they have built infrastructure for crowdsourcing (or citizen science) in the humanities. Playing around, it looks very professional.
Bill Robinson has penned a nice essay Marking 70 years of eavesdropping in Canada. The essay gives the background of Canada’s signals intelligence unit, the Communications Security Establishment (CSE) which just marked its 70th anniversary (on Sept. 1st.)
Unable to read the Soviets’ most secret messages, the UKUSA allies resorted to plain-language (unencrypted) communications and traffic analysis, the study of the external features of messages such as sender, recipient, length, date and time of transmission—what today we call metadata. By compiling, sifting, and fusing a myriad of apparently unimportant facts from the huge volume of low-level Soviet civilian and military communications, it was possible to learn a great deal about the USSR’s armed forces, the Soviet economy, and other developments behind the Iron Curtain without breaking Soviet codes. Plain language and traffic analysis remained key sources of intelligence on the Soviet Bloc for much of the Cold War.
Robinson is particularly interesting on “The birth of metadata collection” as the Soviets frustrated developed encryption that couldn’t be broken.
Eder has a nice page about the work he and ogthers in the Computational Stylistics Group are doing. In the workshop sessions I was able to attend he showed us how to set up and run his “stylo” package (PDF) that provides a simple user interface over R for doing stylometry. He also showed us how to then use Gephi for network visualization.
Information is Beautiful has a great interactive on World’s Biggest Data Breaches & Hacks. The interactive shows how data breaches are getting worse, but it also lets you look at different types of breaches.
ProPublica has a great op-ed about Making Algorithms Accountable. The story starts from a decision from the Wisconsin Supreme Court on computer-generated risk (of recidivism) scores. The scores used in Wisconsin come from Northpointe who provide the scores as a service based on a proprietary alogorithm that seems biased against blacks and not that accurate. The story highlights the lack of any legislation regarding algorithms that can affect our lives.
Spurious Correlations is a great web site that shows correlations that are spurious like this one between revenue generated by arcades and computer science doctorates. The gathered correlations show how correlation is not causation.