Fourteen years ago, Statistics Canada stopped publishing unfounded rates, over concerns about the quality of the data. In “Unfounded,” The Globe and Mail has tried to fill the gaps in the data.
The Globe and Mail has been publishing a fabulous data-driven expose on how the police categorize one out of five sexual assault reports as unfounded. They have a web essay Will police believe you? that summarizes the investigation. There is another article on How The Globe collected and analyzed sexual assault statistics to report on unfounded figures across Canada. While this isn’t big data, it shows the power of data in showing us that there is a problem and prodding police departments to start reviewing their practices.
Wu, who is running for Congress, said in an email that she is “fairly livid” because it appears the FBI didn’t check out many of her reports about death threats. Wu catalogued more than 180 death threats that she said she received because she spoke out against sexism in the game industry and #GamerGate misogyny that eventually morphed into the alt-right movement and carried into the U.S. presidential race.
It sounds like the FBI either couldn’t trace the threats or they didn’t think they were serious enough and eventually closed down the investigation. In the aftermath of the shooting at the Québec City mosque we need to take the threats of trolls more seriously as Anita Sarkeesian did when she was threatened with a “Montreal Massacre style attack” before speaking at the University of Utah. Yes, only a few act on their threats, but threats piggy-back on the terror to achieve their end. Those making the threats may justify it as just for the lulz, but they do so knowing that some people act on their threats.
On another point, having just given a paper on Palantir I was intrigued to read that the FBI used it in their investigation. The report says that “A search of social media logins using Palantir’s search around feature revealed a common User ID number for two of the above listed Twitter accounts, profiles [Redacted] … A copy of the Palantir chart created from the Twitter results will be uploaded to the case file under a separate serial.” One wonders how useful connecting to Twitter accounts to one ID is.
Near the end of the report, which is really just a collection of redacted documents, there is a heavily redacted email from one of those harassed where all but a couple of lines are left for us to read including,
We feel like we are sending endless emails into the void with you.
I was struck by the number of sessions of papers on mapping projects. I don’t know if I have ever seen so many geospatial projects. Many of the papers talked about how mapping is a different way of analyzing the data whether it is the location of eateries in Roman Pompeii or German construction projects before 1924.
I gave a paper on “Information Wants to Be Free, Or Does It? Ethics in the Digital Humanities.”
Yesterday I gave a talk at Access 2016. This conference brings together archivists and librarians interested in library technology. I was honoured to give the Dave Binkley Memorial Lecture at the end of the conference. My conference notes are here. My talk was about the ethics of digitization, or more generally datafication.
Information is Beautiful has a great interactive on World’s Biggest Data Breaches & Hacks. The interactive shows how data breaches are getting worse, but it also lets you look at different types of breaches.
ProPublica has a great op-ed about Making Algorithms Accountable. The story starts from a decision from the Wisconsin Supreme Court on computer-generated risk (of recidivism) scores. The scores used in Wisconsin come from Northpointe who provide the scores as a service based on a proprietary alogorithm that seems biased against blacks and not that accurate. The story highlights the lack of any legislation regarding algorithms that can affect our lives.
Spurious Correlations is a great web site that shows correlations that are spurious like this one between revenue generated by arcades and computer science doctorates. The gathered correlations show how correlation is not causation.
I could see in my daily work how difficult it was to inform people about their privacy issues. Nobody seemed to care. My hypothesis was that the whole subject was too complex. There were no examples, no images that could help the audience to understand the process behind the mass surveillance.
The answer is to mock up a design fiction of an NSA surveillance dashboard based on what we know and then a video describing a fictional use of it to track an architecture student from Berlin. It seems to me the video and mock designs nicely bring together a number of things we can infer about the tools they have.
Thanks to a note from Domenico Fiormonte to Humanist I came across the Information Geographies page at the Oxford Internet Institute. The OII has been producing interesting maps that show aspects of the internet. The one pictured above shows the distribution of Geographic Knowledge in Freebase. Given the importance of Freebase to Google’s Knowledge Graph it is important to understand the bias of its information to certain locations.
Geographic content in Freebase is largely clustered in certain regions of the world. The United States accounts for over 45% of the overall number of place names in the collection, despite covering about 2% of the Earth, less than 7% of the land surface, and less than 5% of the world population, and about 10% of Internet users. This results in a US density of one Freebase place name for every 1500 people, and far more place names referring to Massachusetts than referring to China.
Domenico Fiormonte’s email to Humanist (Humanist Discussion Group, Vol. 29, No. 824) argues that “It is our responsibility to preserve cultural diversity, and even relatively small players can make a difference by building more inclusive ‘representations’.” He argues that we need to be open about the cultural and linguistic biases of the tools and databases we build.
On the ethos of digital presence: I participated today in a panel launching the Italian version of Paolo Sordi’s book I Am: Remix Your Web Identity. (The Italian title is Bloggo Con WordPress Dunque Sono.) The panel included people like Domenico Fiormonte, Luisa Capelli, Daniela Guardamangna, Raul Mordenti, and, of course, Paolo Sordi.