Big Data – Page 18 – Theoreti.ca

Medical Privacy Under Threat in the Age of Big Data

The Intercept has a good introductory story about Medical Privacy Under Threat in the Age of Big Data. I was surprised how valuable medical information is. Here is a quote:

[h]e found a bundle of 10 Medicare numbers selling for 22 bitcoin, or $4,700 at the time. General medical records sell for several times the amount that a stolen credit card number or a social security number alone does. The detailed level of information in medical records is valuable because it can stand up to even heightened security challenges used to verify identity; in some cases, the information is used to file false claims with insurers or even order drugs or medical equipment. Many of the biggest data breaches of late, from Anthem to the federal Office of Personnel Management, have seized health care records as the prize.

The story mentions Latanya Sweeny, who is the Director of the Data Privacy Lab at Harvard. She did important research on Discrimination in Online Ad Delivery and has a number of important papers on health records like a recent work on Matching Known Patients to Health Records in Washington State Data that showed that how one could de-anonymize Washington State health data that is for sale by search news databases. We are far more unique than we think we are.

I should add that I came across an interesting blog post by Dr Sweeny on Tech@FTC arguing for an interdisciplinary field of Technology Science. (Sweeny was the Chief Technologist at the FTC.)

Depositing Archives

We have recently deposited two research archives here at the University of Alberta. One is the John B. Smith Archive. You can download bundles or the complete archive which can be found at http://hdl.handle.net/10402/era.41201. Amy Dyrbye and I worked with John B. Smith to assemble this, document it and deposit it in ERA (the Education and Research Archive).

Another archive that we are building is a collection around Gamergate. The DOI for this is:

doi:10.7939/DVN/10253

For this we are using Dataverse that allows us to manage the archive and publish some parts or not.

Given the work that goes into developing and documenting these archives I would argue that they should be considered scholarly work, but that is another matter.

KIAS shrinks carbon footprints “Around The World”

The Office of Sustainibility at the University of Alberta has recognized our work at the Kule Institute for Advanced Study to develop models for sustainable research. They have published a nice story about the Around the World conference that we run with the title, KIAS shrinks carbon footprints “Around The World”. The question we need to ask ourselves is whether our academic reward system isn’t encouraging flying to conferences where other means of meeting would work. What would it mean to do sustainable research?

diyMatrix: Bertin’s Manual

I have long been interested in Jacques Bertin, a pioneer in thinking about visualization. His Semiology of Graphics is a classic. I had been thinking it would be great to try or simulate his way of doing cluster analysis with physical matrices which he called “dominos”. I was therefore pleased to see that someone has recreated his matrices, see DIY Matrix.

Charles Perin, Pierre Dragicevic, and Jean-Daniel Fekete have updated the matrices and fabricated a version for a CHI’15 workshop on Investigating the Challenges of Making Data Physical (PDF).

Update: They also have a web application called Bertifier that allows you to try it virtually. This interactive allows you to choose different ways of decorating the blocks and will then also reorder them. It is fascinating to play with.

Now I have something I want to print on a fabricator.

The size of the World Wide Web

Reading a paper by Lev Manovich I came across a reference to the web site WorldWideWebSize.com which graphs the size of the World Wide Web. The web site searches Google and Bing daily for different words from a corpus and then uses the total results to estimate the size of the web.

When you know, for example, that the word ‘the’ is present in 67,61% of all documents within the corpus, you can extrapolate the total size of the engine’s index by the document count it reports for ‘the’. If Google says that it found ‘the’ in 14.100.000.000 webpages, an estimated size of the Google’s total index would be 23.633.010.000.

In the screen grab above you can see that the estimated size can change dramatically over time. Hard to tell why.

Around the World Conference

Last week we held our third Around the World Conference on the subject of “Big Data”. We had some fabulous panels from countries including Ireland, Canada, Israel, Nigeria, Japan, China, Australia, USA, Belgium, Italy, and Brazil.

The Around the World Conference streams speakers and panels from around the world out to everyone on the net. We also edit and archive the video clips. This model allows for a sustainable conversation across continents that doesn’t involve flying people around. It allows a lot people who wouldn’t usually be included to speak. We also find there are technical hiccups, but that happens in on-site conferences too.

Editorialisation Et Nouvelles Formes De Publication

In the last couple of weeks I’ve been at two interesting conferences and took research notes.

I gave a keynote on “Big Data and the Humanities” at the Northwestern Research Computation Day (link to my research notes). I gave a lot of examples of projects and visualizations.
At the Éditorialisation Et Nouvelles Formes De Publication (link to my research notes) conference I spoke about “Publishing Tools: A Theatre of Machines”. I showed how text analysis machines have evolved.

TSA’s Secret Behavior Checklist to Spot Terrorists

The Intercept has published the TSA’s behaviour checklist for spotting terrorists as part of two stories. See, Exclusive: TSA’s Secret Behavior Checklist to Spot Terrorists. The Spot Referral Report includes all sorts of behaviours like “Arrives late for flight …”. The idea of the report is that behaviours are assigned points and if someone gets more than a certain number of points the suspect is referred to a Law Enforcement Officer (LEO). The checklist is part of a SPOT (Screening of Passengers by Observation Techniques) Referral Report that is filled out when someone is “spotted” by the TSA. A second story from the Intercept claims that Exclusive: TSA ‘Behavior Detection’ Program Targeting Undocumented Immigrants, Not Terrorists.

Is it Research or is it Spying? Thinking-Through Ethics in Big Data AI and Other Knowledge Sciences

Is it Research or is it Spying? Thinking-Through Ethics in Big Data AI and Other Knowledge Sciences has just been published online. It was written with Bettina Berendt and Marco Büchler and came out of a Dagschule retreat where a group of us started talking about ethics and big data. Here is the abstract:

How to be a knowledge scientist after the Snowden revelations?” is a question we all have to ask as it becomes clear that our work and our students could be involved in the building of an unprecedented surveillance society. In this essay, we argue that this affects all the knowledge sciences such as AI, computational linguistics and the digital humanities. Asking the question calls for dialogue within and across the disciplines. In this article, we will position ourselves with respect to typical stances towards the relationship between (computer) technology and its uses in a surveillance society, and we will look at what we can learn from other fields. We will propose ways of addressing the question in teaching and in research, and conclude with a call to action.

A PDF of our author version is here.

NSA phone record collection does little to prevent terrorist attacks, group says

One of the key issues raised by Snowden is whether all this surveillance works. The Washington Post has a story from a year ago reporting that NSA phone record collection does little to prevent terrorist attacks, group says. This story is based on a report:

Bergen, P., Sterman, D., Schneider, E., and B. Cahall. (2014). Do NSA’s Bulk Surveillance Programs Stop Terrorists? Report from the International Security Program of the New America Foundation.

Continue reading NSA phone record collection does little to prevent terrorist attacks, group says