Big Data – Page 12 – Theoreti.ca

Show and Tell at CHRIN

Stéphane Pouyllau’s photo of me presenting

Michael Sinatra invited me to a “show and tell” workshop at the new Université de Montréal campus where they have a long data wall. Sinatra is the Director of CRIHN (Centre de recherche interuniversitaire sur les humanitiés numériques) and kindly invited me to show what I am doing with Stéfan Sinclair and to see what others at CRIHN and in France are doing.

Continue reading Show and Tell at CHRIN

Facebook to Pay $550 Million to Settle Facial Recognition Suit

It was another black mark on the privacy record of the social network, which also reported its quarterly earnings.

The New York Times has a story on how Facebook to Pay $550 Million to Settle Facial Recognition Suit (Natasha Singer and Mike Isaac, Jan. 29, 2020.) The Illinois case has to do with Facebook’s face recognition technology that was part of Tag Suggestions that would suggest names for people in photos. Apparently in Illinois it is illegal to harvest biometric data without consent. The Biometric Information Privacy Act (BIPA) passed in 2008 “guards against the unlawful collection and storing of biometric information.” (Wikipedia entry)

BIPA suggests a possible answer to the question of what is unethical about face recognition. While I realize that a law is not ethics (and vice versa) BIPA hints at one of the ways we can try to unpack the ethics of face recognition. The position suggested by BIPA would go something like this:

Face recognition is dependent on biometric data which is extracted from an image or in other form of scan.
To collect and store biometric data one needs the consent of the person whose data is collected.
The data has to be securely stored.
The data has to be destroyed in a timely manner.
If there is consent, secure storage, and timely deletion of the data, then the system/service can be said to not be unethical.

There are a number of interesting points to be made about this position. First, it is not the gathering, storing and providing access to images of people that is at issue. Face recognition is an ethical issue because biometric data about a person is being extracted, stored and used. Thus Google Image Search is not an issue as they are storing data about whole images while FB stores information about the face of individual people (along with associated information.)

This raises issues about the nature of biometric data. What is the line between a portrait (image) and biometric information? Would gathering biographical data about a person become biometric at some point if it contained descriptions of their person?

Second, my reading is that a service like Clearview AI could also be sued if they scrape images of people in Illinois and extract biometric data. This could provide an answer to the question of what is ethically wrong about the Clearview AI service. (See my previous blog entry on this.)

Third, I think there is a missing further condition that should be specified, names that the company gathering the biometric data should identify the purpose for which they are gathering it when seeking consent and limit their use of the data to the identified uses. When they no longer need the data for the identified use, they should destroy it. This is essentially part of the PIPA principle of Limiting Use, Disclosure and Retention. It is assumed that if one is to delete data in a timely fashion there will be some usage criteria that determine timeliness, but that isn’t always the case. Sometimes it is just the passage of time.

Of course, the value of data mining is often in the unanticipated uses of data like biometric data. Unanticipated uses are, by definition, not uses that were identified when seeking consent, unless the consent was so broad as to be meaningless.

No doubt more issues will occur to me.

Avast closes Jumpshot over data privacy backlash, but transparency is the real issue

Avast will shutter its Jumpshot subsidiary just days after an exposé targeted the way it sold user data. But transparency remains the bigger issue.

From Venturbeat (via Slashdot) the news that antivirus company Avast closes Jumpshot over data privacy backlash, but transparency is the real issue (Paul Sawers, Jan. 30, 2020). Avast had been found to have been gathering detailed data about users of its antivirus tools and then selling anonymized data through Jumpshot. The data was of sufficient detail (apparently down to an “all clicks feed”) that it would probably be possible to deanonymize data. So what was the ethical problem here?

As the title of the story advertises the issue was not that Avast was concealing what it was doing, it is more a matter of how transparent it was about what it was doing. The data collection was “opt out” and so you had to go find the setting rather than being asked if you wanted to “opt in.” Jumpstart was apparently fairly open about their business. The information the provided to help you make a decision was not particularly deceptive (see image below), but it is typical of software to downplay the identifiability of data collected.

Some of the issue is around consent. What realistically constitutes consent these days? Does one need to opt-in for there to be meaningful consent? Does one need sufficient information to make a decision, and if so, what would that be?

There are 2,373 squirrels in Central Park. I know because I helped count them

I volunteered for the first squirrel census in the city. Here’s what I learned, in a nutshell.

From Lauren Klein on Twitter I learned about a great New York Times article on There are 2,373 squirrels in Central Park. I know because I helped count them. The article is by Denise Lau (Jan. 8, 2020.) As Klein points out, it is about the messiness of data collection. (Note that she has a book coming out on Data Feminism with Catherine D’Ignazio.)

In 2020, let’s stop AI ethics-washing and actually do something – MIT Technology Review

But talk is just that—it’s not enough. For all the lip service paid to these issues, many organizations’ AI ethics guidelines remain vague and hard to implement.

Thanks to Oliver I came across this call for an end to ethics-washing by artificial intelligence reporter Karen Hao in the MIT Technology Review, In 2020, let’s stop AI ethics-washing and actually do something The call echoes something I’ve been talking about – that we need to move beyond guidelines, lists of principles, and checklists. She nicely talks about some of the initiatives to hold AI accountable that are taking place and what should happen. Read on if you want to see what I think we need.

Continue reading In 2020, let’s stop AI ethics-washing and actually do something – MIT Technology Review

The 100 Worst Ed-Tech Debacles of the Decade

With the end of the year there are some great articles showing up reflecting on debacles of the decade. One of my favorites is The 100 Worst Ed-Tech Debacles of the Decade. Ed-Tech is one of those fields where over and over techies think they know better. Some of the debacles Watters discusses:

3D Printing
The “Flipped Classroom” (Full disclosure: I sat on a committee that funded these.)
Op-Eds to ban laptops
Clickers
Stories about the end of the library
Interactive whiteboards
The K-12 Cyber Incident Map (Check it out here)
IBM Watson
The Year of the MOOC

This collection of 100 terrible ideas in instructional technology should be mandatory reading for all of us who have been keen on ed-tech. (And I am one who has develop ed-tech and oversold it.) Each item is a mini essay with links worth following.

Slaughterbots

On the Humanist discussion list John Keating recommended the short video Slaughterbots that presents a plausible scenario where autonomous drones are used to target dissent using social media data. Watch it! It is well done and presents real issues in a credible short video.

While the short is really about autonomous weapons and the need to ban them, I note that one of ideas included is that dissent could be silenced by using social media to target people. The scenario imagines that university students who shared a dissenting video on social media have their data harvested (including images of their faces) and the drones target them using face recognition. Science fiction, but suggestive of how social media presence can be used for control.

Continue reading Slaughterbots

Linked Infrastructure For Networked Cultural Scholarship Team Meeting 2019

This weekend I was at the Linked Infrastructure For Networked Cultural Scholarship (LINCS) Team Meeting 2019. The meeting/retreat was in Banff at the Banff International Research Station and I kept my research notes at philosophi.ca.

The goal of Lincs is to create a shared linked data store that humanities projects can draw on and contribute to. This would let us link our digital resources in ways that create new intellectual connections and that allow us to reason about linked data.

Schools Are Deploying Massive Digital Surveillance Systems. The Results Are Alarming

“every move you make…, every word you say, every game you play…, I’ll be watching you.” (The Police – Every Breath You Take)

Education Week has an alarming story about how schools are using surveillance, Schools Are Deploying Massive Digital Surveillance Systems. The Results Are Alarming. The story is by Benjamin Harold and dates from May 30, 2019. It talks not only about the deployment of cameras, but the use of companies like Social Sentinel, Securly, and Gaggle that monitor social media or school computers.

Every day, Gaggle monitors the digital content created by nearly 5 million U.S. K-12 students. That includes all their files, messages, and class assignments created and stored using school-issued devices and accounts.

The company’s machine-learning algorithms automatically scan all that information, looking for keywords and other clues that might indicate something bad is about to happen. Human employees at Gaggle review the most serious alerts before deciding whether to notify school district officials responsible for some combination of safety, technology, and student services. Typically, those administrators then decide on a case-by-case basis whether to inform principals or other building-level staff members.

The story provides details that run from the serious to the absurd. It mentions concerns by the ACLU that such surveillance can desensitize children to surveillance and make it normal. The ACLU story makes a connection with laws that forbid federal agencies from studying or sharing data that could make the case for gun control. This creates a situation where the obvious ways to stop gun violence in schools aren’t studied so surveillance companies step in with solutions.

Needless to say, surveillance has its own potential harms beyond desensitization. The ACLU story lists the following potential harms:

Suppression of students’ intellectual freedom, because students will not want to investigate unpopular or verboten subjects if the focus of their research might be revealed.
Suppression of students’ freedom of speech, because students will not feel at ease engaging in private conversations they do not want revealed to the world at large.
Suppression of students’ freedom of association, because surveillance can reveal a students’ social contacts and the groups a student engages with, including groups a student might wish to keep private, like LGBTQ organizations or those promoting locally unpopular political views or candidates.
Undermining students’ expectation of privacy, which occurs when they know their movements, communications, and associations are being watched and scrutinized.
False identification of students as safety threats, which exposes them to a range of physical, emotional, and psychological harms.

As with the massive investment in surveillance for national security and counter terrorism purposes, we need to ask whether the cost of these systems, both financial and other, is worth it. Unfortunately, protecting children, like protecting from terrorism is hard to put a price on which makes it hard to argue against such investments.