MIT apologizes, permanently pulls offline huge dataset that taught AI systems to use racist, misogynistic slurs

Vinay Prabhu, chief scientist at UnifyID, a privacy startup in Silicon Valley, and Abeba Birhane, a PhD candidate at University College Dublin in Ireland, pored over the MIT database and discovered thousands of images labelled with racist slurs for Black and Asian people, and derogatory terms used to describe women. They revealed their findings in a paper undergoing peer review for the 2021 Workshop on Applications of Computer Vision conference.

Another one of those “what were they thinking when they created the dataset stories” from The Register tells about how MIT apologizes, permanently pulls offline huge dataset that taught AI systems to use racist, misogynistic slurs. The MIT Tiny Images dataset was created automatically using scripts that used the WordNet database of terms which itself held derogatory terms. Nobody thought to check either the terms taken from WordNet or the resulting images scoured from the net. As a result there are not only lots of images for which permission was not secured, but also racists, sexist, and otherwise derogatory labels on the images which in turn means that if you train an AI on these it will generate racist/sexist results.

The article also mentions a general problem with academic datasets. Companies like Facebook can afford to hire actors to pose for images and can thus secure permissions to use the images for training. Academic datasets (and some commercial ones like the Clearview AI  database) tend to be scraped and therefore will not have the explicit permission of the copyright holders or people shown. In effect, academics are resorting to mass surveillance to generate training sets. One wonders if we could crowdsource a training set by and for people?

Facebook to Pay $550 Million to Settle Facial Recognition Suit

It was another black mark on the privacy record of the social network, which also reported its quarterly earnings.

The New York Times has a story on how Facebook to Pay $550 Million to Settle Facial Recognition Suit (Natasha Singer and Mike Isaac, Jan. 29, 2020.) The Illinois case has to do with Facebook’s face recognition technology that was part of Tag Suggestions that would suggest names for people in photos. Apparently in Illinois it is illegal to harvest biometric data without consent. The Biometric Information Privacy Act (BIPA) passed in 2008 “guards against the unlawful collection and storing of biometric information.” (Wikipedia entry)

BIPA suggests a possible answer to the question of what is unethical about face recognition. While I realize that a law is not ethics (and vice versa) BIPA hints at one of the ways we can try to unpack the ethics of face recognition. The position suggested by BIPA would go something like this:

  • Face recognition is dependent on biometric data which is extracted from an image or in other form of scan.
  • To collect and store biometric data one needs the consent of the person whose data is collected.
  • The data has to be securely stored.
  • The data has to be destroyed in a timely manner.
  • If there is consent, secure storage, and timely deletion of the data, then the system/service can be said to not be unethical.

There are a number of interesting points to be made about this position. First, it is not the gathering, storing and providing access to images of people that is at issue. Face recognition is an ethical issue because biometric data about a person is being extracted, stored and used. Thus Google Image Search is not an issue as they are storing data about whole images while FB stores information about the face of individual people (along with associated information.)

This raises issues about the nature of biometric data. What is the line between a portrait (image) and biometric information? Would gathering biographical data about a person become biometric at some point if it contained descriptions of their person?

Second, my reading is that a service like Clearview AI could also be sued if they scrape images of people in Illinois and extract biometric data. This could provide an answer to the question of what is ethically wrong about the Clearview AI service. (See my previous blog entry on this.)

Third, I think there is a missing further condition that should be specified, names that the company gathering the biometric data should identify the purpose for which they are gathering it when seeking consent and limit their use of the data to the identified uses. When they no longer need the data for the identified use, they should destroy it. This is essentially part of the PIPA principle of Limiting Use, Disclosure and Retention. It is assumed that if one is to delete data in a timely fashion there will be some usage criteria that determine timeliness, but that isn’t always the case. Sometimes it is just the passage of time.

Of course, the value of data mining is often in the unanticipated uses of data like biometric data. Unanticipated uses are, by definition, not uses that were identified when seeking consent, unless the consent was so broad as to be meaningless.

No doubt more issues will occur to me.

The Secretive Company That Might End Privacy as We Know It

“I’ve come to the conclusion that because information constantly increases, there’s never going to be privacy,” Mr. Scalzo said. “Laws have to determine what’s legal, but you can’t ban technology. Sure, that might lead to a dystopian future or something, but you can’t ban it.”

The New York Times has an important story about Clearview AI, The Secretive Company That Might End Privacy as We Know It. Clearview, which is partly funded by Peter Thiel, scraped a number of social media sites for pictures of people and has developed an AI application that you can upload a picture to and it tries to recognize the person and show you their social media trail. They are then selling the service to police forces.

Needless to say, this is a disturbing use of face recognition for surveillance using our own social media. They are using public images that anyone of us could look at, but at a scale no person could handle. They are doing something that would almost be impossible to stop, even with legislation. What’s to stop the intelligence services of another country doing this (and more)? Perhaps privacy is no longer possible.

Continue reading The Secretive Company That Might End Privacy as We Know It

Watch Andy Warhol “Paint” On A Commodore Computer: Gothamist

Eric Hayot at the Novel Worlds conference showed a slide with an image of Debbie Harry of Blondie painted on the Amiga by Andy Warhol. There is a video of Warhol painting on the Amiga at the premiere of the Commodore Amiga.

This is discussed in a documentary The Invisible Photograph: Part 2 (Trapped). The documentary also talks about recovering other images from Warhol’s original Amiga that was preserved by the The Andy Warhol Museum.

Technologizer has a nice retrospective on the Amiga, Amiga: 25 Years Later. I remember when it came out in 1985. I had a Mac by then, but was intrigued by the colour Amiga and the video work people were doing with it.

‘Photo Archives Are Sleeping Beauties.’ Pharos Is Their Prince

Pharos is an effort among 14 institutions to create a database that will eventually hold and make accessible 22 million images of artworks.

The New York Times has a story about a collaboration to develop the Pharos consortium photo archive, ‘Photo Archives Are Sleeping Beauties.’ Pharos Is Their Prince. The consortium has a number of interesting initiatives they are implementing in Pharos:

  • They are applying the CIDOC Conceptual Reference Model.

The CIDOC Conceptual Reference Model (CRM) provides definitions and a formal structure for describing the implicit and explicit concepts and relationships used in cultural heritage documentation.

  • They have a visual search (which doesn’t seem to find anything at the moment.)
  • They are looking at Research Space (which uses CRM) for a research linked data environment.

Animethon 20

Yesterday I went to the Animethon. This is a convention about Japanese anime, manga, games, cosplay and related culture that takes place every year in Edmonton on the campus of Grant MacEwan City Centre Campus. The three day event attracts thousands (probably around 6000). A sgnificant portion of participants are dressed up for cosplay. You can see my photographs on Flickr in my Animethon 2013 set. The best of the cosplayers I saw was the Hello Kitty samurai knight in the photo above.

It is tempting to compare this Japanese pop culture event in Canada to ones I saw in Japan, but I haven’t seen enough on either side of the Pacific to be sure. What is clear to me is that Japanese pop culture is big here in Edmonton and not just among youth. While there were a lot of kids (some with parents), there were also older fans (like me.) I loved the inventive costumes and there seemed to be almost as many men cosplayers as women. Many took real pride in their costumes.

Some of the panels I went to included one on the Touhou Project and one on ball-jointed dolls (BJD). There was a cosplay contest with some fabulous costumes. I also spent time in the exhibit hall were I picked up a WonderSwan and some games, including a copy of Rez. Now I need a PS2 to play it on!

The ball-jointed doll session was the most interesting as it was a community I didn’t know much about. There is apparently a strong BJD club in Edmonton and they meet to trade and teach each other. Many of the participants had brought their dolls (see my photos) and they seemed to be mostly mature women, though there were some men there too with dolls. I can’t help wondering about the differences between the doll culture in Japan and here. Here it seemed to be a hobby in the tradition of collecting dolls. In Japan there seemed to be a subset of male owners for whom these dolls are more than collectibles, but that may be a projection.