Vinay Prabhu, chief scientist at UnifyID, a privacy startup in Silicon Valley, and Abeba Birhane, a PhD candidate at University College Dublin in Ireland, pored over the MIT database and discovered thousands of images labelled with racist slurs for Black and Asian people, and derogatory terms used to describe women. They revealed their findings in a paper undergoing peer review for the 2021 Workshop on Applications of Computer Vision conference.
Another one of those “what were they thinking when they created the dataset stories” from The Register tells about how MIT apologizes, permanently pulls offline huge dataset that taught AI systems to use racist, misogynistic slurs. The MIT Tiny Images dataset was created automatically using scripts that used the WordNet database of terms which itself held derogatory terms. Nobody thought to check either the terms taken from WordNet or the resulting images scoured from the net. As a result there are not only lots of images for which permission was not secured, but also racists, sexist, and otherwise derogatory labels on the images which in turn means that if you train an AI on these it will generate racist/sexist results.
The article also mentions a general problem with academic datasets. Companies like Facebook can afford to hire actors to pose for images and can thus secure permissions to use the images for training. Academic datasets (and some commercial ones like the Clearview AI database) tend to be scraped and therefore will not have the explicit permission of the copyright holders or people shown. In effect, academics are resorting to mass surveillance to generate training sets. One wonders if we could crowdsource a training set by and for people?
Today I gave a short talk at the Digital Synergies Launch Event. The launch included neat talks by colleagues including:
I showed and talked about Lexigraphi.ca – The Dictionary of Worlds in the Wild. This is a social site where people can upload pictures of text outside of books and documents and tag the words – text like tatoos, graffiti, store signs and other forms of public textuality.
Lynne Siemens and Ray Siemens gave the final keynote of the On the Benefits of Failure conference. Their talk was titled “Training Ground for Success? Perspectives on Failure in Several Contexts.”
Continue reading On the Benefits of Failure 2
I just got an email announcing the soft launch of the Canadian Social Knowledge Institute (C-SKI). This institute grew out of the Electronic Textual Culture Lab and the INKE project. Part of C-SKI is a Open Scholarship Policy Observatory which has a number of partners through INKE.
The Canadian Social Knowledge Institute (C-SKI) actively engages issues related to networked open social scholarship: creating and disseminating research and research technologies in ways that are accessible and significant to a broad audience that includes specialists and active non-specialists. Representing, coordinating, and supporting the work of the Implementing New Knowledge Environments (INKE) Partnership, C-SKI activities include awareness raising, knowledge mobilization, training, public engagement, scholarly communication, and pertinent research and development on local, national, and international levels. Originated in 2015, C-SKI is located in the Electronic Textual Cultures Lab in the Digital Scholarship Centre at UVic.
I just came across a great French project called Transcrire. The Huma-Num Very Large Facility has built a system for the crowdsourcing of transcription of archival materials. It looks like they have built infrastructure for crowdsourcing (or citizen science) in the humanities. Playing around, it looks very professional.
This week I attended the second Science 2.0 conference held in Hamburg, Germany. (You can see my research notes here.) The conference dealt with issues around open access, open data, citizen science, and network enabled science. I was one of two Canadian digital humanists presenting. Matthew Hiebert from the University of Victoria talked about the social edition and work from the Electronic Textual Cultures Lab and Iter. It should be noted that in Europe the word “science” is more inclusive and can include the humanities. This conference wasn’t just about how open data and crowdsourcing could help the natural sciences – it was about how research across the disciplines could be supported with virtual labs and infrastructure.
I gave a paper on “New Publics for the Humanities” that started by noting that the humanities no longer engage the public. The social contract with the public that supports us has been neglected. I worry that if the university is disaggregated and the humanities unbundled from the other faculties (the way newspapers have been hit by the internet and the unbundling of services) then people will stop paying for the humanities and much of the research we do. We will end up with cheaper, research poor, colleges that provide lots of higher education without the research, or climbing walls. Only in the elite private universities will the humanities survive, and in those they will survive as a marker of their class status. You will be able to study ancient languages at elite schools because any degree is good from an elite school provides.
Of course, the humanities will survive outside the university, and may become healthier with the downsizing of the professional (or professorial) humanities, but we run the danger of unthinkingly losing a long tradition of thinking critically and ethically. An irony to be sure – losing thinking traditions through the lack of public reflection on the consequences of disruptive change.
Drawing on Greg Crane, I then argued that citizen research (forms of crowdsourcing) can re-engage the publics we need to support us and reflect with us. Citizen research can provide an alternative way of structuring research in anticipation of defunding of the humanities research function. I illustrated my point by showing a number of examples of humanities crowdsourcing projects from the OED (pre-computer volunteer research) to the Dictionary of Words in the Wild. If I can find the time I will write up the argument to see where it goes.
My talk was followed by thorough one on citizen science in environmental studies by Professor Aletta Bonn of the Citizens create knowledge project – a German platform for citizen science. We need to learn from people like Dr. Bonn who are studying and experimenting with the deployment of citizen research. One point she made was the importance of citizen co-design. Most projects enlist citizens in repetitive micro-tasks designed by researchers. What if the research project were designed from the beginning with citizens? What would that mean? How would that work?
Mike Bulajewski has written an excellent critique of the The Cult of Sharing. He describes the way ideas of community and sharing are being exploited by a new type of cult-like company like Airbnb and Uber. Under the guise of sharing and building community these companies are bypassing employment and labor legislation. What’s worse is that they are painting basic labor rights as the outdated way of doing things.
That’s because they’ve adopted a kind of cultural critique of capitalism. For them, the problem with capitalism is not the system itself, but rather depraved contemporary Western culture, which is greedy, individualistic, selfish and acquisitive, and rewards greedy, corrupt, ill-intentioned individuals. The opponents of the so-called culture of greed see the behavior of Black Friday shoppers and Wall Street bankers as equal manifestations of the same general phenomenon, and perhaps believing that we get the leaders we deserve, conclude that the public’s moral flaws makes them in some way responsible for the greed of Wall Street.
The sharing economy is clearly not the kind of economy where wealth and prosperity is shared between rich and poor. On the contrary, it worsens income inequality and concentrates wealth in the hands of those who need it the least. Progressive advocates are well aware of this, but they also see an upside: these startups teach their workers moral lessons about sharing, community, giving and service with a smile.
I’m not sure this is going to be the problem Bulajewski thinks it will be, but he has me worried. I hope that that shine of sharing will wear off and consumers/sharers will begin to treat this as any other industry. I also think the media will soon start reporting the downside of staying on someone’s couch or getting a ride with someone who isn’t licensed. It’s like the internet, which we all thought was a nice sharing community, until it wasn’t.
The Game of Writing (Gwrit) project that I am part of just got support through a University of Alberta Blended Learning Award. See the 2014 Selected Courses. This award is going towards creating a flipped version of Writing 101, a service course that is being scaled up to support large sections by Roger Graves and Heather Graves. With the Blended Learning Award support from the Centre for Teaching and Learning and with Faculty of Arts funding we are redeveloping GWrit to be used in large sections of Writing 101. Here is part of the abstract of the proposal,
Research suggests that by creating a rich online environment for students to connect and interact with instructors and peers they can improve as writers. We are currently building a gamified online writing environment, The Game of Writing (GWrit), for Writing Studies 101 (WRS 101) that can support student writers and alumni. WRS 101 is a high demand service course required for many degree programs across the University. We are creating a large class version that blends face-to-face with gamification strategies. In GWrit students will choose and work on assignments or quests that are part of the course. Their progress on these assignments or quests will be shared with peers and instructional staff; in this way all students can see who is working on the same quests, and they can ask for help or advice from them. Informal assessment will be available online from peers in the class; from paid peer tutors; from GTAs; and from alumni. This represents a significant expansion of the informal assessment available in traditional face-to-face courses, where peers and sometimes the instructor give informal feedback. We also intend to invite alumni to post assignments/quests that come from a workplace writing context. Students who complete WRS 101 will continue to have access to GWrit throughout their undergraduate careers and as alumni.
GWrit started as a prototype developed with support from GRAND. The original idea was an open writing environment where folks could challenge each other to compete at writing and where you could get analytics on your writing (number of words written, tasks completed, and visualizations like word clouds.) This research prototype is now being completely redeveloped by the Arts Resource Centre as a learning tool that can be used by students of our courses. We are adding commenting features so that students (and later alumni) can provide writing guidance in a structured fashion.
From Twitter I learned about this list of cultural Non-profit crowdsourcing projects. I’m amazed there are so few, but some of that is that many projects are probably not listed, or failed early on.
The New York Public Library has another cool digital project called the Building Inspector. They are crowdsourcing the training and correction of a building recognition tool that is combing through old maps. You see a portion of a map with red dots outlining a building and you click “Yes” (if the outline is correct), “No” (if it is wrong), and “Fix” (if it is close, but needs to be fixed.)
They also have a neat subtitle to the project, “Kill Time. Make History.”