The Lives of Literary Characters

The goal of this project is to generate knowledge about the behaviour of literary characters at large scale and make this data openly available to the public. Characters are the scaffolding of great storytelling. This Zooniverse project will allow us to crowdsource data to train AI models to better understand who characters are and what they do within diverse narrative worlds to answer one very big question: why do human beings tell stories?

Today we are going live on Zooinverse with our Citizen Science (crowdsourcing) project, The Lives of Literary Characters. The goal of the project is offer micro-tasks that allow volunteers to annotate literary passages that help annotate training data. It will be interesting to see if we get a decent number of volunteers.

Before setting this up we did some serious reading around the ethics of crowdsourcing as we didn’t want to just exploit readers.

 

A Bored Chinese Housewife Spent Years Falsifying Russian History on Wikipedia

She “single-handedly invented a new way to undermine Wikipedia,” says a Wikipedian.

From Vice a rather funny story about how A Bored Chinese Housewife Spent Years Falsifying Russian History on WikipediaUser Zhemao wrote hundreds of linked articles in the Chinese version of the Wikipedia about fictional events, peoples and places in Russian history. Only recently did someone notice. It shows a vulnerability of such crowdsourced resources; a fabulist can create a network of consistent fictions that supporting each other look true.

GameStop, AMC and the Stock Market’s Wild Ride This Week

GameStop Stock Price from Monday to Friday

Here’s what happened when investors using apps like Robinhood began wagering on a pool of unremarkable stocks.

We’ve all been following the story about GameStop, AMC and the Stock Market’s Wild Ride This Week. The story has a nice David and Goliath side where amateur traders stick it to the big Wall Street bullies, but it is also about the random power of internet-enabled crowds.

Continue reading GameStop, AMC and the Stock Market’s Wild Ride This Week

MIT apologizes, permanently pulls offline huge dataset that taught AI systems to use racist, misogynistic slurs

Vinay Prabhu, chief scientist at UnifyID, a privacy startup in Silicon Valley, and Abeba Birhane, a PhD candidate at University College Dublin in Ireland, pored over the MIT database and discovered thousands of images labelled with racist slurs for Black and Asian people, and derogatory terms used to describe women. They revealed their findings in a paper undergoing peer review for the 2021 Workshop on Applications of Computer Vision conference.

Another one of those “what were they thinking when they created the dataset stories” from The Register tells about how MIT apologizes, permanently pulls offline huge dataset that taught AI systems to use racist, misogynistic slurs. The MIT Tiny Images dataset was created automatically using scripts that used the WordNet database of terms which itself held derogatory terms. Nobody thought to check either the terms taken from WordNet or the resulting images scoured from the net. As a result there are not only lots of images for which permission was not secured, but also racists, sexist, and otherwise derogatory labels on the images which in turn means that if you train an AI on these it will generate racist/sexist results.

The article also mentions a general problem with academic datasets. Companies like Facebook can afford to hire actors to pose for images and can thus secure permissions to use the images for training. Academic datasets (and some commercial ones like the Clearview AI  database) tend to be scraped and therefore will not have the explicit permission of the copyright holders or people shown. In effect, academics are resorting to mass surveillance to generate training sets. One wonders if we could crowdsource a training set by and for people?

Digital Synergies Launch Event


Today I gave a short talk at the Digital Synergies Launch Event. The launch included neat talks by colleagues including:

I showed and talked about Lexigraphi.ca – The Dictionary of Worlds in the Wild. This is a social site where people can upload pictures of text outside of books and documents and tag the words – text like tatoos, graffiti, store signs and other forms of public textuality.

Canadian Social Knowledge Institute

I just got an email announcing the soft launch of the Canadian Social Knowledge Institute (C-SKI). This institute grew out of the Electronic Textual Culture Lab and the INKE project. Part of C-SKI is a Open Scholarship Policy Observatory which has a number of partners through INKE.

The Canadian Social Knowledge Institute (C-SKI) actively engages issues related to networked open social scholarship: creating and disseminating research and research technologies in ways that are accessible and significant to a broad audience that includes specialists and active non-specialists. Representing, coordinating, and supporting the work of the Implementing New Knowledge Environments (INKE) Partnership, C-SKI activities include awareness raising, knowledge mobilization, training, public engagement, scholarly communication, and pertinent research and development on local, national, and international levels. Originated in 2015, C-SKI is located in the Electronic Textual Cultures Lab in the Digital Scholarship Centre at UVic.

Science 2.0 and Citizen Research

This week I attended the second Science 2.0 conference held in Hamburg, Germany. (You can see my research notes here.) The conference dealt with issues around open access, open data, citizen science, and network enabled science. I was one of two Canadian digital humanists presenting. Matthew Hiebert from the University of Victoria talked about the social edition and work from the Electronic Textual Cultures Lab and Iter. It should be noted that in Europe the word “science” is more inclusive and can include the humanities. This conference wasn’t just about how open data and crowdsourcing could help the natural sciences – it was about how research across the disciplines could be supported with virtual labs and infrastructure.

I gave a paper on “New Publics for the Humanities” that started by noting that the humanities no longer engage the public. The social contract with the public that supports us has been neglected. I worry that if the university is disaggregated and the humanities unbundled from the other faculties (the way newspapers have been hit by the internet and the unbundling of services) then people will stop paying for the humanities and much of the research we do. We will end up with cheaper, research poor, colleges that provide lots of higher education without the research, or climbing walls. Only in the elite private universities will the humanities survive, and in those they will survive as a marker of their class status. You will be able to study ancient languages at elite schools because any degree is good from an elite school provides.

Of course, the humanities will survive outside the university, and may become healthier with the downsizing of the professional (or professorial) humanities, but we run the danger of unthinkingly losing a long tradition of thinking critically and ethically. An irony to be sure – losing thinking traditions through the lack of public reflection on the consequences of disruptive change.

Drawing on Greg Crane, I then argued that citizen research (forms of crowdsourcing) can re-engage the publics we need to support us and reflect with us. Citizen research can provide an alternative way of structuring research in anticipation of defunding of the humanities research function. I illustrated my point by showing a number of examples of humanities crowdsourcing projects from the OED (pre-computer volunteer research) to the Dictionary of Words in the Wild. If I can find the time I will write up the argument to see where it goes.

My talk was followed by thorough one on citizen science in environmental studies by Professor Aletta Bonn of the Citizens create knowledge project – a German platform for citizen science. We need to learn from people like Dr. Bonn who are studying and experimenting with the deployment of citizen research. One point she made was the importance of citizen co-design. Most projects enlist citizens in repetitive micro-tasks designed by researchers. What if the research project were designed from the beginning with citizens? What would that mean? How would that work?

The Cult of Sharing

Mike Bulajewski has written an excellent critique of the The Cult of Sharing. He describes the way ideas of community and sharing are being exploited by a new type of cult-like company like Airbnb and Uber. Under the guise of sharing and building community these companies are bypassing employment and labor legislation. What’s worse is that they are painting basic labor rights as the outdated way of doing things.

That’s because they’ve adopted a kind of cultural critique of capitalism. For them, the problem with capitalism is not the system itself, but rather depraved contemporary Western culture, which is greedy, individualistic, selfish and acquisitive, and rewards greedy, corrupt, ill-intentioned individuals. The opponents of the so-called culture of greed see the behavior of Black Friday shoppers and Wall Street bankers as equal manifestations of the same general phenomenon, and perhaps believing that we get the leaders we deserve, conclude that the public’s moral flaws makes them in some way responsible for the greed of Wall Street.

The sharing economy is clearly not the kind of economy where wealth and prosperity is shared between rich and poor. On the contrary, it worsens income inequality and concentrates wealth in the hands of those who need it the least. Progressive advocates are well aware of this, but they also see an upside: these startups teach their workers moral lessons about sharing, community, giving and service with a smile.

I’m not sure this is going to be the problem Bulajewski thinks it will be, but he has me worried. I hope that that shine of sharing will wear off and consumers/sharers will begin to treat this as any other industry. I also think the media will soon start reporting the downside of staying on someone’s couch or getting a ride with someone who isn’t licensed. It’s like the internet, which we all thought was a nice sharing community, until it wasn’t.