Introducing the AI4Society Signature Area

AI4Society will provide institutional leadership in this exciting area of teaching, research, and scholarship.

The Quad has a story Introducing the AI4Society Signature Area. Artificial Intelligence for Society is a University of Alberta Signature Area that brings together researchers and instructors from both the sciences and the arts. AI4S looks at how AI can be imagined, designed, and tested so that it serves society. I’m lucky to contribute to this Area as the Associate Director, working with the Director, Eleni Stroulia from Computing Science.

Knowledge is a commons – Pour des savoirs en commun

The Canadian Comparative Literature Association (CCLA/ACLC) celebrated in 2019 its fiftieth anniversary. The association’s annual conference, which took place from June 2 to 5, 2019 as part of the Congress of the Humanities and Social Sciences of Canada at UBC (Vancouver), provided an opportunity to reflect on the place of comparative literature in our institutions. We organized a joint bilingual roundtable bringing together comparatists and digital humanists who think and put in place collaborative editorial practices. Our goal was to foster connections between two communities that ask similar questions about the modalities for the creation, dissemination and legitimation of our research. We wanted our discussions to result in a concrete intervention, thought and written collaboratively and demonstrating what comparative literature promotes. The manifesto you will read, “Knowledge is a commons – Pour des savoirs en commun”, presents the outcome of our collective reflexion and hopes to be the point of departure for more collaborative work.

Thanks to a panel on the Journal in the digital age at CSDH-SCHN 2020 I learned about the manifesto, Knowledge is a commons – Pour des savoirs en commun. The manifesto was “written colingually, that is, alternating between English and French without translating each element into both languages. This choice, which might appear surprising, puts into practice one of our core ideas: the promotion of active and fluid multilingualism.” This is important.

The manifesto makes a number of important points which I summarize in my words:

  • We need to make sure that knowledge is truly made public. It should be transparent, open and reversible (read/write).
  • We have to pay attention to the entire knowledge chain of research to publication and rebuild it in its entirety so as to promote access and inclusion.
  • The temporalities, spaces, and formats of knowledge making matter. Our tools and forms like our thought should be fluid and plural as they can structure our thinking.
  • We should value the collectives that support knowledge-making rather than just authoritative individuals and monolithic texts. We should recognize the invisible labourers and those who provide support and infrastructure.
  • We need to develop inclusive circles of conversation that cross boundaries. We need an ethics of open engagement.
  • We should move towards an active and fluid multilingualism (of which the manifesto is an example.)
  • Writing is co-writing and re-writing and writing beyond words. Let’s recognize a plurality of writing practices.

 

“Excellence R Us”: university research and the fetishisation of excellence

Is “excellence” really the most efficient metric for distributing the resources available to the world’s scientists, teachers, and scholars? Does “excellence” live up to the expectations that academic communities place upon it? Is “excellence” excellent? And are we being excellent to each other in using it?

During the panel today on Journals in the digital age: penser de nouveaux modèles de publication en sciences humaines at CSDH-SCHN 2020 someone linked to an essay on  “Excellence R Us”: university research and the fetishisation of excellence in Palgrave Communications (2017). The essay does what should have been done some time ago, it questions the excellence of “excellence” as a value for everything in universities. The very overuse of “excellence” has devalued the concept. Surely much of what we do these days is “good enough” especially as our budgets are cut and cut.

The article has three major parts:

  • Rhetoric of excellence – it looks at how there is little consensus around what excellence between disciplines. Within disciplines it is negotiated and can become conservative.
  • Is “excellence” good for research – the second section argues that there is little correlation between forms of excellence review and long term metrics. They go on to outline some of the unfortunate side-effects of the push for excellence; how it can distort research and funding by promoting competition rather than collaboration. They also talk about how excellence disincentivizes replication – who wants to bother with replication if
  • Alternative narratives – the third section looks at alternative ways of distributing funding. They discuss looking at “soundness” and “capacity” as an alternatives to the winner-takes-all of excellence.

So much more could and should be addressed on this subject. I have often wondered about the effect of the success rates in grant programmes (percentage of applicants funded). When the success rate gets really low, as it is with many NEH programmes, it almost becomes a waste of time to apply and superstitions about success abound. SSHRC has healthier success rates that generally ensure that most researchers gets funded if they persist and rework their proposals.

Hypercompetition in turn leads to greater (we might even say more shameless …) attempts to perform this “excellence”, driving a circular conservatism and reification of existing power structures while harming rather than improving the qualities of the underlying activity.

Ultimately the “adjunctification” of the university, where few faculty get tenure, also leads to hypercompetition and an impoverished research environment. Getting tenure could end up being the most prestigious (and fundamental) of grants – the grant of a research career.

 

Google Developers Blog: Text Embedding Models Contain Bias. Here’s Why That Matters.

Human data encodes human biases by default. Being aware of this is a good start, and the conversation around how to handle it is ongoing. At Google, we are actively researching unintended bias analysis and mitigation strategies because we are committed to making products that work well for everyone. In this post, we’ll examine a few text embedding models, suggest some tools for evaluating certain forms of bias, and discuss how these issues matter when building applications.

On the Google Developvers Blog there is an interesting post on Text Embedding Models Contain Bias. Here’s Why That Matters. The post talks about a technique for using Word Embedding Association Tests (WEAT) to see compare different text embedding algorithms. The idea is to see whether groups of words like gendered words associate with positive or negative words. In the image above you can see the sentiment bias for female and male names for different techniques.

While Google is working on WEAT to try to detect and deal with bias, in our case this technique could be used to identify forms of bias in corpora.

What Mutual Aid Can Do During a Pandemic

A radical practice is suddenly getting mainstream attention. Will it change how we help one another?

The most recent New Yorker (to make to my house) has an important article on What Mutual Aid Can Do During a Pandemic. The article looks at a number of the mutual aid groups popping up to meet local needs like delivering food to disabled people. It is particularly interesting on the long term political impact of this sort of local organizing. Well worth thinking about.

DARIAH Virtual Exchange Event

This morning at 7am I was up participating in a DARIAH VX (Virtual Exchange) on the subject of The Scholarly Primitives of Scholarly Meetings. This virtual seminar was set up when DARIAH’s f2f (face-2-face) meeting was postponed. The VX was to my mind a great example of an intentionally designed virtual event. Jennifer Edmunds and colleagues put together an event meant to be both about and an example of a virtual seminar.

One feature they used was to have us all split into smaller breakout rooms. I was in one on The Academic Footprint: Sustainable methods for knowledge exchange. I presented on Academic Footprint: Moving Ideas Not People which discussed our experience with the Around the World Econferences. I shared some of the advice from the Quick Guide I wrote on Organizing a Conference Online.

  • Recognize the status conferred by travel
  • Be explicit about blocking out the time to concentrate on the econference
  • Develop alternatives to informal networking
  • Gather locally or regionally
  • Don’t mimic F2F conferences (change the pace, timing, and presentation format)
  • Be intentional about objectives of conference – don’t try to do everything
  • Budget for management and technology support

For those interested we have a book coming out from Open Book Publishers with the title Right Research that collects essays on sustainable research. We have put up preprints of two of the essays that deal with econferences:

The organizers had the following concept and questions for our breakout group.

Session Concept: Academic travel is an expense not only to the institutions and grant budgets, but also to the environment. There have been moves towards open-access, virtual conferences and near carbon-neutral events. How can academics work towards creating a more sustainable environment for research activities?

Questions: (1) How can academics work towards creating a more sustainable environment for research activities? (2) What are the barriers or limitations to publishing in open-access journals and how can we overcome these? (3) What environmental waste does your research produce? Hundreds of pages of printed drafts? Jet fuel pollution from frequent travel? Electricity from powering huge servers of data?

The breakout discussion went very well. In fact I would have had more breakout discussion and less introduction, though that was good too.

Another neat feature they had was a short introduction (with a Prezi available) followed by an interview before us all. The interview format gave a liveliness to the proceeding.

Lastly, I was impressed by the supporting materials they had to allow the discussion to continue. This included the DARIAH Virtual Exchange Event – Exhibition Space for the Scholarly Primitives of Scholarly Meetings.

All told, Dr. Edmonds and DARIAH colleagues have put together a great exemplar both about and of a virtual seminar. Stay tuned for when they share more.

The Viral Virus

Graph of word "test*" over time
Relative Frequency of word “test*” over time

Analyzing the Twitter Conversation Surrounding COVID-19

From Twitter I found out about this excellent visual essay on The Viral Virus by Kate Appel from May 6, 2020. Appel used Voyant to study highly retweeted tweets from January 20th to April 23rd. She divided the tweets into weeks and then used the distinctive words (tf-idf) tool to tell a story about the changing discussion about Covid-19. As you scroll down you see lists of distinctive words and supporting images. At the end she shows some of the topics gained from topic modelling. It is a remarkably simple, but effective use of Voyant.

COVID-19 contact tracing reveals ethical tradeoffs between public health and privacy

Michael Brown has written a nice article in the U of Alberta folio on COVID-19 contact tracing reveals ethical tradeoffs between public health and privacyThe article quotes me extensively on the subject of the ethics of these new bluetooth contact tracing tools. In the interview I tried the emphasize the importance of knowledge and consent.

  • Users of these apps should know that they are being traced through them, and
  • Users should consent to their use.

There are a variety of these apps from the system pioneered by Singapore called TraceTogether to its Alberta cousin ABTraceTogether. There are also a variety of approaches to tracing people from using credit card records to apps like TraceTogether. The EFF has a good essay on Protecting Civil Rights During a Public Health Crisis that I adapt here to provide guidelines for when one might gather data without knowledge or consent:

  • Medically necessary: There should be a clear and compelling explanation as to how this will save lives.
  • Personal information proportionate to need: The information gathered should fit the need and go no further.
  • Information handled by health informatics specialists: The gathering and processing should be handled by health informatics units, not signals intelligence or security services.
  • Deleted: It should be deleted once it is no longer needed.
  • Not be organized due to vulnerable demographics: The information should not be binned according to stereotypical or vulnerable demographics unless there is a compelling need. We should be very careful that we don’t use the data to further disadvantage groups.
  • Use reviewed afterwards: The should be a review after the crisis is over.
  • Transparency: Government should transparent about what they are gathering and why.
  • Due process: There should be open processes for people to challenge the gathering of their information or to challenge decisions taken as a result of such information.

Locative Gaming in the time of COVID-19

Jessie Marchessault at Concordia has a nice essay on the TAG site on Locative Gaming in the time of COVID-19. I hadn’t thought of how Niantic would be responding to Covid-19 and changing their locative games, except when I saw a small group obviously still playing in a park the other day. As Marchessault points out, the community and Niantic have adapted. Niantic has found ways to make the game playable at home, but they have also done it in a way that increases revenue.

It would be interesting to see if they could include bluetooth proximity services that might tell you that you are getting too close to other players.

260,000 Words, Full of Self-Praise, From Trump on the Virus

The New York Times has a nice content analysis study of Trump’s Coronavirus briefings, 260,000 Words, Full of Self-Praise, From Trump on the Virus. They tagged the corpus for different types of utterances including:

  • Self-congratulations
  • Exaggerations and falsehoods
  • Displays of empathy or appeals to national unity
  • Blaming others
  • Credits others

Needless to say they found he spent a fair amount of time congratulating himself.

They then created a neat visualizations with colour coded sections showing where he shows empathy or congratulates himself.

According to the article they looked at 42 briefings or other remarks from March 9 to April 17, 2020 giving them a total of 260,000 words.

I decided to replicate their study with Voyant and I gathered 29 Coronavirus Task Force Briefings (and one Press Conference) from February 29 to April 17. These are all the Task Force Briefings I could find at the White House web site. The corpus has 418,775 words, but those include remarks by people other than Trump, questions, and metadata.

Some of the things that struck me are the absence of medical terminology in the high frequency words. I was also intrigued by the prominence of “going to”. Trump spends a fair amount of time talking about what he and others are going to be doing rather than what is done. Here you have a Contexts panel from Voyant.