Internet Archive closes the National Emergency Library

Within a few days of the announcement that libraries, schools and colleges across the nation would be closing due to the COVID-19 global pandemic, we launched the temporary National Emergency Library to provide books to support emergency remote teaching, research activities, independent scholarship, and intellectual stimulation during the closures.  […]

According to the Internet Archive blog the Temporary National Emergency Library to close 2 weeks early, returning to traditional controlled digital lending. The National Emergency Library (NEL) was open to anyone in the world during a time when physical libraries were closed. It made books the IA had digitized available to read online. It was supposed to close at the end of June because four commercial publishers decided to sue. 

The blog entry points to what the HathiTrust is doing as part of their Emergency Temporary Access Service which lets libraries that are members (and the U of Alberta Library is one) provide access to digital copies of books they have corresponding physical copies of. This is only available to “member libraries that have experienced unexpected or involuntary, temporary disruption to normal operations, requiring it to be closed to the public”. 

It is a pity the IS NEL was discontinued, for a moment there it looked like large public service digital libraries might become normal. Instead it looks like we will have a mix of commercial e-book services and Controlled Digital Lending (CDL) offered by libraries that have the physical books and the digital resources to organize it. The IA blog entry goes on to note that even CDL is under attack. Here is a story from Plagiarism Today:

Though the National Emergency Library may have been what provoked the lawsuit, the complaint itself is much broader. Ultimately, it targets the entirety of the IA’s digital lending practices, including the scanning of physical books to create digital books to lend.

The IA has long held that its practices are covered under the concept of controlled digital lending (CDL). However, as the complaint notes, the idea has not been codified by a court and is, at best, very controversial. According to the complaint, the practice of scanning a physical book for digital lending, even when the number of copies is controlled, is an infringement.

Obscure Indian cyber firm spied on politicians, investors worldwide

A cache of data reviewed by Reuters provides insight into the operation, detailing tens of thousands of malicious messages designed to trick victims into giving up their passwords that were sent by BellTroX between 2013 and 2020.

It was bound to happen. Reuters has an important story that an  Obscure Indian cyber firm spied on politicians, investors worldwide. The firm, BellTroX InfoTech Services, offered hacking services to private investigators and others. While we focus on state-sponsored hacking and misinformation there is a whole murky world of commercial hacking going on.

The Citizen Lab played a role in uncovering what BellTroX was doing. They have a report here about Dark Basin, a hacking-for-hire outfit, that they link to BellTroX. The report is well worth the read as it details the infrastructure uncovered, the types of attacks, and the consequences.

The growth of a hack-for-hire industry may be fueled by the increasing normalization of other forms of commercialized cyber offensive activity, from digital surveillance to “hacking back,” whether marketed to private individuals, governments or the private sector. Further, the growth of private intelligence firms, and the ubiquity of technology, may also be fueling an increasing demand for the types of services offered by BellTroX. At the same time, the growth of the private investigations industry may be contributing to making such cyber services more widely available and perceived as acceptable.

They conclude that the growth of this industry is a threat to civil society.

What is it became so affordable and normalized that any unscrupulous person could hire hackers to harass an ex-girlfriend or neighbour?

Introducing the AI4Society Signature Area

AI4Society will provide institutional leadership in this exciting area of teaching, research, and scholarship.

The Quad has a story Introducing the AI4Society Signature Area. Artificial Intelligence for Society is a University of Alberta Signature Area that brings together researchers and instructors from both the sciences and the arts. AI4S looks at how AI can be imagined, designed, and tested so that it serves society. I’m lucky to contribute to this Area as the Associate Director, working with the Director, Eleni Stroulia from Computing Science.

Knowledge is a commons – Pour des savoirs en commun

The Canadian Comparative Literature Association (CCLA/ACLC) celebrated in 2019 its fiftieth anniversary. The association’s annual conference, which took place from June 2 to 5, 2019 as part of the Congress of the Humanities and Social Sciences of Canada at UBC (Vancouver), provided an opportunity to reflect on the place of comparative literature in our institutions. We organized a joint bilingual roundtable bringing together comparatists and digital humanists who think and put in place collaborative editorial practices. Our goal was to foster connections between two communities that ask similar questions about the modalities for the creation, dissemination and legitimation of our research. We wanted our discussions to result in a concrete intervention, thought and written collaboratively and demonstrating what comparative literature promotes. The manifesto you will read, “Knowledge is a commons – Pour des savoirs en commun”, presents the outcome of our collective reflexion and hopes to be the point of departure for more collaborative work.

Thanks to a panel on the Journal in the digital age at CSDH-SCHN 2020 I learned about the manifesto, Knowledge is a commons – Pour des savoirs en commun. The manifesto was “written colingually, that is, alternating between English and French without translating each element into both languages. This choice, which might appear surprising, puts into practice one of our core ideas: the promotion of active and fluid multilingualism.” This is important.

The manifesto makes a number of important points which I summarize in my words:

  • We need to make sure that knowledge is truly made public. It should be transparent, open and reversible (read/write).
  • We have to pay attention to the entire knowledge chain of research to publication and rebuild it in its entirety so as to promote access and inclusion.
  • The temporalities, spaces, and formats of knowledge making matter. Our tools and forms like our thought should be fluid and plural as they can structure our thinking.
  • We should value the collectives that support knowledge-making rather than just authoritative individuals and monolithic texts. We should recognize the invisible labourers and those who provide support and infrastructure.
  • We need to develop inclusive circles of conversation that cross boundaries. We need an ethics of open engagement.
  • We should move towards an active and fluid multilingualism (of which the manifesto is an example.)
  • Writing is co-writing and re-writing and writing beyond words. Let’s recognize a plurality of writing practices.

 

“Excellence R Us”: university research and the fetishisation of excellence

Is “excellence” really the most efficient metric for distributing the resources available to the world’s scientists, teachers, and scholars? Does “excellence” live up to the expectations that academic communities place upon it? Is “excellence” excellent? And are we being excellent to each other in using it?

During the panel today on Journals in the digital age: penser de nouveaux modèles de publication en sciences humaines at CSDH-SCHN 2020 someone linked to an essay on  “Excellence R Us”: university research and the fetishisation of excellence in Palgrave Communications (2017). The essay does what should have been done some time ago, it questions the excellence of “excellence” as a value for everything in universities. The very overuse of “excellence” has devalued the concept. Surely much of what we do these days is “good enough” especially as our budgets are cut and cut.

The article has three major parts:

  • Rhetoric of excellence – it looks at how there is little consensus around what excellence between disciplines. Within disciplines it is negotiated and can become conservative.
  • Is “excellence” good for research – the second section argues that there is little correlation between forms of excellence review and long term metrics. They go on to outline some of the unfortunate side-effects of the push for excellence; how it can distort research and funding by promoting competition rather than collaboration. They also talk about how excellence disincentivizes replication – who wants to bother with replication if
  • Alternative narratives – the third section looks at alternative ways of distributing funding. They discuss looking at “soundness” and “capacity” as an alternatives to the winner-takes-all of excellence.

So much more could and should be addressed on this subject. I have often wondered about the effect of the success rates in grant programmes (percentage of applicants funded). When the success rate gets really low, as it is with many NEH programmes, it almost becomes a waste of time to apply and superstitions about success abound. SSHRC has healthier success rates that generally ensure that most researchers gets funded if they persist and rework their proposals.

Hypercompetition in turn leads to greater (we might even say more shameless …) attempts to perform this “excellence”, driving a circular conservatism and reification of existing power structures while harming rather than improving the qualities of the underlying activity.

Ultimately the “adjunctification” of the university, where few faculty get tenure, also leads to hypercompetition and an impoverished research environment. Getting tenure could end up being the most prestigious (and fundamental) of grants – the grant of a research career.

 

Google Developers Blog: Text Embedding Models Contain Bias. Here’s Why That Matters.

Human data encodes human biases by default. Being aware of this is a good start, and the conversation around how to handle it is ongoing. At Google, we are actively researching unintended bias analysis and mitigation strategies because we are committed to making products that work well for everyone. In this post, we’ll examine a few text embedding models, suggest some tools for evaluating certain forms of bias, and discuss how these issues matter when building applications.

On the Google Developvers Blog there is an interesting post on Text Embedding Models Contain Bias. Here’s Why That Matters. The post talks about a technique for using Word Embedding Association Tests (WEAT) to see compare different text embedding algorithms. The idea is to see whether groups of words like gendered words associate with positive or negative words. In the image above you can see the sentiment bias for female and male names for different techniques.

While Google is working on WEAT to try to detect and deal with bias, in our case this technique could be used to identify forms of bias in corpora.

The Viral Virus

Graph of word "test*" over time
Relative Frequency of word “test*” over time

Analyzing the Twitter Conversation Surrounding COVID-19

From Twitter I found out about this excellent visual essay on The Viral Virus by Kate Appel from May 6, 2020. Appel used Voyant to study highly retweeted tweets from January 20th to April 23rd. She divided the tweets into weeks and then used the distinctive words (tf-idf) tool to tell a story about the changing discussion about Covid-19. As you scroll down you see lists of distinctive words and supporting images. At the end she shows some of the topics gained from topic modelling. It is a remarkably simple, but effective use of Voyant.

COVID-19 contact tracing reveals ethical tradeoffs between public health and privacy

Michael Brown has written a nice article in the U of Alberta folio on COVID-19 contact tracing reveals ethical tradeoffs between public health and privacyThe article quotes me extensively on the subject of the ethics of these new bluetooth contact tracing tools. In the interview I tried the emphasize the importance of knowledge and consent.

  • Users of these apps should know that they are being traced through them, and
  • Users should consent to their use.

There are a variety of these apps from the system pioneered by Singapore called TraceTogether to its Alberta cousin ABTraceTogether. There are also a variety of approaches to tracing people from using credit card records to apps like TraceTogether. The EFF has a good essay on Protecting Civil Rights During a Public Health Crisis that I adapt here to provide guidelines for when one might gather data without knowledge or consent:

  • Medically necessary: There should be a clear and compelling explanation as to how this will save lives.
  • Personal information proportionate to need: The information gathered should fit the need and go no further.
  • Information handled by health informatics specialists: The gathering and processing should be handled by health informatics units, not signals intelligence or security services.
  • Deleted: It should be deleted once it is no longer needed.
  • Not be organized due to vulnerable demographics: The information should not be binned according to stereotypical or vulnerable demographics unless there is a compelling need. We should be very careful that we don’t use the data to further disadvantage groups.
  • Use reviewed afterwards: The should be a review after the crisis is over.
  • Transparency: Government should transparent about what they are gathering and why.
  • Due process: There should be open processes for people to challenge the gathering of their information or to challenge decisions taken as a result of such information.

Robots Welcome to Take Over, as Pandemic Accelerates Automation – The New York Times

But labor and robotics experts say social-distancing directives, which are likely to continue in some form after the crisis subsides, could prompt more industries to accelerate their use of automation. And long-simmering worries about job losses or a broad unease about having machines control vital aspects of daily life could dissipate as society sees the benefits of restructuring workplaces in ways that minimize close human contact.

The New York Times has a story pointing out that The Robots Welcome to Take Over, as Pandemic Accelerates Automation. While AI may not be that useful in making the crisis decisions, robots (and the AIs that drive them) can take over certain jobs that need doing, but which are dangerous to humans in a time of pandemic. Sorting trash is one example given. Cleaning spaces is another.

We can imagine a dystopia where everything can run just fine with social (physical) distancing. Ultimately humans would only do the creative intellectual work as imagined in Forester’s The Machine Stops (from 1909!) We would entertain each other with solitary interventions, or at least works that can be made with the artists far apart. Perhaps green-screen technology and animation will let us even act alone and be composited together into virtual crowds.

Digitization in an Emergency: Fair Use/Fair Dealing and How Libraries Are Adapting to the Pandemic

In response to unprecedented exigencies, more systemic solutions may be necessary and fully justifiable under fair use and fair dealing. This includes variants of controlled digital lending (CDL), in which books are scanned and lent in digital form, preserving the same one-to-one scarcity and time limits that would apply to lending their physical copies. Even before the new coronavirus, a growing number of libraries have implemented CDL for select physical collections.

The Association of Research Libraries has a blog entry on Digitization in an Emergency: Fair Use/Fair Dealing and How Libraries Are Adapting to the Pandemic by Ryan Clough (April 1, 2020) with good links. The closing of the physical libraries has accelerated a process of moving from a hybrid of physical and digital resources to an entirely digital library. Controlled digital lending (where only a limited number of patrons can read an digital asset at a time) seems a sensible way to go.

To be honest, I am so tired of sitting on my butt that I plan to spend much more time walking to and browsing around the library at the University of Alberta. As much as digital access is a convenience, I’m missing the occasions for getting outside and walking that a library affords. Perhaps we should think of the library as a labyrinth – something deliberately difficult to navigate in order to give you an excuse to walk around.

Perhaps I need a book scanner on a standing desk at home to keep me on my feet.