Addressing the Alarming Systems of Surveillance Built By Library Vendors

The Scholarly Publishing and Academic Resources Coalition (SPARC) are drawing attention to how we need to be Addressing the Alarming Systems of Surveillance Built By Library Vendors. This was triggered by a story in The Intercept that LexisNexis (is) to provide (a) giant database of personal information to ICE

The company’s databases offer an oceanic computerized view of a person’s existence; by consolidating records of where you’ve lived, where you’ve worked, what you’ve purchased, your debts, run-ins with the law, family members, driving history, and thousands of other types of breadcrumbs, even people particularly diligent about their privacy can be identified and tracked through this sort of digital mosaic. LexisNexis has gone even further than merely aggregating all this data: The company claims it holds 283 million distinct individual dossiers of 99.99% accuracy tied to “LexIDs,” unique identification codes that make pulling all the material collected about a person that much easier. For an undocumented immigrant in the United States, the hazard of such a database is clear. (The Intercept)

That LexisNexis has been building databases on people isn’t new. Sarah Brayne has a book about predictive policing titled Predict and Surveil where, among other things, she describes how the LAPD use Palantir and how police databases integrated in Palantir are enhanced by commercial databases like those sold by LexisNexis. (There is an essay that is an excerpt of the book here, Enter the Dragnet.)

I suspect environments like Palantir make all sorts of smaller and specialized databases more commercially valuable which is leading what were library database providers to expand their business. Before, a database about repossessions might be of interest to only a specialized community. Now it becomes linked to other information and is another dimension of data. In particular these databases provide information about all the people who aren’t in police databases. They provide the breadcrumbs needed to surveil those not documented elsewhere.

The SPARC call points out that we (academics, university libraries) have been funding these database providers. 

Dollars from library subscriptions, directly or indirectly, now support these systems of surveillance. This should be deeply concerning to the library community and to the millions of faculty and students who use their products each day and further underscores the urgency of privacy protections as library services—and research and education more generally—are now delivered primarily online.

This raises the question of our complicity and whether we could do without some of these companies. At a deeper level it raises questions about the curiosity of the academy. We are dedicated to knowledge as an unalloyed good and are at the heart of a large system of surveillance – surveillance of the past, of literature, of nature, of the cosmos, and of ourselves.

A Digital Project Handbook

A peer-reviewed, open resource filling the gap between platform-specific tutorials and disciplinary discourse in digital humanities.

From a list I am on I learned about Visualizing Objects, Places, and Spaces: A Digital Project Handbook. This is a highly modular text book that covers a lot of the basics about project management in the digital humanities. They have a call now for “case studies (research projects) and assignments that showcase archival, spatial, narrative, dimensional, and/or temporal approaches to digital pedagogy and scholarship.” The handbook is edited by Beth Fischer (Postdoctoral Fellow in Digital Humanities at the Williams College Museum of Art) and Hannah Jacobs (Digital Humanities Specialist, Wired! Lab, Duke University), but parts are authored by all sorts of people.

What I like about it is the way they have split up the modules and organized things by the type of project. They also have deadlines which seem to be for new iterations of materials and for completion of different parts. This could prove to be a great resource for teaching project management.

An Anecdoted Topography of Chance

Following a rambling conversation with his friend Robert Filliou, Daniel Spoerri one day mapped the objects lying at random on the table in his room, adding a rigorously scientific description of each. These objects subsequently evoked associations, memories and anecdotes from both the original author and his friends …

I recently bought a copy of Spoerri and friend’s artist’s book, An Anecdoted Topography of Chance. The first edition dates from 1966, but that was based on a version that passed as the catalogue for an exhibition by Spoerri in 1962. This 2016 version has a footnote to the title (in the lower right of the cover) that reads,

* Probably definitive re-anecdoted version

The work is essentially a collection of annotations to a map of the dishes and other things that were on Spoerri’s sideboard in his apartment. You start with the map, that looks like an archaeological diagram, and follow anecdotes about the items that are, in turn, commented on by the other authors. Hypertext before hypertext.

While the work seems to have been driven by the chance items on the small table, there is also an autobiographical element where these items give the authors excuses to tell about their intersecting lives.

I wonder if this would be an example of a work of art of information.

TEXT-MODE: Tumblr about text art

“A dude”, 1886. Published in the poetry section of the January issue of The Undergraduate, Middlebury’s newspaper.

From Pinterest I came across this great tumblr called Text Mode gathers “A collection of text graphics and related works, stretching back thousands of years.” Note the image above of a visual poem about “A Dude” from 1886. Included are all sorts of examples from typewriter art to animations to historical emoticons.

Sean Gouglas Remembers Stéfan Sinclair

Sean Gouglas shared these memories of Stéfan Sinclair with me and asked me to post them. They are from when they started the Humanities Computing programme at the University of Alberta where I am lucky to now teach.

In the summer of 2001, two newly-minted PhDs started planning how they were going to build and then teach a new graduate program in Humanities Computing at the University of Alberta. This was the first such program in North America. To be absolutely honest, Stéfan Sinclair and I really had no idea what we were doing. The next few months were both exhausting and exhilarating. Working with Stéfan was a professional and personal treat, especially considering that he had an almost infinite capacity for hard work. I remember him coding up the first Humanities Computing website in about seven minutes — the first HuCo logo appearing like a rising sun on a dark blue background. It also had an unfortunate typo that neither of us noticed for years. 

It was an inspiration to work with Stéfan. He was kind and patient with students, demanding a lot from them but giving even more back. He promoted the program passionately at every conference, workshop, and seminar. Over the next three years, there was a lot of coffee, a lot of spicy food, a beer or two, some volleyball, some squash, and then he and Stephanie were off to McMaster for their next adventure. 

Our Digital Humanities program has changed a lot since then — new courses, new programs, new faculty, and even a new name. Through that change, the soul of the program remained the same and it was shaped and molded by the vision and hard work of Stéfan Sinclair. 

On the 6th of August, Stéfan died of cancer. The Canadian Society for Digital Humanities has a lovely tribute, which can be found here: https://csdh-schn.org/stefan-sinclair-in-memoriam/. It was written in part by Geoffrey Rockwell, who worked closely with Stéfan for more than two decades. 

Celebrating Stéfan Sinclair: A Dialogue from 2007

Sadly, last Thursday Stéfan Sinclair passed away. A group of us posted an obituary for CSDH-SCHN here,  Stéfan Sinclair, In Memoriam and boy do I miss him already. While the obituary describes the arc of his career I’ve been trying to think of how to celebrate how he loved to play with ideas and code. The obituary tells the what of his life but doesn’t show the how.

You see, Stéfan loved to toy with ideas of text through the development of software toys. The hermeneuti.ca project started with a one day text analysis vacation/hackathon. We decided to leave all the busy work of being an academic in our offices, and spend a day in the TAPoR lab at McMaster. We decided to mess around and try the analytical equivalent of extreme programming. That included a version of “pair programming” where we alternated one at the keyboard doing the analysis while the other would take notes and direct. We told ourselves we would just devote one day without interruptions to this folly and see if together we could take a project from conception to some sort of finished result in a day.

Little did we know we would still be at play right until a few weeks ago. We failed to finish that day, but we got far enough to know we enjoyed the fooling around enough to do it again and again. Those escapes into what we later called agile hermeneutics, to give it a serious name, eventually led to a monster of a project that reflected back on the play. The project culminated in the jointly authored book Hermeneutica (MIT Press, 2016) and Voyant 2.0, both of which tried to not only think-through some of the potential of the play, but also give others a way of making their own interpretative toys (which we called hermeneutica). But these too are perhaps too serious to commemorate Stéfan’s presence.

Which brings me to the dialogue we wrote and performed on “Reading Tools.” Thanks to Susan I was reminded of this script that we acted out at the University of Illinois, Urbana-Champaign in June of 2007. May it honour how Stéfan would want to be remembered. Imagine him smiling at the front of the room as he starts,

Sinclair: Why do we care so much for the opinions of other humanists? Why do we care so much whether they use computing in the humanities?

Rockwell: Let me tell you an old story. There was once a titan who invented an interpretative technology for his colleagues. No, … he wasn’t chained to a rock to have his liver chewed out daily. … Instead he did the smart thing and brought it to his dean, convinced the technology would free his colleagues from having to interpret texts and let them get back to the real work of thinking.

Sinclair: I imagine his dean told him that in the academy those who develop tools are not the best judges of their inventions and that he had to get his technology reviewed as if it were a book.

Rockwell: Exactly, and the dean said, “And in this instance, you who are the father of a text technology, from a paternal love of your own children have been led to attribute to them a quality which they cannot have; for this discovery of yours will create forgetfulness in the learners’ souls, because they will not study the old ways; they will trust to the external tools and not interpret for themselves. The technology which you have discovered is an aid not to interpretation, but to online publishing.”

Sinclair: Yes, Geoffrey, you can easily tell jokes about the academy, paraphrasing Socrates, but we aren’t outside the city walls of Athens, but in the middle of Urbana at a conference. We have a problem of audience – we are slavishly trying to please the other – that undigitized humanist – why don’t we build just for ourselves? …

Enjoy the full dialogue here: Reading Tools Script (PDF).

OSS advise on how to sabotage organizations or conferences

On Twitter someone posted a link to a 1944 OSS Simple Sabotage Field Manual. This includes simple, but brilliant advice on how to sabotage organizations or conferences.

This sounds a lot like what we all do when we academics normally do as a matter of principle. I particularly like the advice to “Make ‘speeches.'” I imagine many will see themselves in their less cooperative moments in this list of actions or their committee meetings.

The OSS (Office of Strategic Services) was the US office that turned into the CIA.

The Useless Web

The Useless Web Button… just press it, and find where it takes you.

Bettina pointed me to this The Useless Web site. It sends you to a useless we site. Examples include The Passive Aggressive Password Machine and Always Judge a Book by its Cover which shows real books with ridiculous titles (go ahead, follow the link and see if you agree.)

My question is whether the The Useless Web Button is one of the sites that you could be taken too?

Documenting the Now (and other social media tools/services)

Documenting the Now develops tools and builds community practices that support the ethical collection, use, and preservation of social media content.

I’ve been talking with the folks at MassMine (I’m on their Advisory Board) about tools that can gather information off the web and I was pointed to the Documenting the Now project that is based at the University of Maryland and the University of Virginia with support from Mellon. DocNow have developed tools and services around documenting the “now” using social media. DocNow itself is an “appraisal” tool for twitter archiving. They then have a great catalog of twitter archives they and others have gathered which looks like it would be great for teaching.

MassMine is at present a command-line tool that can gather different types of social media. They are building a web interface version that will make it easier to use and they are planning to connect it to Voyant so you can analyze results in Voyant. I’m looking forward to something easier to use than Python libraries.

Speaking of which, I found a TAGS (Twitter Archiving Google Sheet) which is a plug-in for Google Sheets that can scrape smaller amounts of Twitter. Another accessible tool is Octoparse that is designed to scrape different database driven web sites. It is commercial, but has a 14 day trial.

One of the impressive features of Documenting the Now project is that they are thinking about the ethics of scraping. They have a Social Labels set for people to indicate how data should be handled.

MIT apologizes, permanently pulls offline huge dataset that taught AI systems to use racist, misogynistic slurs

Vinay Prabhu, chief scientist at UnifyID, a privacy startup in Silicon Valley, and Abeba Birhane, a PhD candidate at University College Dublin in Ireland, pored over the MIT database and discovered thousands of images labelled with racist slurs for Black and Asian people, and derogatory terms used to describe women. They revealed their findings in a paper undergoing peer review for the 2021 Workshop on Applications of Computer Vision conference.

Another one of those “what were they thinking when they created the dataset stories” from The Register tells about how MIT apologizes, permanently pulls offline huge dataset that taught AI systems to use racist, misogynistic slurs. The MIT Tiny Images dataset was created automatically using scripts that used the WordNet database of terms which itself held derogatory terms. Nobody thought to check either the terms taken from WordNet or the resulting images scoured from the net. As a result there are not only lots of images for which permission was not secured, but also racists, sexist, and otherwise derogatory labels on the images which in turn means that if you train an AI on these it will generate racist/sexist results.

The article also mentions a general problem with academic datasets. Companies like Facebook can afford to hire actors to pose for images and can thus secure permissions to use the images for training. Academic datasets (and some commercial ones like the Clearview AI  database) tend to be scraped and therefore will not have the explicit permission of the copyright holders or people shown. In effect, academics are resorting to mass surveillance to generate training sets. One wonders if we could crowdsource a training set by and for people?