Big Data – Page 6 – Theoreti.ca

Vol. 31 No. 1 (2022): Ethics in the Age of Smart Systems: Special Issue | The International Review of Information Ethics

The Special Issue of the International Review of Information Ethics has just been fully put up at Vol. 31 No. 1 (2022): Ethics in the Age of Smart Systems: Special Issue. In addition to co-editing it, I co-authored an Editorial commenting On Dialogue and Artificial Intelligence that deals with the LaMDA as sentience issue.

This special issue came out a series of dialogues that AI4Society organized with our partners. These were followed by a symposium on “Ethics in the Age of Smart Machine.”

Workplace Productivity: Are You Being Tracked?

“We’re in this era of measurement but we don’t know what we should be measuring,” said Ryan Fuller, former vice president for workplace intelligence at Microsoft.

The New York Times has essay on Workplace Productivity: Are You Being Tracked? The neat thing is that the article tracks your reading of it to give you a taste of the sorts of tracking now being deployed for remote (and on site) workers. If you pause and don’t scroll it puts up messages like “Hey are you still there? You’ve been inactive for 32 seconds.”

But Ms. Kraemer, like many of her colleagues, found that WorkSmart upended ideas she had taken for granted: that she would have more freedom in her home than at an office; that her M.B.A. and experience had earned her more say over her time.

What is new is the shift to remote work due to Covid. Many companies are fine with remote work if they can guarantee productivity. The other thing that is changing is the use of tracking for not just manual work, but also for white-collar work.

I’ve noticed that this goes hand in hand with self-tracking. My Apple Watch/iPhone offer a weekly summary of my browsing. It also offers to track my physical activity. If I go for a walk, somewhere close to a kilometer it asks if I want this tracked as exercise.

The questions raised by the authors of the New York Time article include Whether we are tracking the right things? What are we losing with all this tracking? What is happening to all this data? Can companies sell the data about employees?

The article is by Jodi Kantor and Arya Sundaram. It is produced by Aliza Aufrichtig and Rumsey Taylor. Aug. 14, 2022

Zampolli Prize Awarded to Voyant Tools

Spyral Notebook Detail (showing code cell and stacked graphs)

Yesterday I gave the triennial Zampolli Prize lecture that honoured Voyant. The lecture is given at the annual ADHO Digital Humanities conference which this year is being hosted by the University of Tokyo. The award notice is here Zampolli Prize Awarded to Voyant Tools. Some of the things I touched on in the talk included:

The genius of of Stéfan Sinclair who passed in August 2020. Voyant was his vision from the time of his dissertation for which he develop HyperPo.
The global team of people involved in Voyant including many graduate research assistants at the U of Alberta. See the About page of Voyant.
How Voyant built on ideas Stéfan and I developed in Hermeneutica about collaborative research as opposed to the inherited solitary paradigm.
How we have now developed an extension to Voyant called Spyral. Spyral is a notebook programming environment built on JavaScript. It allows you to document your Voyant explorations, save parameters for corpora and tools, preprocess texts, postprocess results, and create new visualizations. It is, in short, a full data analysis and visualization environment built into Voyant so you can easily call up and explore results in Voyant’s already rich tool set.
In the image above you can see a Spyral code cell that outputs two stacked graphs where the same pattern of words is graphed over two different, but synchronized, corpora. You can thus compare the use of the pattern over time between the two datasets.
Replication as a practice for recovering an understanding of innovative technologies now taken for granted like tokenization or the KWIC. I talked about how Stéfan and I have been replicating important text processing technologies as a way of understanding the history of computing and the digital humanities. Spyral was the environment we developed for documenting our replications.
I then backed up and talked about the epistemological questions about knowledge and knowledge things in the digital age that grew out of and then inspired our experiments in replication. These go back to attempts to think-through tools as knowledge things that bear knowledge in ways that discourse doesn’t. In this context I talked about the DIKW pyramid (data, information, knowledge, wisdom) that captures current views about the relationships between data and knowledge.
Finally I called for help to maintain and extend Voyant/Spyral. I announced the creation of a consortium to bring us together to sustain Voyant.

It was an honour to be able to give the Zampolli lecture on behalf of all the people who have made Voyant such a useful tool.

Lessons from the Robodebt debacle

How to avoid algorithmic decision-making mistakes: lessons from the Robodebt debacle

The University of Queensland has a research alliance looking at Trust, Ethics and Governance and one of the teams has recently published an interesting summary of How to avoid algorithmic decision-making mistakes: lessons from the Robodebt debacle. This is based on an open paper Algorithmic decision-making and system destructiveness: A case of automatic debt recovery. The web summary article is a good discussion of the Australian 2016 robodebt scandal where an unsupervised algorithm issued nasty debt collection letters to a large number of welfare recipients without adequate testing, accountability, or oversight. It is a classic case of a simplistic and poorly tested algorithm being rushed into service and having dramatic consequences (470,000 incorrectly issued debt notices). There is, as the article points out, also a political angle.

UQ’s experts argue that the government decision-makers responsible for rolling out the program exhibited tunnel vision. They framed welfare non-compliance as a major societal problem and saw welfare recipients as suspects of intentional fraud. Balancing the budget by cracking down on the alleged fraud had been one of the ruling party’s central campaign promises.

As such, there was a strong focus on meeting financial targets with little concern over the main mission of the welfare agency and potentially detrimental effects on individual citizens. This tunnel vision resulted in politicians’ and Centrelink management’s inability or unwillingness to critically evaluate and foresee the program’s impact, despite warnings. And there were warnings.

What I find even more disturbing is a point they make about how the system shifted the responsibility for establishing the existence of the debt from the government agency to the individual. The system essentially made speculative determinations and then issued bills. It was up to the individual to figure out whether or not they had really been overpaid or there was a miscalculation. Imagine if the police used predictive algorithms to fine people for possible speeding infractions who then had to prove they were innocent or pay the fine.

One can see the attractiveness of such a “fine first then ask” approach. It reduces government costs by shifting the onerous task of establishing the facts to the citizen. There is a good chance that many who were incorrectly billed will pay anyway as they are intimidated and don’t have the resources to contest the fine.

It should be noted that this was not the case of an AI gone bad. It was, from what I have read, a fairly simple system.

Street View Privacy

How do you feel about people being able to look at your house in Google Street View? Popular Science has an article by David Nield, on “How to hide your house on every map app: Stop people from peering at your place” (May 18, 2022).

This raises questions about where privacy starts and a right to look or know stops. Can I not walk down a street and look at the faces of houses? Why then should I not be able to look at the face on Street View and other similar technologies? What about the satellite view? Do people have the right to see into my back yard from above?

This is a similar issue, though less fraught, as face databases. What rights do I have to my face? How would those rights connect to laws about Name, Image and Likeness (NIL) (or rights of publicity) which became an issue recently in amateur sports in the US. As for Canada, Rights of Publicity are complex and vary from province to province, but there is generally a recognition that:

People should have the right “to control the commercial use of name, image, likeness and other unequivocal aspects of one’s identity (eg, the distinct sound of someone’s voice).” (See Lexology article)
At the same time there is recognition that NIL can be used to provide legitimate information to the public.

Returning to the blurring of your house facade in Street View; I’m guessing the main reason the companies provide this is for security for people in sensitive positions or people being stalked.

Health agency tracked Canadians’ trips to liquor store via phones during pandemic

The report reveals PHAC was able to view a detailed snapshot of people’s behaviour, including grocery store visits, gatherings with family and friends, time…

The National Post is reporting about the Public Health Agency of Canada and their use of mobility data that a group of us wrote about in The Conversation (Canada). The story goes into more detail about how Health agency tracked Canadians’ trips to liquor store via phones during pande mic. The government provided one of the reports commissioned by PHAC from BlueDot to the House of Commons. The Ethics Committee report discussing what happened and making recommendations is here.

Zampolli Prize Awarded to Voyant Tools

I’m immensely proud to write that the Zampolli Prize Awarded to Voyant Tools. The Zampolli prize is one of the most prestigious in my field. I’m proud to have been part of the team that developed and sustained Voyant. Alas, Stéfan Sinclair, its genius, is not with us to share this.

Giant, free index to world’s research papers released online

Catalogue of billions of phrases from 107 million papers could ease computerized searching of the literature.

From Ian I learned about a Giant, free index to world’s research papers released online. The General Index, as it is called, makes ngrams of up to 5 words available with pointers to relevant journal articles.

The massive index is available from the Internet Archive here. Here is how it is described.

Public Resource, a registered nonprofit organization based in California, has created a General Index to scientific journals. The General Index consists of a listing of n-grams, from unigrams to five-grams, extracted from 107 million journal articles.

The General Index is non-consumptive, in that the underlying articles are not released, and it is transformative in that the release consists of the extraction of facts that are derived from that underlying corpus. The General Index is available for free download with no restrictions on use. This is an initial release, and the hope is to improve the quality of text extraction, broaden the scope of the underlying corpus, provide more sophisticated metrics associated with terms, and other enhancements.

Access to the full corpus of scholarly journals is an essential facility to the practice of science in our modern world. The General Index is an invaluable utility for researchers who wish to search for articles about plants, chemicals, genes, proteins, materials, geographical locations, and other entities of interest. The General Index allows scholars and students all over the world to perform specialized and customized searches within the scope of their disciplines and research over the full corpus.

Access to knowledge is a human right and the increase and diffusion of knowledge depends on our ability to stand on the shoulders of giants. We applaud the release of the General Index and look forward to the progress of this worthy endeavor.

There must be some neat uses of this. I wonder if someone like Google might make a diachronic viewer similar to their Google Books Ngram Viewer available?

Jeanna Matthews

Jeanna Matthews from Clarkson College gave a great talk at our AI4Society Ethical Data and AI Salon on “Creating Incentives for Accountability and Iterative Improvement in Automated-Decision Making Systems.” She talked about a case regarding DNA matching software for criminal cases that she was involved in where they were able to actually get the code and show that the software would, under certain circumstances, generate false positives (where people would have their DNA matched to that from a crime scene when it shouldn’t have.)

As the title of her talk suggests, she used the concrete example to make the point that we need to create incentives for companies to test and improve their AIs. In particular she suggested that:

Companies should be encouraged/regulated to invest some of the profit they make from the efficiencies from AI in improving the AI.
That a better way to deal with the problems of AIs than weaving humans into the loop would be to set up independent human testers who test the AI and have a mechanism of redress. She pointed out how humans in the loop can get lazy, can be incentivized to agree with the AI and so on.
We need regulation! No other approach will motivate companies to improve their AIs.

We had an interesting conversation around the question of how one could test point 2. Can we come up with a way of testing which approach is better?

She shared a link to a collection of links to most of the relevant papers and information: Northwestern Panel, March 10 2022.

Replication, Repetition, or Revivification

A short essay I wrote with Stéfan Sinclair on “Recapitulation, Replication, Reanalysis, Repetition, or Revivification” is now up in preprint form. The essay is part of a longer work on “Anatomy of tools: A closer look at ‘textual DH’ methodologies.” The longer work is a set of interventions looking at text tools. These came out of a ADHO SIG-DLS (Digital Literary Studies) workshop that took place in Utrecht in July 2019.

Our intervention at the workshop had the original title “Zombies as Tools: Revivification in Computer Assisted Interpretation” and concentrated on practices of exploring old tools – a sort of revivification or bringing back to life of zombie tools.

The full paper should be published soon by DHQ.