Giant, free index to world’s research papers released online

Catalogue of billions of phrases from 107 million papers could ease computerized searching of the literature.

From Ian I learned about a Giant, free index to world’s research papers released online. The General Index, as it is called, makes ngrams of up to 5 words available with pointers to relevant journal articles.

The massive index is available from the Internet Archive here. Here is how it is described.

Public Resource, a registered nonprofit organization based in California, has created a General Index to scientific journals. The General Index consists of a listing of n-grams, from unigrams to five-grams, extracted from 107 million journal articles.

The General Index is non-consumptive, in that the underlying articles are not released, and it is transformative in that the release consists of the extraction of facts that are derived from that underlying corpus. The General Index is available for free download with no restrictions on use. This is an initial release, and the hope is to improve the quality of text extraction, broaden the scope of the underlying corpus, provide more sophisticated metrics associated with terms, and other enhancements.

Access to the full corpus of scholarly journals is an essential facility to the practice of science in our modern world. The General Index is an invaluable utility for researchers who wish to search for articles about plants, chemicals, genes, proteins, materials, geographical locations, and other entities of interest. The General Index allows scholars and students all over the world to perform specialized and customized searches within the scope of their disciplines and research over the full corpus.

Access to knowledge is a human right and the increase and diffusion of knowledge depends on our ability to stand on the shoulders of giants. We applaud the release of the General Index and look forward to the progress of this worthy endeavor.

There must be some neat uses of this. I wonder if someone like Google might make a diachronic viewer similar to their Google Books Ngram Viewer available?

Jeanna Matthews 

Jeanna Matthews from Clarkson College gave a great talk at our AI4Society Ethical Data and AI Salon on “Creating Incentives for Accountability and Iterative Improvement in Automated-Decision Making Systems.” She talked about a case regarding DNA matching software for criminal cases that she was involved in where they were able to actually get the code and show that the software would, under certain circumstances, generate false positives (where people would have their DNA matched to that from a crime scene when it shouldn’t have.)

As the title of her talk suggests, she used the concrete example to make the point that we need to create incentives for companies to test and improve their AIs. In particular she suggested that:

  1. Companies should be encouraged/regulated to invest some of the profit they make from the efficiencies from AI in improving the AI.
  2. That a better way to deal with the problems of AIs than weaving humans into the loop would be to set up independent human testers who test the AI and have a mechanism of redress. She pointed out how humans in the loop can get lazy, can be incentivized to agree with the AI and so on.
  3. We need regulation! No other approach will motivate companies to improve their AIs.

We had an interesting conversation around the question of how one could test point 2. Can we come up with a way of testing which approach is better?

She shared a link to a collection of links to most of the relevant papers and information: Northwestern Panel, March 10 2022.

Replication, Repetition, or Revivification

A short essay I wrote with Stéfan Sinclair on “Recapitulation, Replication, Reanalysis, Repetition, or Revivification” is now up in preprint form. The essay is part of a longer work on “Anatomy of tools: A closer look at ‘textual DH’ methodologies.” The longer work is a set of interventions looking at text tools. These came out of a ADHO SIG-DLS (Digital Literary Studies) workshop that took place in Utrecht in July 2019.

Our intervention at the workshop had the original title “Zombies as Tools: Revivification in Computer Assisted Interpretation” and concentrated on practices of exploring old tools – a sort of revivification or bringing back to life of zombie tools.

The full paper should be published soon by DHQ.

Ottawa’s use of our location data raises big surveillance and privacy concerns

In order to track the pandemic, the Public Health Agency of Canada has been using location data without explicit and informed consent. Transparency is key to building and maintaining trust.

The Conversation has just published an article on  Ottawa’s use of our location data raises big surveillance and privacy concerns. This was written with a number of colleagues who were part of a research retreat (Dagstuhl) on Mobility Data Analysis: from Technical to Ethical.

We are at a moment when ethical principles are really not enough and we need to start talking about best practices in order to develop a culture of ethical use of data.

The Future of Digital Assistants Is Queer

AI assistants continue to reinforce sexist stereotypes, but queering these devices could help reimagine their relationship to gender altogether.

Wired has a nice article on how the The Future of Digital Assistants Is Queer. The article looks at the gendering of virtual assistants like Siri and how it is not enough to just offer male voices, but we need to queer the voices. It mentions the ethical issue of how voice conveys information like whether the VA is a bot or not.

The Proliferation of AI Ethics Principles: What’s Next?

The Proliferation of AI Ethics Principles: What’s Next?

The Montreal AI Ethics Institute has republished a nice article by Ravit Dotan, The Proliferation of AI Ethics Principles: What’s Next? Dotan starts by looking at some of the meta studies and then goes on to argue that we are unlikely to ever come up with a “unique set of core AI principles”, nor should we want to. She points out the lack of diversity in the sets we have. Different types of institutions will need different types of principles. She ends with these questions:

How do we navigate the proliferation of AI ethics principles? What should we use for regulation, for example? Should we seek to create new AI ethics principles which incorporate more perspectives? What if it doesn’t result in a unique set of principles, only increasing the multiplicity of principles? Is it possible to develop approaches for AI ethics governance that don’t rely on general AI ethics principles?

I am personally convinced that a more fruitful way forward is to start trading stories. These stories could take the form of incidents or cases or news or science fiction or even AI generated stories. We need to develop our ethical imagination. Hero Laird made this point in a talk on AI, Ethics and Law that was part of a salon we organize at AI4Society. They quoted from Thomas King’s The Truth About Stories to the effect that,

The truth about stories is that that’s all we are.

What stories do artificial intelligences tell themselves?

Artificial Intelligence Incident Database

I discovered the Artificial Intelligence Incident Database developed by the Partnership on AI. The Database contains reports on things that have gone wrong with AIs like the Australian Centerlink robodebt debacle.

The Incident Database was developed to help educate developers and encourage learning from mistakes. They have posted a paper to arXiv on Preventing Repeated Real World AI Failures by Cataloging Incidents: The AI Incident Database.

Ask Delphi

Delphi Screen Shot

Ask Delphi is an intriguing AI that you can use to ponder ethical questions. You type in a situation and it will tell you if it is morally acceptable or not. It is apparently built not on Reddit data, but on crowdsourced data, so it shouldn’t be as easy to provoke into giving toxic answers.

In their paper, Delphi: Towards Machine Ethics and Norms they say that they have created a Commonsense Norm Bank, “a collection of 1.7M ethical judgments on diverse real-life situations.” This contributes to Delphi’s sound pronouncements, but it doesn’t seem available for others yet.

AI Weirdness has a nice story on how she fooled Delphi.

Emojify: Scientists create online games to show risks of AI emotion recognition

Public can try pulling faces to trick the technology, while critics highlight human rights concerns

From the Guardian story, Scientists create online games to show risks of AI emotion recognition, I discovered Emojify, a web site with some games to show how problematic emotion detection is. Researchers are worried by the booming business of emotion detection with artificial intelligence. For example, it is being used in education in China. See the CNN story about how In Hong Kong, this AI reads children’s emotions as they learn.

A Hong Kong company has developed facial expression-reading AI that monitors students’ emotions as they study. With many children currently learning from home, they say the technology could make the virtual classroom even better than the real thing.

With cameras all over, this should worry us. We are not only be identified by face recognition, but now they want to know our inner emotions too. What sort of theory of emotions licenses these systems?

Why people believe Covid conspiracy theories: could folklore hold the answer?

Using Danish witchcraft folklore as a model, the researchers from UCLA and Berkeley analysed thousands of social media posts with an artificial intelligence tool and extracted the key people, things and relationships.

The Guardian has a nice story on Why people believe Covid conspiracy theories: could folklore hold the answer? This reports on research using folklore theory and artificial intelligence to understand conspiracies.

The story maps how Bill Gates connects the coronavirus with 5G for conspiracy fans. They use folklore theory to understand the way conspiracies work.

Folklore isn’t just a model for the AI. Tangherlini, whose specialism is Danish folklore, is interested in how conspiratorial witchcraft folklore took hold in the 16th and 17th centuries and what lessons it has for today.

Whereas in the past, witches were accused of using herbs to create potions that caused miscarriages, today we see stories that Gates is using coronavirus vaccinations to sterilise people. …

The research also hints at a way of breaking through conspiracy theory logic, offering a glimmer of hope as increasing numbers of people get drawn in.

The story then addresses the question of what difference the research might make. What good would a folklore map of a conspiracy theory do? The challenge of research is the more information clearly doesn’t work in a world of information overload.

The paper the story is based on is Conspiracy in the time of corona: automatic detection of emerging Covid-19 conspiracy theories in social media and the news, by Shadi Shahsavari, Pavan Holur, Tianyi Wang , Timothy R Tangherlini and Vwani Roychowdhury.