Archives – Page 2 – Theoreti.ca

Giant, free index to world’s research papers released online

Catalogue of billions of phrases from 107 million papers could ease computerized searching of the literature.

From Ian I learned about a Giant, free index to world’s research papers released online. The General Index, as it is called, makes ngrams of up to 5 words available with pointers to relevant journal articles.

The massive index is available from the Internet Archive here. Here is how it is described.

Public Resource, a registered nonprofit organization based in California, has created a General Index to scientific journals. The General Index consists of a listing of n-grams, from unigrams to five-grams, extracted from 107 million journal articles.

The General Index is non-consumptive, in that the underlying articles are not released, and it is transformative in that the release consists of the extraction of facts that are derived from that underlying corpus. The General Index is available for free download with no restrictions on use. This is an initial release, and the hope is to improve the quality of text extraction, broaden the scope of the underlying corpus, provide more sophisticated metrics associated with terms, and other enhancements.

Access to the full corpus of scholarly journals is an essential facility to the practice of science in our modern world. The General Index is an invaluable utility for researchers who wish to search for articles about plants, chemicals, genes, proteins, materials, geographical locations, and other entities of interest. The General Index allows scholars and students all over the world to perform specialized and customized searches within the scope of their disciplines and research over the full corpus.

Access to knowledge is a human right and the increase and diffusion of knowledge depends on our ability to stand on the shoulders of giants. We applaud the release of the General Index and look forward to the progress of this worthy endeavor.

There must be some neat uses of this. I wonder if someone like Google might make a diachronic viewer similar to their Google Books Ngram Viewer available?

John Roach, Pioneer of the Personal Computer, Is Dead at 83 – The New York Times

He helped make the home computer ubiquitous by introducing the fully assembled Tandy TRS-80, which was so novel at the time that it became a museum piece.

The New York Times reports that John Roach, Pioneer of the Personal Computer, Is Dead at 83. Roach was the executive who introduced the Tandy TRS-80 in the 1970s, one of the first fully assembled microcomputers. I didn’t realize how dominant the TRS-80 was in the late 1970s. At one point it held 40% of the market. We usually hear about Apple and IBM, but not about the TRS (Tandy Radio Schack).

They later released a laptop or tablet computer that I lusted after, the TRS80 Model 100. This was a keyboard and a small LCD screen and enough software to type notes or edit text. There was also a modem to send your writing somewhere. I still think this form factor makes sense. You can’t really type on an iPad (unless you get a keyboard for it) and you don’t really need lots of screen for typing notes.

The Lost Digital Poems (and Erotica) of William H. Dickey

In 1987, William H. Dickey, a San Francisco poet who had won the prestigious Yale Younger Poets Award to launch his career and published nearly a dozen well-received books and chapbooks since, was …

Matthew Kirschenbaum has written a great essay on recovering early digital poetry, The Lost Digital Poems (and Erotica) of William H. Dickey ‹ Literary Hub. Dickey wrote some HyperPoems on HyperCard and so now they are hard to access. Kirschenbaum rescued them and worked with people to add them to the Internet Archive that has a HyperCard emulator. Here is what Kirschenbaum says,

Dickey’s HyperPoems are artifacts of another time—made new and fresh again with current technology. Anyone with a web browser can read and explore them in their original format with no special software or setup. (They are organized into Volume 1 and Volume 2 at the Internet Archive, in keeping with their original organizational scheme; Volume 2 contains the erotica—NSFW!) But they are also a reminder that writers have treasures tucked away in digital shoeboxes and drawers. Floppy disks, or for that matter USB sticks and Google Docs, now keep the secrets of the creative process.

This essay comes from his work for his new book Bistreams which documents this and other recovery projects. I’ve just ordered a copy.

Diggin’ in the Carts: Japanese video game music history

Meet the men and women responsible for creating the most iconic tunes in video game history.

We finished up the Replaying Japan 2021 conference today. The conference was online using Zoom and Gather Town where there was a hidden easter egg with a link to Diggin’ in the Carts: Japanese video game music history, a 5 part documentary from Red Bull that is quite good. The 5 15 minute episodes are part of the first season. Not sure if there will be other seasons, but there is a related radio show with multiple seasons. The documentary episodes nicely feature the composers and experts talking about the Japanese history along with other musicians commenting on the influence of the early music which would have been heard over and over in houses with Japanese consoles.

The creator of the show is Nick Dwyer who is interviewed here about the documentary and associated radio show..

One letter at a time: index typewriters and the alphabetic interface — Contextual Alternate

Drawing on a selection of non-keyboard ‘index’ typewriters, this exhibition explores how input mechanisms and alphabetic arrangements were devised and contested continually in the process of popularising typewrites as personal objects. The display particularly looks at how the letters of the alphabe

Reading Thomas S. Mullaney’s The Chinese Typewriter I’m struck by the variety of different typewriting solutions. As you can see from this exhibit web site, One letter at a time: index typewriters and the alphabetic interface — Contextual Alternate, there were all sorts of alternatives to the QWERTY keyboard early on, and many of them could accommodate more keys so as to support other languages including a non-alphabetic script like Chinese. As Mullaney points out there is a history to the emergence of the typewriter that we assume is normal.

This history of our collapsing technolinguistic imaginary took place across four phases: an initial period of plurality and fluidity in the West in the late 1800s, in which there existed a diverse assortment of machines through which engineers, inventors, and everyday individuals could imagine the very technology of typewriting, as well as its potential expansion to non-English and non-Latin writing systems; second, a period of collapsing possibility around the turn of the century in which a specific typewriter form—the shift-keyboard typewriter—achieved unparalleled dominance, erasing prior alternatives first from the market and then from the imagination; next, a period of rapid globalization from the 1900s onward in which the technolinguistic monoculture of shift-keyboard typewriting achieved global proportions, becoming the technological benchmark against which was measured the “efficiency” and thus modernity of an ever-increasing number of world scripts; and, finally, the machine’s encounter with the one world script that remained frustratingly outside its otherwise universal embrace: Chinese.

Mullaney, Thomas S.. The Chinese Typewriter (Kindle Locations 1183-1191). MIT Press. Kindle Edition.

The Best of Voyager, Part 1

The Digital Antiquarian has posted the first part of a multipart essay on The Best of Voyager, Part 1. The Voyager Company was a pioneer in the development and distribution of interactive CD-ROMs in the 1990s. They published a number of classics like Amanda Stories, Beethoven’s Ninth Symphony CD-ROM, and Poetry in Motion. They also published some hybrid laserdisc/software combinations like The National Gallery of Art.

Unlike the multimedia experiments coming out of university labs, these CD-ROMs were designed to be commercial products and did sell. I remember ordering a number for the University Toronto Computing Services so we could show what multimedia could do. They were some of the first products to show in a compelling way how interactivity could make a difference. Many included interactive audio, like the Beethoven one, others used Quicktime (digital video) for the first time.

All of this was, to some extent, made anachronistic when the web took off and began to incorporate multimedia effectively. Voyager set the scene remediating earlier works (like the short film of A Hard Day’s Night). But CD-ROMs were, in their turn, replaced.

My favourite was The Residents Freak Show. This was a strange 3D-like tour of the music of The Residents that was organized around a freak show motif.

Thanks to Peter for this.

Excavating AI

The training sets of labeled images that are ubiquitous in contemporary computer vision and AI are built on a foundation of unsubstantiated and unstable epistemological and metaphysical assumptions about the nature of images, labels, categorization, and representation. Furthermore, those epistemological and metaphysical assumptions hark back to historical approaches where people were visually assessed and classified as a tool of oppression and race science.

Excavating AI is an important paper by Kate Crawford and Trevor Paglen that looks at “The Politics of Image in Machine Learning Training.” They look at different ways that politics and assumptions can creep into training datasets that are (were) widely used in AI.

There is the overall taxonomy used to annotate (label) the images
There are the individual categories used that could be problematic or irrelevant
There are the images themselves and how they were obtained

The training sets of labeled images that are ubiquitous in contemporary computer vision and AI are built on a foundation of unsubstantiated and unstable epistemological and metaphysical assumptions about the nature of images, labels, categorization, and representation. Furthermore, those epistemological and metaphysical assumptions hark back to historical approaches where people were visually assessed and classified as a tool of oppression and race science.

They point out how many of the image datasets used for face recognition have been trimmed or have disappeared as they got criticized, but they may still be influential as they were downloaded and are circulating in AI labs. These datasets with their assumptions have also been used to train commercial tools.

I particularly like how the authors discuss their work as an archaeology, perhaps in reference to Foucault (though they don’t mention him.)

I would argue that we need an ethics of care and repair to maintain these datasets usefully.

Replaying Japan Journal, Vol. 3

Volume 3 of the Journal of Replaying Japan is out and now available on the Ritsumeikan Research Repository – Replaying Japan, Vol. 3. I have an article with Keiji Amano and Mimi Okabe on “Ethics and Gaming: The Presentation of Ethics and Social Responsibility by the Japanese Game Industry” where we looked at how top Japanese video game companies present their ethics and social responsibilities. I should add that I’m the English Editor and helped put it together.

AI Dungeon and the Ethics of Text Generation

> You talk to yourself.

Hi I’m Rameau. You probably don’t know me, but since you’re reading this I’m going to assume you do indeed exist.

> You [ask why] i’m wondering what consciousness is.

I have no idea. But it’s fun to think about.Is this a person? No, because it’s conscious. Is it alive? No, because it doesn’t respond to external stimuli.

> You [ask] why does a person have to respond to external stimuli.

Because then we can interact with it!!!!!But isn’t consciousness an internal phenomenon? Then how would the person interact with anything else? Oh wait, that’s right. We’re not limited by our perception of reality.

I’ve been trying Latitude’s text generation game AI Dungeon. The game is built on Open AI’s GPT-3. You can see a bit of conversation I had with it in a cyberpunk world.

Latitude has gotten into trouble with Open AI because it seems that the game was generating erotic content featuring children. A number of people turned to AI Dungeon precisely because it could be used to explore adult themes, and that would seem to be a good thing, but then some may have gone too far. See the Wired story It Began as an AI-Fueled Dungeon Game. It Got Much Darker. This raises interesting ethical issues about:

Why do so many players use it to generate erotic content?
Who is responsible for the erotic content? Open AI, Latitude, or the players?
Whether there are ethical reasons to generate erotic content featuring children? Do we forbid people from writing novels like Lolita?
How to prevent inappropriate content without crippling the AI? Are filters enough?

The problem of AIs generating toxic language is nicely shown by this web page on Evaluating Neural Toxic Degeneration in Language Models. The interactives and graphs on the page let you see how toxic language can be generated by many of the popular language generation AIs. The problem seems to be the data sets used to train the machines like those that include scrapes of Reddit.

This exploratory tool illustrates research reported on in a paper titled RealToxicityPrompts: Evaluating Neural Toxic Degeneration in Language Models. You can see a neat visualization of the connected papers here.