The Dictionary of Words in the Wild

Image of Word Cloud The Dictionary of Words in the Wild is an experiment in public textuality that I’m leading. Andrew MacDonald has done the programming and is contributing images (along with others). You can get an account and upload pictures of words or phrases. We have an application programming interface that you can use to then create web applications that call the dictionary. Join, sample, load! We need pictures.

Try a phrase:


James pointed me to a similar experiment, The Visual Dictionary – a visual exploration of words in the real world. This focuses on single words and has a ranking/rating system. It doesn’t, however, have the API we have. I wonder how we can interoperate? Can such dictionaries be a movement?

Meditation on Electronic Tools

TAPoR Try It

A tool would have a handle with grooves to hold tight. It’s easy to swing into place.

List Words Results

It would have an inhuman steel end. An end unlike my soft flesh. Perhaps the nail dead at the end of the digit.

Tool Broker

Googlizer Results

A tool scratches out its world. A tool outreaches, extends the hand in sight, and where it doesn’t fit (so often), it scrapes a groove. It claws what it can afford.

Visual Collocator

And when it’s finished there’s a pop, a clunk, a ping, and a burr to be swept away. When it’s left, the palm is open to stroke the surface of the craft. A satisfaction puts the tool away.

Error Message

So few parts of the world fit this tool, other than my hand. Perhaps they are not made for work but for the stroking, the holding, and the gripping turn.

Workbench

Which is why I need so many of them, within reach, laid out in frames, carried in bags, on belts, and ready-at-hand and unforseen.

Analyze Text

Then, I’ll pause in the workshop and not do anthing at all. I’ll hold these tools in my mind which is not how to use them.

Images all from the TAPoR portal and TAPoRware.

Ask E.T.: Sparklines: theory and practice

Deficit Sparkline (Sparkline of US deficit over time) Sparklines: theory and practice is a thread in Edward Tufte’s Ask E.T. forum (which is a great place to follow discussions on design issues.) The thread starts with images of some pages from Tufte’s new book, Beautiful Evidence (2006) on sparklines which are defined as “intense, simple, word-sized graphics”. The sparkline at the beginning of this entry is from the Sparkline PHP Graphing Library. Another source of sparkline tools is Bissantz sparkline tools. Thanks to Shawn for this link.
So how can sparklines be woven into text anlysis environments? Small distribution graphs could be included with lists of word or KWIC displays in tools like the TAPoRware tools.

Download Pertinence Summarizer – Text Mining Solutions

Selection from a "Connivence Map" of World PoliticsPertinenceMining.com is a French company that has a number of neat text processing products built on their KENiA or “Knowledge Extraction and Notification Architecture.” One their products is Connivences.info which produces maps of “actors” in the news with weighted lines to indicate relationships.
Another interesting tool is their Google + Pertinence Summarizer that enhances the results from Google with a “Summarize” button which splits the linked page into sentences and tries to rank their pertinence to the document so you can choose to see only the most pertinent. The interface took me a while – I’m not sure it works.

DadaDodo: Exterminate All Rational Thought

DadaDodo is a text generator or “travesty generator” like Dissociated Press. The code is available and unlike programs that randomly cut up text it “it scans bodies of text, and builds a probability tree expressing how frequently word B tends to occur after word A, and various other statistics; then it generates sentences based on those probabilities.” DadaDodo is described by its creator Jamie Zawinski thus:

DadaDodo is a program that analyses texts for word probabilities, and then generates random sentences based on that. Sometimes these sentences are nonsense; but sometimes they cut right through to the heart of the matter, and reveal hidden meanings.

Zawinski’s page has a “cut up” look with downloadable code and interesting links, many of which are no longer active, alas. The effect of DadaDodo are hard to interpret without knowing what the corpus is that it starts with. I am tempted to create a TAPoRware version so that it can be used on existing web pages.

Communications From Elsewhere »

Communications From Elsewhere is a journal (not blog!) by Josh Larios with some interesting text generators including a Postmodernism Generator which randomly generates “completely meaningless” essays using a modified version of The Dada Engine written by Andrew C. Bulhak.

For more on The Dada Engine see the technical report from Monash University, On the simulation of postmodernism and mental debility using recursive transition networks. The Abstract reads:

Recursive transition networks are an abstraction related to context-free grammars and finite-state automata. It is possible, to generate random, meaningless and yet realistic-looking text in genres defined using recursive transition networks, often with quite amusing results. One genre in which this has been accomplished is that of academic papers on postmodernism.

Josh has collected and connected different “Text Generators” to his journal, including an Adolescent Poetry Corner and a Time Cube screed generator. (For an explanation of Gene Ray’s Time Cube theory see DmitryBrant.com ¬ª On Time Cube. The Time Cube site is another story.)

BBC: Text analysis of texting to catch criminals

The BBC have a story on Texting study to catch criminals by researchers at Leicester University in the School of Psychology. They are studying the individual styles of text messages in order to help with forensic investigations. The BBC had previously reported how Text messages examined in Danielle case had helped in the prosecution of the 15-year-old Danielle Jones’ uncle who seems to have sent text messages from Danielle’s phone in order to throw off the scent.

The Leicester University researchers have a web page that welcomes people willing to submit samples. I wonder how useful anonymously submitted messages will be. I imagine, if they get enough messages, it will give them a control sample for the study of particular suspect messages.

RDUES: WebCorp: The Web as Corpus

The Research and Development Unit for English Studies (RDUES) of UCE of Birmingham has a tool WebCorp: The Web as Corpus which searches google for a term and then goes to the top 199 documents Google identifies and searches them. It takes a while and works like our Googlizer, but produces more verbose results. It produces a concordance organized by document with links to a full word frequency list for the doc. The advanced search form has some interesting features, including the ability to point it at other engines.

UCE Birmingham is strange place from the web. UCE stands for “University of Central England” and you have to go deep to the At A Glance : History Of UCE Birmingham to find this out. (There’s no point explaining it to outsiders anywhere on the web page.) They seem to have been formed out of all the little colleges, polytechnics and schools in the area in 1992.