As always someone else has implemented any good idea. WebCorp: The Web as Corpus is an aggregator like the TAPoRware Googlizer that we are developing. We do more on the post-processing, theirs has other strengths. What can we learn from this tool? (Thanks to Ian Lancashire for this.)
Category: Text Analysis
News Bursts
Daypop Top News Bursts is a site that lists clusters of news stories around word bursts. They don’t give the algorithm, but it seems to do something like what Google News does – provides clusters of stories that have similar subjects and which have a “heightened useage of certain words…”
Can this be used on a text? Could you treat a text with paragraphs as if each paragraph were a story in time. Sentences could be pulled that best show the heightened usage of words. Something like that…
Speed Reader
Rob of isagen and I were talking about different types of visualization and sonification of texts. One idea is to have a sonification of a text where keywords are whispered from different directions. A text would be processed into a short sonoric summary. Rob has build scrollers that show the news scrolling by in a window as a way of allowing the user to keep an eye on a (changing) text. I came across this Speed Reader that does something like this here.
What if one ran a process that summarized a text and the summary (a list of frequency sorted words) was then played back through such a reader?
Gender Guessing
The Gender Genie will try to guess the gender of an author based on 500 words of text. It is a form of playful text analysis based on an algorithm developed by Koppel and Argamon.
Continue reading Gender Guessing
Googlism
Googlism is a term coined by a site that uses Google to gather information about people, places, and events. You enter a word and it returns selected phrases that describe that person. This strikes me as an example of smart text analysis and aggregation.
Continue reading Googlism
Possible Tools
The following are possible tools for Tapor Tools Prototype. The idea is to create tools that help summarize texts like TEI tagged texts in visual or literal ways.
Continue reading Possible Tools
Rebecca V.1
Rebecca v.1
v.1 of Rebecca can be played without specialized software. You only need access to a server with the appropriate programming tools. The system is set up so that the players have Read, Write and Execute access to a directory. The start.text is placed in a subdirectory called “startText”. As each player makes a move they must create a new subdirectory with their initials and the number of the move. In that subdirectory they should place their code (move.program) and the output.text. In the game root directory there should be a shared html file called game.html that is used to keep track of the moves and which has links to the moves.
Continue reading Rebecca V.1
Richard Powers Web Site
Richard Powers: American Novelist a web site about the novelist. Powers was a programmer, among other things, and that shows in novels and short works like Galatea 2.2 and Literary Devices.
Continue reading Richard Powers Web Site
Text Toy Language 2
The grammar of TBC (Text by Code) should have:
– Primitives: Paths, Glyphs, Characters, Words, Strings, and Element
– Methods: To get text, parse it into primitives, manipulate it, place it, render it, and make it move.
Continue reading Text Toy Language 2