Solr: Open search server

solr.pngSolr is a “search server” based on Lucene that offers “Advanced, Configurable Text Analysis” and XML handling.

Text fields are typically indexed by breaking the field into words and applying various transformations such as lowercasing, removing plurals, or stemming to increase relevancy. The same text transformations are normally applied to any queries in order to match what is indexed. (Tutorial)

IDC White Paper: The Digital Universe

Image of Report CoverIn an earlier blog I mentioned the IDC report, The Digital Universe, about the explosion of digital information. It was commissioned by EMC Corporation and is available free on their site, here. They also have a page on related information which includes a link to “Are You an Informationist?” and “The Inforati Files”.

The PDF of the IDC White Paper includes some interesting points:

  • Between 2006 and 2010, the information added annually to the digital universe will increase more than six fold from 161 exabytes to 988 exabytes.
  • Three major analog to digital conversions are powering this growth ‚Äì film to digital image capture, analog to digital voice, and analog to digital TV.
  • Images, captured by more than 1 billion devices in the world, from digital cameras and camera phones to medical scanners and security cameras, comprise the largest component of the digital universe. They are replicated over the Internet, on
    private organizational networks, by PCs and servers, in data centers, in digital TV broadcasts, and on digital projection movie screens. building automation and security migrates to IP networks, surveillance goes digital, and RFID and sensor networks
    proliferate.

Is it time to rewrite “The Work of Art in the Age of Mechanical Reproduction” to think about about “The Image in the Age of Networked Distribution”.

Scribd: Put your docs online

Scribd.com lets you post documents and then, like other social network sites, lets others comment on them or embed them elsewhere. Most of the documents seem to be silly like a Girlfriend Application, but the system has some neat features. It presents the documents as PDFs with a custom viewer. It has a text to speech synthesizer that reads the document out and basic statistics about the doucment. It seems heavily influenced by Flickr.

Swivel: When Sharks Attack!

Monthly Iraqi Civilian Deaths vs. Coalition Military DeathsSwivel is a simple site where people can upload data sets and then graph them against each other. You get graphs from the intriguing like, When Sharks Attack! vs. NASDAQ Adjusted Close, to the serious, as in Monthly Iraqi Civilian Deaths vs. Coalition Military Deaths (seen in picture).

With Swivel you can explore other people’s data, graph different data sets, comment on graphs, and blog your results. It is a clean idea to get people experimenting with data. I wonder how we could provide something like this for texts?

Thanks to Sean for this.

Society for Textual Scholarship Presentation

Last Thursday I gave a paper on “The Text of Tools” at the Society for Textual Scholarship annual conference in New York. I was part of a session on Digital Textuality with Steven E. Jones and Matthew Kirschenbaum. Steven gave a fascinating paper on “The Meaning of Video Games: A Textual Studies Approach” which looked at games as texts whose history of production and criticism can be studied, just as textual scholars study manuscripts and editions. He is proposing an alternative to the ludology vs. narrativity approaches to games – one that looks at their material production and reception.

Matt Kirschenbaum presented a paper titled “Shall These Bits Live?” (See the trip report with the same title.) that looked at preservation and access to games. He talked about his experience studying the Michael Joyce archives at the Harry Ransom Humanities Research Centre. He made the argument that what we should be preserving are the conditions of playing games, not necessarily the game code (the ROMs), or the machines. He pointed to projects like MANS (Media Art Notation System) – an attempt to document a game the way a score documents the conditions for recreating a performance. This reminds me of HyTime, the now defunct attempt to develop an SGML standard for hypermedia.

In my paper, “The Text of Tools” I presented a tour through the textuality of TAPoR that tried to show the ways texts are tools and tools are texts so that interpretation is always an analysis of what went before that produces a new text/tool.

Update. Matt has sent me a clarification regarding preserving the game code or machines,

I’d actually make a sharp distinction between preserving the code and the machines. The former is always necessary (though never sufficient); the latter is always desirable (at least in my view, though others at the Berkeley meeting would differ), but not always feasible and is expendable more often than we might think. I realize I may not have been as clear as I needed to be in my remarks, but the essential point was that the built materiality of a Turing computer is precisely that it is a machine engineered to render its own
artifactual dimension irrelevant. We do no favors to materiality of computation by ignoring this (which is what one of the questioners seemed to want).

tiddlyspot

I blogged before about TiddlyWiki the amazing selfcontained (HTML, CSS and JavaScript) wiki in a web page. I’ve now come across tiddlyspot where you can create a server based TiddlyWiki that can be private or public.

I’m convinced that between services like tiddlyspot, Ning.com, Blogger.com and Flckr.com you can create a robust distributed web presence without needing an ISP. Push your content out into the world.

Wikipedia: Book sources

The Wikipedia has a cool book source lookup tool that I just noticed. If you have a book with the ISBN of “9780304349616” you can create a link like this, The Cassell guide to punctuation which goes to “http://en.wikipedia.org/w/index.php?title=Special:Booksources&isbn=9780304349616”. This opens a page where you can find the book in most accessible card catalogues like Toronto Public Library. The system lets Wikipedia references be followed to local libraries where you could get the book. I should get into the habit of tagging references online this way.

TagCrowd

TagCrowd is a tool that lets you generate a word cloud from text typed in or uploaded. It has a nice clean interface. Unlike our TAPoRware Word Cloud tool the results are HTML so they can be easily integrated into a web page like this,

created at TagCrowd.com


They have a long list of blacklists of words. I wonder where they came from. Thanks to Paola for this.

Robotic age poses ethical dilemma

Roboethics ImageThe BBC has a story about roboethics, Robotic age poses ethical dilemma, triggered by a South Korean initiative to develop a Robot Ethics Charter as part of a focus on robotics as a growth area.

In the past, robots were considered just a useful tool in the manufacturing industry. But it is gradually embedded in human life by cleaning homes, protecting them from thieves and providing education. Nowadays robots are also used to rescue people at accident spots such as fires.

This year, various robots are to be introduced: a robot that teaches English and sings songs to children, a robot that guides people at the post office and a robot designed to save people at disaster areas. (Korea.net, Robots, cars, batteries hold key to future growth)

Poking around I found this Painter Robot from Yahoh. (Sounds like Yahoo to me.) The BBC story also mentions the Roboethics.org – Official Roboethics website which has issued a Roboethics Roadmap.

Roboethics is the ethics applied to Robotics, guiding the design, construction and use of the robots.
In this site you may find: birth and history of Roboethics; all the information concerning the development of the concept of a human-centered Roboethics; the events which have marked the update of the original proposal; the international projects on Roboethics; the EURON Roboethics Roadmap; the activity of the IEEE-RAS Technical Committee on Roboethics.

Thanks to Daryl for this link.

Hairy Messaging

Screen ShotHairy Mail is a the most unusual messaging environment I’ve encountered. Your write a message and it spreads Sodium Hydroxide (found in hair removal and cigarettes) over a hairy back in the shape of your message. If you press OK it removes the hair from the back.

So, what’s the point? Well it’s part of a site thetruth.com which promotes an anti-smoking message. The point is that Sodium Hydroxide is found in cigarettes, which can’t be good. The Hairy-Mail Flash toy sends your message as an e-mail.