ForensicXP for forensic document analysis

Software Screen ImageForensicXP is a device that does forensic document imaging. It combines 3D imaging with chemical analysis to do Hyperspectrum Imaging and Processing. This can be used to recover “obliterated” writing, to figure out the sequence of line drawing (what lines/words were drawn first), and to detect additions and substitutions. Obviously it also helps identify the chemistry (ink) used.

Thanks to John for this.

Kirschenbaum: Hamlet.doc?

Matt Kirschenbaum has published an article in The Chronicle of Higher Education titled, Hamlet.doc? Literature in a Digital Age (From the issue of August 17, 2007.) The article nicely summarizes teases us with the question of what we scholars could learn about the writing of Hamlet if Shakespeare had left us his hard-drive. Kirschenbaum has nicely described and theorized the digital archival work humanists will need to learn to do in his forthcoming book from MIT Press, Mechanisms. Here is the conclusion of the Chronicle article,

Literary scholars are going to need to play a role in decisions about what kind of data survive and in what form, much as bibliographers and editors have long been advocates in traditional library settings, where they have opposed policies that tamper with bindings, dust jackets, and other important kinds of material evidence. To this end, the Electronic Literature Organization, based at the Maryland Institute for Technology in the Humanities, is beginning work on a preservation standard known as X-Lit, where the “X-” prefix serves to mark a tripartite relationship among electronic literature’s risk of extinction or obsolescence, the experimental or extreme nature of the material, and the family of Extensible Markup Language technologies that are the technical underpinning of the project. While our focus is on avant-garde literary productions, such literature has essentially been a test bed for a future in which an increasing proportion of documents will be born digital and will take fuller advantage of networked, digital environments. We may no longer have the equivalent of Shakespeare’s hard drive, but we do know that we wish we did, and it is therefore not too late ‚Äî or too early ‚Äî to begin taking steps to make sure we save the born-digital records of the literature of today.

Blacklight: Faceted searching at UVA

Screen capture of BlacklightBlacklight is a neat project that Bethany Nowviskie pointed me to at the University of Virginia. They have indexed some 3.7 million records from their library online catalogue and set up a faceted search and browse tool.

What is faceted searching and browsing? Traditionally search environments like those for finding items in a library have you fill in fields. In Blacklight you can both search with words, but you can also add constraints by clicking on categories within the metadata. So, if I search for “gone with the wind” in Blacklight it shows that there are 158 results. On right it shows how those results are distributed over different categories. It shows me that 41 of these are “BOOK” in the category “format”. If I click on “BOOK” it then adds a constraint and updates the categories I can use further. Backlight makes good use of inline graphics (pie charts) so you can see at a glance what percentage of the remaining results are in what category type.

This faceted browsing is a nice example of a rich-prospect view on data where you can see and navigate by a “prospect” of the whole.

Blacklight came out of work on Collex. It is built on Flare which harnesses Solr through Ruby on Rails. As I understand it, Blacklight is also interesting as an open-source experimental alternative to very expensive faceted browsing tools that comes out of the Collex project. It is a “love letter to the Library” from a humanities computing project and its programmer.

Where is the Semantic Web?

Semantic Web DiagramWhere is the Semantic Web? In the face of Web 2.0 hype, the semantic web meme seems to be struggling. Tim Berners-Lee, in the slides from a 2003 talk says there is “no such thing” as a killer-app for the semantic web, that “its the integration, stupid!” (slide 7 of 35.) The problem is that mashups are giving users usable integration now. The difference is that mashups are usually based around one large content portal like Flickr that then little sattelite tools feed off. The semantic web was a much more democratic idea of integration.

Google’s Peter Norvig is quoted in Google exec challenges Berners-Lee saying that there are three problems with the semantic web:

  • Incompetence: users don’t know how to use HTML in a standard way let alone RDF.
  • Competition: companies that are in a leadership position don’t like to use open standards that could benefit others, they like to control the standards to their advantage.
  • Trust: too many people try to trick systems to change the visibility of their pages (selling Viagra.)

In a 2006 Guardian report, Spread the word, and join it up, SA Mathieson quotes Berners-Lee to the effect that they (semantic web folk) haven’t shown useful stuff. The web of TBL was a case of less is more (compared to SGML and other hypertext systems), the semantic web may lose out to all the creative mashups that are less standardized and more useful.

Society for Textual Scholarship Presentation

Last Thursday I gave a paper on “The Text of Tools” at the Society for Textual Scholarship annual conference in New York. I was part of a session on Digital Textuality with Steven E. Jones and Matthew Kirschenbaum. Steven gave a fascinating paper on “The Meaning of Video Games: A Textual Studies Approach” which looked at games as texts whose history of production and criticism can be studied, just as textual scholars study manuscripts and editions. He is proposing an alternative to the ludology vs. narrativity approaches to games – one that looks at their material production and reception.

Matt Kirschenbaum presented a paper titled “Shall These Bits Live?” (See the trip report with the same title.) that looked at preservation and access to games. He talked about his experience studying the Michael Joyce archives at the Harry Ransom Humanities Research Centre. He made the argument that what we should be preserving are the conditions of playing games, not necessarily the game code (the ROMs), or the machines. He pointed to projects like MANS (Media Art Notation System) – an attempt to document a game the way a score documents the conditions for recreating a performance. This reminds me of HyTime, the now defunct attempt to develop an SGML standard for hypermedia.

In my paper, “The Text of Tools” I presented a tour through the textuality of TAPoR that tried to show the ways texts are tools and tools are texts so that interpretation is always an analysis of what went before that produces a new text/tool.

Update. Matt has sent me a clarification regarding preserving the game code or machines,

I’d actually make a sharp distinction between preserving the code and the machines. The former is always necessary (though never sufficient); the latter is always desirable (at least in my view, though others at the Berkeley meeting would differ), but not always feasible and is expendable more often than we might think. I realize I may not have been as clear as I needed to be in my remarks, but the essential point was that the built materiality of a Turing computer is precisely that it is a machine engineered to render its own
artifactual dimension irrelevant. We do no favors to materiality of computation by ignoring this (which is what one of the questioners seemed to want).

Wikipedia: Book sources

The Wikipedia has a cool book source lookup tool that I just noticed. If you have a book with the ISBN of “9780304349616” you can create a link like this, The Cassell guide to punctuation which goes to “http://en.wikipedia.org/w/index.php?title=Special:Booksources&isbn=9780304349616”. This opens a page where you can find the book in most accessible card catalogues like Toronto Public Library. The system lets Wikipedia references be followed to local libraries where you could get the book. I should get into the habit of tagging references online this way.

Hairy Messaging

Screen ShotHairy Mail is a the most unusual messaging environment I’ve encountered. Your write a message and it spreads Sodium Hydroxide (found in hair removal and cigarettes) over a hairy back in the shape of your message. If you press OK it removes the hair from the back.

So, what’s the point? Well it’s part of a site thetruth.com which promotes an anti-smoking message. The point is that Sodium Hydroxide is found in cigarettes, which can’t be good. The Hairy-Mail Flash toy sends your message as an e-mail.

Time to learn your exabytes: Tech researchers calculate wide world of data

161 exabytes of information was generated last year according to a CBC.ca story, Time to learn your exabytes: Tech researchers calculate wide world of data by Brian Bergstein (March 5, 2007). That is way up from the estimate in How Much Information? 2003 that I blogged before. The study quotes John F. Gantz of IDC, but I can’t find the paper on the IDC site.

Wired News also has a version of the story, but again they link to the general IDC site.

Thanks to Matt amd Mike for this.

The Exchange Online

Robert Townsend has a Review of the ACLS Cyberinfrastructure Report in the The Exchange Online of the Association of American Universtiy Presses. He is critical of the report, arguing,

To make its case, the commission simply ignores skeptics who ask whether the rush to mass digitization could hurt reading and scholarship, and whether there might be other casualties on this road to progress. This offers a rather narrow view of the “grand challenges” facing the humanities and social sciences, and limits the array of problems that might be remedied by a developed cyberinfrastructure. This seems part of a larger rhetorical strategy in the report, however, which positions potential problems and the costs of digitization as external to its vision of technological progress—limiting them to social, political, or financial failures that can be assigned to publishers and “conservative” academics.

He rightly points to the ongoing costs of maintaining digital projects, “Like Jacob Marley‚Äôs chains, link-by-link we forge these digital burdens that we can never seem to lay down.” (Great image.) He is worried about the place of non-profit publisher who might get left behind if there is massive investment in cyberinfrastructure that goes to the universities who then cut out the publishers. I’m tempted to say that this is an old refrain, but that doesn’t make the issue go away. Frankly I doubt cyberinfrastructure investment will endanger quality publishers, but it may change their relationship with the academy. More importantly I think the Report (see previous blog entry) was making the case for investment in humanities and arts cyberinfrastructure so we can do our research, including research around digital publications.