 TextAnalyst is text mining and analysis software from Megaputer. It is hard, without buying it, to tell what it does. They do have what sounds like a neat plug-in for IE that does analysis on the web page you are looking at (see screenshot with this post.) The plug-in, TextAnalyst for Microsoft Internet Explorer summarizes web pages, provides a semantic network and allows natural language querying.
TextAnalyst is text mining and analysis software from Megaputer. It is hard, without buying it, to tell what it does. They do have what sounds like a neat plug-in for IE that does analysis on the web page you are looking at (see screenshot with this post.) The plug-in, TextAnalyst for Microsoft Internet Explorer summarizes web pages, provides a semantic network and allows natural language querying.
Where is the Semantic Web?
 Where is the Semantic Web? In the face of Web 2.0 hype, the semantic web meme seems to be struggling. Tim Berners-Lee, in the slides from a 2003 talk says there is “no such thing” as a killer-app for the semantic web, that “its the integration, stupid!” (slide 7 of 35.) The problem is that mashups are giving users usable integration now. The difference is that mashups are usually based around one large content portal like Flickr that then little sattelite tools feed off. The semantic web was a much more democratic idea of integration.
Where is the Semantic Web? In the face of Web 2.0 hype, the semantic web meme seems to be struggling. Tim Berners-Lee, in the slides from a 2003 talk says there is “no such thing” as a killer-app for the semantic web, that “its the integration, stupid!” (slide 7 of 35.) The problem is that mashups are giving users usable integration now. The difference is that mashups are usually based around one large content portal like Flickr that then little sattelite tools feed off. The semantic web was a much more democratic idea of integration. 
Google’s Peter Norvig is quoted in Google exec challenges Berners-Lee saying that there are three problems with the semantic web:
- Incompetence: users don’t know how to use HTML in a standard way let alone RDF.
- Competition: companies that are in a leadership position don’t like to use open standards that could benefit others, they like to control the standards to their advantage.
- Trust: too many people try to trick systems to change the visibility of their pages (selling Viagra.)
In a 2006 Guardian report, Spread the word, and join it up, SA Mathieson quotes Berners-Lee to the effect that they (semantic web folk) haven’t shown useful stuff. The web of TBL was a case of less is more (compared to SGML and other hypertext systems), the semantic web may lose out to all the creative mashups that are less standardized and more useful.
Solr: Open search server
 Solr is a “search server” based on Lucene that offers “Advanced, Configurable Text Analysis” and XML handling.
Solr is a “search server” based on Lucene that offers “Advanced, Configurable Text Analysis” and XML handling. 
Text fields are typically indexed by breaking the field into words and applying various transformations such as lowercasing, removing plurals, or stemming to increase relevancy. The same text transformations are normally applied to any queries in order to match what is indexed. (Tutorial)
IDC White Paper: The Digital Universe
 In an earlier blog I mentioned the IDC report, The Digital Universe, about the explosion of digital information. It was commissioned by EMC Corporation and is available free on their site, here. They also have a page on related information which includes a link to “Are You an Informationist?” and “The Inforati Files”.
In an earlier blog I mentioned the IDC report, The Digital Universe, about the explosion of digital information. It was commissioned by EMC Corporation and is available free on their site, here. They also have a page on related information which includes a link to “Are You an Informationist?” and “The Inforati Files”. 
The PDF of the IDC White Paper includes some interesting points:
- Between 2006 and 2010, the information added annually to the digital universe will increase more than six fold from 161 exabytes to 988 exabytes.
- Three major analog to digital conversions are powering this growth – film to digital image capture, analog to digital voice, and analog to digital TV.
- Images, captured by more than 1 billion devices in the world, from digital cameras and camera phones to medical scanners and security cameras, comprise the largest component of the digital universe. They are replicated over the Internet, on
private organizational networks, by PCs and servers, in data centers, in digital TV broadcasts, and on digital projection movie screens. building automation and security migrates to IP networks, surveillance goes digital, and RFID and sensor networks
proliferate.
Is it time to rewrite “The Work of Art in the Age of Mechanical Reproduction” to think about about “The Image in the Age of Networked Distribution”.
Scribd: Put your docs online
Scribd.com lets you post documents and then, like other social network sites, lets others comment on them or embed them elsewhere. Most of the documents seem to be silly like a Girlfriend Application, but the system has some neat features. It presents the documents as PDFs with a custom viewer. It has a text to speech synthesizer that reads the document out and basic statistics about the doucment. It seems heavily influenced by Flickr.
Swivel: When Sharks Attack!
Swivel is a simple site where people can upload data sets and then graph them against each other. You get graphs from the intriguing like, When Sharks Attack! vs. NASDAQ Adjusted Close, to the serious, as in Monthly Iraqi Civilian Deaths vs. Coalition Military Deaths (seen in picture).
With Swivel you can explore other people’s data, graph different data sets, comment on graphs, and blog your results. It is a clean idea to get people experimenting with data. I wonder how we could provide something like this for texts?
Thanks to Sean for this.
Society for Textual Scholarship Presentation
Last Thursday I gave a paper on “The Text of Tools” at the Society for Textual Scholarship annual conference in New York. I was part of a session on Digital Textuality with Steven E. Jones and Matthew Kirschenbaum. Steven gave a fascinating paper on “The Meaning of Video Games: A Textual Studies Approach” which looked at games as texts whose history of production and criticism can be studied, just as textual scholars study manuscripts and editions. He is proposing an alternative to the ludology vs. narrativity approaches to games – one that looks at their material production and reception.
Matt Kirschenbaum presented a paper titled “Shall These Bits Live?” (See the trip report with the same title.) that looked at preservation and access to games. He talked about his experience studying the Michael Joyce archives at the Harry Ransom Humanities Research Centre. He made the argument that what we should be preserving are the conditions of playing games, not necessarily the game code (the ROMs), or the machines. He pointed to projects like MANS (Media Art Notation System) – an attempt to document a game the way a score documents the conditions for recreating a performance. This reminds me of HyTime, the now defunct attempt to develop an SGML standard for hypermedia.
In my paper, “The Text of Tools” I presented a tour through the textuality of TAPoR that tried to show the ways texts are tools and tools are texts so that interpretation is always an analysis of what went before that produces a new text/tool.
Update. Matt has sent me a clarification regarding preserving the game code or machines,
I’d actually make a sharp distinction between preserving the code and the machines. The former is always necessary (though never sufficient); the latter is always desirable (at least in my view, though others at the Berkeley meeting would differ), but not always feasible and is expendable more often than we might think. I realize I may not have been as clear as I needed to be in my remarks, but the essential point was that the built materiality of a Turing computer is precisely that it is a machine engineered to render its own
artifactual dimension irrelevant. We do no favors to materiality of computation by ignoring this (which is what one of the questioners seemed to want).
tiddlyspot
I blogged before about TiddlyWiki the amazing selfcontained (HTML, CSS and JavaScript) wiki in a web page. I’ve now come across tiddlyspot where you can create a server based TiddlyWiki that can be private or public.
I’m convinced that between services like tiddlyspot, Ning.com, Blogger.com and Flckr.com you can create a robust distributed web presence without needing an ISP. Push your content out into the world.
Wikipedia: Book sources
The Wikipedia has a cool book source lookup tool that I just noticed. If you have a book with the ISBN of “9780304349616” you can create a link like this, The Cassell guide to punctuation which goes to “http://en.wikipedia.org/w/index.php?title=Special:Booksources&isbn=9780304349616”. This opens a page where you can find the book in most accessible card catalogues like Toronto Public Library. The system lets Wikipedia references be followed to local libraries where you could get the book. I should get into the habit of tagging references online this way.
TagCrowd
TagCrowd is a tool that lets you generate a word cloud from text typed in or uploaded. It has a nice clean interface. Unlike our TAPoRware Word Cloud tool the results are HTML so they can be easily integrated into a web page like this,
They have a long list of blacklists of words. I wonder where they came from. Thanks to Paola for this.