Solr: Open search server

solr.pngSolr is a “search server” based on Lucene that offers “Advanced, Configurable Text Analysis” and XML handling.

Text fields are typically indexed by breaking the field into words and applying various transformations such as lowercasing, removing plurals, or stemming to increase relevancy. The same text transformations are normally applied to any queries in order to match what is indexed. (Tutorial)

State of the Union Visualization

SOTU Visualization ImageBrad Borevitz of onetwothree.net has developed another visualization of language in presidential State of the Union Addresses at State of the Union. He calls it a “data toy” and it combines a number of different graphs. One nice feature is that if you click on one Address and then another the word cloud for the first appears behind (and in red) the second for comparison purposes.

I have blogged other such visualization toys that use the State of the Union Addresses like State of the Union Parsing Tool and the SOTU Rich Prospect Browsing of the New York Times.

Thanks to Nick for this.

CNW Group: Mediavantage

Logo for MediavantageThe CNW Group that has it’s Canadian base in Toronto has a new service called MEDIAVANTAGE that has many of the features of a multimedia news crawling, managing, and visualizing service. From the Flash intro it looks like users define keywords to track. Mediavantage then shows you results from different sources. It can send alerts and graphs result history.
Screenshot
The interesting part is that they track TV news and provide text summaries that look like the text off close captioning. Subsets of results can be shared by e-mail and PDF. This is a news mining tool for business that offers a model for what Web Mining for Research might look like.

Thanks to Terry for this.

Web Mining for Research

What’s Web Mining for Research is a white paper I wrote on the TADA wiki trying to define an emerging research practice that draws on the web as evidence of human behaviour. I’m not happy with the phrase, but it hard to know what to call it. Text mining refers to mining large text databases, not the web. Web mining means all sorts of things. What stands out for me as important is that we have in the Web a massive body of evidence for philosophical and cultural analysis, something we haven’t had before. While a chance in evidence may seem trivial, the resulting change in research practices is not.

State of the Union Parsing Tool

Image of VisualizationYet another George W Bush, State of the Union visualization tool can be seen at State of the Union Parsing Tool. I commented earlier on the New York Times, State of the Union in Words. It seems that Bush’s State of the Union addresses are becoming the standard text for visualizations.

This one on Style.org colorizes the lines with the found words. You can set the size of the words (and therefore text representation.)

Centre National de Resources Textuelles et Lexicales

Synonyms for French "Content"The Centre National de Ressources Textuelles et Lexicales (CNRTL) is a centre attached to the Analyse et Traitement Informatique de la Langue Fran?ßaise (ATILF / CNRS) lab at Nancy Universit?©. They have a portal for lexical, morphological and etymological work in French. It takes a French word and will give you synonyms, dictionary entry, etymology and so on. Neat. I wonder if it can be mashed into a TAPoR tool?

This is thanks to Jean-Guy.