NITLE: National Institute for Technololgy and Liberal Education: Semantic Indexing

National Institute for Technology and Liberal Education or NITLE (pronounced “nightly”?) have a free semantic indexing tool written in perl that you can download. Their page also has useful starting links on semantic analysis. The project was/is funded by Mellon.
In particular I recommend the introduction to latent semantic indexing they have put up at, Patterns in Unstructured Data: Discovery, Aggregation, and Visualization by Yu, Cuadrado, Ceglowski, and Payne.

A quote about semantic indexing:

Semantic indexing is our name for a family of techniques for searching and organizing large data collections. The goal of semantic indexing is to find patterns in unstructured data (documents without descriptors such as keywords or special tags) and use those patterns to offer more effective search and categorization services.

Semantic indexing techniques are language-agnostic, so data collections don’t have to be in English, or even in any human language at all. For example, we have had good preliminary results in protein structure prediction using algorithms adapted from a text search engine.