Text Technology and TAPoR – Page 10

Expose Your Texts

How can tools like TAPoRware interact with texts elsewhere?

They will only work if e-text projects explose their texts for analytical tools elsewhere. Many e-text project either let you browse html (which is sometimes generated from XML) or they bundle a customized search environment with the texts. Setting aside issues of intellectual property, there is a problem with bundling in that it forces the researcher to use the bundled tool for analysis. While it is useful for new users of a resource to be able to use customized search tools to familiarize themselves with a corpus, the bundled tool can become a limitation if it does not support original analysis. Research involves the asking of new questions which can involve the using new methods. Bundled tool/text combinations where the text is not exposed in an open and documented fashion limit our capacity to use new methods and ask new questions. Therefore: Expose your texts.

Here is a PowerPoint outline of a talk I gave on the subject: Download file.

TAPoRware and Hyperlistes

Can text analysis tools like TAPoRware be adapted for special collections?

Here is an experiment with TAPoRware and the Hyperlistes project: Tapor XML Tools Demo. We created a special interface in html so that TAPoRware tools could operate on Hyperliste texts without having to fill in the URLs. We also created as special backend shim programme that calls the others. In theory we want to be able to do this without touching TAPoRware – all the work should happen in the html so that it can be done elsewhere.

Clumps of Communities

Can we find patterns in large networked groups?

Roland Piquepaille has a column in his Technology Trends blog on Detecting Patterns in Complex Social Networks which talks about new ways of “uncovering patterns in complicated networks”. (Note: blog now discontinued.)

Anytime I have done significant visualizations it becomes clear that we need types of MVA to help simplify. The simweb project John Bradley and I did used Correspondance Analysis to reduce the n dimensional space to dimensions that captured most of the variation and which therefore could be graphed and understood. Hmmm… I wonder what statistical techniques (and I assume that it is forms of MVA) are used?
Continue reading Clumps of Communities

textz: the anti gnutenberg

How can we share texts?

textz.com is a refreshing ascii simple project for sharing texts. They seem to be against everything including encoded texts and copyright. textz is the antidote to complex and expensive e-books and xml texts. See textz.com – we are the & in copy & paste. Authors like Douglas Adams, Neil Stephenson, and Slavoj Zizek have works posted.

Their interface is interesting, but doesn’t seem to work reliably. There is an annoying scrolling news feed at the bottom (while a cute hack, it doesn’t seem to add to the site unless one considers the feed more textz.)

Could we hook this into TAPoR to give analysis features? Probably.
Continue reading textz: the anti gnutenberg

Open Source Content Management

What are the advantages and disadvantages of open source content systems?

Open-source content management systems is a clear summary of the pros and cons. Most people forget to mention the cons, but this article does.
Continue reading Open Source Content Management

Information Studies

The Faculty of Information Studies at the University of Toronto is going through an exciting planning process. See the PDF Chartreuse Paper at WebBoard – Guest User Page.

In the paper the dean, Briank Cantwell Smith raises questions about what is the subject of information studies (we all study information.) He argues for an issues oriented, interdisciplinary centre that looks at documentary practices and performances.

What is exciting about the process is that it is open (I can look at it) and openness is also one of the issues (as in Open Source as an issue.)

Perhaps what we need is a clear philosophy of open source research as a practice.

Category: Text Technology and TAPoR

Expose Your Texts

TAPoRware and Hyperlistes

Clumps of Communities

textz: the anti gnutenberg

Open Source Content Management

Information Studies

RSS News

JSpell HTML, services for forms

Novels by Numbers

Automatic Translation