Text Analysis – Page 21

Toy Chest (Online or Downloadable Tools for Building Projects)

Alan Liu and others have set up a Knowledge Base for the Department of English at UCSB which includes a neat Toy Chest (Online or Downloadable Tools for Building Projects) for students. The idea is to collect free or very cheap tools students can use and they have done a nice job documenting things.

The idea of a departmental knowledge base is also a good one. I assume the idea is that this can be an informal place for public knowledge faculty, staff and students gather.

netzspannung.org | Archive | Archive Interfaces

Image of Semantic Map

netzspannung.org is a German new media group with an archive of “media art, projects from IT research, and lectures on media theory as well as on aesthetics and art history.” They have a number of interfaces to this archive, for an explanation see, Archive Interfaces. The most interesting is the Java Semantic Map (see picture above.)

netzspannung.org is an Internet platform for artistic production, media projects, and intermedia research. As an interface between media art, media technology and society, it functions as an information pool for artists, designers, computer scientists and cultural scientists. Headed by Â» Monika Fleischmann and Â» Wolfgang Strauss, at the Â» MARS Exploratory Media Lab, interdisciplinary teams of architects, artists, designers, computer scientists, art and media scientists are developing and producing tools and interfaces, artistic projects and events at the interface between art and research. All developments and productions are realised in the context of national and international projects.

See The Semantic Map Interface for more on their Java Web Start archive browser.

Image of Semantic Map

We feel fine: Blog harvesting

Logo We Feel Fine is “an exploration of human emotion, in six movements” that harvests recent blog posts for “I feel” or “I am feeling” and then stores information for visualization. There is an applet where you can look for “men who are abiding”. Thanks to Guy for this.

Visuwords: Visual Dictionary Graph

Screen Image for Visuwords Visuwords online graphical dictionary and thesaurus is a tool that visualizes WordNet relationships. It shows the synonyms and definitions for words you search for. Thanks to Shawn for this.

OpenSocial – Google Code

Two days ago, on the day of All Hallows (All Saints), Google announced OpenSocial a collection of APIs for embedded social applications. Actually much of the online documentation like the first OpenSocial API Blog entry didn’t go up until early in the morning on November 2nd after the Campfire talk. On November 1st they had their rather hokey Campfire One in one of the open spaces in the Googleplex. A sort of Halloween for older boys.

Screen from YouTube video. Note the campfire monitors.

OpenSocial, is however important to tool development in the humanities. It provides an open model for the type of energetic development we saw in the summer after the Facebook Platform was launched. If it proves rich enough, it will provide a way digital libraries and online e-text sites can open their interface to research tools developed in the community. It could allow us tool developers to create tools that can easily be added by researchers to their sites – tools that are social and can draw on remote sources of data to mashup with the local text. This could enable an open mashup of information that is at the heart of research. It also gives libraries a way to let in tools like the TAPoR Tool bar. For that matter we might see creative tools coming from out students as they fiddle with the technology in ways we can’t imagine.

The key difference between OpenSocial and the Facebook Platform is that the latter is limited to social applications for Facebook, as brilliant as it is. OpenSocial can be used by any host container or social app builder. Some of the other host sites that have committed to using is are Ning and Slide. Speaking of Ning, Marc Andreessen has the best explanations of the significance of both the Facebook Platform phenomenon and OpenSocial potential in his blog, blog.pmarca.com (gander the other stuff on Ning and OpenSocial too).

Republican Debate: Analyzing the Details – The New York Times

The New York Times has created another neat text visualization, this time for the Republican Debate. The visualization has two panels. One shows the video, a transcript, and sections. You can jump the video using the transcript or section outline. The other is a “Transcript Analyzer” where you can see a rich prospect of the debate divided by speeches and you can search for words. What is missing is some sort of overview of what the high frequency words are and how they collocate.

So, I have created a public text for analysis in TAPoR and here are some results. Here is a list of words that are high frequency generated using the List Words tool. Some interesting words:

People (76), Think (66), Know (48), Giuliani (42), Clinton (33), Reagan (13), Democrats (16), Republicans (11)

Health (45), Government (35), Security (35), Country (25), Policy (16), Military (15), School (15),

Marriage (23), Insurance (23), Conservative (23), Private (22), Let (21), Gay (12)

Iraq (13), Iran (12), Turkey (7), Canada (2), Darn (2), Europe (5),

Immigrants (5), Citizens (2)

Man (7), Mean (7), Woman (4), Congressman (25)

Answer (10), Problem (10), Solution (5), War (12)

Continue reading Republican Debate: Analyzing the Details – The New York Times

Plagiarism and The Ecstasy of Influence

Jonathan Lethem had a wonderful essay, The Ecstasy of Influence: A Plagiarism, in the February 2007 Harpers. The twist to the essay, which discusses the copying of words, gift economies, and public commons, was that it was mostly plagiarized – a collage text – something I didn’t realize until I got to the end. The essay challenges our ideas of academic integrity and plagiarism.

In my experience plagiarism has been getting worse with the Internet. There are now web sites like Customessay.org where you can buy customized essays for as low as $12.95 a page. Do the math – a five page paper will probably cost less than the textbook and it won’t get detected by services like Turn It In.

These essay writing companies actually offer to check that the essay you are buying isn’t plagiarized. Here is what Customessay.org says about their Cheat Guru software:

Custom Essay is using the specialized Plagiarism Detection software to prevent instances of plagiarism. Furthermore, we have developed the special client module and made this software accessible to our customers. Many companies claim to utilize the tools of such kind, few of them do and none of them offer their Plagiarism Detection software to their customers. We are sure about the quality of our work and provide our customers with effective tools for its objective assessment. Download and install our Cheat Guru and test the quality of the products you receive from us or elsewhere.

Newspapers have been running stories on plagiarism like JS Online: Internet cheating clicks with students connecting it to ideas from a book by David Callahan, The Cheating Culture (see the archived copy of the Education page that was on his site.)

There is a certain amount of research on plagiarism on the web. A place to start is the The Plagiarism Resource Site or the University of Maryland College’s Center for Intellectual Property page on Plagiarism.

I personally find it easy to catch students who crib from the web by using Google. When I read a shift in writing professionalism I take a sequence of five or so words and Google the phrase in quotations marks. Google will show me the web page the sequence came from. The trick is finding a sequence short enough to not be affected by paraphrasing while long and unique enough to find a web site the student used. This Salon article, “The Web’s plagiarism police” by Andy Dehnart, talks about services and tools that do similar things.

Perhaps the greatest use of these plagiarism catching tools is that they might show us how anything we write is woven out of the words of others. It’s possible these could be adapted to show us the web of connections radiating out from anything written.

Note: This entry was edited in Feb. 2018 to fix broken links. Thanks to Alisa from Plagiarism Check for alerting me to the broken links.

Kirschenbaum: Hamlet.doc?

Matt Kirschenbaum has published an article in The Chronicle of Higher Education titled, Hamlet.doc? Literature in a Digital Age (From the issue of August 17, 2007.) The article nicely summarizes teases us with the question of what we scholars could learn about the writing of Hamlet if Shakespeare had left us his hard-drive. Kirschenbaum has nicely described and theorized the digital archival work humanists will need to learn to do in his forthcoming book from MIT Press, Mechanisms. Here is the conclusion of the Chronicle article,

Literary scholars are going to need to play a role in decisions about what kind of data survive and in what form, much as bibliographers and editors have long been advocates in traditional library settings, where they have opposed policies that tamper with bindings, dust jackets, and other important kinds of material evidence. To this end, the Electronic Literature Organization, based at the Maryland Institute for Technology in the Humanities, is beginning work on a preservation standard known as X-Lit, where the “X-” prefix serves to mark a tripartite relationship among electronic literature’s risk of extinction or obsolescence, the experimental or extreme nature of the material, and the family of Extensible Markup Language technologies that are the technical underpinning of the project. While our focus is on avant-garde literary productions, such literature has essentially been a test bed for a future in which an increasing proportion of documents will be born digital and will take fuller advantage of networked, digital environments. We may no longer have the equivalent of Shakespeare’s hard drive, but we do know that we wish we did, and it is therefore not too late ‚Äî or too early ‚Äî to begin taking steps to make sure we save the born-digital records of the literature of today.

Mashing Texts and Just in Time Research

With colleagues StÃ©fan Sinclair, Alexandre Sevigny and Susan Brown, I recently got a SSHRC Research and Development Initiative grant for a project Mashing Texts. This project will look at “mashing” open tools to test ideas for text research environments. Here is Powerpoint File that shows the first prototype for a social text environment based on Flickr.

From the application:

The increasing availability of scholarly electronic texts on the internet makes it possible for researchers to create “mashups” or combinations of streams of texts from different sources for purposes of scholarly editing, sociolinguistic study, and literary, historical, or conceptual analysis. Mashing, in net culture, is reusing or recombining content from the web for purposes of critique or creating a new work. Web 2.0 phenomena like Flickr and FaceBook provide public interfaces that encourage this recombination (see “Mashup” article and Programmableweb.com.) Why not recombine the wealth of electronic texts on the web for research? Although such popular social networking applications as mashups seem distant from the needs of humanities scholars, in many ways so-called mashups or repurposing of digital content simply extend the crucial principle developed in humanities computing for the development of rich text markup languages: that content and presentation should be separable, so that the content can be put to various and often unanticipated uses.

Mashing Texts will prototype a recombinant research environment for document management, large-scale linguistic research, and cultural analysis. Mashing Texts proposes to adapt the document repository model developed for the Text Analysis Portal for Research (TAPoR) project so that a research team interested in recombinant documents can experiment with research methods suited to creating, managing and studying large collections of textual evidence for humanities research. The TAPoR project built text analysis infrastructure suited to analysis of individual texts. Mashing Texts will prototype the other side of the equation ï¿½ï¿½ï¿½ the rapid creation of large-scale collections of evidence. It will do this by connecting available off-the-shelf open-source tools to the TAPoR repository so that the team can experiment with research using large-scale text methods.

Many Eyes : Word Tree

Many Eyes, the IBM visualization site I blogged before, has a great new text visualization tool called a Word Tree. It is an effective way to structure a concordance by the patterns of words. Thanks to Shawn for this.

I should add that I am impressed with the Many Eyes Topic Hub idea like the Literature (general) hub. It shows the potential to have discussion and shared research online.