Harvard and Open Access

Peter Suber in Open Access News has reproduced the text of the motion that the Faculty of Arts and Science at Harvard passed requiring faculty to deposit a copy of their articles with the university.

The Faculty of Arts and Sciences of Harvard University is committed to disseminating the fruits of its research and scholarship as widely as possible. In keeping with that commitment, the Faculty adopts the following policy: Each Faculty member grants to the President and Fellows of Harvard College permission to make available his or her scholarly articles and to exercise the copyright in those articles.

According to another post by Peter Suber, Harvard is the first North American university to adopt an open access policy. He calls it a “permission mandate” (granting permission to the university to make research open) rather than a “deposit mandate.” It has the virtue that the university takes responsibility for maintaining the access, not the faculty member.

More on this can be found here (another Suber post) and here (Chronicle of Higher Ed.).

Tech and the Humanities: The MLA at Chicago

Right after Christmas I was involved in two events at the MLA. I organized and presided over a session on Open Digital Communities which was nicely written up by the Chronicle of Higher Education, Tech and the Humanities: A Report from the Front Lines – Chronicle.com.

I also participated in a newish format for the MLA – what is now being called a digital roundtable on Textual Visualization organized by Maureen Jameson where I showed visualization tools available through TAPoRware and the TAPoR portal.

Republican Debate: Analyzing the Details – The New York Times

Screen Image The New York Times has created another neat text visualization, this time for the Republican Debate. The visualization has two panels. One shows the video, a transcript, and sections. You can jump the video using the transcript or section outline. The other is a “Transcript Analyzer” where you can see a rich prospect of the debate divided by speeches and you can search for words. What is missing is some sort of overview of what the high frequency words are and how they collocate.

So, I have created a public text for analysis in TAPoR and here are some results. Here is a list of words that are high frequency generated using the List Words tool. Some interesting words:

People (76), Think (66), Know (48), Giuliani (42), Clinton (33), Reagan (13), Democrats (16), Republicans (11)

Health (45), Government (35), Security (35), Country (25), Policy (16), Military (15), School (15),

Marriage (23), Insurance (23), Conservative (23), Private (22), Let (21), Gay (12)

Iraq (13), Iran (12), Turkey (7), Canada (2), Darn (2), Europe (5),

Immigrants (5), Citizens (2)

Man (7), Mean (7), Woman (4), Congressman (25)

Answer (10), Problem (10), Solution (5), War (12)

Kirschenbaum: Hamlet.doc?

Matt Kirschenbaum has published an article in The Chronicle of Higher Education titled, Hamlet.doc? Literature in a Digital Age (From the issue of August 17, 2007.) The article nicely summarizes teases us with the question of what we scholars could learn about the writing of Hamlet if Shakespeare had left us his hard-drive. Kirschenbaum has nicely described and theorized the digital archival work humanists will need to learn to do in his forthcoming book from MIT Press, Mechanisms. Here is the conclusion of the Chronicle article,

Literary scholars are going to need to play a role in decisions about what kind of data survive and in what form, much as bibliographers and editors have long been advocates in traditional library settings, where they have opposed policies that tamper with bindings, dust jackets, and other important kinds of material evidence. To this end, the Electronic Literature Organization, based at the Maryland Institute for Technology in the Humanities, is beginning work on a preservation standard known as X-Lit, where the “X-” prefix serves to mark a tripartite relationship among electronic literature’s risk of extinction or obsolescence, the experimental or extreme nature of the material, and the family of Extensible Markup Language technologies that are the technical underpinning of the project. While our focus is on avant-garde literary productions, such literature has essentially been a test bed for a future in which an increasing proportion of documents will be born digital and will take fuller advantage of networked, digital environments. We may no longer have the equivalent of Shakespeare’s hard drive, but we do know that we wish we did, and it is therefore not too late ‚Äî or too early ‚Äî to begin taking steps to make sure we save the born-digital records of the literature of today.

Long Bets Now

Have you ever wanted to go on record with a prediction? Would you like put money (that goes to charity) on your prediction? The Long Bets Foundation lets you do just that. It is a (partial) spin-off of The Long Now Foundation where you can register and make long-term predictions (up to thousands of years, I believe.) The money bet and challenged goes to charity; all you get if you are right is credit and the choice of charity. An example prediction in the text analysis arena is:

Gregory W. Webster predicts: “That by 2020 a wearable device will be available that will use voice recognition capability and high-volume storage to monitor and index conversations you have or conversations which occur in your vicinity for later searching as supplemental memory.” (Prediction 16)

Some of the other predictions of interest to humanists are: 177 about print on demand, 179 about reading on digital devices, and 295 about a second renaissance.

The Long Bet has some interesting people making predictions and bets (a prediction becomes a bet when formally challenged) including Ray Kurzweil betting against Mitch Kapor that “By 2029 no computer – or “machine intelligence” – will have passed the Turing Test.” (Bet 1)

Just to make life interesting there is a prediction 137 that “The Long Bets Foundation will no longer exist in 2104.” 63% of the voters seem to agree!

IDC White Paper: The Digital Universe

Image of Report CoverIn an earlier blog I mentioned the IDC report, The Digital Universe, about the explosion of digital information. It was commissioned by EMC Corporation and is available free on their site, here. They also have a page on related information which includes a link to “Are You an Informationist?” and “The Inforati Files”.

The PDF of the IDC White Paper includes some interesting points:

  • Between 2006 and 2010, the information added annually to the digital universe will increase more than six fold from 161 exabytes to 988 exabytes.
  • Three major analog to digital conversions are powering this growth ‚Äì film to digital image capture, analog to digital voice, and analog to digital TV.
  • Images, captured by more than 1 billion devices in the world, from digital cameras and camera phones to medical scanners and security cameras, comprise the largest component of the digital universe. They are replicated over the Internet, on
    private organizational networks, by PCs and servers, in data centers, in digital TV broadcasts, and on digital projection movie screens. building automation and security migrates to IP networks, surveillance goes digital, and RFID and sensor networks

Is it time to rewrite “The Work of Art in the Age of Mechanical Reproduction” to think about about “The Image in the Age of Networked Distribution”.

The Exchange Online

Robert Townsend has a Review of the ACLS Cyberinfrastructure Report in the The Exchange Online of the Association of American Universtiy Presses. He is critical of the report, arguing,

To make its case, the commission simply ignores skeptics who ask whether the rush to mass digitization could hurt reading and scholarship, and whether there might be other casualties on this road to progress. This offers a rather narrow view of the “grand challenges” facing the humanities and social sciences, and limits the array of problems that might be remedied by a developed cyberinfrastructure. This seems part of a larger rhetorical strategy in the report, however, which positions potential problems and the costs of digitization as external to its vision of technological progress—limiting them to social, political, or financial failures that can be assigned to publishers and “conservative” academics.

He rightly points to the ongoing costs of maintaining digital projects, “Like Jacob Marley‚Äôs chains, link-by-link we forge these digital burdens that we can never seem to lay down.” (Great image.) He is worried about the place of non-profit publisher who might get left behind if there is massive investment in cyberinfrastructure that goes to the universities who then cut out the publishers. I’m tempted to say that this is an old refrain, but that doesn’t make the issue go away. Frankly I doubt cyberinfrastructure investment will endanger quality publishers, but it may change their relationship with the academy. More importantly I think the Report (see previous blog entry) was making the case for investment in humanities and arts cyberinfrastructure so we can do our research, including research around digital publications.


Shawn recently introduced me to TiddlyWiki, which Stéfan Sinclair has also blogged. It is a web page (with over 5000 lines of code) that acts as a wiki if you have write priviledges to the file. It is an extremely smart and simple tool that I don\’t really think of as a wiki since it really is more like a web page application for private and local use. You can use it to keep notes on your local computer just by saving an empty page.

I have the feeling there is a principle to technologies like TiddlyWiki – simple objects that are both application and data, documents that carry the smarts needed so you don\’t need a separate application (well actually you do need a browser.) Reminds me of the document-centric view of OpenDoc that Apple tried unsuccessfully to promote. What other TiddlyWiki like doc/apps can we imagine:

  • A Curriculum Vitae that one can add items to and reorganize in different views.
  • A Bibliography that lets you maintain references and then export them.
  • A Analyze Me TiddlyWiki that lets you paste in data (or text) to study and then lets you run analysis on it to get results that become part of the document

Forking the Wikipedia

Larry Sanger forks the Wikipedia reports on an initiative by one of the founders of the Wikipedia to create an alternative by taking the content and setting up an editorial system with more control by expert editors. The alternative would be called the called the Citizendium.

The Wikipedia is an important example of a social knowledge network that has stirred up a lot of controversy this year. There is a literature now about the Wikipedia and its discontents. See, for example the Request for Comments (RFC) by Alan Liu about student use of the Wikipedia. He sees 2006 as a threshold year when students started using the Wikipedia like never before.

Is it a sign of maturity when web phenomena like the Wikipedia don’t just get reported with that “gee whiz, isn’t this neat” tone, but are being really debated?