Yesterday I gave a talk at the Orion conference Powering Research and Innovation: A National Summit on a panel on Cyberinfrastructure on “Cyberinfrastructure in the Humanities: Back to Supercomputing.” Alas Michael Macy from Cornell, who was supposed to also talk didn’t make it. (It is always more interesting to hear others than yourself.) I was asked to try to summarize the humanities needs/perspectives on cyberinfrastructure for research which I did by pointing people to the ACLS Commission on Cyberinfrastructure report “Our Cultural Commonwealth.” One of the points worth making over an over is that we have a pretty good idea now what researchers in the humanities need as a base level of infrastructure (labs, servers and support). The interesting question is how our needs are evolving and I think that is what the Bamboo project is trying to document. Another way to put it is that research computing support units need strategies for handling the evolution of cyberinfrastructure. They need ways of knowing what infrastructure should be treated like a utility (and therefore be free, always on and funded by the institution) and what infrastructure should be funded through competitions, requests or not at all. We would all love to have everything before we thought of it, but institutions can’t afford expensive stuff no one needs. My hope for Bamboo is that it will develop a baseline of what researchers can be shown to need (and use) and then develop strategies for consensually evolving that baseline in ways that help support units. High Performance Computing access is a case in point as it is very expensive and what is available is usually structured for science research. How can we explore HPC in the humanities and how would we know when it is time to provide general access?
Information Overload and Clay Shirky
Peter sent me to Clay Shirky’s It’s Not Information Overload. It’s Filter Failure talk at the Web 2.0 Expo in New York which starts with a chart from a IDC White Paper showing the growth of digital information. His title summarizes his position on the issue of Information Overload, but on the way he made the point that we have been complaining about overload for a while. To paraphrase Shirky, “if the problem doesn’t go away it is a fact.” Shirky jokes that the issue comes up over an over because “it makes us feel better” about not getting anything done.
I, like others, have used the overload meme to start talks and am now wondering about the meme. Recently I was researching a talk for CaSTA 2008 that started from this issue of excess information and found that Vannevar Bush had used overload as the problem to drive his essay, “As We May Think” in 1945.
There is a growing mountain of research. But there is increased evidence that we are being bogged down today as specialization extends. The investigator is staggered by the findings and conclusions of thousands of other workers—conclusions which he cannot find time to grasp, much less to remember, as they appear. Yet specialization becomes increasingly necessary for progress, and the effort to bridge between disciplines is correspondingly superficial.
Professionally our methods of transmitting and reviewing the results of research are generations old and by now are totally inadequate for their purpose. (Vannevar Bush, As We May Think)
If Shirky is right that this is a fact, not a problem, and that we default to using it to leverage ideas as solutions, then we have to look again at the perception of overload. Some of the questions we might ask are:
- What is the history of the perception of overload?
- Is it something that can be solved or is it a like a philosophical problem that we return to informatics as a ground for discussion?
- Have structural changes in how information is produced and consumed affected our perception as Shirky claims? (He talks about FaceBook being a structural change for which our balancing filtering mechanisms haven’t caught up.)
- One common response in the academy is to call for less publishing (usually they call for more quality and less pressure on researchers to crank out books to get tenure.) Why doesn’t anyone listen (and stop writing?)
- What role do academics play in the long term selection and filtering that shapes the record down to a canon?
NiCHE: The Programming Historian
NiCHE (Network in Canadian History & Environment) has a useful wiki called The Programming Historian by William Turkel and Alan MacEachern. The wiki is a “tutorial-style introduction to programming for practicing historians” but it is could also be used by textual scholars who want to be able to program their own tools. It takes you through learning and using Python for text processing for things like word frequencies and KWICs. It reminds me of Susan Hockey’s book, Snobol Programming for the Humanities. (Oxford: Oxford University Press, 1985) which I loved at the time, even if I couldn’t find a Snobol interpreter for the Mac.
We need more of such books/wikis.
Conference Report: Tools For Data-Driven Scholarship
I just got back from the Tools For Data-Driven Scholarship meeting organized by MITH and the Centre for New Media and History. This meeting was funded by the NEH, NSF, and the IMLS and brought together tool developers, content providers (like museums and public libraries), and funders (NEH, JISC, Mellon, NSF and IMLS.) The goal was to imagine initiative(s) that could advance humanities tool development and connect tools better with audiences. I have written a Conference Report with my notes on the meeting. One of the interesting questions asked by a funder was “What do the developers really want?” It was unclear that developers really wanted some of the proposed solutions like a directory of tools or code repository. Three things the breakout group I was in came up with was:
- Recognition, credit and rewards for tool development – mechanisms to get academic credit for tool development. This could take the form of tool review, competitions, prizes or just citation when our tool is used. In other words we want attention.
- Long-term Funding so that tool development can be maintained. A lot of tool development takes place in grants that run out before the tool can really be tested and promoted to the community. In other words we want funding to continue tool development without constantly writing grants.
- Methods, Recipes, and Training that are documented that bring together tools in the context of humanities research practices. We want others with the outreach and writing skills to weave stories about their use to help introduce tools to others. In other words we want others to do the marketing of our tools.
A bunch of us sitting around after the meeting waiting for a plane had the usual debriefing about such meetings. What do they achieve even if they don’t lead to initiatives. From my perspective these meeting are useful in unexpected ways:
- You meet unexpected people and hear about tools that you didn’t know about. The social dimension is important to meetings organized by others that bring people together from different walks. I, for example, finally met William Turkle of Digital History Hacks.
- Reports are generated that can be used to argue for support without quoting yourself. There should be a report from this meeting.
- Ideas for initiatives are generated that can get started in unexpected ways. Questions emerge that you hadn’t thought of. For example, the question of audience (both for tools and for initiatives) came up over and over.
Fortune of the Day – Fortune Hunting
Lisa Young with the support of the Brown University Scholarly Technology Group (STG) has developed a Fortune of the Day – Fortune Hunting interactive art site based on a collection of scanned fortune cookie slips she created. It has elements of a public textuality site like the Dictionary though focused completely on fortunes. The interface is simple and elegant. I believe it has been exhibited recently for the first time. The project uses the TAPoRware Visual Collocator for one of its interfaces.
University Affairs: MLA changes course on web citations
University Affairs has a story by Tim Johnson on the latest MLA Style Manual, titled “MLA changes course on web citations”, where they quote me about the new MLA recommendation that URLs aren’t needed in citations (because they aren’t reliable.) I had a long discussion with Tim – being interviewed when they have talked to other people is a strange way to learn about a subject. In retrospect it would have been more useful to point out the emerging alternatives to URLs, some of which are designed to be more stable. Some that I know of:
- TinyURL and similar projects let you get a short (“tiny”) URL that redirects to the full location. A list of such tools is at http://daverohrer.com/15-tinyurl-alternatives-shorten-your-urls/
- The Digital Object Identifier (DOI®) System allows unique identifiers to be allocated and then has a resolution system to point to a location(s). To quote from their Overview, a DOI “is a name (not a location) for an entity on digital networks. It provides a system for persistent and actionable identification and interoperable exchange of managed information on digital networks.”
- The WayBack Machine grabs copies of web pages at regular intervals if allowed. You can thus see changes in the document over time.
In short, we don’t have a clear standard that has emerged, but we have alternatives that could provide us with a stable system.
I should add that the point of a citation is not what is in it, but whether it lets you easily find the referenced research so that we can recapitulate the research.
CaSTA 2008: New Directions in Text Analysis
I am at the CaSTA 2008 New Directions in Text Analysis conference at the University of Saskatchewan in Saskatoon. The opening keynote by Meg Twycross was a thorough and excellent tour through manuscript digitization and forensic analysis techniques.
My notes are in a conference report (being written as it happens.)
Today is Open Access Day
Today, October 14th, 2008, is Open Access Day which I discovered the University of Alberta library promotes thanks to Erika.
The Canadian libraries supporting OAD are listed on the Open Access Day 2008 wiki. I love the U of Calgary comment, “We’re considering options but will definitely mark the day.” U of Alberta, by contrast has a number of initiatives including a Open Access blog and a We Support Open Access (PDF) poster.
Of particular interest is the SPARC Author’s Addendum which is a form for author’s to fill out to assert their copyright when signing agreements with publishers. It basically adds an addendum to whatever agreement you are signing that asserts that you retain copyright and that you retain the right to reproduce the article for non-commercial purposes. It is a nice little “tool”. Now we need one like that for graduate students when they are signing the Theses Canada license. What would it assert?
University Libraries in Google Project to Offer Backup Digital Library – Chronicle.com
From Bethany I discovered this story by the Chronicle of Higher Education about the HathiTrust, titled University Libraries in Google Project to Offer Backup Digital Library (Jeffrey R. Young, Oct. 13, 2008). “Hathi” is the hindi word for elephant suggesting memory and size. Here is a quote from the HathiTrust site:
As a digital repository for the nation’s great research libraries, HathiTrust (pronounced hah-TEE) brings together the immense collections of partner institutions.
HathiTrust was conceived as a collaboration of the thirteen universities of the Committee on Institutional Cooperation and the University of California system to establish a repository for these universities to archive and share their digitized collections. Partnership is open to all who share this grand vision.
The repository, among other things, will pool the volumes digitized by Google in collaboration with the universities so there is a backup should Google lose interest. Large-scale search is being studied now and they expect in November to have preview version available.
A Companion to Digital Literary Studies
The A Companion to Digital Literary Studies edited by Ray Siemens and Susan Schreibman is available online in full text. This is tremendous resource with too many excellent contributions to list individually. Chapters go from Reading on the Screen by Christian Vandendorpe and Algorithmic Criticism by Stephen Ramsay.
There is a good Annotated Overview of Selected Electronic Resources by Tanya Clement and Gretchen Gueguen with links to projects like TAPoR.