Blacklight: Faceted searching at UVA

Screen capture of BlacklightBlacklight is a neat project that Bethany Nowviskie pointed me to at the University of Virginia. They have indexed some 3.7 million records from their library online catalogue and set up a faceted search and browse tool.

What is faceted searching and browsing? Traditionally search environments like those for finding items in a library have you fill in fields. In Blacklight you can both search with words, but you can also add constraints by clicking on categories within the metadata. So, if I search for “gone with the wind” in Blacklight it shows that there are 158 results. On right it shows how those results are distributed over different categories. It shows me that 41 of these are “BOOK” in the category “format”. If I click on “BOOK” it then adds a constraint and updates the categories I can use further. Backlight makes good use of inline graphics (pie charts) so you can see at a glance what percentage of the remaining results are in what category type.

This faceted browsing is a nice example of a rich-prospect view on data where you can see and navigate by a “prospect” of the whole.

Blacklight came out of work on Collex. It is built on Flare which harnesses Solr through Ruby on Rails. As I understand it, Blacklight is also interesting as an open-source experimental alternative to very expensive faceted browsing tools that comes out of the Collex project. It is a “love letter to the Library” from a humanities computing project and its programmer.

YEP: PDF Broswer

Screen of YepYep is the best new software I’ve come across in a while. Yep is to PDFs on your Mac as iPhoto is to images and iTunes is to music – a well designed tool for managing large collections of PDFs. Yep can automatically load PDFs from your hard drive, search across them, tag them and let you assign tags with which to organize them. It also lets you move them around (something I wish iPhoto did) and export them to other viewers, e-mail and print.

Thanks to Shawn for pointing me to this.

Facing Facebook

Today’s Globe and Mail had a story No more Facebook for city employees (Jeff Gray, May 10, 2007) about how (Toronto) city employees will not be able to use Facebook at work despite the fact that “there is no evidence of rampant abuse”. Toronto seems to be following the province of Ontario which is reported to have banned Facebook as it “does not add value to a workplace environment and civil servants should not be wasting office time visiting the site” according to Premier Dalton McGuinty. Do we have evidence that it is not useful to civil servants? Could there be uses of social networking? At McMaster a number of librarians have Facebook accounts that they are using to be more accessible to students on the princple that they should be where their audience is. (Coming soon a WOW librarian.)

Issues of time wasting hide what to my mind is the more serious issue. Two of my students did a multimedia project on
Facing Facebook that deals with privacy issues. There is a Flash opinion piece that Alex pointed me to that similarly asks, Does what happens in the Facebook stay in the Facebook? These deal with the large-scale corporate privacy issues. My previous post, Facebook Ethics looks at the local ethical issues.

We need to avoid being spooked by a new use of technology just because it takes off. (Toronto is apparently the largest community on Facebook.) But, we also have to be vigilant.

McMaster Youth Media Study

A colleague of mine, Phillip Savage, supervised an interesting student research project into the attitudes of McMaster youth towards broadcast and youth media. A group of upper-class students in his Communication Studies courses surveyed students in a first-year class and prepared a report titled, The McMaster Youth Media Study (PDF). What is impressive is that one of the students, Christina Oreskovich, presented this at the “CBC New Media Panel” to the House of Commons Standing Committee on Canadian Heritage (today, May 10th, 2007.)

Here is their “composite sketch of the typical student” which nicely captures the results:

She is 18 years old in her first year of a liberal arts program, her parents were immigrants and she speaks English and another home language (mostly with her grandparents now). She has a cell phone with a built-in camera and is toying with the idea of perhaps in the summer upgrading to a phone with a built in MP3 player. She got a laptop computer when she started at Mac in the fall and has broadband access at home, and in certain locations on campus. She downloads music for free from the Internet form a range of sites and although she has an iPod she rarely pays for iTunes. She regularly downloads complete TV programs off the web to watch on the laptop but rarely whole movies. Almost every day she catches one or two items from YouTube (usually sent as attachments to electronic messages from friends). She occasionally uses MySpace for social networking. On a daily basis she keeps up to date with over 100 friends from school, home and work on Facebook. Only occasionally does she look at blogs; and she doesn’t keep one herself – though some of her friends do.

She still watches TV – usually at least once a day. Her favourite channel is City-TV but she also catches CTV, CBC-TV and CH. She regularly listens to radio (very rarely CBC Radio), although she figures she gets most of her music from other sources. She’s heard of people getting satellite radio but since she doesn’t have a car she very rarely experiences – it’s more something her Dad is into. She will read magazines quite regularly, at least once or twice a week.

She doesn’t feel she has the time or interest to follow most news closely, yet. When she does she is as likely to use traditional mass media (TV, newspaper or radio) as internet sources. She prefers TV, radio nd newspapers for national and international news, and the internet for arts and entertainment stories.

She feels strongly that the Internet allows her to both keep in touch with a wide range of friends, but worries a bit that maybe she is spending less face to face time with close friends and family. She is also a bit concerned about whether time on the internet is making her a little less productive in her school work, although she thinks that it really helps her understand quickly about what’s going on in the world and exposes her to a wide range of points of view (more so than traditional mass media). She gets worried at time about her own privacy on the Internet, especially when she spends so much time on Facebook. She’s not really sure if she can find more Canadian information on the Internet versus traditional media. (pages 3 and 4)

I knew Facebook was popular, but didn’t expect it to be this popular. I’m guessing that it is becoming an “always on” utility for many students that aggregates, summarizes and nicely shows what’s happening to their friends. I suspect interfaces like Facebook Mobile will become more and more popular as iPhone type cell phones become affordable.

BookCrossing – The World’s Biggest Free Book Club – Catch and Release Used Books

BookCrossing is a project a colleague librarian Barbara suggested to me as an example of new media and books intersecting. The idea is that people release books into the “wild” with a BCID label and number. Then others who find the book can log on and write in the journal of the book. Users can then watch how books travel around, being caught, read and released. Neat idea – would our library do this on campus? What if we took books being deacquisitioned and released them in departmental lounges or the student centre?

Joan Lippincott: Digital Learning Spaces

Henry Jenkins, the Director of the Comparative Media Studies Program at the Massachusetts Institute of Technology has written a whitepaper, Confronting the Challenges of Participatory Culture: Media Education for the 21st Century, for the MacArthur Foundation that talks about the challenges of dealing with students who are (or want to be) participating in creating culture. (See my previous entry for a link to a presentation by Jenkins.) Joan Lippincott from the Coalition for Networked Information gave a talk today about how we should think about the library, learning, and space for these NetGen students who are used to the participatory culture of the web. To summarize her discussion of the differences between us and the Net Generation based partly on Jenkins:

  • We tend to do things in serial (first this task and then that) whole the NetGen multitask.
  • We (especially in the Humanities) value privacy and solitary work while the NetGen like to work in teams.
  • We tend to value linear text while they value hyperlinked visual multimedia.
  • We value critical thinking while they value creative production.

Joan goes on to argue that to reach the Net Generation Libraries need to rethink their services and spaces. She showed images of new spaces and discussed some of what she has written about in Linking the Information Commons to Learning which is part of a book from EDUCAUSE, Learning Spaces. Two things stuck me:

  • Lack of Books. In most of the pictures shown of information commons there were no books! This certainly isn’t true when you look at the workstations of most students or faculty in their own spaces where books, papers, and computer are “mashed” together. Why then are information commons being set up apart from the books and periodicals? One wonders why libraries are building spaces that look more like what computing services should set up. Is it politics – libraries are doing what campus computing services failed to do? Joan, rightly I think, answered that these spaces are/should be set up in collaboration with people with technical skill (from computing) and that the idea is to connect students to content whether digital or print. Books should be there too or be at hand.
  • Lack of Faculty Coordination. While these spaces are popular with students (see Henning’s Final Report on a survey of learning commons), the deeper problem is integration into the curriculum. Individual faculty may take advantage of the changing services and spaces of the library, but I haven’t seen the deep coordination that sees courses across the curriculum changed. Faculty assume the library is a service unit that supports their teaching by having books on reserve. We don’t think of the library as a living space where students are talking through our assignments, collaborating and getting help with their essays. We don’t coordinate changes in how we teach with changes in space and service, but stumble upon new services and weave them into our courses if we have the time (and it does take time to change how you teach.)

So here are a couple of ideas:

  • Curated Distributions. We should think along the lines suggested in A world in three aisles, Gideon Lewis-Kraus’ fascinating discussion of the Prelingers’ personal curated library where materials are arranged in associative clusters based on a curatorial practice designed to encourage pursuing topics that cross traditional shelf distribution. Why not invite faculty to curate small collections of books to be distributed among the workstations of a commons where users can serindipitously come across them, wonder why they are there, and browse not just sites, but thematic collections of books?
  • Discovery Centres. Another approach would be to work with chairs and deans to identify key courses or sets of courses and then build spaces with faculty input that are designed for studying for those courses. The spaces would have a mix of meeting spaces optimized for tutorials in the course(s), groupwork spaces for the types of groups formed in the courses, print materials (like books and magazines) needed for the course, and electronic finding aids for online materials related to the course. These topical spaces would be centres for students in these courses to access relevant information, browse related materials, meet other students, and get help. A library could obviously only afford a limited number of these, which is why the idea would be to target stressful first and second year courses where chairs identify the need and opportunity for discovery centres.

Desk Set (1957)

Image of movie coverDesk Set (1957) is a Katherine Hepburn and Spencer Tracy movie about automation where Tracy, an engineer is brought in to automate the research department run by Bunny Watson (Hepburn.) There is a moment of interest to digital humanists when Tracy is showing off EMERAC:

Boss: Well there she is, EMERAC, the modern miracle …

Richard Sumner (Tracy): The purpose of this machine, of course, is to free the worker…

Bunny Watson (Hepburn): You can say that again…

Sumner: …to free the worker from the routine and repetitive tasks and liberate his time for more important work.

For example, you see all those books there … and the ones up there? Well, every fact in them has been fed into Emmy. What do you have there?

Operator: This is Hamlet

Boss: That’s Hamlet?

Operator: Yes the entire text.

Sumner: In code, of course… Now these little cards create electronic impulses which are accepted and retained by the machine so that in the future, if anyone calls up and wants a quotation from Hamlet the research worker types it into the machine here, Emmi goes to work, and the answer comes out here.

Boss: And it never makes a mistake.

Sumner: Well … Now that’s not entirely accurate. Emmy can make a mistake.

Bunny: Ha ha…

Sumner: But only if the human element makes the mistake first.

Boss: Tell me Bunny, has EMERAC been helping you any?

Bunny: Well frankly it hasn’t started to give yet. For the past two weeks we’ve been feeding it information. But I think you could safely say that it will provide more leisure for more people.

There is an image of EMERAC on Flickr.

bastwood.com: Aphex Face and transcoding

Image of Aphex Demonbastwood.com has a good page on images found in sound starting with the demon face in Aphex Twin’s “Windowlicker”. It turns out the demon is really an inverted version of the Twin himself. The site goes on to discuss how to find such images and how to create sound from images using common software.

There are some uses for image/video sonnification tools other than putting surprises in tunes. See the The vOICE synthetic vision software site which sells systems for the visually impaired.

Thanks to Alex for the link.

TAPoRware Word Cloud

We’ve been playing with ways to make text analysis tools like word clouds that don’t need parameters work automatically on loading a page. See TAPoRware Word Cloud documentation. Here is an example.


An alternate beginning to humanities computing

Reading Andrew Booth’s Mechanical Resolution of Linguistic Problems (1958) I came across some interesting passages about the beginnings of text computing that suggest an alternative to the canonical Roberto Busa story of origin. Booth (the primary author) starts the book with a “Historical Introduction” in which he alludes to Busa’s project as part of a list of linguistic problems that run parallel to the problems of machine translation:

In parallel with these (machine translation) problems are various others, sometimes of a higher, sometimes of a lower degree of sophistry. There is, for example the problem of the analysis of the frequency of occurence of words in a given text. … Another problem of the same generic type is that of constructing concordances for given texts, that is, lists, usually in alphabetic order, of the words in these texts, each word being accompanied by a set of page and line references to the place of its occurrence. … The interest at Birkbeck College in this field was chiefly engendered by some earlier research work on the Dialogues of Plato … Parallel work in this field has been carried out by the I.B.M. Corporation, and it appears that some of this work is now being put to practical use in the preparation of a concordance for the works of Thomas Aquinas.
A more involved application of the same sort is to the stylistic analysis of a work by purely mechanical means. (p. 5-6)

In Mechanical Resolutions he continues with a discussion of how to use computers to count words and to generate concordances. He has a chapter on the problem of Plato’s dialogues which seems to have been a set problem at that time and, of course, there are chapters on dictionaries and machine translation. He describes some experiments he did starting in the late 40s that suggest that Searle’s Chinese Room Argument of 1980 might have been based on real human simulations.

Although no machine was available at this time (1948), the ideas of Booth and Richens were extensively tested by the construction of limited dictionaries of the type envisaged. These were used by a human untutored in the languages concerned, who applied only those rules which could eventually be performed by a machine. The results of these early ‘translations’ were extremely odd, … (p. 2)

Did others run such simulations of computing with “untutored” humans in the early years when they didn’t have access to real systems? See also the PDF of Richens and Booth, Some Methods of Mechanized Translation.

As for Andrew D. Booth, he ended up in Canada working on French/English translation for the Hansard, the bilingual transcript of parlimentary debates. (Note that Bill Winder has also been working on these, but using them as source texts for bilingual collocations. ) Andrew and Kathleen Booth wrote a contribution on The Origins of MT (PDF) that describes his early encounters with pioneers of computing around the possibilities of machine translation starting in 1946.

We date realistic possibilities starting with two meetings held in 1946. The first was between Warren Weaver, Director of the Natural Sciences Division of the Rockefeller Foundation, and Norbert Wiener. The second was between Weaver and A.D. Booth in that same year. The Weaver-Wiener discussion centered on the extensive code-breaking activities carried out during the World War II. The argument ran as follows: decryption is simply the conversion of one set of “words”–the code–into a second set, the message. The discussion between Weaver and A.D. Booth on June 20, 1946, in New York identified the fact that the code-breaking process in no way resembled language translation because it was known a priori that the decrypting process must result in a unique output. (p. 25)

Booth seems to have successfully raised funds from the Nuffield Foundation for a computer at Birkbeck College at the University of London that was used by L. Brandwood for work on Plato, among others. In 1962 he and his wife migrated to Saskatchewan to work on bilingual translation and then to Lakehead in Ontario where they “continued with emphasis on the construction of a large dictionary and the use of statistical techniques in linguistic analysis” in 1972. They retired to British Columbia in 1978 as most sensible Canadians do.

In short, Andrew Booth seems to have been involved in the design of early computers in order to get systems that could do machine translation and that led him to support a variety of text processing projects including stylistic analysis and concording. His work has been picked up as important to the history of machine translation, but not for the history of humanities computing. Why is that?
In a 1960 paper on The future of automatic digital computers he concludes,

My feeling on all questions of input-output is, however, the less the better. The ideal use of a machine is not to produce masses of paper with which to encourage Parkinsonian administrators and to stifle human inventiveness, but to make all decisions on the basis of its own internal operations. Thus computers of the future will communicate directly with each other and human beings will only be called on to make those judgements in which aesthetic considerations are involved. (p. 360)