Paolo Sordi: I blog therefore I am

I_am_remix
On the ethos of digital presence: I participated today in a panel launching the Italian version of Paolo Sordi’s book I Am: Remix Your Web Identity. (The Italian title is Bloggo Con WordPress Dunque Sono.) The panel included people like Domenico Fiormonte, Luisa Capelli, Daniela Guardamangna, Raul Mordenti, and, of course, Paolo Sordi.

Continue reading Paolo Sordi: I blog therefore I am

LOTRProject: Visualizing the Lord of the Rings

ChrctrMentions

Emil Johansson, a student in Gothenburg, has created a fabulous site called the LOTRProject (or Lord Of The Rings Project. The site provides different types of visualizations about Tolkien’s world (Silmarillion, Hobbit, and LOTR) from maps to family trees to character mentions (see image above).

Continue reading LOTRProject: Visualizing the Lord of the Rings

Literary Analysis and the Wolfram Language

digital-research-methods-cover-2015-medium

Lately I’ve been trying Wolfram Mathematica more an more for analytics. I was introduced to Mathematica by Bill Turkel and Ian Graham who have done some impressive stuff with it. Bill Turkel has now created a open access, open content, and open source textbook Digital Research Methods with Mathematica. The text is a Mathematica notebook itself so, if you have Mathematica you can actually use the text to do analytics on the spot.

Wolfram has also posted an interesting blog entry on Literary Analysis and the Wolfram Language: Jumping Down a Reading Rabbit Hole. They show how you can generate word clouds and sentiment analysis graphs easily.

While I am still learning Mathematica, some of the features that make it attractive include:

  • It uses a “literate programming” model where you write notebooks meant to be read by humans with embedded code rather than writing code with awkward comments embedded.
  • It has a lot of convenient Web, Language, and Visualization functions that let you do things we want to do in the digital humanities.
  • You can call on Wolfram Alpha in a notebook to get real world knowledge like capital cities or maps or language information.

Text Mining The Novel 2015

novelTMworkshop

On Thursday and Friday (Oct. 22nd and 23rd) I was at the 2nd workshop for the Text Mining the Novel project. My conference notes are here Text Mining The Novel 2015. We had a number of great papers on the issue of genre (this year’s topic.) Here are some general reflections:

  • The obvious weakness of text mining is that it operates on the novel as text, specifically digital text (or string.) We need to find ways to also study the novel as material object (thing), as a social object, as a performance (of the reader), and as an economic object in a market place. Then we also have to find ways to connect these.
  • So many analytical and mining processes depend on bags of words from dictionaries to topics. Is this a problem or a limitation? Can we try to abstract characters, plot, or argument.
  • I was interested in the philosophical discussions around the epistemological in novels and philosophical claims about language and literature.

 

Data Management Plan Recommendation

Today I deposited a Data Management Plan Recommendation for Social Science and Humanities Funding Agencies (http://hdl.handle.net/10402/era.42201in our institutional repository ERA. This report/recommendation was written by Sonja Sapach with help from me and Catherine Middleton. We recommended that:

Agencies that fund social science and humanities (SSH) research should move towards requiring a Data Management Plan (DMP) as part of their application processes in cases where research data will be gathered, generated, or curated. In developing policies, funding agencies should consult the community on the values of stewardship and research that would be strengthened by requiring DMPs. Funding agencies should also gather examples and data about reuse of archived data in the social sciences and humanities and encourage due diligence among researchers to make themselves aware of reusable data.

On the surface the recommendation seems rather bland. SSHRC has required the deposit of research data they fund for decades. The problem, however, is that few of us pay attention because it is one more thing to do, and something that shares hard-won data with others that you may want to continue milking for research. What we lack is a culture of thinking of the deposit of research data as a scholarly contribution the way the translation and edition of important cultural texts is. We need a culture of stewardship as a TC3+ (tri-council)  document put it. See Capitalizing on Big Data: Toward a Policy Framework for Advancing Digital Scholarship in Canada (PDF).

Given the potential resistance of colleagues it is important that we understand the arguments for requiring planning around data management and that is one of the things we do in this report. Another issue is how to effectively require at the funding proposal end something (like a Data Management Plan) that would show how the researchers are thinking through the issue. To that end we document the approaches of other funding bodies. The point is that this is not actually that new and some research communities are further ahead.

At the end of the day, what we really need is a recognition that depositing data so that it can be used by other researchers is a form of scholarship. Such scholarship can be assessed like any other scholarship. What is the data deposited and what is its quality? How is the data deposited? How is it documented? Can it have an impact?

You can find this document also at Catherine Middleton’s web site and Sonja Sapach’s web site.

 

Medical Privacy Under Threat in the Age of Big Data

The Intercept has a good introductory story about Medical Privacy Under Threat in the Age of Big Data. I was surprised how valuable medical information is. Here is a quote:

[h]e found a bundle of 10 Medicare numbers selling for 22 bitcoin, or $4,700 at the time. General medical records sell for several times the amount that a stolen credit card number or a social security number alone does. The detailed level of information in medical records is valuable because it can stand up to even heightened security challenges used to verify identity; in some cases, the information is used to file false claims with insurers or even order drugs or medical equipment. Many of the biggest data breaches of late, from Anthem to the federal Office of Personnel Management, have seized health care records as the prize.

The story mentions Latanya Sweeny, who is the Director of the Data Privacy Lab at Harvard. She did important research on Discrimination in Online Ad Delivery and has a number of important papers on health records like a recent work on Matching Known Patients to Health Records in Washington State Data that showed that how one could de-anonymize Washington State health data that is for sale by search news databases. We are far more unique than we think we are.

I should add that I came across an interesting blog post by Dr Sweeny on Tech@FTC arguing for an interdisciplinary field of Technology Science. (Sweeny was the Chief Technologist at the FTC.)

Depositing Archives

We have recently deposited two research archives here at the University of Alberta. One is the John B. Smith Archive. You can download bundles or the complete archive which can be found at http://hdl.handle.net/10402/era.41201. Amy Dyrbye and I worked with John B. Smith to assemble this, document it and deposit it in ERA (the Education and Research Archive).

Another archive that we are building is a collection around Gamergate. The DOI for this is:

doi:10.7939/DVN/10253

For this we are using Dataverse that allows us to manage the archive and publish some parts or not.

Given the work that goes into developing and documenting these archives I would argue that they should be considered scholarly work, but that is another matter.

KIAS shrinks carbon footprints “Around The World”

The Office of Sustainibility at the University of Alberta has recognized our work at the Kule Institute for Advanced Study to develop models for sustainable research. They have published a nice story about the Around the World conference that we run with the title, KIAS shrinks carbon footprints “Around The World”. The question we need to ask ourselves is whether our academic reward system isn’t encouraging flying to conferences where other means of meeting would work. What would it mean to do sustainable research?