Michael Jordan on the Delusions of Big Data

October 27th, 2014

IEEE Spectrum has an interview with Michael Jordan that touches on the Delusions of Big Data and Other Huge Engineering Efforts. He is worried about the white noise or false positives. If a dataset is big enough you can always find something to correlate with what you want. That doesn’t mean it is causal or informatively correlated. He predicts a “big-data winter” after the bubble of excitement pops.

After a bubble, when people invested and a lot of companies overpromised without providing serious analysis, it will bust. And soon, in a two- to five-year span, people will say, “The whole big-data thing came and went. It died. It was wrong.” I am predicting that.

Traсes – Augmented reality gifts

October 19th, 2014

From a New Scientist article I learned about Traсes. Traces lets you leave a bundle of information (like a song and some greetings) for someone at a particular GPS location (and at a particular time.) You can thus use it to add gifts for other people to find. It strikes me a neat use of augmented reality. I can imagine all sorts of uses for it beyond gifts:

  • One could use it to leave information about a place.
  • It could be used by artists to leave AR works as imagined by William Gibson in Spook Country.
  • One could create alternate reality games with it.

Alas, it is not available in the Canadian App Store.

Donkey Kong (Arcade) – The Cutting Room Floor

October 19th, 2014


From Slashdot I learned about The Cutting Room Floor, a wiki “dedicated to unearthing and researching unused and cut content from video games.” For example, they have information about Donkey Kong (Arcade) that includes unused music, unused graphics, hidden text (see above), and regional difference. Yet another example of how the fan community is doing history of videogames in innovative ways.

Trans-Atlantic Platform

October 16th, 2014

The Trans-Atlantic Platform: Social Sciences and Humanities is a collaboration among social science and humanities funders in different countries. In their About Us page they describe the purpose of this collaborative platform thus:

This Trans-Atlantic Platform will enhance the ability of funders, research organizations and researchers to engage in transnational dialogue and collaboration. It will identify common challenges and promote a culture of digital scholarship in social science and humanities research. It will facilitate the formation of networks within the social sciences and humanities and help connect them with other disciplines. It will also heighten awareness of the crucial role the social sciences and humanities play in addressing 21st century challenges.

The T-AP is co-chaired by the (then) President of SSRHC and the Netherlands social sciences funding agency. It likewise seems to be co-administered by SSRHC and NWO Social Sciences. The T-AP got funding that helped launch it from the European Commission 7th Framework Programme.

What is interesting is who is in T-AP. The German DFG and Americans NEH/NSF are down as “associated partners”. Brazilian, Canadian, Finish, French, Mexican, Dutch, Portuguese, and UK funding organizations are “key partners.” (See Partners page.)

I also have questions about T-AP:

  • Does this mean we will see more programmes like Digging into Data that can fund teams across countries? Wouldn’t it be great if a project could include the right people rather than the right people in Canada?
  • Or, will we see thematic collaborations like call on Sustainable Urban Development?
  • Will they try to harmonize research data policies?

Adobe is Spying on Users, Collecting Data on Their eBook Libraries

October 12th, 2014


Nate Hoffelder on The Digital Reader blog has broken a story about how Adobe is Spying on Users, Collecting Data on Their eBook Libraries. He and Arts Technica report that the Adobe’s Digital Editions 4 send data home about what you read and how far (what page) you get to. The data is sent in plain text.

Hoffelder used a tool called Wireshark to look at what was being sent out from his computer.

Sensitive Words: Hong Kong Protests

October 11th, 2014

On Thursday I heard a great talk by Ashley Esarey on “Understanding Chinese Information Control and State Preferences for Stability Maintenance.” He has been studying a dataset of over 4,000 censorship directives issued by the Chinese state to website administrators to do things like stop mentioning Obama’s inauguration in headlines or to delete all references to certain issues. I hadn’t realized how hierarchical and human the Chinese control of the internet was. Directives came from all levels and seem to also have been ignored.

In his talk Esarey mentioned how the China Digital Times has been tracking various internet censorship issues in China. At that site I found some fascinating stories and lists of words censored. See:

Exclusive: Hundreds Of Devices Hidden Inside New York City Phone Booths

October 7th, 2014

From The Intercept I followed a link to a Buzzfeed Exclusive: Hundreds Of Devices Hidden Inside New York City Phone Booths. Buzzfeed found that the company that manages the advertising surrounding New York phone booths had installed beacons that could interact with apps on smartphones as the passed by. The beacons are made by Gimbal which claims to have “the world’s largest deployment of industry-leading Bluetooth Smart beacons…” The Buzzfeed article describes what information can be gathered by these beacons:

Gimbal has advertised its “Profile” service. For consumers who opt in, the service “passively develops a profile of mobile usage and other behaviors” that allow the company to make educated guesses about their demographics “age, gender, income, ethnicity, education, presence of children”, interests “sports, cooking, politics, technology, news, investing, etc”, and the “top 20 locations where [the] user spends time home, work, gym, beach, etc..”

The image above is from Buzzfeed who got it from Gimbal and it illustrates how Gimbal is collecting data about “sightings” that can be aggregated and mined both by Gimbal and by 3rd parties who pay for the service. Apple is however responsible for an important underlying technology, iBeacon. If you want the larger picture on beacons and the hype around them see the BEEKn site (which is about “beacons, brands and culture on the Internet of Things) or read about Apple’s iBeacon technology. I am not impressed with the use cases described. They are mostly about advertisers telling us (without our permission) about things on sale. They can be used for location specific (very specific) information like the Tulpenland (tulip garden) app but outdoors you can do this with geolocation. A better use would be indoors for museums where GPS doesn’t work as Prophets Kitchen is doing for the Rubens House Antwerp Museum though the implementation shown looks really lame (multiple choice questions about Rubens!). The killer app for beacons has yet to appear, though mobile payments may be it.

What is interesting is that the Intercept article indicates that users don’t appreciate being told they are being watched. It seems that we only mind be spied on when we are personally told that we are being spied on, but that may be an unwarranted inference. We may come to accept a level of tracking as the price we pay for cell phones that are always on.

In the meantime New York has apparently ordered the beacons removed, but they are apparently installed in other cities. Of course there are also Canadian installations.


We Have Never Been Digital

September 29th, 2014

Historian of technology Thomas Haigh has written a nice reflection on the intersection of computing and the humanities, We Have Never Been Digital (PDF) (Communications of the ACM, 57:9, Sept 2014, 24-28). He gives a nice tour of the history of the idea that computers are revolutionary starting with Berkeley’s 1949 Giant Brains: Or Machines That Think. He talks about the shift to the “digital” locating it in the launch of Wired, Stewart Brand and Negroponte’s Being Digital. He rightly points out that the digital is not evenly distributed and that it has a material and analogue basis. Just as Latour argued that we have never been (entirely) modern, Haigh points out that we have never been and never will be entirely digital.

This leads to a critique of the “dated neologism” digital humanities. In a cute move he questions what makes humanists digital? Is it using email or building a web page? He rightly points out that the definition has been changing as the technology does, though I’m not sure that is a problem. The digital humanities should change – that is what makes disciplines vital. He also feels we get the mix of computing and the humanities wrong; that we should be using humanities methods to understand technology not the other way around.

There is a sense in which historians of information technology work at the intersection of computing and the humanities. Certainly we have attempted, with rather less success, to interest humanists in computing as an area of study. Yet our aim is, in a sense, the opposite of the digital humanists: we seek to apply the tools and methods of the humanities to the subject of computing…

On this I think he is right – that we should be doing both the study of computing through the lens of the humanities and experimenting with the uses of computing in the humanities. I would go further and suggest that one way to understand computing is to try it on that which you know and that is the distinctive contribution of the digital humanities. We don’t just “yack” about it, we try to “hack” it. We think-through technology in a way that should complement the philosophy and history of technology. Haigh should welcome the digital humanities or imagine what we could be rather than dismiss the field because we haven’t committed to only humanistic methods, however limited.

Haigh concludes with a “suspicion” I have been hearing since the 1990s – that the digital humanities will disappear (like all trends) leaving only real historians and other humanists using the tools appropriate to the original fields. He may be right, but as a historian he should ask why certain disciplines thrive and other don’t. I suspect that science and technology studies could suffer the same fate – the historians, sociologists, and philosophers could back to their homes and stop identifying with the interdisciplinary field. For that matter, what essential claim does any discipline have? Could history fade away because all of us do it, or statistics disappear because statistical techniques are used in other disciplines? Who needs math when everyone does it?

The use of computing in the other humanities is exactly why the digital humanities is thriving – we provide a trading zone for new methods and a place where they can be worked out across the concerns of other disciplines. Does each discipline have to work out how texts should be encoded for interchange and analysis or do we share enough to do it together under a rubric like computing in the humanities? As for changing methods – the methods definitive of the digital humanities that are discussed and traded will change as they get absorbed into other disciplines so … no, there isn’t a particular technology that is definitive of DH and that’s what other disciplines want – a collegial discipline from which to draw experimental methods. Why is it that the digital humanities are expected to be coherent, stable and definable in a way no other humanities discipline is?

Here I have to say that Matt Kirschenbaum has done us an unintentional disfavor by discussing the tactical use of “digital humanities” in English departments. He has led others to believe that there is something essentially mercenary or instrumental to the field that dirties it compared to the pure and uneconomical pursuit of truth to be found in science and technology studies, for example. The truth is that no discipline has ever been pure or entirely corrupt. STS has itself been the site of positioning at every university I’ve been at. It sounds from Haigh that STS has suffered the same trials of not being taken seriously by the big departments that humanities computing worried about for decades.  Perhaps STS could partner with DH to develop a richer trading zone for ideas and techniques.

I should add that many of us are in DH not for tactical reasons, but because it is a better home to the thinking-through we believe is important than the disciplines we came from. I was visiting the University of Virginia in 2001-2 and participated in the NEH funded meetings to develop the MA in Digital Humanities. My memory is that when we discussed names for the programme it was to make the field accessible. We were choosing among imperfect names, none of which could ever communicate the possibilities we hoped for. At the end it was a choice as to what would best communicate to potential students what they could study.


September 25th, 2014

chart (1)The folks behind the Google Ngram Viewer have developed a new tools called bookworm. It has a number of corpora (the example above is from bills from beta.congress.gov.) It lets you describe more complex queries and you can upload your own data.

Bookworm is hosted by the Cultural Observatory at Harvard directed by Erez Lieberman Aiden and Jean-Baptiste Michel who were behind the NGgam Viewer. They have recently published a book Uncharted where they talk about different cultural trends they studied using the NGram Viewer. The book is accessible though a bit light.

The Material in Digital Books

September 19th, 2014

Elika Ortega in a talk at Experimental Interfaces for Reading 2.0 mentioned two web sites that gather interesting material traces in digital books. One is The Art of Google Books that gathers interesting scans in Google Books (like the image above).

The other is the site Book Traces where people upload interesting examples of marginal marks. Here is their call for examples:

Readers wrote in their books, and left notes, pictures, letters, flowers, locks of hair, and other things between their pages. We need your help identifying them because many are in danger of being discarded as libraries go digital. Books printed between 1820 and 1923 are at particular risk.  Help us prove the value of maintaining rich print collections in our libraries.

Book Traces also has a Tumblr blog.

Why are these traces important? One reason is that they help us understand what readers were doing and think while reading.