High Performance Computing

Facial Recognition: What Happens When We’re Tracked Everywhere We Go?

When a secretive start-up scraped the internet to build a facial-recognition tool, it tested a legal and ethical limit — and blew the future of privacy in America wide open.

The New York Times has an in depth story about Clearview AI titled, Facial Recognition: What Happens When We’re Tracked Everywhere We Go? The story tracks the various lawsuits attempting to stop Clearview and suggests that Clearview may well win. They are gambling that scraping the web’s faces for their application, even if it violated terms of service, may be protected as free speech.

The story talks about the dangers of face recognition and how many of the algorithms can’t recognize people of colour as accurately which leads to more false positives where police end up arresting the wrong person. A broader worry is that this could unleash tracking at another scale.

There’s also a broader reason that critics fear a court decision favoring Clearview: It could let companies track us as pervasively in the real world as they already do online.

The arguments in favour of Clearview include the challenge that they are essentially doing to images what Google does to text searches. Another argument is that stopping face recognition enterprises would stifle innovation.

The story then moves on to talk about the founding of Clearview and the political connections of the founders (Thiel invested in Clearview too). Finally it talks about how widely available face recognition could affect our lives. The story quotes Alvaro Bedoya who started a privacy centre,

“When we interact with people on the street, there’s a certain level of respect accorded to strangers,” Bedoya told me. “That’s partly because we don’t know if people are powerful or influential or we could get in trouble for treating them poorly. I don’t know what happens in a world where you see someone in the street and immediately know where they work, where they went to school, if they have a criminal record, what their credit score is. I don’t know how society changes, but I don’t think it changes for the better.”

It is interesting to think about how face recognition and other technologies may change how we deal with strangers. Too much knowledge could be alienating.

The story closes by describing how Clearview AI helped identify some of the Capitol rioters. Of course it wasn’t just Clearview, but also a citizen investigators who named and shamed people based on photos released.

Carpenter: The Gathering Cloud

The Cloud is an airily deceptive name connoting a floating world far removed from the physical realities of data.

The Gathering Cloud by J. R. Carpenter is a great interactive work that uses Luke Howard’s Essay on the Modification of Clouds from 1803 to meditate on the digital cloud. The The work “is a hybrid print- and web-based work by J. R. Carpenter commissioned by NEoN Digital Arts Festival 2016.”

Continue reading Carpenter: The Gathering Cloud

What Ever Happened to Project Bamboo?

What Ever Happened to Project Bamboo? by Quinn Dombrowski is one of the few honest discussions about the end of a project. I’ve been meaning to blog this essay which follows on her important conference paper at DH 2013 in Nebraska (see my conference report here which comments on her paper.) The issue of how projects fail or end rarely gets dealt with and Dombrowski deserves credit for having the courage to document the end of a project that promised so much.

I blog about this now as I just finished a day-long meeting of the Leadership Council for Digital Infrastructure where we discussed a submission to Industry Canada that calls for coordinated digital research infrastructure. While the situation is different, we need to learn from projects like Bamboo when we imagine massive investment in research infrastructure. We all know it is important, but doing it right is not as easy as it sounds.

Which brings me back to failure. There are three types of failure:

The simple type we are happy to talk about where you ran an experiment based on a hypothesis and didn’t get positive results. This type is based on a simplistic model of the scientific process which pretends to value negative results as much as positive ones. We all know the reality is not that simple and, for that matter, that the science model doesn’t really apply to the humanities.
The messy type where you don’t know why you failed or what exactly failed. This is the type where you promised something in a research or infrastructure proposal and didn’t deliver. This type is harder to report because it reflects badly on you. It is an admission that you were confused or oversold your project.
The third and bitter type is the project that succeeds on its own terms, but is surpassed by the disciplines. It is when you find your research isn’t current any longer and no one is interested in your results. It is when you find yourself ideologically stranded doing something that someone important has declared critically flawed. It is a failure of assumptions, or theory, or positioning and no one wants to hear about this failure, they just want to avoid it.

When people like Willard McCarty and John Unsworth call for a discussion of failure in the digital humanities they describe the first type, but often mean the second. The idea is to describe a form of failure reporting similar to negative results – or to encourage people to describe their failure as simply negative results. What we need, however, is honest description of the second and third types of failure, because those are expensive. To pretend some expensive project that slowly disappeared in missunderstanding was simply an experiment is missing what was at stake. This is doubly true of infrastructure because infrastructure is not supposed to be experimental. No one pays for roads and their maintenance as an experiment to see if people will take the road. You should be sure the road is needed before building.

Instead, I think we need to value research into infrastructure as something independent of the project owners. We need to do in Canada what the NSF did – bring together research on the history and theory of infrastructure.

Editorialisation Et Nouvelles Formes De Publication

In the last couple of weeks I’ve been at two interesting conferences and took research notes.

I gave a keynote on “Big Data and the Humanities” at the Northwestern Research Computation Day (link to my research notes). I gave a lot of examples of projects and visualizations.
At the Éditorialisation Et Nouvelles Formes De Publication (link to my research notes) conference I spoke about “Publishing Tools: A Theatre of Machines”. I showed how text analysis machines have evolved.

The computer program billed as unbeatable at poker

The Toronto Star has a nice story, The computer program billed as unbeatable at poker, about a poker playing program Cepehus that was developed at the Computer Poker Research here at the University of Alberta. Michael Bowling is quoted to the effect that,

No matter what you do, no matter how strong a player you are, even if you look at our strategy in every detail . . . there is no way you are going to be able of have any realistic edge on us.

On average we are playing perfectly. And that’s kind of the average that really matters.

You can play Cepehus at their web site. You can read their paper “Heads-up limit hold’em poker is solved”, just published in Science here (DOI: 10.1126/science.1259433).

Conference Report: Digital Infrastructure Summit 2014

I have just finished participating in and writing up a conference report on the Digital Infrastructure Summit 2014 in Ottawa. This summit brought some 140 people together from across Canada and across the stakeholders to discuss how to develop leading digital infrastructure in Canada. This was organized by the Digital Infrastructure Leadership Council. For this Summit the Council (working with Janet Halliwell and colleagues) developed a fabulous set of reference materials that paint a picture of the state of digital infrastructure in Canada.

You can see my longer conference report for details, but here are some of the highlights:

Infrastructure has been redefined, largely because of SSRHC’s leadership, as big and long data. This redefinition from infrastructure as tubes to focus on research data for new knowledge has all sorts of interesting effects. In brings libraries in, among other things.
Chad Gaffield (President of SSRHC) made the point that there is a paradigm shift taking place across many disciplines as we deal with the digital in research. As we create more and more research evidence in digital form it is vital that we build the infrastructure that can preserve and make useful this evidence over the long term.
We have a peculiarly Canadian problem that most of the stakeholders are more than willing to contribute to any coalition, but no one is jumping in to lead. Everyone is too polite. No one wants a new body, but no existing body seems to want to take the lead.
There is a lot of infrastructure already in place, but they are often not bundled as services that researchers understand. Much could be made of the infrastructure in place if there were a training layer and “concierge” layer that connects to researchers.

NSA slides explain the PRISM data-collection program

The Washington Post has been publishing NSA slides that explain the PRISM data-collection program. These slides not only explain aspects of PRISM, but also allow us to see how the rhetoric of text analysis unfolds. How do people present PRISM to others? Note the “You Should Use Both” – the imperative in the voice.

Hacker Measures the Internet Illegally with Carna Botnet

Speigel Online has an interesting story about how a Hacker Measures the Internet Illegally with Carna Botnet. The anonymous hacker(s) exploited unprotected devices online to create a botnet with which they then used to take a census of those online.

So what were the actual results of the Internet census? How many IP addresses were there in 2012? “That depends on how you count,” the hacker writes. Some 450 million were “in use and reachable” during his scans. Then there were the firewalled IPs and those with reverse DNS records (which means there are domain names associated with them). In total, this equalled some 1.3 billion IP addresses in use.

Using Zotero and TAPOR on the Old Bailey Proceedings

The Digging Into Data program commissioned CLIR (Council on Library and Information Resources) to study and report on the first round of the programme. The report includes case studies on the 8 initial projects including one on our Criminal Intent project that is titled Using Zotero and TAPOR on the Old Bailey Proceedings: Data Mining with Criminal Intent (DMCI). More interesting are some of the reflections on big data and research in the humanities that the authors make:

1. One Culture. As the title hints, one of the conclusions is that in digital research the lines between disciplines and sectors have been blurred to the point where it is more accurate to say there is one culture of e-research. This is obviously a play on C. P. Snow’s Two Cultures. In big data that two cultures of the science and humanities, which have been alienated from each other for a century or two, are now coming back together around big data.

Rather than working in silos bounded by disciplinary methods, participants in this project have created a single culture of e-research that encompasses what have been called the e-sciences as well as the digital humanities: not a choice between the scientific and humanistic visions of the world, but a coherent amalgam of people and organizations embracing both. (p. 1)

2. Collaborate. A clear message of the report is that to do this sort of e-research people need to learn to collaborate and by that they don’t just mean learning to get along. They mean deliberate collaboration that is managed. I know our team had to consciously develop patterns of collaboration to get things done across 3 countries and many more universities. It also means collaborating across disciplines and this is where the “one culture” of the report is aspirational – something the report both announces and encourages. Without saying so, the report also serves as a warning that we could end up with a different polarization just as the separation of scientific and humanistic culture is healed. We could end up with polarization between those who work on big data (of any sort) using computational techniques and those who work with theory and criticism in the small. We could find humanists and scientists who use statistical and empirical methods in one culture while humanists and scientists who use theory and modelling gather as a different culture. One culture always spawns two and so on.

3. Expand Concepts. The recommendations push the idea that all sorts of people/stakeholders need to expand their ideas about research. We need to expand our ideas about what constitutes research evidence, what constitutes research activity, what constitutes research deliverables and who should be doing research in what configurations. The humanities and other interpretative fields should stop thinking of research as a process that turns the reading of books and articles into the writing of more books and articles. The new scale of data calls for a new scale of concepts and a new scale of organization.

It is interesting how this report follows the creation of the Digging Into Data program. It is a validation of the act of creating the programme and creating it as it was. The funding agencies, led by Brett Bobley, ran a consultation and then gambled on a programme designed to encourage and foreground certain types of research. By and large their design had the effect they wanted. To some extent CLIR reports that research is becoming what Digging encouraged us to think it should be. Digging took seriously Greg Crane’s question, “what can you do with a million books”, but they abstracted it to “what can you do with gigabytes of data?” and created incentives (funding) to get us to come up with compelling examples, which in turn legitimize the program’s hypothesis that this is important.

In other words we should acknowledge and respect the politics of granting. Digging set out to create the conditions where a certain type of research thrived and got attention. The first round of the programme was, for this reason, widely advertised, heavily promoted, and now carefully studied and reported on. All the teams had to participate in a small conference in Washington that got significant press coverage. Digging is an example of how granting councils can be creative and change the research culture.

The Digging into Data Challenge presents us with a new paradigm: a digital ecology of data, algorithms, metadata, analytical and visualization tools, and new forms of scholarly expression that result from this research. The implications of these projects and their digital milieu for the economics and management of higher education, as well as for the practices of research, teaching, and learning, are profound, not only for researchers engaged in computationally intensive work but also for college and university administrations, scholarly societies, funding agencies, research libraries, academic publishers, and students. (p. 2)

The word “presents” can mean many things here. The new paradigm is both a creation of the programme and a result of changes in the research environment. The very presentation of research is changed by the scale of data. Visualizations replace quotations as the favored way into the data. And, of course, granting councils commission reports that re-present a heady mix of new paradigms and case studies.

Digital Infrastructure Summit 2012

A couple of weeks ago I gave a talk at Digital Infrastructure Summit 2012 which was hosted by the Canadian University Council of Chief Information Officers (CUCCIO). This short conference was very different from any other I’ve been at. CUCCIO, by its nature, is a group of people (university CIOs) who are used to doing things. They seemed committed to defining a common research infrastructure for Canadian universities and trying to prototype it. It seemed all the right people were there to start moving in the same direction.

For this talk I prepared a set of questions for auditing whether a university has good support for digital research in the humanities. See Check IT Out!. The idea is that anyone from a researcher to an administrator can use these questions to check out the IT support for humanists.

My conference notes are here.