Common Crawl

The Common Crawl is a project that has been crawling the web and making an open corpus of web data from the last 7 years available for research. There crawl corpus is petabytes of data and available as WARCs (Web Archives.) For example, their 2013 dataset is 102TB and has around 2 billion web pages. Their collection is not as complete as the Internet Archive, which goes back much further, but it is available in large datasets for research.

BuzzFeed on Breitbart courting the alt-right

Screen of emails from Dan Lyons

Buzzfeed News has an article on Here’s How Breitbart and Milo Smuggled Nazi and White Nationalist Ideas Into The Mainstream. The article in based on a cache of internal Breitbart emails and mostly deals with what Milo Yiannopoulos was up to.

From this motley chorus of suburban parents, journalists, tech leaders, and conservative intellectuals, Yiannopoulos’s function within Breitbart and his value to Bannon becomes clear. He was a powerful magnet, able to attract the cultural resentment of an enormously diverse coalition and process it into an urgent narrative about the way liberals imperiled America. It was no wonder Bannon wanted to groom Yiannopoulos for media infamy: The bigger the magnet got, the more ammunition it attracted.

Part of the story also deals with some “liberal” journalists who apparently were emailing Milo like Dan Lyons. It just get more and more sordid.

Many of those who wrote Milo seem to be disgruntled people who feel oppressed by the “political correctness” of their situation, whether in a tech company or entertainment business. They email Milo to vent or pass tips or just get sympathy.

Canadian Social Knowledge Institute

I just got an email announcing the soft launch of the Canadian Social Knowledge Institute (C-SKI). This institute grew out of the Electronic Textual Culture Lab and the INKE project. Part of C-SKI is a Open Scholarship Policy Observatory which has a number of partners through INKE.

The Canadian Social Knowledge Institute (C-SKI) actively engages issues related to networked open social scholarship: creating and disseminating research and research technologies in ways that are accessible and significant to a broad audience that includes specialists and active non-specialists. Representing, coordinating, and supporting the work of the Implementing New Knowledge Environments (INKE) Partnership, C-SKI activities include awareness raising, knowledge mobilization, training, public engagement, scholarly communication, and pertinent research and development on local, national, and international levels. Originated in 2015, C-SKI is located in the Electronic Textual Cultures Lab in the Digital Scholarship Centre at UVic.

Geofeedia ‘allowed police to track protesters’

geofeedia
From the BBC a story about US start-up Geofeedia ‘allowed police to track protesters’. Geofeedia is apparently using social media data from Twitter, Facebook and Instagram to monitor activists and protesters for law enforcement. Access to these social media was changed once the ACLU reported on the surveillance product. The ACLU discovered the agreements with Geofeedia when they requested public records of California law enforcement agencies. Geofeedia was boasting to law enforcement about their access. The ACLU has released some of the documents of interest including a PDF of a Geofeedia Product Update email discussing “sentiment” analytics (May 18, 2016).

Frome the Geofeedia web site I was surprised to see that they are offering solutions for education too.

Godwin’s Bot: Recent stories on AI

Godwin’s Bot is a good essay from Misha Lepetic on 3QuarksDaily on artificial intelligence (AI). The essay reflects on the recent Microsoft debacle with @TayandYou, an AI chat bot that was “targeted at 18 to 24 year old in the US.” (About Tay & Privacy) For a New Yorker story on how Microsoft shut it down after Twitter trolls trained it to be offensive see I’ve Seen the Greatest A.I. Minds of My Generation Destroyed By Twitter. Lepetic calls her Godwin’s Bot after Godwin’s Law that asserts that in any online conversation there will eventually be a comparison to Hitler.

What is interesting about the essay is that it then moves to an interview wtih Stephen Wolfram on AI & The Future of Civilization where Wolfram distinguishes between inventing a goal, which is difficult to automate, and (once one can articulate a goal clearly) executing it, which can be automated.

How do we figure out goals for ourselves? How are goals defined? They tend to be defined for a given human by their own personal history, their cultural environment, the history of our civilization. Goals are something that are uniquely human.

Lepetic then asks if Tay had a goal or who had goals for Tay. Microsoft had a goal, and that had to do with “learning” from and about a demographic that uses social media. Lepetic sees it as a “vacuum cleaner for data.” In many ways the trolls did us a favor by misleading it.

Or … TayandYou was troll-bait to train a troll filter.

My question is whether anyone has done a good analysis of how the Tay campaign actually worked?

Building Research Capacity Across the Humanities

On Monday I gave a talk at the German Institute for International Educational Research (DIPF) on:

Building Research Capacity Across the Humanities and Social Sciences: Social Innovation, Community Engagement and Citizen Science

The talk began with the sorry state of public support for the humanities. We frequently read how students shouldn’t major in the humanities because there are no jobs and we worry about dropping enrolments. The social contract between our publics (whose taxes pay for public universities) and the humanities seems broken or forgotten. We need to imagine how to re-engage the local and international communities interested in what we do. To that end I proposed that we:

  • We need to know ourselves better so we can better present our work to the community. It is difficult in a university like the University of Alberta to know what research and teaching is happening in the social sciences and humanities. We are spread out over 10 different faculties and don’t maintain any sort of shared research presence.
  • We need to learn to listen to the research needs of the local community and to collaborate with the community researchers who are working on these problems. How many people in the university know what the mayor’s priorities are? Who bothers to connect the research needs of the local community to the incredible capacity of our university? How do we collaborate and support the applied researchers who typically do the work identified by major stakeholders like the city. Institutes like the Kule Institute can help document the research agenda of major community stakeholders and then connect university and community researchers to solve them.
  • We need to learn to connect through the internet to communities of interest. Everything we study is of interest to amateurs if we bother to involve them. Crowdsourcing or “citizen science” techniques can bring amateurs into research in a way that engages them and enriches our projects.

In all three of these areas I described projects that are trying to better connect humanities research with our publics. In particular I showed various crowdsourcing projects in the humanities ending with the work we are now doing through the Text Mining the Novel project to imagine ways to crowdsource the tagging of social networks in literature.

One point that resonated with the audience at DIPF was around the types of relationships we need to develop with our publics. I argued that we have to learn to co-create research projects rather than “trickle down” results. We need to develop questions, methods and answers together with community researchers rather think that do the “real” research and then trickle results down to the community. This means learning new and humble ways of doing research.

#GamerGate on Hashtagify.me

hashtags data by hashtagify.me

Hashtagify.me is a neat site that tracks hashtags in Twitter. For example, here is what they have on #GameGate. They show the other hashtags that your hashtag connects to (like #NotYourShield) and you can get a trend line.

hashtags data by hashtagify.me

The trend makes it look like #GamerGate is going down, but I don’t trust their projection.

All of this is free. They also have a Pro account, but I haven’t tried that.

Thanks to Brett for this.

The Cult of Sharing

Mike Bulajewski has written an excellent critique of the The Cult of Sharing. He describes the way ideas of community and sharing are being exploited by a new type of cult-like company like Airbnb and Uber. Under the guise of sharing and building community these companies are bypassing employment and labor legislation. What’s worse is that they are painting basic labor rights as the outdated way of doing things.

That’s because they’ve adopted a kind of cultural critique of capitalism. For them, the problem with capitalism is not the system itself, but rather depraved contemporary Western culture, which is greedy, individualistic, selfish and acquisitive, and rewards greedy, corrupt, ill-intentioned individuals. The opponents of the so-called culture of greed see the behavior of Black Friday shoppers and Wall Street bankers as equal manifestations of the same general phenomenon, and perhaps believing that we get the leaders we deserve, conclude that the public’s moral flaws makes them in some way responsible for the greed of Wall Street.

The sharing economy is clearly not the kind of economy where wealth and prosperity is shared between rich and poor. On the contrary, it worsens income inequality and concentrates wealth in the hands of those who need it the least. Progressive advocates are well aware of this, but they also see an upside: these startups teach their workers moral lessons about sharing, community, giving and service with a smile.

I’m not sure this is going to be the problem Bulajewski thinks it will be, but he has me worried. I hope that that shine of sharing will wear off and consumers/sharers will begin to treat this as any other industry. I also think the media will soon start reporting the downside of staying on someone’s couch or getting a ride with someone who isn’t licensed. It’s like the internet, which we all thought was a nice sharing community, until it wasn’t.

Blended Learning Award for GWrit

The Game of Writing (Gwrit) project that I am part of just got support through a University of Alberta Blended Learning Award. See the 2014 Selected Courses. This award is going towards creating a flipped version of Writing 101, a service course that is being scaled up to support large sections by Roger Graves and Heather Graves. With the Blended Learning Award support from the Centre for Teaching and Learning and with Faculty of Arts funding we are redeveloping GWrit to be used in large sections of Writing 101. Here is part of the abstract of the proposal,

Research suggests that by creating a rich online environment for students to connect and interact with instructors and peers they can improve as writers. We are currently building a gamified online writing environment, The Game of Writing (GWrit), for Writing Studies 101 (WRS 101) that can support student writers and alumni. WRS 101 is a high demand service course required for many degree programs across the University. We are creating a large class version that blends face-to-face with gamification strategies. In GWrit students will choose and work on assignments or quests that are part of the course. Their progress on these assignments or quests will be shared with peers and instructional staff; in this way all students can see who is working on the same quests, and they can ask for help or advice from them. Informal assessment will be available online from peers in the class; from paid peer tutors; from GTAs; and from alumni. This represents a significant expansion of the informal assessment available in traditional face-to-face courses, where peers and sometimes the instructor give informal feedback. We also intend to invite alumni to post assignments/quests that come from a workplace writing context. Students who complete WRS 101 will continue to have access to GWrit throughout their undergraduate careers and as alumni.

GWrit started as a prototype developed with support from GRAND. The original idea was an open writing environment where folks could challenge each other to compete at writing and where you could get analytics on your writing (number of words written, tasks completed, and visualizations like word clouds.) This research prototype is now being completely redeveloped by the Arts Resource Centre as a learning tool that can be used by students of our courses. We are adding commenting features so that students (and later alumni) can provide writing guidance in a structured fashion.