The size of the World Wide Web

sizeofweb

Reading a paper by Lev Manovich I came across a reference to the web site WorldWideWebSize.com which graphs the size of the World Wide Web. The web site searches Google and Bing daily for different words from a corpus and then uses the total results to estimate the size of the web.

When you know, for example, that the word ‘the’ is present in 67,61% of all documents within the corpus, you can extrapolate the total size of the engine’s index by the document count it reports for ‘the’. If Google says that it found ‘the’ in 14.100.000.000 webpages, an estimated size of the Google’s total index would be 23.633.010.000.

In the screen grab above you can see that the estimated size can change dramatically over time.  Hard to tell why.

Ted Hewitt speaks at University of Alberta

Text Hewitt spoke today on “The Perils and Prospects of Digital Scholarship in the 21st Century Canada: Tri-Agency Research Data Initiative” at our Research Data Management week. Some of the things he talked about follow.

Canada is not leading on data stewardship. We need to catch up so that we can take advantage of what the world has to offer and we need to offer what Canada has to the world. Data management capacity is increasingly linked to Canada’s international competitiveness.

We used to do a literature review when starting a project. Now we also look for data sets that we can use so we aren’t re-searching to create useful data.

Continue reading Ted Hewitt speaks at University of Alberta

Around the World Conference

ATW_Logo

Last week we held our third Around the World Conference on the subject of “Big Data”. We had some fabulous panels from countries including Ireland, Canada, Israel, Nigeria, Japan, China, Australia, USA, Belgium, Italy, and Brazil.

The Around the World Conference streams speakers and panels from around the world out to everyone on the net. We also edit and archive the video clips. This model allows for a sustainable conversation across continents that doesn’t involve flying people around. It allows a lot people who wouldn’t usually be included to speak. We also find there are technical hiccups, but that happens in on-site conferences too.

Editorialisation Et Nouvelles Formes De Publication

In the last couple of weeks I’ve been at two interesting conferences and took research notes.

  1. I gave a keynote on “Big Data and the Humanities” at the Northwestern Research Computation Day (link to my research notes). I gave a lot of examples of projects and visualizations.
  2. At the Éditorialisation Et Nouvelles Formes De Publication (link to my research notes) conference I spoke about “Publishing Tools: A Theatre of Machines”. I showed how text analysis machines have evolved.

From Airline Reservations to Sonic the Hedgehog

An important book for anyone doing the history of computing is From Airline Reservations to Sonic the Hedgehog by Martin Campbell-Kelly. This book more or less invents the field of software history by outlining the important phases, sectors and sources. Other histories have focused on individual companies, heros, or periods; Campbell-Kelly tries to survey the history (at least up to 1995) and define what needs to be considered and what we don’t know. In particular he tries to correct the consumer view that the history of software is about Microsoft. To that end he spends a lot of time on mainframe software and the sorts of services like IBM CICS (Customer Information Control System) that allows ATMs and other systems to reliably communicate transactions.

Martin Campbell-Kelly in the first chapter outlines three phases to the history of software that also correspond to sectors of the industry:

  1. From mid 1950s, Software Contracting
  2. From mid 1960s, Corporate Software Products
  3. From late 1970s, Packaged mass-market software products

You can read an interesting exchange about the book here that reviews the book, criticizes it and gives Campbell-Kelly a chance to respond.

Bibliographic reference: Campbell-Kelly, M. (2003). From Airline Reservations to Sonic the Hedgehog: a History of the Software Industry. Cambridge, MA, MIT Press.

Gone Home: A Story Exploration Video Game

Just finished a gem of a game called Gone Home: A Story Exploration Video Game. The game is simple. You are the older daughter returned to an empty home after a year in Europe. You wander around the house finding notes and other clues as to where your family is. In the process you uncover the stories of your parents, your sister and a dead uncle. The ending had me in tears – proof for me that a game can evoke emotions.

The empty and mysterious mood reminds me of other games that use that mood like Dear Esther and even Myst.

TSA’s Secret Behavior Checklist to Spot Terrorists

The Intercept has published the TSA’s behaviour checklist for spotting terrorists as part of two stories. See, Exclusive: TSA’s Secret Behavior Checklist to Spot Terrorists. The Spot Referral Report includes all sorts of behaviours like “Arrives late for flight …”. The idea of the report is that behaviours are assigned points and if someone gets more than a certain number of points the suspect is referred to a Law Enforcement Officer (LEO). The checklist is part of a SPOT (Screening of Passengers by Observation Techniques) Referral Report that is filled out when someone is “spotted” by the TSA. A second story from the Intercept claims that Exclusive: TSA ‘Behavior Detection’ Program Targeting Undocumented Immigrants, Not Terrorists.

Nintendo asking for ad revenue for gaming on Youtube

CBC and others are reporting on a new Nintendo Creators Program where Nintendo will take a percentage of the ad revenue associated with a YouTube channel or video with playthroughs (Let’s Play) of their games. See YouTube gaming stars blindsided by Nintendo’s ad revenue grab or Nintendo’s New Deal with Youtubers Is A Jungle Of Rights. This will

The Nintendo Creators Program presents this in their Guide as an opportunity to make money off their copyrighted materials,

In the past, advertising proceeds that could be received for videos that included Nintendo-copyrighted content (such as gameplay videos) went to Nintendo, according to YouTube rules. Now, through this service, Nintendo will send you a share of these advertising proceeds for any YouTube videos or channels containing Nintendo-copyrighted content that you register.

This program is only for “copyrighted content related to game titles specified by Nintendo”. This is probably because Nintendo has to be careful to not be seen as making money off playthroughs of other publisher’s games.

This new policy/program raises interesting issues around:

  • Fair use. Is a screen shot or a whole series of them that make up a playthrough covered by “fair use”? My read is that the publishers think not.
  • Publicity from Playthroughs. YouTuber’s like PewDiePie who post Let’s Play videos (and make money off their popular channels) argue that these videos provide free exposure and publicity.
  • New Economic Models for Gaming. Is Nintendo exploring new economic models tied to their copyright? Nintendo has been suffering so it makes sense that they would try to find ways to monetize their significant portfolio of popular game franchises and characters.

Mina S. Rees and Early Computers

Reading Thomas P. Hughes book Rescuing Prometheus I came across a reference to Dr Mina S. Rees who, in different senior roles at the Office of Naval Research in the late 1940s and early 50s, played a role in promoting early computing research. This led me to her 1950 Science article The Federal Computing Machine Program (December 1950, Vol. 112, No. 2921, pp. 731-736), a terrific survey of the state of computing at the time that is both a pleasure to read and nicely captures the balance/promise of analogue and electronic machines at the time. I was particularly struck by the wry humour of the overview. For example, in the opening she talks about what she will not talk about in her overview, and jokes that,

For an adequate discourse on the military applications of automatically sequenced electronic computers, I direct you to recent Steve Canyon comic strips in which a wonderful electronic brain that could see and shoot down planes at great distances was saved from the totalitarian forces of evil. (p. 731)

The Steve Canyon comic in question is a “Mechanical Brain” story her audience would have recognized. (See this review of the Milton Caniff’s Steve Canyon 1950 compilation.) Interestingly (perhaps because she had read Jay Forrester’s reports about air defense), Whirlwind, one of the computers she mentions, went on to be developed into the SAGE system which was designed to semi-automatically, “see and shoot down planes at great distances”.

Rees’ humour, humility and prescience can also be seen in her concession that visual displays and interface are important to certain problems,

As one who has suspected from the beginning that all oscilloscope displays were manipulated by a little man standing in hiding near by, I am happy now to concede that in several of the problems we are now attacking the introduction of visual display equipment has substantial merit. (p. 732)

She recognized the value of a “broad point of view” that looked at computing as more than efficient number crunching. This article reminds us of how computing was understood differently in the 1940s and 1950s and thereby helps us reacquire a broad point of view on computing with some humour.

For a memorial biography of Dr Rees see the memorial here (PDF).