Whistleblower: The NSA is Lying–U.S. Government Has Copies of Most of Your Emails

According to National Security Agency (of the USA) whistleblower William Binney, the NSA probably has most of our email. See the video Whistleblower: The NSA is Lying–U.S. Government Has Copies of Most of Your Emails. The question then is what they are doing with it? He mentions that the email can be “put it into forms of graphing, which is building relationships or social networks for everybody, and then you watch it over time, you can build up knowledge about everyone in the country.” (see transcript on page). In other words they could (are) building a large social graph that they can use in various ways.

In the transcript of the longer video Binney talks about various programs developed to filter out all the information:

Well, it was called Thin Thread. I mean, Thin Thread was our—a test program that we set up to do that. By the way, I viewed it as we never had enough data, OK? We never got enough. It was never enough for us to work at, because I looked at velocity, variety and volume as all positive things. Volume meant you got more about your target. Velocity meant you got it faster. Variety meant you got more aspects. These were all positive things. All we had to do was to devise a way to use and utilize all of those inputs and be able to make sense of them, which is what we did.

Binney goes on to talk about the code named Stellar Wind program that Bush authorized and then was forced to change after a revolt of some sort in the Justice Department in 2004. Stories tell of senior Bush advisors trying to get Ashcroft to sign authorization papers for the program while he was in the hospital.  As for Stellar Wind, it seems to be mostly about metadata – the date, to, and from of emails that you could use to build a diachronic social graph which is what Binney was talking about. Strictly speaking this would be social network analysis rather than text analysis, but they might have supplemented the system with some keyword capabilities. Another story from Time points out the problem with such analysis – that it generates too many vague false positives. “Leads from the Stellar Wind program were so vague and voluminous that field agents called them “Pizza Hut cases” — ostensibly suspicious calls that turned out to be takeout food orders.”

Either way, these hints give us a tantalizing view into how text and network analysis is being experimented with. Are there any useful research applications?

Leximancer

Susan pointed me to Leximancer which is a commercial text analysis tool that creates mind maps of your information. I’m struck by how compelling people find mind maps.

Leximancer enables you to navigate the complexity of text in a uniquely automated fashion. Our software identifies ‘Concepts’ within the text – not merely keywords but focused clusters of related, defining terms as conceptualised by the Author. Not according to a predefined dictionary or thesaurus.

The Concepts are presented in a compelling, interactive display so that you can clearly visualise and interrogate their inter-connectedness and co-occurrence – which is as important as the Concepts themselves – right down to the original text that spawned them.

 

@MentionMachine: Who’s up, who’s down on Twitter?

Reading the Washington Post I was annoyed by a panel at the bottom of my screen with their @MentionMachine tracks the presidential candidates: Who’s up, who’s down on Twitter?. The @MentionMachine tracks Twitter mentions using the Twitter API and also media mentions using Trove. This is real-time social media text analysis. The Washington Post blog page on @MentionMachine argues that “Twitter was the real-time warning system” that could tell us which candidates were trending up or down. I wonder if that is reliably true or only true in selective cases.

Stephen Wolfram Blog : The Personal Analytics of My Life

Thanks to Bethany on twitter I came across this great post by Stephen Wolfram on The Personal Analytics of My Life. Wolfram is not the first person to use computers to track his activities and then understand himself. Microsoft Research has a project MyLifeBits that is “an attempt to fulfill Vannevar Bush‘s vision of an automated store of the documents, pictures (including those taken automatically), and sounds an individual has experienced in his lifetime, to be accessed with speed and ease.” The project is digitizing and following Gordon Bell and they have released a book Your Life, Uploaded. We could even go back to the ancient Greek aphorism “Know theyself” that motivated Socrates and which, in its Latin form (temet nosce), shows up over the door of the Oracle in the Matrix.

Continue reading Stephen Wolfram Blog : The Personal Analytics of My Life

Voyant at Georgia Tech

Today I Skyped into a class by Lauren Klein on Digital Humanities at Georgia Tech. The students all had to use Voyant for an assignment and they had a great set of questions to ask me. See Questions for Professor Rockwell.

Klein also had her students post short essays on using Voyant on Sherlock Holmes under the category Sherlock Holmes Text Analysis. You can see the range of reactions from frustration with the tool, to “so what”, to students who find the “surfing and stumbling” creative. I’m impressed at how Professor Klein has put together a reasonable exercise in text analysis for undergrads.

In the spirit of Voyant, here is a word cloud of the student assignments on the course blog:


WordSeer

Stéfan pointed me to Berkley WordSeer a text analysis tool “that includes visualizations and works on the grammatical structure of text.” You can watch the video with Aditi Muralidharan talking about the project. She sees the problem with traditional search being the way keyword reading models texts as a bag of words. What we can’t do is model text as sentences. In other words she wants to leverage natural language processing to enhance search so you can see how “God” is described or what she/he has done. There are also some visualization tools like a heat map and word tree.

There is a nice YouTube video demoing how to use WordSeer to explore “beautiful” in Shakespeare.

Antconc – Concordance tool on PC/Mac

Screen shot of Antconc

Thanks to John, I learned about a gem of a concordance tool for the Mac, PC and Linux called Antconc. It runs on your computer and you can download the tool from the author’s site, Laurence Anthony’s Software. If it is stable it could be a great tool to introduce students to text analysis. Looking at the screenshots it has some nice features for finding n-grams and can handle a set of texts.

Analysis of 250,000 hacker conversations

 

From Slashdot a story about the text Analysis of 250,000 hacker conversations. A security company Imperva has been analyzing hacker forums to understand trends, how people learn about hacking, and what are popular strategies.

In the Imperva report, Hacker Intelligence Initiative, Monthly Trends Report #5 (PDF) they describe their methodology as “content analysis” (their quotations) but it mostly involves searching for threads and reading. The report has great examples of the types of discussions.

A good example of how simple text analysis can help industry understanding.