Emilie pointed me to an NPR strory on mining mood in 20th century books, Mining Books To Map Emotions Through A Century. This story draws on a very readable article The Expression of Emotions in 20th Century Books in PLOS One. The article reports on a study of “mood” or sentiment over time in literature. The used the Google Ngram data. I like how they report first and then discuss methodology at the end.
They mention support from an interesting EU funded project TrendMiner. TrendMiner is developing real-time multi-lingual analysis tools.
Update: John has sent me some thoughts on problems with their research. He sees three issues:
- Lag. There is a gap of time between writing and publishing that doesn’t seem to be accounted for (or perhaps it is in that publishers/editors are acting as cultural mood filters).
- They don’t seem to be differentiating across relative volume of types of work. They make a big deal about using such a wide variety of books, but without some sense of how this composition changes overtime certain claims–especially that we are writing about less emotional things–are questionable. If 2000 is full of automative guides and 1900 is full of novels then the claim seems to be much weaker.
- Distribution of particular works is also important. The choice of many people to purchase a few books given the language in those should also be telling, perhaps even more so.