Posts Tagged ‘big data’

Dataclysm

Saturday, December 19th, 2015 | Books

Christian Rudder is a founder and head of data trends at the dating site OkCupid. For years he ran the blog OkTrends which looked at what data you could mine from their site. This book is a continuation of this work as well as bringing in other data sets, mostly to talk about human sexuality.

The full title is Dataclysm: Who We Are (When We Think No One’s Looking).

The anonymised data of OkCupid in aggregate provide some surprising facts, and some expected ones. Take gender differences, for example. Women rate men of a similar age to themselves as the most attract. Up until 30 women will rate men a year or two older than them as the most attractive; after 30 they find men a year or two younger than them most attractive. A drop off starts at 40. That is a good innings though. Compare this to the way men rate women. They rate 21 year olds the most attractive and it goes down hill from there.

He looks at the use of English on Twitter. Many people suppose the internet is degrading the quality of language used. Not so. The average length of a word used on Twitter is actually longer than that in professional publications, and historically. It turns out that when you limit people to 140 characters, they write concisely using a wide lexicon.

He quotes Steve Jobs: “people don’t know what they want until you show it to them”. This always reminds me of the Henry Ford quote “if I had asked my customers what they wanted, they would have said a faster horse”. Whether Ford actually said that is unknown, but it makes a good point. When asking for feedback you really need to find out what they think the problem is that you want to solve, rather than asking them what they think the solution is. In Ford’s case a faster way to get from A to B and in Job’s case an easier way to play and listen to music.

Back on OkCupid, it turns out that everyone is a racist. Rudder breaks the data down into how four groups: white, asian, latino, black, rate each other’s photos. It turns out that people generally rate their own race as the most attractive, but the real drop off is for black women by any other group, who consistently rate them lower. This has geographic differences however. There is a big gap in the US for example, while almost no gap in the UK.

He also looks at the differences between the heterosexual and LGBT communities. Is sexuality a spectrum, for example. Only 19% self-identifying bisexuals regularly message both males and females. This could imply a number of things. It could be that there is a spectrum and many bisexuals fall at either end of it. It could also be that some gay people identify as bisexual for cultural or social reasons. Especially given it correlates with their state’s tolerance of homosexuality. The answer is probably a number of different factors.

Rudder also mentions that Justine Sacco, the woman who made the “hope I don’t get aids” tweet, worked for OkCupid’s parent company. Sacco was discussed in Jon Ronson’s book So You’ve been Publicly Shamed. The hashtag #HasJustineLandedYet is a classic example of how quickly things can travel world the world these days.

In summary, it’s not too clear what Dataclysm was actually about. It seemed to be mostly “here is some interesting data about people”. From that respect, it was genuinely interesting. It also had a lot of crossover with A Billion Wicked Thoughts in using anonymous internet data, a source that has only come around in the last few decades, to reveal fascinating insights into human thoughts and behaviour.

dataclysm