Rumble on Youtube: Psy vs Justin Bieber

On Saturday November 24th 2012 history was written: The most viewed Youtube video up til then – Canadian Justin Bieber’s baby with 805.914.820 views – was surpassed by South Korean Psy’s Gangnam Style with 833.499.683 views.
It’s interesting to look at the viewer statistics of both videos and the way Youtube presents them. First of all, looking at the shape of Bieber’s viewer stats across time. At the early stages after the video was released we see a steep incline, which then levels off to a horizontal line. Looking at Psy’s graph we see that the amount of viewers is still on the increase. There are no signs yet that a maximum has been reached.
At first, after the new record was set, I expected there would be a competition between Bieber fans and the Psy fans to compete for the new target: views!!! Then again, there is no visual sign of Bieber’s curve to rise again. This means that Bieber´s video increasingly lags behind Psy’s video, probably to the extent that he will never be able to overtake Psy. If Psy’s video will reach more than a billion views, it’ll be the record holder for a long time. Or will it? Bieber’s record only lasted less than three years, while Psy broke that record in record time. It took Psy only 134 days (which equals 6.220.147 views per day). Compare this to a measly average of 804.306 daily views for 1002 days for Bieber’s Baby. So, it’s waiting for the next video to break Psy’s record. It’ll come within the next five years. Mark my words!!! 🙂

Interestingly, Youtube doesn’t seem to be that interested in Justin Bieber’s video. First of all, Youtube is very slow in updating Bieber’s viewer stats. Whereas Psy’s stats are updated daily, Bieber’s stats on average they lag behind for about a week. Also Bieber’s vertical axis needs to be updated because the video surpassed the 800.000.000 mark clearly.

Psy’s Youtube stats

Justin Bieber’s Youtube stats

A further notable difference is the steep climb at the beginning of the number of viewer for the Bieber video. Compare this to the slow start of Psy’s video. A possible explanation is the date these video´s were posted: Bieber’s video was posted in Februari, one of the coldest months of the northern hemisphere, and Psy’s video was posted mid July, the hottest period for the northern hemisphere. That got me thinking that in the coldest months people stay indoors and have Youtube readily available, whereas in the summer people often are outdoors, or are on vacation, limiting their Youtube access. Cautionary note: this analysis is based on visual inspection of the graphs. It’d be better to use the actual longitudinal data. On July 28, 2012 Robbie Williams linked to Psy’s video on his website, probably aiding the quick dissemination through the Web, particularly the English speaking parts of the Web.

Justin Bieber’s Youtube interaction stats

Psy’s Youtube interaction stats

Below are the interaction stats directly compared between Bieber and Psy. It shows that, again, Psy has the most views, but Bieber has the most reactions. Psy has the most “Thumps Up”, whereas Bieber ahs the most “Thumbs Down”.

Viewer stats compared

As for the ratios between different stats we see that the audience of Biber’s video is more responsive than Psy’s audience. The rates for “Thumbs Up” “Thumbs Down” are quite similar to the earlier indicators because the number of views for Psy and Bieber are at a similar level. Still the quite small fraction of people reacting to these videos which only reaches a 1.1 percent shows that social media are not always that interactive. This percentage is probably somewhat inflated, and probably somewhat higher if multiple views by the same person would be taken account for. At the same time a single person can post multiple reponses to the video. This shows that using of-the-rack stats comes with limitations.
Rates between stats compared

Finishing this blog post on the 12th of December and checking the latest numbers on the Psy video, I wouldn’t be surprised when it reaches a billion views before then end of the year Only some 67 million views to go!

OK, to make this blog post complete here are both videos:

To free data or not to …

In some countries downloaders and uploaders are regarded as criminals: in the US you can expect litigation by record companies and artists when you share your music. In France your Internet connection wil be cut off when you’ve been downloading music and/or video’s. This shows that the Internet isn’t a safe haven for people that want to freely share information.

This is totally opposite to what the founding father of the World Wide Web Tim Berners-Lee thinks it should be. Berners-Lee is the advocate for free data. Organizations, in particular governments, should open their databases on-line, creating a level playing field for all. It also allows for members of the general public to contribute to datasets. Also, people can, if they have the expertise, to analyse these data and share the results in a numerical or visual style. This can be risky because as is possible with numerical information, visualizations can be deceiving. The proverb “there are lies, damn lies and statistics” should be “there are lies, damn lies, statistics and visualizations”.

A few initiatives are the US government, the UK government, and the Guardian. The sharing of data is, or it should be, common practice in scientific circles. In the Netherlands DANS archives scientific data and grants access (however, not for all). In academics, unfortunately, it doesn’t always pay to share your data for an important reason. Increasingly academics are told by university management to publish in ISI-ranked journals. That’s OK. But archiving data, which is important for secondary analyses and enabling others to check your work, takes a lot of time but is not rewarded by management. The time it takes to prepare the data and report for archiving could also be spent on new research articles and data collection. So, given the choice between a time consuming unrewarded archiving and writing new manuscripts, the choice will often be the latter. Unfortunately, this can only change when universities reward archiving the same as journal publications. I feel this is unlikely to happen in the foreseeable future.

Below are two videos where Tim Berners-Lee explains the idea and shows some examples.

Looking into the future of newspapers

Last week there was a lot of discussion about Ross Dawson who predicted the time of death of the newspaper in several countries. Piet Bakker devoted a blog to it as did Trouw (there are probably many more sources so let me know in the comments). It’s important to know when the newspapers get extinct, so then we know when to have cleared our agenda’s to go to the funeral or at least have found a new job.

All kidding aside, we love predictions (especially weather forecasts)  and we need them. However, it’s one of the most difficult things to do. All predictions are uncertain. However some are more uncertain than others. Having no additional data, predicting the end of the newspaper is similar to looking into a crystal ball, humming some unclear sounds while burning incense: rubbish in gets rubbish out. Then again, if one would actually have additional data things brighten up. What data for instance? Well, first of all the hard data the publishers already have, even the data we researchers have little unconditional access to. For instance,  (1) detailed subscription data (paid, non-paid, discounts) (2) financial data (costs,  revenues), (3) marketing strategies  (4) changes in population composition, (5) competitors’ actions and developments, and (6) autonomous developments such as general economic growth and level of unemployment, but also technological developments. Some other suggestions have been made in comments on Piet Bakker’s and Ross Dawson´s blogs, such as the writing off of printing presses, the culture of reading for each country. All more or less relevant to be taken into account.

The prediction made by Piet Bakker (i.e. time series analysis) are, in my opinion, equally uncertain (see my posted comment). Especially because he used very little observations (six years) to predict far into the future (40 years). So, the further away the prediction is the more uncertain the prediction is. Not only that, a possible bias in a short term estimate is most likely increased the further away the prediction is, as such biasing the final prediction. Also, I suspect there will be an excellerated decrease in subscriptions at the end, because newspaper publisher (or the banks) will see the end is near and will pull the plug before they loose even more capital.

There are alternative approaches to the moving averages approch, though. The first one is using multiple time series, where the dependent is the number of subscriptions and the others are the independent ones (some of which are listed above). Alternatively, a system dynamics model could be developed taking account of feedback loops, because every action may have positive effects on the dependent variable (in this case the number of subscriptions) but can have an effect on the causes as well in the future. These complex problems are called messy (cf. Vennix, 1996) because of the large numbers of unknowns. There is alo a more qualitative version called group model building. This used when hard data do not exist. In this approach experts from different background are set around the table and chart the problem at hand. Then different scenario’s are explored to foresee possible futures. Subsequently, the aim is to develop a business strategy that is robust for many possible futures. These approaches have been used for instance by Shell and the Dutch government. I’m not sure it’s too late for publishers to undertake such an exploration, what what I hope they’ve learned that it’s best to prepare for the worst case scenario when you can. For the publishers that would have been sometime in the eighties of the last century. Then again, like quiting smoking, it’s never too late.

On a final yet important note, and this may be depressing for the publishers, the readers of physical newspapers may be a dying breed. It’s in large part a generations issue. People who grew up with (their parents) having a newspaper at home are the ones that were more likely to have a newspaper subscription as well. Until the Internet took of in the late nineties of the last century: youngsters that grew up in the age of the Internet are less likely to subscribe. A similar thing happended with television: young people that grew up with only bublic broadcasting continued watching public broadcasting while youngsters that grew up with commercial television as well didn’t watch public broadcasting. So, it seems inevitable to see newspapers as we know them disappear. If they survive it’ll be not in this fashion. In the technical sense, no medium has ever really disappeared, it’s function merely changed. This is likely to happen to the newspaper as well. What is more important is how journalism will evolve the coming decades. But that’s an entirely different discussion.