Activity Prediction in Social Media
Nearly two years ago I published together with Vicenç Gómez and Vicente López an article entitled Description and Prediction of Slashdot Activity where (among other things) we proposed a model for predicting the number of comments to a news-post on Slashdot. We found that although posts receive comments during approximately two weeks, an accurate estimation of the amount of comments is already possible a few minutes after the publication of a news story.
and tries to predict the number of views to popular Youtube videos and votes to promoted Digg stories (stories which appear on the front page). The authors compare our approach with two prediction algorithms based on linear extrapolations of the number of diggs or views and minimizing either the absolute (LN) or relative squared errors (CS). See the resulting mean error curves (±stdv) in the following figure extracted from the preprint.
As one would expect each of the two new approaches performs best for the specific error measure it optimizes while our approach (GP) (which seems to perform better for Youtube than for Digg) falls in between the two. Which on is then the better predictor?. It seems that it depends on how you would like to measure the error and on the dataset as well.
Interestingly, the authors claim that an accuracy of 10% is reached within 2 hours on Digg (10 days on Youtube). However, one should not forget that the error measure used is the average of the squares of the relative error which translates (according to the triangle inequality) into a relative standard error of greater than sqrt(0.1)≈0.31 or ~30% accuracy. Similar accuracy was found also in our study for the expected number of Slashdot comments when considering only the most popular posts and also using only 2 hours of data.
One of the mayor problem in activity prediction are the non-constant activity cycles on websites, which would cause different initial responses to the same story whether it is published during hours of low or high activity. More on this subject in my next blog entry.