Abstract:
This monograph describes author's large experimental work
on one machine learning task
- prediction of movie ratings in the Netflix Prize dataset.
The main objective of the experiments was to obtain maximally accurate
prediction, as evaluated by hold-out RMSE,
but also important was the perspective of applying the developed
methods in recommender systems.
The publication has two goals:
summarizing the understanding of the subject due to the published work
of many people on the same task,
and presenting some novel insights.
Reaching a good understanding of one task and one dataset gives hope
to generalize on other prediction tasks, as similar challenges
recur in analyses of any datasets.
The idea of collaborative filtering is to make use of relations between
tasks (users in our data), and between task attributes (items in our data).
Collaborative filtering methods are used in recommender systems
to calculate personalized recommendations, or in other words,
to identify items preferred by a particular user.
To realize that goal, a good intermediate task is prediction
of user ratings, and the most accurate models for this task are based
on dimensionality reduction, describing
each item by a small number of variables,
which can be seen as automatically learned analogues of movie genres,
and a small number of variables describes each user's taste.
One the most accurate models, regularized SVD, was analyzed more closely,
and the assumptions of that model, such as the single-variable output,
combining hidden variables by multiplication, and using Gaussian priors,
were critically examined.
In addition, an interpretation of the learned features by naming
new movie genres has been proposed.
To learn the parameters in the developed models the best predictive
accuracy was obtained by using different degrees of approximation
of the Bayesian approach, from MCMC and Variational Bayes,
to neural-networks-like simplifications.
When identifying the model, that is, while approaching the unknown
probabilistic model that generated the data, good engineering practice
was maintaining a blend of an ensemble of many accurate, but varied
methods.
Blends of large ensembles also gave the best reached accuracy,
indicating that, despite the large combined effort of many people,
the process of model identification for the analyzed data remained
largely unfinished, which is probably an unavoidable situation
in an analysis of real-life datasets.
The work is complemented by giving heuristics adapting rating prediction
to generate lists of recommendations, heuristics for cold-start
situations, and descriptions of two SVD-based recommender systems.