Predicting movie ratings and recommender systems - a monograph
by A. Paterek (pdf file, 195 pages)
A 195-page monograph by a top-1% Netflix Prize contestant. Learn about the famous machine learning competition. Improve your machine learning skills. Learn how to build recommender systems.
What's inside:
- introduction to predictive modeling,
- a comprehensive summary of the Netflix Prize,
- detailed description of my top-50 Netflix Prize solution predicting movie ratings,
- summary of methods published by others - RMSE's from different papers listed and grouped in one place,
- detailed analysis of matrix factorizations / regularized SVD,
- how to interpret the factorization results - new, most informative movie genres
(see how I use it here and here),
- how to adapt the algorithms developed for the Netflix Prize to calculate good quality personalized recommendations,
- dealing with the cold-start: simple content-based augmentation,
- description of two rating-based recommender systems realized by me (see one of them in action),
- commentary on everything: novel and unique insights, know-how from >9 years of practicing and analysing predictive modeling.
Must-have for:
- people interested in a comprehensive summary of the developments
around the Netflix Prize contest,
- for people developing recommender systems based on ratings -
the publication can potentially save you hundreds of hours of work,
and maybe give a tech edge over the competition.
Can be useful for:
- people interested in machine learning and prediction
- the Netflix Prize task is rare case of a prediction task
analysed well,
- for people interested in deep learning, to see how to train one hidden layer well,
- for people competing in prediction contests, to better understand
the time-efficient way to obtain maximally accurate predictions,
- for software developers trying to write their own recommender system
or wanting to understand the know-how behind recommender systems,
- for adepts of physics and other natural sciences,
to better understand how to make best use of gathered data,
how to properly take into account different kinds of uncertainties,
always present when doing inference, no matter how much data is gathered,
and learn how to perform model identification
in a time-efficient way, by maintaining ensembles of methods,
- for applied mathematicans, who want to see the surprising,
but necessary complexity behind a simply formulated real-life
prediction task,
- for traders, risk specialists, gamblers and bookmakers, who need
very accurate predictions, up to the last one percent of accuracy possible,
- for data analysts, to learn tricks from another experienced data analyst,
learn how to develop simpler and more accurate methods, with less effort,
and master better the data analysis process: choosing the right task,
gathering the right data, identifying the underlying probabilistic model,
and finding the best methods solving the task,
- for everyone who was taught the maximum likelihood method and other
methods of classical statistics, to learn about the more accurate
approximate Bayesian approaches,
- for anyone planning a career in one of the top 5 professions,
(according to this study):
software engineer, mathematican, actuary, statistician, computer systems
analyst, to see what the practical, modern data analysis is about,
a subject rarely properly taught at universities,
- for people interested in film theory and genre theory,
to see how the automatically learned movie genres relate to the traditional
movie genre taxonomy.