Experimenting Machine Learning with JavaScript applied to Movie Ratings
- Objective
- Predict the rating of movies, based on previous ratings
- Method
- k-nearest neighbors (KNN) regression
- (k-NN is a non-parametric method used for classification and regression. In both cases, the input consists of the k closest training examples in the feature space. In k-NN regression, the output is the average of the values of its k nearest neighbors. More on Wikipedia)
- Training Data
- 380 rated movies (or you can upload your IDMb movie ratings csv)
- Features
- IMDb Rating; Runtime (mins); Year; Num. Votes
- Unlabeled Data
- a list of 100 films (or you can upload your IDMb watchlist csv)
1. Training Data
Select the training dataset:
This requires to have rated movies on IMDb.com- Go to imdb.com & log in
- Open the drop-down menu on your login name (top-right corner)
- Click "Your Ratings"
- Go to the bottom of this page
- Click "Export this list"
453 movie ratings successfully loaded!
2. Features
Select the features that could predict a movie rating:
3 features successfully selected!
3. Unlabeled Data
Select unrated movies to predict the rating:
This requires to have an IMDb.com account and rated movies- Go to imdb.com & log in
- Click "Watchlist"
- Go to the bottom of this page
- Click "Export this list"
xxx unrated movies successfully loaded! (xxx were already in the training set)
4. K
Select K (= number of nearest neighbours to look for):
K successfully selected as xxx!
5. Run the Algorithm Once
Title | IMDB Rating | Runtime | Year | Num. Votes | Prediction |
---|
| | | | | |
The xxx nearest neighbors:
Title | IMDB Rating | Runtime | Year | Num. Votes | You rated | Distance |
---|
- Distance formula = Euclidian distance
- $ \sqrt{(distance_1^2+distance_2^2+...+distance_k^2)} $
- Normalization formula = Feature Scaling
- $ distance_k = \frac{property_k(training) - property_k(unseen)}{range_k} $
Show all predictions
6. All Predictions
Title | Predicted rating |
---|
Title | Predicted rating |
Title | Predicted rating |