I'm using CF algorithm(SVD) on a real world data set. Now I meet a problem about the data sparse problem. That means the sparsity of the user/item rating matrix is around 0.01%. I split the data into train/test set with 80/20, I find that there're just a few users and items in testing set appear in the training set, so I can just use a few rating in testing set to calculate RMSE. Would you give me some advise to fix it?
Best How To :
In case of recommender systems one usually splits each user's history into train and test. More detailed:
- For each user we write out items he interacted with.
- Preferably, we order them by (incresing) time to overcome "time-traveling issue" (user can revisit already known items, so you don't want to test on early dataset).
- As usual, you use first (1-k) percents of your dataset as a train set and the rest as a test set.