Hi there :)
I was asking myself how to deal with the test set when using matrix factorization.
To find the word vector (vector representing a word), we perform matrix factorization on X (so we obtain W and Z) and extract the d(th) row of W to obtain the word vector of word d.
However, if we include the test data in X, the train set will influence the word vectors of the test set (potentially leading to overfitting).
Hence, should we rather not include the test data in X, and compute the matrix factorization only for the test data? This seems a bit dodgy.
This also relates to the Pb 16 in the 2018 exam:
« Logistic regression used for text classification is faster at test time when using word vectors as opposed to bag-of-word representation of the input. »
One easy way to test the quality of the matrix factorization would be removing some ratings from the training set and comparing the removed ratings with the predicted ones. One drawback of matrix factorization for recommendation is you cannot add new users and new movies during the testing. So we cannot really have a training-test split in the classic sense.
Edit: you were mentioning using matrix factorization for another purpose but the same reasoning applies.