I have a question regarding text representation learning and what is considered to be supervised or unsupervised. Why is predicting the next word unsupervised but predicting 'sentiment' of a tweet supervised?
Similarly for matrix factorization, to me we use ratings like 'labels' to train no? wouldn't that make it a supervised method?

Hi,
The basic answer is because 'the next word' comes for free from a text corpus. Basically, your data is a string of words, therefore 'the next word' is not something labeled by a human. On the other hand for the 'sentiment', someone needed to sit and annotate, therefore it is a case of supervised learning. I am not sure about your last point, if you are using the user ratings, then it should be considered supervised, but maybe I am missing something.
Best,
Semih

hi,
thanks for the response, i'll have another watch of the two last video lectures but I recall the matrix factorization problem to be called unsupervised

Matrix factorization is indeed an unsupervised algorithm. given a matrix, it factorizes it into two parts and does not need example factorizations as supervision, for example. It is not different than k-means in this regard.
However, if you use it for the recommender system, you fill the matrix with labels coming from the users. I would say in this case you are using an unsupervised algorithm on supervised data. Naming this supervised or unsupervised is context-dependent. But I would say stick to lecture notes to be safe.

## self-supervised vs unsupervised vs supervised

I have a question regarding text representation learning and what is considered to be supervised or unsupervised. Why is predicting the next word unsupervised but predicting 'sentiment' of a tweet supervised?

Similarly for matrix factorization, to me we use ratings like 'labels' to train no? wouldn't that make it a supervised method?

Thanks for the help

## 1

Hi,

The basic answer is because 'the next word' comes for free from a text corpus. Basically, your data is a string of words, therefore 'the next word' is not something labeled by a human. On the other hand for the 'sentiment', someone needed to sit and annotate, therefore it is a case of supervised learning. I am not sure about your last point, if you are using the user ratings, then it should be considered supervised, but maybe I am missing something.

Best,

Semih

hi,

thanks for the response, i'll have another watch of the two last video lectures but I recall the matrix factorization problem to be called unsupervised

Thanks for the help

hello again - I revisited lectures and in fact in lecture 10 matrix factorization is introduced as unsupervised (page 3)

Matrix factorization is indeed an unsupervised algorithm. given a matrix, it factorizes it into two parts and does not need example factorizations as supervision, for example. It is not different than k-means in this regard.

However, if you use it for the recommender system, you fill the matrix with labels coming from the users. I would say in this case you are using an unsupervised algorithm on supervised data. Naming this supervised or unsupervised is context-dependent. But I would say stick to lecture notes to be safe.

## Add comment