Optimizing over the space of rank 1 matrices is equivalent to the space of matrices with nuclear norm <= 1

Good afternoon,

I have a few questions regarding homework 8 part 2, where we want to find the projection on \(X\). I am not sure to understand clearly why it is equivalent to minimize on matrices with nuclear norm \(\leq\) 1. I must say, I never met the nuclear norm before, so maybe I missed a key property here.
The part where we take the SVD of the matrix C confuses me a bit. When we rewrite \(C = \sum_i \sigma_i u_i v_i^\top\), I first thought we did so because \(C\) is a convex combination of the elements of \(X\), but we also have that \(\sum \sigma_i \leq 1\). In case we have a convex combination, I believe the sum of "convex coefficients" should be exactly 1, so the reason is probably different. Then I wondered if the fact that \(\sum \sigma_i \leq 1\) (and not exactly 1) is related to the fact that the rank of matrices in \(X\) is 1. I remember some property relating the trace and the eigenvalues for symmetric matrices exists, so maybe is it similar for singular values ?

Thanks for your help

Top comment

First for any rank 1 matrix M which belongs to A, we know that -M also belongs to A, then \( 0=\frac{1}{2} M + \frac{1}{2} (-M)\) is inside space X=conv(A).

For exercise 2, "optimization domain X is the unit ball of trace norm (or nuclear norm)" is given so you can directly convert \(C\in X\) to \( ||C||_*\le1\). This is because the nuclear norm is defined as the sum of singular values. Consider the form \(C=\sum_i\sigma_iu_iv_i^\top\). If \( ||C||_* =\sum_i\sigma_i= 1\), then \(C\in \mathcal{A} \subset X \); if \( ||C||_*=\sum_i\sigma_i < 1\), then \(C= \sum_i\sigma_iu_iv_i^\top + \frac{1-\sum_i\sigma_i}{2} M + \frac{1-\sum_i\sigma_i}{2} (-M) \) shows \(C\in X\). We can also prove that for all \(C\in X\) indicates \(||C||_*\le1\) in a similar way.

For the second question, we considered the case when the input matrix S is already in the set X. So \( \sum\sigma_i \le 1\) is used instead of \( \sum\sigma_i = 1\). Note that in the section 3.5 of the lecture notes excluded this trivial case from Fact 3.6.

Ok, I see now, I forgot about the line stating the fact in the exercice... Thank you explaining me the equivalence, now I understand better

Page 1 of 1

Add comment

Post as Anonymous Dont send out notification