Exam 2016 MCQ 5&6


After reading the solution to these two questions, I cannot understand how they work. Specifically, why can \(x^TWx\) can written as \(\sum_{i,j}{W_{i,j}x_ix_j}\) instead of other forms such as \(\sum_{i,j}{x_iW_{i,j}x_j}\) and so on.

Further, why the answer to Q5 a.k.a. \(x_ix_j\) is \(xx^T\) instead of \(x^Tx\)?

For Q6, why is another term of derivative with respect to \(x_i\) is \(x^TW\) and the final answer is \((W+W^T)x\)? It will be great if a more specific explanation could be provided.

Thanks in advance!

{"image":["Unsupported file format. Supported formats are jpeg, jpg, gif."]}

Top comment


  • note that as \(x_i, W_{i,j} \) and \(x_j\) are numbers in \(\mathbb{R}\), you can change the order inside the sum without changing the result, thus both \(\sum_{i,j}{W_{i,j}x_ix_j}\) and \(\sum_{i,j}{x_i W_{i,j}x_j}\) are exactly equal and you can use any of these forms.

  • Assume that \(x \in \mathbb{R}^n, W \in \mathbb{R}^{n\times n} \). Then
    (i) as we take gradient w.r.t to all entries of W, and there are \(n \times n\) entries, and thus the answer should be a matrix of the size \( n \times n \).

(ii) As the solution states, the gradient w.r.t. individual entry \(w_{i,j} \) is equal to \(x_i x_j\) thus the gradient matrix should consist of these elements

(iii) matrix \(x x^\top \in \mathbb{R}^{n\times n}\), has the right shape, and every entry of this matrixis equal to exaclty \(x_i x_j\), thus \(x x^\top\) is equal to the gradient

(iv) \(x^\top x\) is a number that is equal to \(\sum_i x_i^2 \), thus it doesn't has the right shape and it is not a correct answer.

Q6: similar, derivative w.r.t to every element of x should have n elements, and thus it should have a form of a vector in \(R^n \).
Next, we calculate every entry of this gradient: derivative w.r.t. to each individual x_i.

\(\nabla_{x_i} \sum_{k,j}{W_{k,j}x_k x_j} = \nabla_{x_i} \sum_{k = i or j = i}{W_{k,j}x_k x_j} = \nabla_{x_i} \sum_{k = i \& j \neq i}{W_{k,j}x_k x_j} + \nabla_{x_i} \sum_{k \neq i \& j = i}{W_{k,j}x_k x_j} + \nabla_{x_i} W_{i,i}x_i x_i = \sum_{j \neq i}{W_{i,j} x_j} +\sum_{k \neq i }{W_{k,i }x_k} + 2 x_i = \sum_{j }{W_{i,j} x_j} + \sum_{k }{W_{k,i }x_k} = W x + W^\top x \)

Note that there is a typo in the solution, and the answer should be W^\top x instead of x^\top W for the second term.

In this document equation 47 : https://atmos.washington.edu/~dennis/MatrixCalculus.pdf

It is written that the solution is \(x^T(A^T+A)\), I don’t think that this is equivalent to \((A+A^T)x\), but both have the correct dimensions so how to distinguish in this case? And how to determine which of them is the correct form?

Page 1 of 1

Add comment

Post as Anonymous Dont send out notification