Hi! In the solution of the exercises about Poisson regression, at point b for example, the final expression is \(X^T(y - exp (X^T \beta))\), but I can't understand why, because, considering the dimensions - if I understood correctly X is a Nx(D+1) matrix and \(\beta\) is a (D+1)-dim vector - I think it should be \(X^T(y - exp (X \beta))\), so the X inside the exponential shouldn't be transpose. Am I missing something?
Related to the same exercise, I don't understand why the computational complexity should be \(O(D^3+ND^2)\). I would have also written \(O(D^3+ND^2+N^2D)\), because to compute the Hessian you have to compute \(X^T diag(exp(X \beta)) X\), therefore I would also consider an \(O(N^2D)\) for the computation of \(diag(exp(X \beta)) X\). Thanks for your answer!
Also related to this exercise, I am a bit confused with the notation. Shouldn't it be y instead of yn in ecucation 19?
I don't understand why we have the subindex
I'm also struggling to understand why we used an indicator function for multi-class classification and not in this case. Why did we quickly assume that \(k\) was \(yn\)?
Poisson exercise - Mock Exam 2014
Hi! In the solution of the exercises about Poisson regression, at point b for example, the final expression is \(X^T(y - exp (X^T \beta))\), but I can't understand why, because, considering the dimensions - if I understood correctly X is a Nx(D+1) matrix and \(\beta\) is a (D+1)-dim vector - I think it should be \(X^T(y - exp (X \beta))\), so the X inside the exponential shouldn't be transpose. Am I missing something?
1
Related to the same exercise, I don't understand why the computational complexity should be \(O(D^3+ND^2)\). I would have also written \(O(D^3+ND^2+N^2D)\), because to compute the Hessian you have to compute \(X^T diag(exp(X \beta)) X\), therefore I would also consider an \(O(N^2D)\) for the computation of \(diag(exp(X \beta)) X\). Thanks for your answer!
2
Also related to this exercise, I am a bit confused with the notation. Shouldn't it be y instead of yn in ecucation 19?
I don't understand why we have the subindex
I'm also struggling to understand why we used an indicator function for multi-class classification and not in this case. Why did we quickly assume that \(k\) was \(yn\)?
Add comment