### Classifying with the kernel K

Hello! I can't understand the passages associated to the topic "Classifying with the kernel K" in lecture 7b notes.
We start saying that
y = φ(x)^T w
and considering the representer theorem we arrive at the expression
φ(x)^T φ(X)^T α
but I would have written
φ(x)^T X^T α

Is that right? Are the two expressions equivalent? Because, if they are, I can't see why.

I think that when you do feature augmentation, you have always to consider φ(X), the augmented matrix, and not X, even for the prediction. It wouldn't make sense to augment a datapoint and then use the matrix X to predict, and the dimensions in most of the cases wouldn't even match.

I agree, I would say the same, if you learn parameters in the augmented space, since the kernel is non-linear, you cannot relate them to the original vector in a linear relationship (here matrix multiplication)

@Anonymous said:
Can you explain a bit more why you would have written φ(x)^T X^T α?

I was just applying the result of the representer theorem, saying that w = X^T α.
Hence, φ(x)^T w = φ(x)^T X^T α.
But what the others said completely makes sense. Thanks everyone!

Page 1 of 1