Exercise 3 formula

This forum is inactive. Browsing/searching possible.

Connect your moderator Slack workspace to receive post notifications:

Is the formula right? From what I can see −1/2 (xn−μ)T (Σ ^ −1 ) (xn−μ) will result in a (2 ,1) (2,2) (1, 2) which cannot be matmul, but if we switch the terms the expression will result in (1, 2) (2, 2) * (2, 1) that can now be multiplied.

17 Sep '20 ·

anonymous

@razvan, I guess in this course, all vectors are colomns.

1

17 Sep '20 ·

anonymous

In mathematics columns usually represent points (or a column vector of features). As the previous guy told in this course vector of features is represented as a row vector (but in many other math-related literature the features are represented as column vectors). In Problem Set 1 task A, you see that matrix dimensions are (n,d), and that you have n points, hence you have d features, so it row vector of features :))

17 Sep '20 ·

anonymous

I understand the math behind it. In this case d was set to 2 and n to 200, if i remember correctly. Every row is a vector containing 2 elements. 1 row, 2 columns resulting in the dimension I stated before (1, 2). Transposing it will result in dimension (2, 1) which cannot be multiplied by a (2, 2) matrix, the dimension of sigma.

17 Sep '20 ·

anonymous

I think the confusion here arises from the following. This formula for the probability density is specified for a single sample x (also previously called datapoint/point), which is represented as a (single-)column vector of dimension d = (d,1) = (2,1):

$$p(\mathbf x) = \frac{1}{\sqrt{(2\pi)^d|\mathbf\Sigma|}}e^{\left(-\frac 1 2 (\mathbf x-\mathbf\mu)^{T}\mathbf\Sigma^{-1}(\mathbf x-\mathbf\mu)\right)}$$

Therefore, after subtracting the mean of size (d,1) = (2,1), the transpose is needed to multiply it on the right with the covariance matrix of shape (d,d) = (2,2).

Note that in Python a single column vector of dimension d is sometimes written as (d,1) or (d,).

Note that, when processing many of these d-dimensional samples at once, it is common to batch these n samples together in a single matrix of shape (n,d) with the batch dimension n on the spot of the row index. This gives the idea that all of a sudden the sample vectors have become (single-)row vectors with many columns, but in fact when you select a single sample from this, you retrieve the common column vector representation:

import numpy as np
a = np.zeros((200,2))
print(a[0].shape) # extracting the first sample

returns (2,) which is a column vector

17 Sep '20 · 1 ·

anonymous

Page 1 of 1

Add comment

How to style: strictly use the or click here. E.g., $\alpha + \beta$ gives (inline) $\alpha + \beta$. No $\LaTeX$ preview (yet).