Friday, November 8, 2019

Dimensionality Reduction

* It is complex to imagine anything higher than 3D.
* In 2D, distance of any point close to boundary is low. while in high dimensional, it is high.
* In 2D, distance between any point is near by, whereas in high dimensional, it is far.
* It is called Curse of Dimensionality for a reason.

Projection
* Project from higher dimensional to lower dimensional.

Manifold Learning
* modeling manifold on which training instances lie on
* based on manifold hypothesis, most real world high dimensional lie close to much lower dimensional.
* Think of Swiss roll.

PCA
* Principal Component Analysis
* Identifies the hyperplane closest to the data and project the data to that hyperplane.
* choose axis that preserve maximum amount of variance.
* Principal Components refer to axis that has the largest amount of variance of training set. Then the second axis the orthogonal to the first one.
* In scenario where there is multiple feature (high dimensional), principal components will do the axis for each feature.
* Easy way to calculate is by Singular Value Decomposition (SVD) = U S V_T. V_T is the matrix containing principal components.
* The SVD must be applied on data centred on Origin.
* The projected matrix X is X . W_d (subset of V_T columns). choose many dimensions that we want to project it to. in terms of dimension, m x n . n x d = m x d => d dimensions.
* explained_variance_ratio, describe proportion of dataset variance that lies along the axis of each principal component.
** From the values, we know which axis has no significant impact.
** It can be used to determine number of dimension we want. Normally we want 95% variance.
* We can decompress the data by applying: X_transformed . W_d_T.
* There will be lost of data depends on the variance. The mean squared distance between the original data and reconstructed data is called reconstruction error.
* Randomized PCA; faster, use stochastic approach to find d principal components.
* Incremental PCA; does not need all training data to be loaded to memory.

LLE
* Locally Linear Embedding
* rely on closest neighbours.
* Look for low level representation where these local neighbours relationship are preserved.
* First step, Find w such that the distance between x and sigma w . x_neighbours is minimum.
* Second step, find the projected z such that distance between z and sigma w . z_neighbours is minimum.

No comments:

Post a Comment

Artificial Neural Network

Logical Computation With Neuron * It has one or more binary input and one output. * Activate output when certain number of input is active...