Dimensionality Reduction Techniques
Dimensionality reduction simplifies high-dimensional data while preserving structure.
PCA
PCA (Principal Component Analysis) finds orthogonal directions of maximum variance. PCA(n_components=k) selects k components.
explained_variance_ratio_ shows variance captured. transform(X) projects data. inverse_transform recovers approximate original.
t-SNE
t-SNE provides non-linear dimensionality reduction for visualization. Perplexity controls balance between local and global structure.
tsne = TSNE(n_components=2). perplexity=30. Results are stochastic; set random_state.
UMAP
UMAP is faster than t-SNE and preserves global structure better. UMAP(n_components=2).
Key Takeaways
- PCA is linear and interpretable
- t-SNE is for visualization, not downstream tasks
- UMAP balances local and global structure