How does PCA work? What are its assumptions?
Understanding Principal Component Analysis
Principal Component Analysis (PCA) is the "Marie Kondo" of data—it helps you keep the information that "sparks joy" (variance) and discard the clutter (noise/redundancy).
How PCA Works (Step-by-Step)
1. Standardization
PCA is sensitive to scale. If one feature is "Annual Income" (thousands) and another is "Age" (tens), the income will dominate. We center the data by subtracting the mean and scaling to unit variance.
2. Covariance Matrix Computation
We calculate a matrix that expresses how the features vary together. This identifies redundancy: if two variables move in lockstep, we don't need both.
3. Eigendecomposition
We compute Eigenvectors (the directions of the new axes) and Eigenvalues (the magnitude/importance of those directions).
- PC1: The direction of maximum variance.
- PC2: Perpendicular (orthogonal) to PC1, capturing the next highest variance.
4. Feature Vector & Projection
We decide how many components to keep (usually via a Scree Plot). Finally, we multiply the original data by the selected eigenvectors to project it onto the new, lower-dimensional space.
The Core Assumptions
Linearity
PCA assumes the relationships between variables are linear. If your data has complex curves or "spiral" patterns, standard PCA will fail to capture the structure (use Kernel PCA instead).
Variance = Importance
PCA assumes that the directions with the highest variance contain the most information. It treats small variances as "noise" to be discarded.
Sensitivity to Outliers
Extreme values can massively skew the mean and covariance, leading to principal components that don't represent the bulk of the data accurately.
Orthogonality
By design, PCA assumes the new features must be uncorrelated and perpendicular to each other. This is great for fixing multicollinearity.
Ready to Code PCA?
Understanding the math is one thing; implementing it is another. Our 2026 Data Science cohort dives deep into Scikit-learn, PyTorch, and the linear algebra behind every algorithm.