Back to all lessons
Machine LearningIntermediate

🔭Dimensionality Reduction

Keeping the important information

Take your time with this one. The interactive parts are here to help you test the idea, not rush through it.

25 min- Explore at your own pace

Before We Begin

What we are learning today

The art of smart simplification. When datasets have hundreds of columns, dimensionality reduction keeps the essence while trimming the clutter.

How this lesson fits

Here’s where the magic shows up: we stop hand-writing every rule and let data teach the model. Think of it as coaching instead of scripting.

The big question

How can a machine spot patterns from examples the way a student learns from practice problems?

Tell the difference between predicting numbers and discovering patternsInterpret simple models and talk through their outputs in plain EnglishCompare the strengths and tradeoffs of common ML methods

Why You Should Care

More columns aren’t always better. Cleaner, lower-dimensional views can make patterns easier to see and models easier to train.

Where this is used today

  • Visualizing high-dimensional data (t-SNE, UMAP)
  • Compressing images/video
  • Preprocessing for other ML models

Think of it like this

Like casting a shadow of a 3D object. You lose some depth, but from the right angle, the important shape remains.

Easy mistake to make

Dimensionality reduction isn’t random column deletion. It’s a careful mathematical compression that preserves structure.

By the end, you should be able to say:

  • Explain the curse of dimensionality in plain language
  • Describe PCA as finding the directions of greatest variation
  • Connect lower-dimensional views to visualization and compression

Think about this first

If you had to summarize a student with only two numbers, which would you choose to keep the most useful story?

Words we will keep using

dimensionfeaturevarianceprincipal componentprojection

Why We Shrink the Number of Features

Imagine taking a photo of a 3D statue. The photo is 2D, but if you pick the right angle, you can still recognize the shape. Dimensionality reduction is the art of finding that perfect angle—simplifying the data without destroying the meaning.

PCASquashes the data flat, keeping the widest (most varied) view.
t-SNEKeeps neighbors together. Great for visualizing clusters.
AutoencodersNeural networks that learn to zip and unzip data.

PCA — Principal Component Analysis

PCA asks a very practical question: if I had to redraw this dataset using fewer axes, which new directions would keep the most useful information? The first principal component follows the strongest spread in the data, the second follows the next strongest spread, and so on.

PC1=argmaxv=1Var(Xv)\text{PC}_1 = \arg\max_{\|v\|=1} \text{Var}(Xv)

Left: rotating 3D view. Right: PCA projection to 2D (always same orientation).

What PCA is trying to do:

  1. Shift the data so the cloud is centered around the origin
  2. Measure which features tend to vary together using the covariance matrix
  3. Find the directions where the data spreads out the most
  4. Project the data onto the top directions you want to keep
Scree Plot — the bars show how much variation each principal component explains, and the line shows how quickly those pieces add up.
Why people use it: to visualize embeddings, remove noise, speed up later models, and summarize messy datasets in a cleaner way.