🔵Clustering & K-Means
Finding groups without labels
Take your time with this one. The interactive parts are here to help you test the idea, not rush through it.
Pause and experiment as you go.
Before We Begin
What we are learning today
Finding order in chaos. Without labels, clustering groups similar points to reveal hidden structure we might otherwise miss.
How this lesson fits
Here’s where the magic shows up: we stop hand-writing every rule and let data teach the model. Think of it as coaching instead of scripting.
The big question
How can a machine spot patterns from examples the way a student learns from practice problems?
Why You Should Care
Not all ML starts with answer keys. Sometimes the goal is simply to reveal structure and patterns.
Where this is used today
- ✓Customer segmentation for marketing
- ✓Image compression (color quantization)
- ✓Grouping search results by topic
Think of it like this
Like sorting a box of LEGO bricks without instructions. You could cluster by color or size—several answers can make sense.
Easy mistake to make
K-means doesn’t discover a single “true” answer. Different choices of K can produce different but valid groupings.
By the end, you should be able to say:
- Explain what makes clustering unsupervised
- Describe the two repeating steps of K-means
- Interpret the elbow method as a way to choose K
Think about this first
If you sorted a box of mixed items without labels, what clues would you use to form groups?
Words we will keep using
Clustering: Finding Hidden Groups
Clustering is like sorting a bucket of mixed LEGOs when you lost the instruction manual. You don't know what the groups are supposed to be, so you organize them by what looks similar—color, size, or shape.
K-Means Algorithm
- Guess: Drop K center points (centroids) randomly on the map.
- Assign: Every data point joins the team of the closest centroid.
- Update: Each team finds its new center of gravity and moves the centroid there.
- Repeat until nothing moves anymore.
Step-by-Step K-Means
Press start and watch the two repeating moves: assign points, then move centroids.
Phase: Init
Points: 0/90 assigned
Choosing K — The Elbow Method
How many clusters should you use? The "Elbow Method" is a rule of thumb: keep adding clusters until the improvement slows down. It's like eating pizza—the first slice is amazing, the fifth one is just okay.
Red dot = elbow at K=3. Adding more clusters beyond this gives diminishing returns.
- Silhouette score: asks whether points are close to their own cluster and far from other clusters
- Gap statistic: compares your clustering result to what random data would look like
- Domain knowledge: sometimes you already know how many groups make sense