Back to all lessons
Machine LearningBeginner

🌳Decision Trees & Random Forests

Learning by asking better questions

Take your time with this one. The interactive parts are here to help you test the idea, not rush through it.

25 min- Explore at your own pace

Before We Begin

What we are learning today

It’s “20 Questions” for data. The tree keeps splitting into smaller groups until it’s confident. A random forest mixes many trees so one bad question doesn’t derail the decision.

How this lesson fits

Here’s where the magic shows up: we stop hand-writing every rule and let data teach the model. Think of it as coaching instead of scripting.

The big question

How can a machine spot patterns from examples the way a student learns from practice problems?

Tell the difference between predicting numbers and discovering patternsInterpret simple models and talk through their outputs in plain EnglishCompare the strengths and tradeoffs of common ML methods

Why You Should Care

Decision trees are visual, explainable, and bridge everyday reasoning with formal ML. Great for classroom demos.

Where this is used today

  • ✓Loan approval systems (bank rules)
  • ✓Medical triage charts
  • ✓Customer support chatbots

Think of it like this

Think of a school nurse diagnosing a student: “Do you have a fever?” “Is it high?” Each answer narrows the possibilities.

Easy mistake to make

A deeper tree isn’t automatically smarter. It can memorize the training data instead of learning a reliable pattern.

By the end, you should be able to say:

  • Explain how a tree chooses a split
  • Interpret leaves, branches, and impurity
  • Explain why combining trees can reduce overfitting

Think about this first

If you had to decide on a loan, what first question would you ask, and why does it matter?

Words we will keep using

splitnodeleafimpurityrandom forest

How Decision Trees Work

A decision tree is just a game of "20 Questions." The computer learns which questions to ask to split the data into clean groups. It is one of the few AI models you can print out and read like a manual.

Gini ImpurityA fancy name for "messiness." The goal is to make groups that are pure (all Yes or all No).
SplittingThe tree tries many possible questions and keeps the one that best separates the classes.
OverfittingIf you ask too many questions, you memorize the specific examples instead of learning the general rule.
Gini impurity is written as G=1−∑kpk2G = 1 - \sum_k p_k^2. You do not need to calculate it by hand right now. Just remember: the smaller it is, the “cleaner” the node is.

Loan Approval Tree — Walk-through

Move the sliders and follow the highlighted path. You can literally watch the model reason its way to a decision.

(Only used if Age < 30)

(Only used if Age ≄ 30)

Decision: ✅ Approve

Random Forests

A single tree can be shaky—change one data point, and the whole structure might flip. A Random Forest solves this by training hundreds of different trees and letting them vote.

for i in 1..N:
  sample = bootstrap(data)
  features = random_subset(features)
  tree_i = DecisionTree(sample, features)
predict = majority_vote(tree_1...tree_N)
Why it helps: A single deep tree can change a lot if the training data changes a little. Averaging many trees makes the final model more stable.
Feature importance: Forests also give a rough sense of which features matter most, which is useful when you want an interpretable summary.