Back to all lessons
Neural NetworksIntermediate

⬅️Training & Backpropagation

How a network learns from mistakes

Take your time with this one. The interactive parts are here to help you test the idea, not rush through it.

30 min- Explore at your own pace

Before We Begin

What we are learning today

This is the “learning” in deep learning. When the network errs, we send the feedback backward, nudging every neuron that played a part.

How this lesson fits

Inspired by the brain, powered by math. Here we’ll treat neural nets like a story of information flowing through layers, changing just enough each time to become something meaningful.

The big question

How do stacks of numbers and weights turn raw input into a confident prediction?

Trace information through a neural network in clear, simple languageExplain why activations and gradients matter for learningConnect specialized architectures to images and perception tasks

Why You Should Care

Backprop turns a mysterious term into a clear sequence of adjustments. Without it, training modern networks would be painfully inefficient.

Where this is used today

  • Training every modern AI model
  • Optimizing supply chains
  • Financial model tuning

Think of it like this

Like a coach reviewing game tape: “You missed here—adjust this next time.” Each player (neuron) learns a small lesson.

Easy mistake to make

Backpropagation does not mean the network understands its mistakes like a person. It is a mathematical way to measure how each weight contributed to the error.

By the end, you should be able to say:

  • Explain backpropagation as sending error information backward through the network
  • Connect gradients to how weights change during training
  • Describe why small updates repeated many times can produce learning

Think about this first

If you miss a basketball shot, what kind of feedback helps you improve on the next try?

Words we will keep using

lossgradientlearning rateupdatebackpropagation

Backpropagation: How the Network Learns from Mistakes

Backpropagation sounds intimidating, but it's really just a "blame game." When the network makes a mistake, we trace the error backward through the connections to find out which weights were responsible. Then we nudge them to do better next time.

Lw1=Ly^y^h1h1w1\frac{\partial L}{\partial w_1} = \frac{\partial L}{\partial \hat{y}} \cdot \frac{\partial \hat{y}}{\partial h_1} \cdot \frac{\partial h_1}{\partial w_1}

The chain rule is just a way of tracing influence. The output depends on the hidden units, the hidden units depend on the weights, so the error can be followed all the way back to each parameter.

The 5 Moves to Watch

Step 1: Forward Pass
Step 2: Compute Loss
Step 3: Output Gradient
Step 4: Backprop to Hidden
Step 5: Weight Update

Live 2-Layer Network — Watch Weights Update

Node colour: green = high, red = low. Blue edges = positive weight, red = negative. Dashed node = true label y.

Loss curve

Click 100 Steps to watch it learn!

Network state (epoch 0)
h₁ = 0.5474 h₂ = 0.6479
ŷ = 0.7363 y = 1.0000
L = 0.034778

Current weight values

w1 (x1→h1)0.500
w2 (x2→h1)-0.200
w3 (x1→h2)0.300
w4 (x2→h2)0.700
b1 (bias h1)0.100
b2 (bias h2)-0.100
wOut1 (h1→ŷ)0.800
wOut2 (h2→ŷ)0.600
bOut0.200

Vanishing & Exploding Gradients

In very deep networks, the gradient can shrink until learning becomes painfully slow, or grow until training becomes unstable. That is why modern architectures use tools like ReLU, normalization, and residual connections to keep learning healthy.