⬅️Training & Backpropagation
How a network learns from mistakes
Take your time with this one. The interactive parts are here to help you test the idea, not rush through it.
Pause and experiment as you go.
Before We Begin
What we are learning today
This is the “learning” in deep learning. When the network errs, we send the feedback backward, nudging every neuron that played a part.
How this lesson fits
Inspired by the brain, powered by math. Here we’ll treat neural nets like a story of information flowing through layers, changing just enough each time to become something meaningful.
The big question
How do stacks of numbers and weights turn raw input into a confident prediction?
Why You Should Care
Backprop turns a mysterious term into a clear sequence of adjustments. Without it, training modern networks would be painfully inefficient.
Where this is used today
- ✓Training every modern AI model
- ✓Optimizing supply chains
- ✓Financial model tuning
Think of it like this
Like a coach reviewing game tape: “You missed here—adjust this next time.” Each player (neuron) learns a small lesson.
Easy mistake to make
Backpropagation does not mean the network understands its mistakes like a person. It is a mathematical way to measure how each weight contributed to the error.
By the end, you should be able to say:
- Explain backpropagation as sending error information backward through the network
- Connect gradients to how weights change during training
- Describe why small updates repeated many times can produce learning
Think about this first
If you miss a basketball shot, what kind of feedback helps you improve on the next try?
Words we will keep using
Backpropagation: How the Network Learns from Mistakes
Backpropagation sounds intimidating, but it's really just a "blame game." When the network makes a mistake, we trace the error backward through the connections to find out which weights were responsible. Then we nudge them to do better next time.
The chain rule is just a way of tracing influence. The output depends on the hidden units, the hidden units depend on the weights, so the error can be followed all the way back to each parameter.
The 5 Moves to Watch
Live 2-Layer Network — Watch Weights Update
Node colour: green = high, red = low. Blue edges = positive weight, red = negative. Dashed node = true label y.
Loss curve
Click 100 Steps to watch it learn!
Current weight values
Vanishing & Exploding Gradients
In very deep networks, the gradient can shrink until learning becomes painfully slow, or grow until training becomes unstable. That is why modern architectures use tools like ReLU, normalization, and residual connections to keep learning healthy.