Neural NetworksIntermediate

⬅️Training & Backpropagation

How a network learns from mistakes

Take your time with this one. The interactive parts are here to help you test the idea, not rush through it.

30 min

Pause and experiment as you go.

30 min- Explore at your own pace

Before We Begin

What we are learning today

This is the “learning” in deep learning. When the network errs, we send the feedback backward, nudging every neuron that played a part.

How this lesson fits

Inspired by the brain, powered by math. Here we’ll treat neural nets like a story of information flowing through layers, changing just enough each time to become something meaningful.

The big question

How do stacks of numbers and weights turn raw input into a confident prediction?

Trace information through a neural network in clear, simple languageExplain why activations and gradients matter for learningConnect specialized architectures to images and perception tasks

Why You Should Care

Backprop turns a mysterious term into a clear sequence of adjustments. Without it, training modern networks would be painfully inefficient.

Where this is used today

✓Training every modern AI model
✓Optimizing supply chains
✓Financial model tuning

Think of it like this

Like a coach reviewing game tape: “You missed here—adjust this next time.” Each player (neuron) learns a small lesson.

Easy mistake to make

Backpropagation does not mean the network understands its mistakes like a person. It is a mathematical way to measure how each weight contributed to the error.

By the end, you should be able to say:

Explain backpropagation as sending error information backward through the network
Connect gradients to how weights change during training
Describe why small updates repeated many times can produce learning

Think about this first

If you miss a basketball shot, what kind of feedback helps you improve on the next try?

Words we will keep using

lossgradientlearning rateupdatebackpropagation

Backpropagation: How the Network Learns from Mistakes

Backpropagation sounds intimidating, but it's really just a "blame game." When the network makes a mistake, we trace the error backward through the connections to find out which weights were responsible. Then we nudge them to do better next time.

\frac{\partial L}{\partial w_1} = \frac{\partial L}{\partial \hat{y}} \cdot \frac{\partial \hat{y}}{\partial h_1} \cdot \frac{\partial h_1}{\partial w_1}

The chain rule is just a way of tracing influence. The output depends on the hidden units, the hidden units depend on the weights, so the error can be followed all the way back to each parameter.

The 5 Moves to Watch

Step 1: Forward Pass

Step 2: Compute Loss

Step 3: Output Gradient

Step 4: Backprop to Hidden

Step 5: Weight Update

Live 2-Layer Network — Watch Weights Update

Node colour: green = high, red = low. Blue edges = positive weight, red = negative. Dashed node = true label y.

Loss curve

Click 100 Steps to watch it learn!

x₁: 0.5x₂: 0.8y: 1.0

Learning rate α: 0.50

Network state (epoch 0)

h₁ = 0.5474 h₂ = 0.6479

ŷ = 0.7363 y = 1.0000

L = 0.034778

Current weight values

w1 (x1→h1)0.500

w2 (x2→h1)-0.200

w3 (x1→h2)0.300

w4 (x2→h2)0.700

b1 (bias h1)0.100

b2 (bias h2)-0.100

wOut1 (h1→ŷ)0.800

wOut2 (h2→ŷ)0.600

bOut0.200

Vanishing & Exploding Gradients

In very deep networks, the gradient can shrink until learning becomes painfully slow, or grow until training becomes unstable. That is why modern architectures use tools like ReLU, normalization, and residual connections to keep learning healthy.