Neural NetworksIntermediate

🧠Feedforward Neural Networks

From neurons to layered predictions

Take your time with this one. The interactive parts are here to help you test the idea, not rush through it.

35 min

Pause and experiment as you go.

35 min- Explore at your own pace

Before We Begin

What we are learning today

A digital bucket brigade. Each layer mixes inputs, applies a rule, and passes the result on. Stack enough layers and you capture patterns a straight line could never handle.

How this lesson fits

Inspired by the brain, powered by math. Here we’ll treat neural nets like a story of information flowing through layers, changing just enough each time to become something meaningful.

The big question

How do stacks of numbers and weights turn raw input into a confident prediction?

Trace information through a neural network in clear, simple languageExplain why activations and gradients matter for learningConnect specialized architectures to images and perception tasks

Why You Should Care

Neural nets sit at the heart of modern AI. Understanding the forward flow makes training, vision, and language models feel approachable.

Where this is used today

✓Simple digit recognition (MNIST)
✓Approximating complex functions
✓Control systems in simple robots

Think of it like this

Like a rumor passing through a crowd. It changes slightly at each person, and by the end, it might reveal a clearer story.

Easy mistake to make

Neural nets are inspired by brains but remain simplified math machines, not realistic brain simulations.

By the end, you should be able to say:

Identify inputs, hidden layers, weights, and outputs
Explain why activation functions make networks more expressive
Trace a simple forward pass through the network

Think about this first

Why might stacking several simple calculations beat one single straight-line rule?

Words we will keep using

neuronlayerweightbiasactivation

Feedforward Neural Networks

A feedforward neural network is a bucket brigade of information. Each layer takes the data, mixes it up, transforms it, and hands it to the next layer. If you understand this forward flow, you understand the skeleton of deep learning.

h^{(l)} = \sigma\!\left(W^{(l)}\, h^{(l-1)} + b^{(l)}\right)

Why non-linearity?If you don't add a non-linear activation, the whole network collapses into a single straight-line rule. No matter how deep you make it, it can't learn curves.

Why layers helpExtra layers let the network build up complexity step by step—finding edges, then shapes, then objects.

What gets learnedThe network doesn't change the math. It changes the weights—tuning the connections until the output looks right.

Activation Functions

relu

sigmoid

tanh

gelu

linear

RELU—max(0, x)Used in: ResNets, most modern CNNs

Different activations change how flexible the network can be. Modern language models often use GELU because it behaves smoothly and trains well at scale.

Interactive Forward Pass

Node colour: Green = active (firing), Red = inactive (suppressed). Values shown inside.

Input values

x1: 0.80x2: -0.30

Architecture

2 → 4 → 3 → 1 — Activation: relu

Output: 0.495

Decision Boundary

The decision boundary is the line where the network changes its mind. On one side, it says "Yes"; on the other, "No". This is the best place to see why non-linearity matters—try switching to Linear and see how the boundary gets stuck as a straight line.

Linear (no activation)
The model can only draw straight boundaries, no matter how many layers you stack.

ReLU / GELU / Tanh
These activations bend the model away from a straight line, which is why the network can handle richer patterns.

Key insight
Depth alone is not enough. You need depth and non-linearity together.

Layer Computation Trace

This table shows the first hidden layer in slow motion. Each neuron multiplies the inputs by weights, adds them up, adds a bias, and then sends the result through the activation function.

z_j = \sum_i w_{ji}\, x_i + b_j \qquad h_j = \sigma(z_j)

Neuron	w1·x1	w2·x2	+bias	= z
h1	0.30·0.80	-0.90·-0.30	0.14	0.645
h2	-0.68·0.80	-0.25·-0.30	-0.01	-0.477
h3	-0.80·0.80	0.71·-0.30	-0.15	-0.998

The shading shows how much each piece contributes. This is the arithmetic hidden inside the network diagram above.