Machine LearningBeginner

📈Regression & Classification

Predicting numbers and choosing categories

Take your time with this one. The interactive parts are here to help you test the idea, not rush through it.

30 min

Pause and experiment as you go.

30 min- Explore at your own pace

Before We Begin

What we are learning today

Two pillars of prediction: “How much?” (regression) and “What is it?” (classification). One predicts a number, the other chooses a label.

How this lesson fits

Here’s where the magic shows up: we stop hand-writing every rule and let data teach the model. Think of it as coaching instead of scripting.

The big question

How can a machine spot patterns from examples the way a student learns from practice problems?

Tell the difference between predicting numbers and discovering patternsInterpret simple models and talk through their outputs in plain EnglishCompare the strengths and tradeoffs of common ML methods

Why You Should Care

These are your first real models. Nail these and neural networks will feel far less mysterious.

Where this is used today

✓Predicting house prices (Regression)
✓Diagnosing benign vs malignant tumors (Classification)
✓Forecasting stock trends (Regression)

Think of it like this

Regression is estimating the price of a house from its size. Classification is deciding if the object is a house, a boat, or a tree.

Easy mistake to make

Logistic regression is actually a *classification* model. The name is misleading; it estimates class probabilities.

By the end, you should be able to say:

Tell the difference between regression and classification
Interpret a fitted line and a decision boundary
Relate model outputs to common evaluation metrics

Think about this first

Which is regression and which is classification: predicting an exact exam score or predicting pass/fail?

Words we will keep using

regressionclassificationdecision boundaryerrorprobability

Linear Regression

Linear regression asks: "What is the number?" (e.g., price, temperature). It tries to draw a straight line that passes as close as possible to all your data points.

\text{MSE} = \frac{1}{n}\sum_{i=1}^{n}(\hat{y}_i - y_i)^2

To find the best line, the computer plays a game of "hot or cold." It nudges the line slightly, checks if the error gets smaller, and repeats. This process is called gradient descent.

m \leftarrow m - \alpha \frac{\partial \text{MSE}}{\partial m}, \quad b \leftarrow b - \alpha \frac{\partial \text{MSE}}{\partial b}

Gradient Descent on MSE Loss

Learning rate α: 0.01

y = 0.00x + 5.00
MSE: 0.000

Try a large learning rate (α ≈ 0.04) and watch the loss. Too large → oscillation; too small → slow convergence.

Logistic Regression (Classification)

Logistic regression asks: "Yes or No?" (e.g., Spam or Not Spam). Instead of a raw number, it gives you a probability between 0% and 100%.

\hat{p} = \sigma(w_1 x_1 + w_2 x_2 + b) = \frac{1}{1+e^{-(w_1 x_1 + w_2 x_2 + b)}}

Logistic Regression — Decision Boundary

w₁: 1.0w₂: -1.0bias: 0.0

Drag the sliders and watch the decision boundary move. That boundary is the place where the model is exactly undecided, with $\hat{p}=0.5$ .

Blue points belong to one class, red points to the other.
The background color shows what the model currently believes.
The live score changes as soon as your boundary moves.

Notice the limitation: logistic regression can only draw a straight dividing line. If the pattern is curved, we need a more flexible model.

Model Evaluation Metrics

Accuracy is a trap. If 99% of emails are safe, a model that says "Safe" every time is 99% accurate but 100% useless at catching spam. We need better scoreboards.

The four cells

TP (True Positive) — correctly predicted positive

FP (False Positive) — predicted positive, actually negative (Type I error)

FN (False Negative) — predicted negative, actually positive (Type II error)

TN (True Negative) — correctly predicted negative

Accuracy = (TP+TN) / N. Fine when the classes are balanced, but risky when one class is rare.

Precision = TP / (TP+FP). When you say “positive,” how often are you right?

Recall (TPR) = TP / (TP+FN). Of the real positives, how many did you actually catch?

F1 combines precision and recall into one score when both matter.

ROC-AUC measures ranking quality across many thresholds, not just one fixed cutoff.

Drag threshold — watch the orange dot move along the curve

Model parameters

w₁ = 1.5w₂ = -1.5bias = 0.0Threshold = 0.50Predict positive if š(z) ≥ threshold

Confusion Matrix

	Pred +	Pred −
Actual +	TP = 25	FN = 15
Actual −	FP = 17	TN = 23

Live metrics at t = 0.50

Accuracy

60%

Precision

60%

Recall

63%

F1 Score

61%

When classes are imbalanced

If one class is rare, accuracy can hide failure. In those cases, precision, recall, F1, and PR-AUC usually tell a more honest story.

Threshold trade-off

If you lower the threshold, the model says “positive” more often. That usually helps recall but hurts precision. You are trading one kind of mistake against another.

Beyond binary classification

Different tasks need different scoreboards. There is no single metric that is best for every problem.

Regression Evaluation Metrics

When the output is a number, the question becomes: how far off were we? That is why regression uses error-based metrics instead of a confusion matrix.

\text{MAE} = \frac{1}{n}\sum|y_i - \hat{y}_i|

Mean Absolute Error — robust to outliers, interpretable in original units

\text{MSE} = \frac{1}{n}\sum(y_i - \hat{y}_i)^2

Mean Squared Error — penalises large errors heavily; used as training loss

\text{RMSE} = \sqrt{\text{MSE}}

Root MSE — same units as target, more interpretable than MSE

R^2 = 1 - \frac{\text{SS}_{\text{res}}}{\text{SS}_{\text{tot}}}

R² (coefficient of determination) — proportion of variance explained. 1.0 = perfect, 0 = no better than predicting the mean

Linear vs Logistic — Key Differences

Linear RegressionYou use this when the answer should be a number: house price, height, temperature, and so on.

Logistic RegressionYou use this when the answer should be a class or label, usually by predicting a probability first.