Back to all lessons
FoundationsBeginner

đŸŽČProbability & Distributions

How AI talks about uncertainty

Take your time with this one. The interactive parts are here to help you test the idea, not rush through it.

25 min- Explore at your own pace

Before We Begin

What we are learning today

The world is messy and unpredictable, and that’s okay. AI lives in the land of “maybes,” and probability is its language for talking about how confident we are.

How this lesson fits

Welcome to the bedrock of AI. Think of this module as the class warm-up where we learn how computers follow rules, deal with uncertainty, and search for answers—exactly the skills we’ll lean on all year.

The big question

How can something as ordinary as metal and silicon learn to follow rules, handle uncertainty, and still find its way through a messy world?

Trace a computation step by step and explain the “why” out loudReason about chance with simple, friendly distributionsDescribe how search algorithms pick a smart path forward

Why You Should Care

Probability is the safety net for every prediction we’ll make—from grades to weather to model confidence. It explains why a model can be useful even when it’s not perfect.

Where this is used today

  • ✓Weather forecasting (70% chance of rain)
  • ✓Medical diagnosis (accuracy of test vs probability of disease)
  • ✓Spam filtering (Naive Bayes models)

Think of it like this

Think of checking the sky before school. You don’t know if it will rain, but the clouds give you a “score” that helps you decide whether to pack an umbrella.

Easy mistake to make

Probability doesn’t predict a single outcome with certainty. It describes patterns across many possibilities or repeated trials.

By the end, you should be able to say:

  • Explain probability as a number between 0 and 1
  • Compare Bernoulli, binomial, and normal distributions
  • Use Bayes’ theorem as a way to update beliefs with new evidence

Think about this first

Why is “70% chance of rain” more informative than just “it will rain” or “it won’t”? How would you plan your day differently?

Words we will keep using

probabilitydistributionmeanvarianceevidence

The Language of Uncertainty

Life is random. Models almost never know the future for sure. Instead of saying "It will rain," they say "There is a 92% chance of rain." Probability is the tool we use to measure that uncertainty.

EventThe thing we are watching. A coin landing on heads, or an email being spam.
DistributionThe shape of luck. It shows every possible outcome and how likely it is.
Law of Large NumbersLuck is wild in the short run but predictable in the long run.

Part 1: The Coin Flip (Bernoulli)

đŸȘ™ Coin Flip Simulator

Total: 0 | H: 0 (0%) | T: 0

This is the simplest random experiment in the world. Flip a coin. One trial, two choices: Success or Failure. In math, we call this a Bernoulli trial.

P(X=1)=p,P(X=0)=1−pP(X=1) = p, \quad P(X=0) = 1-p

Mean: E[X]=pE[X] = p   Variance: Var(X)=p(1−p)\text{Var}(X) = p(1-p). Don't worry about the formulas yet. Just see that even a random coin flip has exact rules governing it.

Click Flip ×100. See how the bars jump around? Now keep clicking. The more you flip, the closer you get to 50/50. That is the Law of Large Numbers in action.

Part 2: The Bell Curve (Normal / Gaussian)

🔔 Normal (Bell Curve)

The Bell Curve (Normal distribution) is everywhere. Height, shoe size, test scores—whenever you add up lots of little random factors, you get this shape.

f(x)=12π σe−(x−Ό)22σ2f(x) = \frac{1}{\sqrt{2\pi}\,\sigma} e^{-\frac{(x-\mu)^2}{2\sigma^2}}
  • ÎŒ\mu moves the center left or right
  • σ\sigma controls whether the curve is tight or spread out
  • About 68% of values fall within ±1σ\pm 1\sigma of the mean

Part 3: Counting Successes (Binomial)

📊 Binomial Distribution

Mean = np = 5.00  |  Std = √(np(1-p)) = 1.58

Now repeat that simple yes/no experiment nn times. Instead of asking what happens once, we ask: how many successes do we get in total? That count follows a binomial distribution.

P(K=k)=(nk)pk(1−p)n−kP(K=k) = \binom{n}{k} p^k (1-p)^{n-k}

Set p = 0.5 and make nn bigger. You will see the bars begin to look more and more like a bell curve.

Real uses: How many emails get opened, how many basketball shots go in, or how many patients respond to a treatment.

Part 4: Bayes' Theorem — Updating Beliefs

🔄 Bayes Theorem Calculator

P(H|E) = P(E|H)·P(H) / [P(E|H)·P(H) + P(E|H)·P(H)]

Bayes' Theorem is the math of changing your mind. It tells you exactly how to update your beliefs when you see new evidence.

P(H∣E)=P(E∣H) P(H)P(E)P(H|E) = \frac{P(E|H)\,P(H)}{P(E)}
  • P(H) is your starting belief
  • P(E|H) asks how likely the evidence would be if the hypothesis were true
  • P(E) is the overall chance of seeing that evidence
  • P(H|E) is your new belief after taking the evidence into account
Classic example: A disease is rare, but the test is fairly good. Even then, a positive result may still mean the disease is unlikely because false positives add up.P(disease|+) ≈ 4.3% in this example.

This idea shows up everywhere in AI, from spam filters to medical decision systems.

Key Takeaways

  • Probability helps you talk about uncertainty instead of pretending every answer is exact.
  • A distribution describes the full range of outcomes, not just one guess.
  • The bell curve appears naturally when many small factors combine.
  • These ideas are basic tools for later topics such as HMMs, classifiers, and neural network outputs.