📝Embeddings & Word2Vec
How words become meaningful vectors
Take your time with this one. The interactive parts are here to help you test the idea, not rush through it.
Pause and experiment as you go.
Before We Begin
What we are learning today
Words become numbers. In this space, “King” - “Man” + “Woman” lands near “Queen,” and synonyms become neighbors.
How this lesson fits
These lessons power the language revolution. We turn words into math and teach models to track context and meaning as they read.
The big question
How can a model capture word meaning, hold onto context, and generate fluent language one token at a time?
Why You Should Care
Embeddings are the bridge between language and geometry. They make modern language tech possible.
Where this is used today
- ✓Semantic search (finding "dog" when you search "puppy")
- ✓Recommendation systems
- ✓Language translation alignment
Think of it like this
It’s a map of meaning. Synonyms share a neighborhood; opposites live across town. Distance signals relatedness.
Easy mistake to make
Embeddings reflect patterns in training text, biases included—they’re not perfect dictionaries.
By the end, you should be able to say:
- Explain why words must be converted into numbers
- Describe what it means for similar words to be close in vector space
- Summarize the idea behind skip-gram training
Think about this first
If “king” and “queen” are related, how might a computer discover that from text alone?
Words we will keep using
Vector Space Semantics
Imagine if words were places on a map. "King" and "Queen" would live next door. "Apple" and "Banana" would be down the street. This is what embeddings do: they turn meaning into geometry.
king - man + woman ≈ queen
In the explorer below, you are looking at Word2Vec embeddings that were originally far larger. We squash them down to 3D so you can move around them and notice that language begins to form neighborhoods.
Word2Vec 10K
PCA 3D Projection
Initialising…
How it works: Skip-gram
The training is surprisingly simple: Pick a word, and ask the model to guess its neighbors. Do this billions of times. Words that appear in similar contexts will naturally drift closer together in vector space.
Training Objective
The formal goal says: given the center word , make the nearby context words as predictable as possible.
Where is defined by the softmax of the dot product:
Cosine Similarity
If two arrows point in the same direction, the words are related. If they point in different directions, they are unrelated. It's that simple.
Try searching for “good” in the explorer above and inspect the nearest neighbors. That is where the abstract idea suddenly starts to feel real.