"What Are Embeddings? How AI Turns Meaning Into Numbers It Can Compare"

When you search your notes for "morning routine," a keyword search finds only notes that contain those exact words. But what about the note titled "How I start my day" — or "First-hour habits" — or "AM ritual"? Same idea, different words. A keyword search misses all of them. An embedding is a list of numbers that captures the meaning of a piece of text, so that pieces with similar meanings end up close together in a mathematical space — and can be found even when the words don't match.

That one idea powers semantic search, AI question-answering over documents, and a growing number of features in AI-first tools. Let's build up to it from scratch.

Start with what a computer can't do

Computers are very good at comparing identical things — the word "cat" either matches or it doesn't. What they're bad at, by default, is comparing ideas. "Cat," "feline," and "kitten" mean related things to you, but to a character-by-character comparison they're completely unrelated strings.

The problem isn't storage. It's representation. A word stored as the characters c-a-t tells the computer nothing about its relationship to anything else. For a computer to reason about meaning, you need to encode meaning in a form it can operate on.

Numbers are a form a computer can operate on.

The key insight: meaning as location

Imagine placing every word in a giant space — not a physical room, but a mathematical one with hundreds of dimensions. Each word gets a location in that space: a list of coordinates, one number per dimension. Words used in similar contexts — "cat," "dog," "kitten," "puppy" — end up clustered near each other. Words that rarely appear together end up far apart.

That list of coordinates is called an embedding (or sometimes a "vector representation"). The word embedding itself refers to the fact that meaning has been embedded into a geometric space. Two pieces of text with similar meaning will have similar coordinate lists — and measuring the distance between two coordinate lists is something a computer can do instantly.

Recap: An embedding is a list of numbers (coordinates) representing a piece of text's meaning. Similar meanings → similar coordinates → small distance apart.

A worked example

Suppose you have three notes:

"Review the quarterly budget before the board meeting."
"Check the financials — board presentation is Thursday."
"Morning run: 5km, felt great."

An embedding model turns each note into a list of, say, 1,536 numbers. You don't have to read those numbers — just know that notes 1 and 2 will produce lists that are close to each other (both are about reviewing financial documents ahead of a meeting), and note 3 will produce a list that's far from both (running has nothing to do with budgets or board meetings).

When you ask, "What do I need to prepare for the board meeting?", the model turns your question into its own embedding — a coordinate list — and finds the notes whose coordinates are closest. Notes 1 and 2 surface. Note 3 doesn't.

No tags. No keywords. Just geometry.

How embeddings get built

Embeddings aren't hand-designed. They come from training a model on enormous amounts of text and letting the model learn, through billions of adjustments, which words and sentences tend to appear in similar contexts. Words that appear near the same neighbors end up near each other in the embedding space.

The result is a model that has encoded a rough geometry of meaning. You feed in any piece of text and it returns that text's position in the meaning-space. The model doesn't "understand" text the way you do — but it has learned a map of how language tends to cluster, and that map is good enough to be practically useful.

Why this matters for how you work

Embeddings show up wherever you want to find things by meaning rather than exact words:

Semantic search — finding notes, documents, or messages that match the idea of your query, even when the words are different.
Ask-your-notes — when an AI answers a question using your documents, it first uses embeddings to find the relevant passages, then reads those passages to form an answer. The embedding step is what makes it fast; without it, the model would have to read everything.
Suggested connections — surfaces notes that are conceptually related to the one you're currently reading, even if they share no keywords.
Duplicate detection — finds notes that say the same thing in different words, so you can consolidate instead of accumulate.

In JustJot.ai, the semantic search feature uses exactly this: your notes are stored as embeddings, and when you search, your query is turned into an embedding and compared against them. That's how it finds "morning ritual" when you type "AM routine."

Try this

The next time you search for something in a notes app, try two versions of the query: one using the exact words you think you used ("budget review"), and one using a related phrase you wouldn't have tagged it with ("preparing for a meeting"). If you're using a tool with semantic search, the second query should still find the right note. If it does, you've just seen embeddings at work.

Once you know that "meaning as location" is the mechanism, a lot of what AI tools do starts to make geometric sense.