LLMs contain a LOT of parameters. But what’s a parameter?


When a model is trained, each word in its vocabulary is assigned a numerical value that captures the meaning of that word in relation to all the other words, based on how the word appears in countless examples across the model’s training data.

Each word gets replaced by a kind of code?

Yeah. But there’s a bit more to it. The numerical value—the embedding—that represents each word is in fact a list of numbers, with each number in the list representing a different facet of meaning that the model has extracted from its training data. The length of this list of numbers is another thing that LLM designers can specify before an LLM is trained. A common size is 4,096.

Every word inside an LLM is represented by a list of 4,096 numbers?  

Yup, that’s an embedding. And each of those numbers is tweaked during training. An LLM with embeddings that are 4,096 numbers long is said to have 4,096 dimensions.

Why 4,096?

It might look like a strange number. But LLMs (like anything that runs on a computer chip) work best with powers of two—2, 4, 8, 16, 32, 64, and so on. LLM engineers have found that 4,096 is a power of two that hits a sweet spot between capability and efficiency. Models with fewer dimensions are less capable; models with more dimensions are too expensive or slow to train and run. 

Using more numbers allows the LLM to capture very fine-grained information about how a word is used in many different contexts, what subtle connotations it might have, how it relates to other words, and so on.

Back in February, OpenAI released GPT-4.5, the firm’s largest LLM yet (some estimates have put its parameter count at more than 10 trillion). Nick Ryder, a research scientist at OpenAI who worked on the model, told me at the time that bigger models can work with extra information, like emotional cues, such as when a speaker’s words signal hostility: “All of these subtle patterns that come through a human conversation—those are the bits that these larger and larger models will pick up on.”

The upshot is that all the words inside an LLM get encoded into a high-dimensional space. Picture thousands of words floating in the air around you. Words that are closer together have similar meanings. For example, “table” and “chair” will be closer to each other than they are to “astronaut,” which is close to “moon” and “Musk.” Way off in the distance you can see “prestidigitation.” It’s a little like that, but instead of being related to each other across three dimensions, the words inside an LLM are related across 4,096 dimensions.

Yikes.

It’s dizzying stuff. In effect, an LLM compresses the entire internet into a single monumental mathematical structure that encodes an unfathomable amount of interconnected information. It’s both why LLMs can do astonishing things and why they’re impossible to fully understand.    



Source link

  • Related Posts

    6,000 Meters Under the Pacific, Japan Seeks Independence From China on Rare Earths

    It’s called Minamitorishima, and it’s a small atoll in the Pacific Ocean. It is one of the most remote islands in Japan’s vast archipelago, so much so that it lies…

    Physical Intelligence, a hot robotics startup, says its new robot brain can figure out tasks it was never taught

    Physical Intelligence, the two-year-old, San Francisco-based robotics startup that has quietly become one of the most closely watched AI companies in the Bay Area, published new research Thursday showing that…

    Leave a Reply

    Your email address will not be published. Required fields are marked *

    You Missed

    He was detained during Ramadan. Eight days later, his family collected his body.

    He was detained during Ramadan. Eight days later, his family collected his body.

    The 27-Year-Old Diplomat Waging Trump’s Cultural War With Europe

    Playoffs give Ingram, Barnes chance to shine

    Playoffs give Ingram, Barnes chance to shine

    A Family Feud at an Oregon Winery Turns to Vinegar Over A.I. Slop

    6,000 Meters Under the Pacific, Japan Seeks Independence From China on Rare Earths

    6,000 Meters Under the Pacific, Japan Seeks Independence From China on Rare Earths

    Fortnite Showdown: Where To Find Every Chaos Cube Available So Far

    Fortnite Showdown: Where To Find Every Chaos Cube Available So Far