The Anatomy of a Prompt : A journey from Python to Silicon - Part 1 of 5
What happens when you type "Explain quantum physics like I am 5 years old" and press enter?
The Mystery
Chapter 0: The rabbit hole
In December 2025, I finally bit the bullet and built myself a computer. It had been many years since I embarked on that journey. I guess I was waiting for steam to drop their linux box, and that video pushed me into the world of Bazzite OS. I have not used windows in more than a decade and that was the biggest reason I had not built a desktop for myself, as I did not want to run windows and linux had its limitations. However, with Bazzite, there is a new beggining. The best of all worlds, I can play steam games, I can edit photos, I can run my ML workloads, without being tied to windows.
I went with a full AMD build, AMD processor, AMD GPU 20 GB VRAM, 32 GB memory, the works (maybe worth writing about it as an entire other post), it took me weeks to decide on the spec and build, and it took me atleast 20 hours of putting everything together.
Then I tried to run Lllama 70B. It didnt fit, Okay, fair enough, 70 billion parameters is a lot. I tried some smaller models, some of them worked, some of them crashed, some of them ran slowly and some refused to start.
Meanwhile, I booted up Frostpunk 2, with full high resolution, everything dialled up and the game ran beautifully .
This made no sense to me. I am not a game developer, however, I know that games use matrices. Every 3D object we see is a being transformed by matrix multiplications to power rotation, scaling, projection. Every lighting calculation is linear algebra. When it rains or snows, my GPU is doing billions of calculations per frame, sixty times per second.
So if my GPU could render a cold, rainy city with industrial pipelines and robots, but choke on a language model, something just did not sit with me well.
This question sent me down a rabbit hole. What I found and learnt, was not just a technical explanation, it was an entire hidden economy. A story about Physics, economics, and the invisible human mastery that makes AI possible.
This is what I learned.
Chapter 1: From Text to Electricity
You type a Prompt.
I sit at your computer and type into a chat box: “Explain quantum physics like I’m five.” I hit enter.
The experience is seamless. It feels like magic, a split-second pause, then the words start streaming.
The question I started asking was, what happens to this prompt, if we were to shrink ourselves to the size of an electron ? What is actually happening at a hardware level ? How does this all work ? At a software level, its easier, you understand the building blocks of transformer network, you write it up in pytorch, you do model.train, and voila the model starts training on the GPU.
Let us trace what actually happens to our prompt.
The Disguise, Tokenization
Our journey begins in Python. We run model.generate(), however a computer does not actually understand the “text” we just sent its way. It only understands numbers.
First, a Tokenizer chops our sentences into integers.
"Explain quantum physics like I'm five"
↓
[1042, 9921, 7823, 1426, 564, 2901]Each word or word-piece gets assigned an ID from a vocabulary of tokens.
The Meaning, Embeddings
The token aka nothing but an ID card. It does not actually convey the meaning of the word.
In order to understand the meaning of the word, the model looks up the Embedding - a list of floating-point number (vectors as they are famously called), that represent the meaning of the word in a multi-dimensional space.
1042 ("Explain") → [0.12, -0.98, 0.05, 0.77, -0.33, ...]The vector can have thousands of dimensions. Words with similar meanings end up near each other in this space. As we all famously know, King and Queen are close, whereas King and Banana are far apart.
The words we entered as prompt have now become mathematical objects in a multi-dimensional space.
Sounds almost like a science fiction episode. It only gets better from here.
The Digestion, Bits and Bytes
The interesting thing is that, GPUs cannot read decimal or floating point numbers like 0.12. It only speaks binary.
This is where the IEEE 754 Floating point standards come in. The number 0.12 becomes a string of 16 zeros and ones (in FP16 format).
0.12 → 0 01111 0111101011
↑ ↑ ↑
sign exponent mantissaBit 1 (Sign): Is it positive or negative?
Bits 2-6 (Exponent): How big is the number?
Bits 7-16 (Mantissa): What’s the precise value?
Its almost nostalgic to me looking at this conversion, because I remember in our undergraduate class as we were programming 8085 and 8086 micro-processors, our professor was talking to us about exponent, mantissa and sign. It makes so much sense, why certain standards have been established. Okay, lets continue this journey.
We have successfully gone from a simple sentence of words to a stream of millions of bits. As Jared from Silicon valley would say, its magical.
The Spark, Electricity Becomes Math.
This is our deepest layer. We now are working with bits and bytes, but how does silicon do arithmetic?
Inside the GPU, a binary 1 is high voltage pulse and a 0 is low voltage.
When the GPU multiplies matrices, its routing voltage through a maze of transistors and logic gates. The electricity that comes out the other side represents the answer.
I mean, how insane is that? Who would think a human mind could counjour up something like this.
Our words have now become voltage.
Next Week: The Attention Tax
We’ve turned words into voltage. Millions of bits now sit in GPU memory, ready to be processed.
But the GPU can’t just “think” about them. It has to do something very specific — multiply every token against every other token. For a 100K context, that’s 10 billion operations. Per layer. Times 80 layers.
This is called the Attention Mechanism. It’s both the genius and the curse of modern AI.
Part 2 drops next week. Subscribe so you don’t miss it.













Incredibly insightful! Moving to part 2