LLM Basics: Session → Prompt → Tokens → Parameters

A conversation-style explanation of how large language models work.

Purpose of this document

This page intentionally preserves the prompt and response format of the interaction. It is not meant to read like a formal white paper. Instead it demonstrates how a question posed to an AI assistant produces a structured explanation.

Understanding the interaction itself is part of understanding the technology.

Conversation Transcript

User Prompt

Hope all is well with you today.

Today I want to dig a little deeper into AI in very general terms.

First I want to get my terms, my vocabulary for discussing how AI works in the LLM context. My understanding is limited but I have impressions of things happening at different levels.

A session is opened and there might or might not be persistence of past interactions. Maybe a tokenization of the user?

The user fills in a prompt to get a response. This seems like the place where things can go off track if the prompt lacks detail.

That prompt is converted into tokens. I wondered if this is like a lossy compression system.

Then the tokens are used to select patterns from training data (parameters?). Are there statistics attached to parameters that guide selection? Are there pointers between parameters?

So check my use of terms and expand on my thoughts.

Assistant Response

1. Session / Context

In most LLM systems there are several levels of memory:

Conversation context — the messages in the current session.
Stored user information — preferences or notes that may be inserted into the context.
Model parameters — statistical knowledge learned during training.

The model itself does not remember past conversations. Each reply is generated by reading the current context provided to it.

2. Prompt

Your observation is correct: prompts strongly influence results. If the prompt is vague, the model must guess between many plausible responses. Clear prompts reduce ambiguity.

3. Tokenization

Tokenization is not compression. It is more like encoding text into dictionary entries.

Prompt text
   ↓
Tokens
   ↓
Token IDs

Tokens may represent words or pieces of words depending on the tokenizer.

4. Parameters

Parameters are the learned weights of the neural network. They store statistical relationships between words and concepts, not literal stored sentences.

5. Generation Process

The model generates output one token at a time.

context
  → probability distribution for next token
  → sample token
  → append token
  → repeat

Controls like temperature or top-p determine how deterministic the selection is.

6. Attention

Transformers use an attention mechanism allowing each token to reference other tokens in the context. This is how the model maintains coherence and resolves references.

7. Why It Appears to Understand

Because the training data contains reasoning patterns, the model can reproduce reasoning-like language structures. This gives the appearance of understanding even though the mechanism is statistical prediction.

Summary Pipeline

Session begins
→ context assembled
→ prompt added
→ tokenization
→ transformer processing
→ next-token prediction
→ response built token by token

Key takeaway

Human communication relies heavily on shared assumptions about culture, context, and experience.

A language model does not share that background unless it is explicitly included in the context. When context is incomplete, the model fills the gaps using general statistical patterns learned during training.