What does LLM stand for?

LLM stands for large language model. It is a type of artificial intelligence trained on huge amounts of text to read, write, summarize, translate, and answer questions in natural language.

What is an example of an LLM?

The most widely used LLMs include GPT (from OpenAI, powering ChatGPT), Claude (from Anthropic), Gemini (from Google), Llama (from Meta), Mistral, Command (from Cohere), Grok (from xAI), and DeepSeek. ChatGPT itself is not an LLM — it's an application built on a GPT-family LLM.

ChatGPT is an application that uses an LLM, specifically a model in OpenAI's GPT family. The terms get used interchangeably in casual conversation, but technically the LLM is the underlying model and ChatGPT is the product built on top of it.

What is the difference between an LLM and AI?

Artificial intelligence is the broad field of building software that performs tasks usually associated with human intelligence. A large language model is one specific kind of AI, focused on language. Other AI systems classify images, predict outcomes, recommend products, or control robotics — they aren't LLMs.

What is the difference between an LLM and an AI agent?

An LLM is a language model that generates text. An AI agent is an application that uses one or more LLMs along with tools, memory, and grounding to complete tasks for a person — like resolving a customer issue end-to-end. The LLM is the engine; the AI agent is the system built around it.

What is the difference between an LLM and a small language model (SLM)?

A large language model has billions to hundreds of billions of parameters and is trained for general-purpose use. A small language model has fewer parameters (often under 10 billion) and is usually purpose-built or distilled for a specific task. SLMs run faster and cheaper than LLMs, and for narrow tasks they can match or exceed a general-purpose model.

How do LLMs work in plain English?

An LLM is trained by reading enormous amounts of text and learning which words tend to follow which other words in which contexts. When you give it a prompt, it predicts the next likely word, then the next, and so on — building a response one piece at a time. It is not retrieving an answer from a database; it is generating one based on patterns it learned during training.

Do LLMs understand language?

Not in the way a person does. An LLM does not have intent, beliefs, or comprehension. It has a statistical model of how language works, learned from training data. The output can look like understanding because the patterns are deep enough to produce coherent reasoning, but the underlying mechanism is prediction, not comprehension.

Why do LLMs hallucinate?

An LLM generates text by predicting the next token, not by checking facts against a source. When the model is asked something its training data didn't cover well, or asked to be more confident than the underlying patterns support, it can produce text that sounds correct and isn't. Grounding the model in a verified knowledge source — through retrieval-augmented generation — is the most reliable way to reduce hallucinations.

Are LLMs the same as generative AI?

LLMs are a subset of generative AI. Generative AI is the broader category — it includes models that generate text, images, audio, video, and code. LLMs are the text-and-language slice of that category. Most generative AI you encounter day-to-day is LLM-powered, but image and video models like Midjourney, Stable Diffusion, and Sora are generative AI without being LLMs.

What is an LLM definition how they work and examples

LLM stands for large language model. A large language model is a type of artificial intelligence trained on enormous amounts of text that learns the statistical patterns of language well enough to read, write, summarize, translate, and answer questions in natural language. The best-known LLMs power products like ChatGPT, Claude, Gemini, Llama, and Copilot — but the model itself is the engine, not the application.

The best-known LLM families include GPT, Claude, Gemini, Llama, Mistral, and DeepSeek, which power products such as ChatGPT, Claude, Gemini, and Copilot.

The word "large" refers to two things: the size of the model (often hundreds of billions of parameters) and the size of the training dataset (typically trillions of words pulled from books, articles, websites, code repositories, and other text). Stanford's Human-Centered AI Institute and Wikipedia both define LLMs in roughly these terms.

This page covers what an LLM is, how it works, what the major LLMs are today, what LLMs can and cannot do, how they differ from related terms like AI agents, agentic AI, and small language models, and how grounding turns a general-purpose LLM into a system that resolves real customer conversations.

LLM in one sentence

A large language model is software that learns the patterns of human language from huge volumes of text and uses those patterns to read, write, and respond in natural language.

What "LLM" stands for and where the name comes from

LLM is the acronym for large language model. The name is descriptive rather than technical:

Large — the model has many parameters (the numbers it tunes during training) and was trained on a very large dataset.
Language — the model works with human language, primarily text. Multimodal LLMs also handle images, audio, and video, but the language layer is the core.
Model — a model in machine learning is a mathematical function trained to map inputs to outputs.

A "small language model" (SLM) is the same kind of system at a smaller scale — fewer parameters, narrower training data, usually purpose-built for a single task. More on that comparison below.

How LLMs work

Every modern LLM, regardless of vendor, follows roughly the same five-step pattern.

1. The transformer architecture

Almost all current LLMs are built on the transformer architecture, introduced by Google researchers in 2017. The transformer processes text in chunks called tokens (roughly, pieces of words) and uses a mechanism called attention to weigh how each token relates to every other token in the input. Attention is what lets an LLM keep track of context across long passages.

2. Training on a huge corpus

The model is trained on trillions of tokens of text. During training, the model is repeatedly given a sequence of tokens with the next one hidden, and asked to predict the missing token. Each prediction it gets wrong updates the model's parameters. After billions of these updates, the model has learned an extraordinarily rich statistical map of how language works — grammar, facts, reasoning patterns, style, and the relationships between concepts.

3. Token prediction at inference

When a person enters a prompt, the LLM does not retrieve a stored answer. It generates the response one token at a time, each time predicting the most likely next token given everything in the prompt and everything it has generated so far. This is why two responses to the same prompt can differ, and why the model sometimes produces fluent text that is factually wrong.

4. Alignment and fine-tuning

A base LLM is broadly capable but not always safe, helpful, or on-brand. To make it usable, model builders apply techniques like reinforcement learning from human feedback (RLHF) and supervised fine-tuning on curated examples. Alignment shapes the model's behavior — when to refuse, how to stay polite, how to follow instructions. Fine-tuning specializes the model for a particular domain, voice, or task.

5. Grounding and retrieval

A model only knows what was in its training data, which is frozen at a point in time. To answer current or company-specific questions accurately, an LLM needs to be grounded — connected to a source of truth like a knowledge base, a customer record, or a product catalog. Retrieval-augmented generation (RAG) is the most common grounding pattern: retrieve relevant facts from a connected source, include them in the prompt, then generate the answer. Grounding is what separates a research toy from a production system.

Examples of LLMs

The LLMs most people encounter today belong to a handful of model families:

Model family	Maker	First public release
GPT (powers ChatGPT)	OpenAI	2018; ChatGPT launched November 2022
Claude	Anthropic	2023
Gemini (formerly Bard)	Google	2023
Llama	Meta	2023, open-weights
Mistral	Mistral AI	2023, open-weights
Command	Cohere	2022
Grok	xAI	2023
DeepSeek	DeepSeek	2024, open-weights

ChatGPT is the household name, but ChatGPT is an application built on a GPT-family LLM. The same distinction applies across the table: the LLM is the engine; the chatbot, assistant, or copilot is the product built on top of it.

What LLMs can do

The capabilities most often used in production:

Read and summarize. Compress a long document, transcript, or email thread into the key points.
Write. Produce drafts of emails, marketing copy, reports, code, and conversational replies in a specified voice.
Translate. Convert text between languages, often with quality close to dedicated translation models.
Answer questions. Respond to natural-language questions, with accuracy improving sharply when the model is grounded in a source.
Classify and tag. Identify the topic, sentiment, intent, or category of a piece of text.
Reason in steps. Work through multi-step problems, especially when prompted to show the steps.
Generate structured output. Produce JSON, tables, or other formatted data that downstream systems can read.

One model can do all of the above. That generality is the reason LLMs became the default substrate for new AI products in 2023 and after.

What LLMs cannot do (without help)

Equally important — and less often covered in vendor explainers — are the structural limits of an LLM on its own:

No real-time knowledge. A base LLM only knows what was in its training data. It doesn't know today's news, today's prices, or today's order status without a retrieval layer.
No persistent memory across sessions. A model doesn't remember a person between conversations unless a memory layer is added on top.
No native action-taking. An LLM generates text. Refunding an order, updating an account, or sending a confirmation email requires tool use and integrations layered on top — the move from LLM to AI agent.
No guaranteed accuracy. Token prediction can produce fluent, confident-sounding output that is factually wrong. This is the hallucination problem, and grounding is the most reliable mitigation.
No native math or logic engine. Modern models are better at arithmetic and logic than earlier generations, but reliable math still benefits from tool use (a calculator, a code interpreter) rather than pure generation.

The takeaway: LLMs are powerful but partial. A production system that resolves real customer issues is an LLM plus grounding plus tools plus guardrails — not the model alone.

LLM vs SLM vs foundation model

These three terms overlap in coverage. The clean distinction:

Concept	What it is	Typical scale	Where it's used
Foundation model	A model trained on broad data that can be adapted to many tasks. The umbrella category.	Varies	Any modality — text, image, audio, code, multimodal.
Large language model (LLM)	A foundation model specialized in text, with billions to hundreds of billions of parameters.	7B–1T+ parameters	General-purpose assistants, copilots, AI agents.
Small language model (SLM)	A purpose-built or distilled language model, usually under 10 billion parameters.	Often <10B parameters	On-device assistants, specific tasks, cost-sensitive deployments.

SLMs are getting more attention because they run faster and cheaper than LLMs, and for narrow tasks they can match or exceed a general-purpose LLM. Most production AI stacks now mix LLMs and SLMs — the LLM handles open-ended reasoning, an SLM handles classification or routing, and a router picks which to use per request.

LLM vs AI agent vs agentic AI

This is the comparison most often muddled in customer service marketing. The clean version:

Term	What it is	Example
LLM	The language engine. Predicts text.	The model that drafts a reply.
AI agent	An application that uses one or more LLMs plus tools, grounding, and memory to complete tasks.	A customer service AI that reads the inquiry, looks up the order, and drafts a response.
Agentic AI	A broader architecture in which AI systems pursue multi-step goals, take action across tools, and coordinate without step-by-step human direction.	An AI that not only drafts the reply but processes the refund, updates the order, and emails the confirmation.

The LLM is the engine. The AI agent is the car. Agentic AI is the road network the car can drive on. A glossary entry on AI agents and agentic AI covers each in depth.

Where LLMs are used in the real world

The applications most commonly built on LLMs today:

Customer service. Drafting agent replies, summarizing long conversations, translating in real time, generating help-center content, and powering self-service answers.
Marketing and content. First drafts of blog posts, ad copy, product descriptions, email campaigns, and social posts — often with brand-voice fine-tuning.
Software development. Code completion, code review suggestions, test generation, and documentation.
Sales. Personalized outbound, call summaries, proposal drafts, prospect research.
Knowledge management. Search across internal docs, contracts, and policies in natural language.
Operations. Querying data in plain English, generating executive summaries, translating analyst output for non-technical audiences.
Design and creative. Concept generation, copy variations, multimodal brainstorming with image and video models.

The common pattern: an LLM is the substrate. The product is the LLM plus the grounding, the tools, the guardrails, and the workflow.

How LLMs are evaluated

LLMs are not measured the way traditional software is. The common axes:

Capability benchmarks. Standardized tests across reasoning (MMLU), coding (HumanEval), math (GSM8K), and instruction-following. Used by model builders to publish leaderboards.
Cost per million tokens. The pricing unit. Models priced from cents to dollars per million tokens, with sharp variation by speed and capability tier.
Latency. How fast the first token returns and how fast subsequent tokens stream. Critical for live conversation.
Safety and alignment. Refusal behavior, bias measurements, hallucination rates on grounded and ungrounded tasks.
Production accuracy. The only metric that matters for a deployed system: did the model produce the right answer for the user's actual question. Benchmark performance and production accuracy are correlated but not the same.

The right model for a customer service deployment is rarely the highest-ranked model on a public leaderboard. It's the model whose price, speed, and grounded accuracy fit the use case.

LLMs in customer service: the grounded vs. ungrounded distinction

This is the section where Gladly's position is on the record.

A general-purpose LLM is impressive in a demo. In a production customer service deployment, an ungrounded LLM is a liability: it doesn't know the customer's order history, it doesn't know the brand's voice, and it doesn't know whether the policy quoted in its training data still applies today. The answer it gives may be fluent and wrong.

An LLM grounded in a customer's full conversation history, order data, brand voice, and current policies is a different system. It resolves the conversation. It speaks in the brand's voice. It refunds the right order, applies the right loyalty points, and escalates only when it should.

Across brands using Gladly AI, the difference between general-purpose LLM use and grounded LLM use is the difference between novelty and outcomes:

KÜHL runs a 59% AI resolution rate with a 120% lift in revenue per conversation.
Breeze Airways has AI enhancing 71% of conversations while maintaining high CSAT.
Smith Optics hits a 67% AI resolution rate on product-help conversations — the kind most generic LLM deployments struggle with because they require both knowledge and judgment.

The model is the engine. The grounding is why the engine takes the customer somewhere worth going.

What is a large language model (LLM)?

LLM in one sentence

What "LLM" stands for and where the name comes from

How LLMs work

1. The transformer architecture

2. Training on a huge corpus

3. Token prediction at inference

4. Alignment and fine-tuning

5. Grounding and retrieval

Examples of LLMs

What LLMs can do

What LLMs cannot do (without help)

LLM vs SLM vs foundation model

LLM vs AI agent vs agentic AI

Where LLMs are used in the real world

How LLMs are evaluated

LLMs in customer service: the grounded vs. ungrounded distinction

Frequently asked questions

Learn more

Going deeper?

What is a large language model (LLM)?

LLM in one sentence

What "LLM" stands for and where the name comes from

How LLMs work

1. The transformer architecture

2. Training on a huge corpus

3. Token prediction at inference

4. Alignment and fine-tuning

5. Grounding and retrieval

Examples of LLMs

What LLMs can do

What LLMs cannot do (without help)

LLM vs SLM vs foundation model

LLM vs AI agent vs agentic AI

Where LLMs are used in the real world

How LLMs are evaluated

LLMs in customer service: the grounded vs. ungrounded distinction

Frequently asked questions

What does LLM stand for?

What is an example of an LLM?

Is ChatGPT an LLM?

What is the difference between an LLM and AI?

What is the difference between an LLM and an AI agent?

What is the difference between an LLM and a small language model (SLM)?

How do LLMs work in plain English?

Do LLMs understand language?

Why do LLMs hallucinate?

Are LLMs the same as generative AI?

Learn more

Going deeper?