What is a service level agreement (SLA)?
A service level agreement (SLA) is a contract between a service provider and a customer that defines the service to be delivered, how performance against that service will be measured, and what happens if the provider does not meet the committed levels. It is the document that turns a vague promise — "we'll keep your systems running" or "we'll answer your support requests quickly" — into a measurable, enforceable obligation.
SLAs are most commonly associated with technology vendors — cloud providers, telecoms, hosting companies, SaaS platforms — but the same instrument is used between internal teams inside a company and between brands and the customers they serve. The acronym hides three different uses, and a lot of confusion about SLAs comes from people meaning different things when they say "our SLA."
This page covers what an SLA is, where the instrument came from, the three flavors the term hides, the standard structure of an SLA document, the metrics that typically live inside one, how SLAs relate to SLOs, OLAs, and KPIs, what penalties look like when an SLA is missed, and what an SLA looks like specifically in a customer service context.
SLA in one sentence
An SLA is the contract that says what good service looks like, how it will be measured, and what happens if it falls short.
What an SLA actually defines
An SLA does three things at once.
It scopes the service. It names what is being delivered — uptime for a hosted application, response time for a support team, throughput for a data pipeline — and it draws the boundary around what is not covered. The exclusions matter. Scheduled maintenance, force majeure, customer-side equipment, and out-of-scope work are typically called out so both sides agree on what counts.
It sets the performance level. It states the numerical commitment. 99.9% uptime per month. First response within 4 business hours. Resolution within 24 hours for a Priority 1 issue. These numbers are the heart of the agreement — every other section of the SLA exists to define, measure, and enforce them.
It sets the consequences. It describes what happens if the provider misses the commitment — service credits, refunds, the right to escalate, the right to terminate. An SLA without consequences is a statement of intent, not a contract.
The word "SLA" gets used loosely for any one of these three pieces in isolation. A team can say "our SLA is 4 hours" when they mean the performance target; another team can say "we breached the SLA" when they mean the consequences clause. Both are slang. A full SLA is all three.
The three flavors of SLA
The same acronym covers three different instruments. Most disagreements about SLAs trace back to people meaning different things by the word.
Vendor SLAs. The classic case. A vendor — cloud provider, hosting company, telecom, SaaS platform — promises a customer a level of service, usually expressed as availability or uptime, and offers service credits or refunds if the commitment is missed. Amazon Web Services publishes more than 300 separate vendor SLAs, one per service. The Amazon S3 SLA is a representative example: monthly uptime commitments, a service-credit schedule, and clear exclusion language.
Internal SLAs. An agreement between two teams inside the same company. The development team commits to a deployment cadence for the platform team. The IT help desk commits to a resolution time for the rest of the business. No money changes hands, but the agreement creates the same kind of measurable, enforceable expectation — and the same kind of friction when it's missed.
Customer-facing SLAs. A brand commits to its customers — often without a literal contract — that support requests will be answered within a certain window. A 24-hour email response. A 5-minute chat response. These commitments may live on a help-center page, inside terms of service, or simply inside a customer support team's operating manual. They are SLAs in the operational sense even when no contract has been signed, and they are the kind of SLA most customer support leaders are talking about when they use the term.
A single company can have all three types in play simultaneously. The platform team has an internal SLA with engineering. Engineering has a vendor SLA with their cloud provider. The customer support team has a customer-facing SLA on chat response time. All three live under the same acronym.
Where SLAs came from
The instrument is older than the internet. As Wikipedia notes in its SLA entry, SLAs have been used since the late 1980s by fixed-line telecom operators to formalize service guarantees to enterprise customers — uptime on a leased line, mean time to repair on a network failure, response time on a fault report. The format the telecom industry developed in that decade is recognizable in any modern SLA: a performance commitment, a measurement methodology, and a remedy.
The framework moved onto the internet in the late 1990s as ISPs, hosting companies, and managed-service providers adapted the telecom template to network services. The U.S. Telecommunications Act of 1996 provided some of the regulatory scaffolding for those agreements, though it did not require SLAs by name. By 2001, IBM had published the first formal specification for monitoring SLA compliance on web services, the Web Service Level Agreement (WSLA) language — an attempt to make SLAs machine-readable for what was then a brand new world of programmatic service consumption.
Cloud computing made the SLA ubiquitous. When a company's most critical infrastructure runs on a provider it does not own, the SLA becomes the only enforceable description of what the customer is actually paying for. Every major cloud platform — AWS, Azure, Google Cloud, IBM Cloud — now publishes per-service SLAs at the level of individual products. The same logic extended into SaaS: every enterprise B2B product of any size publishes some form of uptime and support SLA.
The customer service version of the SLA arrived later but has the same DNA. Once contact centers and helpdesks started using ticketing systems with measurable response-time data, the same "commitment + measurement + consequence" structure mapped onto support work — and "the SLA" became a unit of internal performance management for customer experience teams.
Anatomy of an SLA
Every SLA varies, but most contracts contain a recognizable set of sections. The list below combines the components specified by Verma's foundational 2004 IEEE paper with the standard structure used by modern vendors, as documented on IBM Think and AWS.
Overview. The opening section. The parties involved, the effective date, the duration of the agreement, and a one-paragraph summary of the services being committed to.
Description of services. The detailed scope. What is being provided, with enough specificity that both sides can recognize a deliverable when they see one — operating hours, deliverable types, included channels, supported regions, supported integrations.
Stakeholders and contacts. Who is on the hook for what. Named contacts on each side, escalation paths, after-hours coverage, and the chain of responsibility for reporting and resolving issues.
Performance tracking and reporting. The measurement methodology. What metrics will be tracked, how they will be measured, who collects the data, how often the data is shared, and what reports each side gets. This is the section that most often becomes the focus of disputes — if the methodology is sloppy, the rest of the contract is unenforceable.
Service-level objectives. The numerical targets. Each commitment is usually written as an SLO inside the SLA — "99.9% availability per calendar month," "first response within four business hours," "P1 resolution within 24 hours." The SLO is the specific, measurable threshold that the SLA is making a promise about.
Exclusions. What does not count. Force majeure, scheduled maintenance windows, customer-side equipment failures, third-party outages, beta features. The exclusions section is where most provider risk is parked.
Security. The security posture the provider commits to maintaining — controls, certifications, data-handling, NDAs.
Penalties and remedies. The consequences clause. What the customer gets if the provider misses the commitment — typically service credits, sometimes refunds, sometimes the right to escalate or terminate. Earn-back provisions allowing the provider to win credits back through sustained performance are common in mature vendor SLAs.
Indemnification. A legal-risk-shifting clause. If a breach causes the customer a third-party loss, who pays. Often omitted from standardized SLA templates and added with legal counsel.
Review and adjustment. A scheduled cadence for revisiting the SLA — quarterly, semi-annually, annually — and a process for adjusting metrics or targets as the relationship evolves.
Termination. The conditions under which either party can end the agreement before its scheduled expiration, the notice period, and the obligations on the way out.
Signatures. Authorized signatories on each side, binding the parties for the duration of the agreement.
The longer and more critical the service, the more these sections matter. A consumer cloud-storage SLA might fit on one page. An enterprise data-center SLA can run to fifty pages of appendices.
Types of SLAs
IBM Think groups SLAs into three structural types that most modern contracts fall into. The taxonomy is useful when negotiating because it tells you what kind of contract you are actually being offered.
Customer-based SLA. A single agreement with a single customer that covers every service the customer uses. The format is common in large enterprise relationships where one master agreement contains every commitment the vendor is making across the customer's footprint.
Service-based SLA. A single agreement that applies identically to every customer who uses a given service. AWS S3, Microsoft 365, and Salesforce Sales Cloud all use this model. The vendor publishes a single SLA and every customer of that product is governed by the same commitments.
Multilevel SLA. A single agreement that nests different commitments at different levels — by customer segment, by service tier, by region. The most common pattern is by paid tier: free, professional, enterprise plans each get a different uptime commitment and different support response times.
Wikipedia's SLA article adds two more dimensions of SLA structure used in older service-level-management frameworks — corporate-level (one SLA covering generic issues across the whole organization) and the same customer/service split inside a larger SLM hierarchy — but the three-type taxonomy above is the version most working vendors and customers use today.
Common SLA metrics
The metrics inside an SLA depend on what is being delivered, but a small set of measurements show up across almost every contract.
Availability and uptime. The percentage of time the service is operational. Usually expressed monthly. The most-quoted metric in any vendor SLA.
Response time. The acceptable time between an issue being reported and the provider responding to it. Critical in support SLAs; less critical in infrastructure SLAs (where the system is monitored and failures are detected automatically).
Resolution time. The acceptable time between an issue being reported and the provider fully resolving it. Often broken out by priority — P1 issues might have a 4-hour resolution SLA, P4 issues a 5-day SLA.
Mean time to recovery (MTTR). The average time to restore a service after a failure. Useful when individual outage times vary widely.
Mean time between failures (MTBF). The average operational time between failures. Used in infrastructure contracts where reliability matters as much as recovery.
Error rates. The percentage of requests that fail. Common in API, processing, and data-pipeline SLAs.
First-call resolution rate. The percentage of customer support issues resolved in a single contact. A defining metric for customer-service SLAs.
Abandonment rate. The percentage of customers who give up before the provider responds. Critical in call-center and chat SLAs.
The uptime translation grid
Vendor SLAs are usually written in availability percentages, and the percentages map to very different real-world downtime budgets. The five most-cited tiers:
Uptime commitment | Common name | Maximum monthly downtime | Maximum annual downtime |
|---|---|---|---|
99% | "Two nines" | 7 hours 18 minutes | 3 days 15 hours |
99.5% | "Two-and-a-half nines" | 3 hours 39 minutes | 1 day 19 hours |
99.9% | "Three nines" | 43 minutes 49 seconds | 8 hours 45 minutes |
99.95% | 99.95% | 21 minutes 54 seconds | 4 hours 22 minutes |
99.99% | "Four nines" | 4 minutes 23 seconds | 52 minutes 35 seconds |
99.999% | "Five nines" | 26 seconds | 5 minutes 15 seconds |
Five nines is the gold-standard telecom commitment. Three nines is the typical SaaS commitment. Two nines is what consumer-grade services usually publish. The cost of moving from three nines to four nines is generally an order of magnitude more than the cost of moving from two to three — every additional nine carries a steeper engineering and operating bill.
SLA vs SLO vs OLA vs KPI
Four acronyms get used interchangeably and they should not be. Each describes a different layer in the same service-quality stack.
Term | What it is | Who agrees to it | Example |
|---|---|---|---|
SLA — Service Level Agreement | The contract between the provider and the customer. The promise. | Provider ↔ customer | "We commit to 99.9% uptime per calendar month." |
SLO — Service Level Objective | The internal target the provider sets to make sure it can hit the SLA. Usually tighter than the SLA. | Provider ↔ itself | "We engineer to 99.95% so we have headroom before we breach the 99.9% SLA." |
OLA — Operational Level Agreement | The internal agreement between teams inside the provider that makes the SLO achievable. | Provider's teams ↔ each other | "Networking commits to under 30 seconds of recovery time on any single-link failure, which is what gives platform engineering the headroom to hit 99.95%." |
KPI — Key Performance Indicator | The metric used to track performance against the SLA, SLO, and OLA. The number on the dashboard. | Anyone | "Monthly availability — currently 99.973% — measured by external synthetic monitoring." |
The shortest way to keep them straight: the SLA is the promise to the customer, the SLO is the internal target backing it, the OLA is the agreement between internal teams that makes the SLO possible, and the KPI is the metric used to track all three.
Most well-run service organizations operate all four. Skipping any one of them usually shows up as friction somewhere — missed SLAs because there was no SLO buffer, missed SLOs because no OLA existed between the teams that had to coordinate, or missed everything because the KPIs were not measuring what the contract actually committed to.
Penalties when an SLA is missed
The consequences side of an SLA is where the contract acquires teeth. The four most common remedies, drawn from the AWS penalties section and standard vendor practice:
Service credits. A percentage of the customer's bill is credited back when the provider misses a commitment. The most common remedy in cloud and SaaS SLAs. Often capped at a percentage of monthly fees so the provider cannot lose unlimited money on a single bad month.
Earn-backs. A provision that lets the provider recover service credits by sustained over-performance — for instance, four consecutive months of meeting the SLA might earn back a credit issued during a prior breach. Common in long-term enterprise contracts.
Financial penalties. Direct cash payments above and beyond service credits, used in larger enterprise deals where service-credit caps would not be meaningful. Typically negotiated, not standard.
License extensions or termination rights. In severe or repeated breach scenarios, the customer may have the right to extend the license at no additional cost (a softer remedy) or to terminate the contract early without penalty (a harder one). The termination right is what makes the SLA enforceable in commercial reality — without it, service credits often cap the provider's exposure at a level too low to drive behavior change.
Penalty design matters more than penalty size. A small service credit that triggers reliably tends to drive more provider behavior change than a large financial penalty that is hard to invoke.
SLAs in customer service
The customer-service version of an SLA is the same instrument with a different content set. The "service" being committed to is response time and resolution time on customer contacts. The "performance" is measured against ticket data. The "consequences" are usually internal — escalations, executive reviews, retention exposure — rather than financial.
Customer-service SLAs are usually set per channel because the customer expectation varies by channel. A customer who emails expects a slower response than a customer on chat. A customer on the phone expects an answer in under a minute. Common starting-point targets often look something like:
Channel | Common SLA target |
|---|---|
Voice | 1 minute to first response |
Chat | 5 minutes to first response |
SMS | 10 minutes to first response |
Voicemail | 10 minutes to first response |
24 hours to first response |
These are starting points, not rules. The right SLA target depends on customer expectation, brand promise, volume, staffing, and the cost of missing the target. A premium brand may aim for 30 seconds on voice. A high-volume consumer ecommerce brand may set a 1-hour SLA on email during business hours.
For the operational depth on setting, measuring, and tuning customer-service SLAs — how to benchmark them, what to track, how to use staffing to defend them — our comprehensive guide to customer service SLAs covers the implementation side. The page you are reading is the broader definition.
What are the strengths of SLAs?
SLAs have been the dominant service-commitment instrument for nearly four decades because the strengths are practical.
They make expectations concrete. "Good service" is a meaningless phrase in a contract. "99.9% uptime per calendar month, measured by external synthetic monitoring" is enforceable. SLAs are the instrument that turns one into the other
They give both sides a forcing function. The provider has to engineer to a number. The customer has to track to a number. The shared accountability is more valuable than the number itself.
They create a paper trail. When something goes wrong, the SLA provides a frame for the conversation that follows — what was committed, what happened, what is owed. Disputes that would otherwise be open-ended become bounded.
They are widely understood. Almost every B2B technology buyer knows what an SLA is. The format is portable across vendors, industries, and contracts.
They scale. A single service-based SLA can govern thousands of customer relationships. Once written, it does not need to be renegotiated per customer.
What are the limitations of SLAs?
SLAs are useful but they are not a full description of service quality.
They reward what is measured, not what matters. An SLA committed to "first response within 4 hours" can be hit by a one-line auto-acknowledgement that does nothing to solve the customer's problem. The metric is met. The customer is not served.
They can become a ceiling. When a team optimizes against an SLA, the SLA target tends to become the target — work that exceeds the SLA is treated as wasted effort. The risk is that the SLA defines the minimum acceptable level of service and then collapses into the only level of service.
They are usually backward-looking. An SLA breach is observable in retrospect, by which point the customer has already had the bad experience. Service credits help, but they do not undo the outage.
The exclusions section often eats the commitment. A 99.99% SLA with a generous force majeure clause and a wide-open scheduled-maintenance window can mean very little in practice. The headline number does not tell the full story until the exclusions are read.
They do not replace operational quality. A well-run service organization rarely thinks about its SLAs because the underlying engineering and operations keep it comfortably ahead of the commitments. A poorly-run organization thinks about its SLAs constantly. The SLA is a backstop, not a strategy.
The practical takeaway: SLAs are necessary but not sufficient. They define the floor of acceptable service. Great customer experiences require more than simply meeting an SLA.
Frequently asked questions
Learn more
Going deeper?
See how Gladly customers put this into practice in their day-to-day customer service work.