When your AI system goes down at 2 AM and the client calls demanding to know when it will be fixed, the answer should not come from a panicked on-call engineer's best guess. It should come from a clearly defined service level agreement that specifies response times, resolution targets, and escalation procedures that both parties agreed to before the crisis.
Service level management for AI systems is more complex than traditional software SLAs. AI systems have unique failure modes—accuracy degradation, model drift, hallucinations, and performance changes that do not trigger traditional monitoring alerts. An AI system can be technically "up" while producing increasingly unreliable outputs.
What Makes AI SLAs Different
Availability Is Not Enough
Traditional SLAs focus on availability—is the system up or down? For AI systems, availability is necessary but insufficient. An AI chatbot that is online 99.9% of the time but gives wrong answers 30% of the time is failing its purpose despite meeting availability targets.
AI SLAs must address three dimensions:
Availability: Is the system operational and responsive?
Accuracy: Is the system producing correct outputs within defined parameters?
Performance: Is the system responding within acceptable time limits?
Accuracy Degrades Gradually
Traditional systems either work or they do not. AI systems degrade gradually—accuracy drops incrementally as data distributions shift, prompts become less effective, or underlying models change. This gradual degradation requires continuous monitoring, not just uptime checks.
External Dependencies
AI systems depend on external services (model providers, cloud infrastructure, data sources) that are outside your direct control. Your SLA must account for failures in these dependencies without exposing your agency to unlimited liability.
Defining SLA Metrics
Availability Metrics
System uptime: The percentage of time the AI system is operational and responsive to requests.
- Target: 99.5% for standard managed services, 99.9% for premium
- Measurement: Automated health checks every 60 seconds
- Exclusions: Scheduled maintenance windows (define these explicitly), force majeure events, client-caused outages
API response rate: The percentage of API calls that receive a response (regardless of content quality).
- Target: 99.5% of requests receive a response
- Measurement: Server-side response logging
Accuracy Metrics
Output accuracy: The percentage of AI outputs that meet defined quality criteria.
- Target: Varies by use case (85-95% typical for production AI systems)
- Measurement: Automated evaluation against test sets run daily or weekly
- Important: Define what "accurate" means precisely. For document extraction, accuracy might mean "correct value for the specified field." For chatbots, accuracy might mean "response addresses the user's question based on the knowledge base."
Hallucination rate: The percentage of outputs that contain fabricated or unsupported information.
- Target: Below 2-5% depending on the use case and risk tolerance
- Measurement: Regular sampling and human evaluation
Confidence threshold compliance: The percentage of outputs that meet minimum confidence thresholds before being delivered to end users.
- Target: 100% (the system should never deliver low-confidence outputs without flagging them)
- Measurement: Automated confidence score logging
Performance Metrics
Response time: The time from request submission to response delivery.
- Target: P95 under 5 seconds for interactive applications, P95 under 30 seconds for batch processing
- Measurement: Server-side timing logs
Throughput: The number of requests the system can process per unit time.
- Target: Defined based on the client's volume requirements (e.g., 100 documents per hour)
- Measurement: Throughput monitoring with alerting below threshold
SLA Structure
Service Tiers
Offer SLA tiers that correspond to your managed service pricing:
Standard SLA (included with standard managed services):
- 99.5% availability
- Response time: P95 under 10 seconds
- Accuracy monitoring: weekly evaluation
- Support hours: business hours (9 AM - 6 PM client timezone)
- Incident response: 4-hour response, next business day resolution target
- Monthly performance report
Premium SLA (included with premium managed services):
- 99.9% availability
- Response time: P95 under 5 seconds
- Accuracy monitoring: daily evaluation
- Support hours: extended (7 AM - 10 PM client timezone)
- Incident response: 1-hour response, 4-hour resolution target for critical issues
- Weekly performance report
Enterprise SLA (custom for large enterprise clients):
- 99.95% availability
- Response time: P95 under 3 seconds
- Accuracy monitoring: continuous
- Support hours: 24/7
- Incident response: 30-minute response, 2-hour resolution target for critical issues
- Real-time monitoring dashboard access
- Dedicated support contact
Incident Severity Levels
Define severity levels that determine response and resolution targets:
Critical (Severity 1): System is completely down or producing outputs that could cause harm. All users affected.
- Standard: 4-hour response, 24-hour resolution
- Premium: 1-hour response, 4-hour resolution
- Enterprise: 30-minute response, 2-hour resolution
Major (Severity 2): Significant degradation in accuracy or performance. Many users affected.
- Standard: 8-hour response, 48-hour resolution
- Premium: 2-hour response, 8-hour resolution
- Enterprise: 1-hour response, 4-hour resolution
Minor (Severity 3): Limited impact on accuracy or performance. Few users affected.
- Standard: Next business day response, 5-day resolution
- Premium: 4-hour response, 24-hour resolution
- Enterprise: 2-hour response, 8-hour resolution
Informational (Severity 4): No immediate impact. Cosmetic issues, feature requests, or minor optimization opportunities.
- Standard: 5-day response
- Premium: 2-day response
- Enterprise: 1-day response
Service Credits
When SLA commitments are not met, provide service credits as compensation:
Availability credits:
- 99.5% - 99.0%: 5% credit on monthly fee
- 99.0% - 98.0%: 10% credit
- Below 98.0%: 25% credit
Response time credits:
- P95 response time 1.5x target: 5% credit
- P95 response time 2x target: 10% credit
- P95 response time 3x target: 15% credit
Maximum monthly credit: Cap total credits at 25-30% of the monthly fee. This protects your revenue while providing meaningful accountability.
Credit process: Client must request credits within 30 days of the SLA breach. Credits are applied to the next invoice, not paid as refunds.
Monitoring and Reporting
What to Monitor
Real-time monitoring:
- System health (up/down, response time)
- Error rates (API errors, processing failures)
- Queue depth (for batch processing systems)
- AI provider health (upstream service status)
Daily monitoring:
- Accuracy metrics against evaluation sets
- Output quality sampling
- Cost per request (to detect usage anomalies)
- User feedback signals (if available)
Weekly monitoring:
- Accuracy trend analysis (is accuracy improving, stable, or declining?)
- Performance trend analysis
- Capacity utilization
- Incident count and resolution metrics
Reporting
Monthly performance report (for all SLA tiers):
- Availability achieved vs. target
- Accuracy metrics vs. target
- Response time performance vs. target
- Incident summary (count, severity, resolution time)
- Optimization actions taken
- Recommendations for improvement
Weekly performance report (for premium and enterprise tiers):
- Same content as monthly, at weekly granularity
- Trend analysis with early warning indicators
- Upcoming planned maintenance or changes
Alerting
Configure alerts for:
- System downtime or health check failures (immediate alert)
- Accuracy dropping below threshold (alert within 1 hour)
- Response time exceeding target (alert within 15 minutes)
- Error rate exceeding threshold (alert within 15 minutes)
- AI provider outage or degradation (immediate alert)
Define the alerting chain:
- On-call engineer receives first alert
- If not acknowledged within 15 minutes, escalate to engineering manager
- If not acknowledged within 30 minutes, escalate to agency leadership
- If severity is critical and client is on enterprise SLA, notify client simultaneously
Managing SLA Commitments
Setting Realistic Targets
Base SLA targets on demonstrated performance, not aspirational goals:
- Monitor system performance for 30-60 days before committing to SLA targets
- Set targets at 1-2% below demonstrated performance to allow for normal variation
- Never commit to targets your monitoring cannot measure
Handling SLA Breaches
When an SLA is breached:
- Acknowledge immediately: Notify the client within the committed response time. Do not wait until you have a solution—acknowledge the issue.
- Communicate regularly: Provide updates at defined intervals (every 30 minutes for critical, every 2 hours for major) until resolution.
- Resolve and document: Fix the issue. Document the root cause, the timeline, and the corrective actions.
- Post-incident review: Within 48 hours, provide the client with a post-incident report covering root cause, impact, resolution, and preventive measures.
- Process credits: If the breach triggers service credits, calculate and communicate them proactively rather than waiting for the client to request them.
External Dependency Management
Your SLA depends on services you do not control. Manage this risk:
- Monitor AI provider status pages and subscribe to incident notifications
- Define in the SLA that provider outages beyond your control are excluded from availability calculations (but still trigger your response procedures)
- Maintain contingency procedures for common provider failure scenarios
- Document the provider's SLA and share it with the client for transparency
Common SLA Mistakes
- Committing to targets you cannot measure: If you promise 95% accuracy but do not have an automated evaluation system, you cannot verify compliance. Only commit to what you can monitor.
- Ignoring AI-specific metrics: An SLA that only covers availability misses the accuracy degradation that is the most common AI system failure mode.
- Unlimited liability: SLA breaches should trigger service credits, not unlimited financial liability. Cap your exposure explicitly in the agreement.
- No exclusions: Scheduled maintenance, client-caused issues, and force majeure events should be excluded from SLA calculations. Without exclusions, routine maintenance becomes an SLA breach.
- Setting targets too aggressively: Committing to 99.99% availability when your infrastructure supports 99.9% sets you up for frequent breaches and credit payouts.
- Not reviewing SLAs regularly: Review SLA performance quarterly with the client. Adjust targets if system capabilities change significantly.
Service level management is the operational foundation of trust in managed AI services. Clear, measurable SLAs give clients confidence that their AI systems will perform reliably, and give your agency a framework for delivering that reliability consistently.