Every AI vendor offers a free trial. Most of them look great. Dashboards are polished, demo scripts are tight, and resolution numbers are high. But trials are curated experiences — and the questions that matter most are rarely on the agenda.
Before you sign anything, here are six areas where the gap between trial performance and production reality tends to be widest.
1. How they define "resolution"
Resolution rate is the headline metric in almost every AI pitch. It's also one of the most inconsistently defined numbers in the industry.
Ask for the contractual definition in writing. "Resolved" should mean the customer's issue was actually solved — not that they stopped replying, timed out, or clicked away. Some vendors measure resolution only on a curated subset of AI conversations, not your full traffic. That's not your resolution rate. That's their best-case sample.
Request sample invoices from the trial period. How metrics translate to line items is rarely shown upfront. And ask specifically about partial resolutions — if the AI handles step one and a team member closes the loop, who gets credit?
The vendor that welcomes this question has thought it through. The one that deflects probably hasn't.
2. Whether the AI will go off-script
Hallucination is the risk most vendors hope you won't probe. If a vendor can't clearly explain how they measure and share hallucination data, that's your answer.
During the trial, test with out-of-scope questions. Does the AI make something up, or does it redirect cleanly? More importantly: what guardrails do you control? Can you constrain topics, set prohibited responses, and enforce brand tone — or is the model a black box you're just trusting?
The most important test here isn't one the vendor will run for you. Bring your actual edge cases — the conversations your team dreads, the angry customers, the unusual requests. That's where the cracks show.
3. What "setup time" actually means
A go-live estimate without a detailed plan behind it is just a sales line. Get week-by-week milestones in writing. Ask who does the work: is it self-service, vendor-led, or does it require a third-party SI? Hidden implementation costs add up fast and rarely appear in the initial proposal.
Clarify what "live" actually means. Is it a limited beta on one channel, or full production on your real traffic? Those are very different things, and vendors don't always volunteer the distinction.
Ask for peer references from companies your size. Implementation complexity scales with volume and organizational complexity — the logo on a vendor's website doesn't tell you what the rollout actually looked like.
4. How they handle your edge cases
Standard demos are built to succeed. Multi-intent conversations — where a customer asks three things at once, changes their mind midway, or escalates emotionally — are where AI systems fail silently.
Bring your top 10 escalation scenarios to the trial. If the vendor won't let you test them, that's the answer. Pay attention to how the AI handles distressed or angry customers — empathy and tone matter as much as accuracy. A technically correct response delivered poorly can make a bad situation worse.
Insist on metrics from your data, not their benchmark averages. Your volume, complexity, and customer base are what matter.
What customers actually expect from AI interactions
Our 2026 Customer Expectations Report reveals what's driving customer loyalty — and what breaks it. Essential context for any AI evaluation.
5. Where your data goes
Data practices are the part of an AI contract most teams review too late. Ask directly: is your conversation data being used to train their models today? What happens if that policy changes?
Review retention and deletion policies — how long are conversations stored, and who controls the purge? Verify compliance certifications (SOC 2, GDPR, CCPA) rather than assuming. And ask about model update disclosure: will you be notified before a change that could alter behavior in production?
This isn't just a legal question. It's an operational one. A model update that shifts tone or changes how certain intents are handled can affect your customer experience overnight.
6. What happens when it fails
No AI system operates at 100%. The question isn't whether it will fail — it's what happens when it does.
Map the failure path before you go live. When confidence is low, does the AI auto-escalate, loop, or dead-end the customer? Test handoff quality, not just handoff speed — does the team member receive full conversation context, or does the customer have to start over?
Ask about failure reporting: can you see where and why the AI handed off, and can you use that data to improve? And ask what happens if the AI layer goes down entirely. Does the platform stay up and route to your team, or does your support operation go dark?
Free trials are designed to impress. These six questions are designed to find out what they're not showing you. The right vendor will welcome every one of them — and put their resolution definition in writing before the trial begins.
See how Gladly answers every one of these questions
Get a personalized demo and bring your edge cases. We'll show you exactly how Gladly performs on your traffic, not ours.
Frequently asked questions
Recommended reading

The deflection trap: why most conversational AI fails to deliver
Most conversational AI fails because companies optimize for deflection, not outcomes. Forrester research reveals why — and what successful adoption looks like.
By
Angie Tran

Before you get AI-powered support, get executive buy-in
Get executives on board for your support team's AI rollout. Gladly walks you through how to state a compelling case for implementing effective AI.
By
Angie Tran

The hidden costs of voice AI for CX leaders
Voice AI pricing goes beyond the monthly fee. Calculate the true total cost of ownership for voice AI solutions.
By
Angie Tran
