You've sat through the demo. The agent found the late load, drafted the perfect customer email, and the sales engineer smiled. Demos are easy — the vendor picked the load, staged the data, and rehearsed the recovery. What you're actually buying is the behavior on load four thousand, at 2 a.m., when the tracking feed is lying and the carrier's dispatcher isn't answering. These questions are how you evaluate that, from across the desk.
The nine questions
- 1Which actions can it take without a person, exactly?Demand the list, in writing. A real vendor hands you action classes with gates: reads free, internal drafts free, anything external — tenders, customer messages, rate changes — routed through approval. A vendor who answers 'it's configurable!' without a default safety posture is telling you they haven't thought about it.
- 2Show me the audit trail for that demo you just ran.Every action you just watched should have produced a record: trigger, evidence, proposal, approver, outcome. If they have to 'follow up' on this one, the trail doesn't exist. A freight audit trail is the difference between governed AI and a chatbot with API keys.
- 3What happens when it's not sure?The right answer is a specific escalation path: low confidence or missing context becomes a flagged exception for a human, with the reasoning attached. The wrong answer is any version of 'it figures it out.' Uncertainty handling is where freight AI either earns trust or torches a shipper relationship.
- 4What happened the last time it was wrong?Real systems have been wrong. A vendor with operational history tells you a specific story — what the agent missed, how it surfaced, what changed after. A vendor who claims it hasn't happened is either lying or hasn't run on real freight.
- 5How do we start small, and what does expansion look like?You want a scoped start — one exception type, one lane, read-and-draft — with a measurable gate for expanding: the share of drafts approved unchanged. 'Full rollout in week one' isn't confidence, it's a vendor who needs the logo more than the outcome.
- 6Who does the integration and tuning work — your team or mine?If the answer is your team, price in the months of mapping, threshold-tuning, and babysitting before value shows up. A managed motion — the vendor owns onboarding, sandbox replay, shadow mode, monitoring — moves that burden where it belongs.
- 7What does shadow mode look like before we go live?Serious vendors run their system against your live, read-only data and let you compare its recommendations to what your team actually did — before anything touches production. If there's no shadow-mode step, you are the shadow mode.
- 8What data leaves my building, and where does it go?Tenant isolation, credential handling, what's used for model training, and whether your shipper's data can leak into someone else's instance. Procurement will ask; better that you ask first.
- 9What happens when we want out?Data export, audit history portability, and what stops working on day one after cancellation. A vendor confident in the product makes leaving easy to describe.
Red flags that should end the meeting
- ROI numbers quoted before they've seen a single one of your loads.
- "Fully autonomous" as a selling point rather than a roadmap caveat — in freight, autonomy without a permission ladder is a liability with a UI.
- No straight answer on which actions are gated by approval, or gates presented as a feature you can simply switch off on day one.
- The demo can't deviate from the script — ask them to click a different load and watch what happens.
- Pricing pressure to commit before a pilot on your own freight.
We built the Haulbase Agent to survive this exact interrogation, because we'd ask the same things: it starts in read-and-draft mode, external commitments route through approval packets, onboarding runs through sandbox replay and shadow mode before production, and every recommendation writes an audit record. Bring this list to our demo too — that's what it's for.