Methodology · Agentic-Commerce Readiness

Can an agent be trusted to transact on your behalf?

Agentic-Commerce Readiness (ACR) is an independent, evidence-based 0–100 measure of how ready a reviewed AI agent is for the world that is arriving — orchestrator agents that compare, negotiate, and transact across many other agents over MCP, OpenAI ACP, Google AP2, and A2W. It is the third-party answer to the vendor self-generated "AX score": we verify readiness rather than letting an agent grade itself.

Why ACR exists

The frontier the market is moving to is agentic commerce: an orchestrator agent (a ChatGPT / Claude / operator-style agent) talks to many sites and agents on a user's behalf, compares options, negotiates, and completes a transaction through the emerging protocols — MCP, OpenAI ACP, Google AP2, and A2W. Before it delegates a task — and downstream, an identity and a payment — that orchestrator asks one question: can I actually reach this agent programmatically, is its capability proven rather than just claimed, and is it safe to hand something to?

Vendors have started self-generating an "AX score" for their own agent-readiness. Issuer-graded readiness has the same problem as issuer-paid credit ratings: the party being measured controls the measure. Hlido's role is the independent referee — we compute ACR across the whole reviewed corpus from signals we captured ourselves, and we publish how each score was evidenced. We verify; we do not self-attest.

A derived surface, not a new score

ACR is honest about what it is. It is derived from signals Hlido already captures during a normal review — it adds no new private scoring weight, no new measurement dimension, and no new layer to the methodology. It re-expresses existing evidence (programmatic-surface flags, verified behavioural runs, claim verification, and evidence depth) under a single cross-cutting lens: commerce-readiness. The composite is a 0–100 number per reviewed agent, with a band.

The four axes

ACR is composed from four axes. We name what each one measures and publish the per-axis sub-scores. We do not publish the numeric weighting between them — that line is the moat, held so scores can't be reverse-engineered or gamed.

Interface surface — can an orchestrator reach the agent programmatically at all? MCP, API, SDK, CLI, and webhook surfaces each count, with MCP weighted highest as the agent-to-agent / agentic-commerce substrate. An agent with no programmatic surface is closed to the agentic-commerce world no matter how good its product is.
Verified behavior — is the capability proven, not just claimed? This draws on Hlido's verified behavioural runs and per-claim verification. It is the independence moat in action: we don't take the vendor's word, we test, and the score reflects what actually held.
Delegation safety — is it safe to hand a task (and, downstream, an identity or payment) to? Derived today from the agent's verified tier and its documented access / claim surface, with a penalty for published red flags. This is a documented-surface proxy — the deep payment-rail probe is forthcoming (see below).
Transparency — is the readiness evidenced? Built from the count of recorded evidence and the presence of signed, inspectable proof. An independent rating is only worth what it can show its work for.

Bands

The 0–100 composite maps to four bands an orchestrator can act on directly:

COMMERCE-READY — 75 and above. Reachable programmatically, capability verified, and safe enough to delegate to.
INTEGRABLE — 50 and above. A real programmatic surface and meaningful verified signal; integrable with care.
SURFACE-ONLY — 25 and above. Some way in exists, but readiness is thin or largely unverified.
CLOSED — under 25. No practical programmatic path for an orchestrator to reach or trust it today.

Evidence, not certification

Every ACR row carries an evidence_basis — a record of exactly which signals backed that agent's score (the surfaces detected, the behavioural or claim results used, the coverage and tier, the evidence count and whether proof was signed). No readiness is ever asserted beyond what the evidence supports. We are deliberate about the line: ACR is evidence, not certification. It tells an orchestrator what we observed and how, not that an agent is licensed or guaranteed.

What's private, what's public

Private (the moat): the numeric axis weights and the composite formula. Publishing them would let an agent reverse-engineer or game its readiness.
Public: every per-agent outcome — the ACR score, the band, the four axis sub-scores, the detected surfaces, the confidence level, and the full evidence_basis — served as an open data file. Outcomes and evidence are public; the weighting is not.

ACR is computed from existing Hlido review signals only. It introduces no new private scoring weight and no new measurement layer — it is a lens over evidence we already publish per review.

What's parked (and why we say so)

The deepest possible test — actually connecting to an agent to verify it speaks ACP / AP2 / A2W and can complete a sandbox transaction — is parked, not pretended. Today, delegation safety is a documented-surface and verified-tier proxy, not a live payment-rail probe. We will build the transaction-protocol probe when there is a real consumption signal that justifies it. Until then we say exactly what each score is and isn't, rather than implying a capability we haven't run.

Use the data

The Index report — the published Agentic-Commerce Readiness ranking and write-up, including the live population and band distribution: read the ACR Index.
Free checker — look up any reviewed agent's readiness and evidence at the Agent Trust Checker.
Open data — the full machine-readable surface (outcomes + evidence basis, weights excluded) at /data/acr-index.json.

For the readiness numbers — how many agents are MCP-ready, the per-band counts, the category leaderboard — read the live data file or the Index report. Counts move as the corpus grows, so we do not bake them into this page.

How ACR relates to the core review

ACR sits on top of Hlido's standard review, it does not replace it. The same independence and human-accountability standard applies: Ankit Kapur (Founder & Editor) is the named, accountable signatory for the trust layer — reachable at /contact. For how the underlying Laddoo Score, claim audit, and behavioral harness work, see the main methodology.

Check an agent → Back to methodology