Hlido Reliability Report — 2026-06-12

Name: Hlido Weekly Reliability Report
Creator: Hlido

Independent reliability signals across 642 reviewed AI agents. Published incidents: 19 (6 critical · 4 high · 7 low · 2 medium).

Dead agents (6)

Agents whose primary product surface no longer exists — domain dead or unreachable, verified on two independent networks:

stanley-for-x — Primary site unreachable: domain no longer resolves (first observed 2026-05-08)
playht — Primary site unreachable: domain no longer resolves (first observed 2026-05-08)
oraza — Primary site unreachable: domain no longer resolves (first observed 2026-05-08)
hapax — Primary site unreachable: domain no longer resolves (first observed 2026-05-08)
fleece-ai — Primary site unreachable: domain no longer resolves (first observed 2026-05-08)
adaptive — Primary site unreachable: domain no longer resolves (first observed 2026-05-08)

Findings from the review corpus

AI Agent is the largest category (230 agents) but scores avg 63.9 — 5.5pts below corpus
Of all reviewed categories with >10 agents, 'AI Agent' (230 agents, 36% of corpus) has the worst average score at 63.9 vs corpus avg 69.4. 83 agents (36%) score below 60. Red flags in this category dominate the whole corpus: 83 unverified-claim flags, 32 auth-opacity flags. In contrast, top categories (Voice 79.3, Eval 79.1, Frameworks & Eval 79.0) score 15+ points higher. The most-reviewed category is also the worst quality signal.
Chat & Companion (28 agents, avg 54.8) and Companion (3 agents, avg 53) are the bottom performers
Chat & Companion agents average 54.8 — the lowest of any category with 5+ agents. Companion agents avg 53. Combined 31 agents in the consumer chat space average below 55. Evidence coverage is also lowest here (11% for Chat & Companion). These products frequently fail on claim verification and auth transparency. This is a credibility risk for Hlido if these categories are over-visible in the discovery surface.
Unverified claims is the #1 red flag across 161 scorecards — concentrated in AI Agent category
Across 635 reviewed agents, 'absence/lack of verified/verifiable claims' appears as a red flag in 161 scorecards (25% of all reviews). AI Agent category alone accounts for 83 of these (52%). The three-way cluster of unverified claims (161) + auth opacity (104) + sparse docs (97) = 362 flags affecting an estimated 35-40% of the corpus. Together these represent the single biggest trust gap in the AI agent ecosystem.
Only 21% of reviews achieve high confidence; 28% are low confidence
Confidence breakdown: high=136 (21%), medium-high=94 (15%), medium=221 (35%), medium-low=5 (1%), low=178 (28%). Over half the corpus (63%) is medium or lower confidence, driven primarily by login walls, sparse public surfaces, and limited testability of enterprise/API products. Low-confidence reviews disproportionately affect AI Agent category. High-confidence reviews correlate with open-source tools, CLI agents, and API-first products with public docs.

Registry state

Tier distribution across 642 scored agents: FADING 297 · STEADY 193 · VITAL 152.

Incident registry · All reviews · Past reports

Independent, evidence-backed. Machine-readable edition: report.json · incidents API · MCP. Hlido never exposes scoring weights.