Coding agents reviewed by Hlido

Engineering leads adopting a Coding agent inherit a production-critical dependency. The agent edits real files, lands real commits, and can shape code that ships to customers. A polished demo proves little about behavior under load, retry paths, dependency changes, or what happens when the model is wrong. Hlido tests Coding agents in sandboxed git repos against fixed prompt batteries, captures terminal sessions and diffs as signed evidence, and scores Strategic Alpha, Execution Grit, Craft & Soul, and Value Signal.

Browse all 30 →

How we evaluate Coding agents

We run install, repo-edit, refactor, test, and failure-recovery prompts in disposable git repos when the product allows it. The evidence package preserves terminal output, diffs, commits, screenshots, and any blocked steps so teams can inspect exactly what happened.

Top Coding picks right now

Current top 3 by Laddoo Score across the Coding corpus.

Browse the full Coding corpus

Sortable, filterable list with tier and last-tested date.

Agent Score Tier Finding Last tested
Aider 90/100 VITAL Aider 0.86.2 — open-source AI pair-programming CLI. Live tested in a sandboxed git repo: it edits files when given a natural-language --message, 2026-04-26
GitHub Copilot 90/100 VITAL Public-surface review of GitHub Copilot 2026-05-01
Replit Agent 90/100 VITAL Public-surface review of Replit Agent 2026-05-01
Sourcegraph Cody 90/100 VITAL Public-surface review of Sourcegraph Cody 2026-05-01
Tabnine 90/100 VITAL Public-surface review of Tabnine 2026-05-01
OpenHands 90/100 VITAL Public-surface review of OpenHands 2026-05-01
Sweep 90/100 VITAL Public-surface review of Sweep 2026-05-01
Zed AI 90/100 VITAL Public-surface review of Zed AI 2026-05-01
Augment Code (Intent) 78/100 STEADY Public-surface review of Augment Code (Intent) 2026-05-01
Cursor 78/100 STEADY Public-surface review of Cursor 2026-05-01
Cline 78/100 STEADY Public-surface review of Cline 2026-05-01
Continue 78/100 STEADY Public-surface review of Continue 2026-05-01
Lovable 78/100 STEADY Public-surface review of Lovable 2026-05-01
Open Interpreter 78/100 STEADY Public-surface review of Open Interpreter 2026-05-01
GPT Engineer 78/100 STEADY Public-surface review of GPT Engineer 2026-05-01
AutoGen Studio 78/100 STEADY Public-surface review of AutoGen Studio 2026-05-01
SuperAGI 78/100 STEADY Public-surface review of SuperAGI 2026-05-01
Warp AI 78/100 STEADY Public-surface review of Warp AI 2026-05-01
Windsurf 65/100 FADING Public-surface review of Windsurf 2026-05-01
ClaimCheck 65/100 FADING Public-surface review of ClaimCheck 2026-05-01
Bolt.new 65/100 FADING Public-surface review of Bolt.new 2026-05-01
Codeium 65/100 FADING Public-surface review of Codeium 2026-05-01
v0 65/100 FADING Public-surface review of v0 2026-05-01
Devin (Cognition) 65/100 FADING Public-surface review of Devin (Cognition) 2026-05-01
MetaGPT 65/100 FADING Public-surface review of MetaGPT 2026-05-01
Plandex 65/100 FADING Public-surface review of Plandex 2026-05-01
Poolside 65/100 FADING Public-surface review of Poolside 2026-05-01
AgentGPT 53/100 FADING Public-surface review of AgentGPT 2026-05-01
Magic 53/100 FADING Public-surface review of Magic 2026-05-01
Smol Developer 40/100 FADING Public-surface review of Smol Developer 2026-05-01