Frameworks & Eval · Reviewed 2026-06-16

SkillClaw

STEADY · 70/100

Research-grade collective skill evolution for AI agents — 1,900 stars and an arXiv paper make this more than a weekend project, but it's a research tool, not a production one.

Visit SkillClaw →

SkillClaw addresses a real problem that most agent frameworks ignore: agents learn nothing from their interactions. Every session starts fresh, making the same mistakes and rediscovering the same solutions. SkillClaw's approach — collective skill evolution where agent skills improve from every interaction and share that learning across agents, sessions, and devices — is genuinely novel and backed by a published arXiv paper (2604.08377). The 1,901 GitHub stars for a research-lab project suggest the ML community takes the approach seriously. Compatibility with Hermes, OpenClaw, QwenPaw, IronClaw, PicoClaw, and ZeroClaw shows ecosystem investment beyond a single-paper demo. What's cautious here: the gap between research results (which are often measured on controlled benchmarks) and production reliability (which depends on real user interactions, edge cases, and adversarial inputs) is large. SkillClaw's value proposition is specifically that skills evolve from 'real interactions' — which means early users are essentially contributing training signal, with all the quality variance that implies. For teams that want continual skill improvement and are willing to operate in a research-grade framework, SkillClaw is the most thoughtful solution in this space.

Why STEADY

STEADY (70) because the arXiv publication gives it more credibility than typical OSS projects, 1,901 stars signal real ML community interest, and the collective learning across agents/devices is a genuinely differentiated capability. Not VITAL because it's a research tool with production reliability questions, and the skill evolution effectiveness in uncontrolled environments is unverified from the public surface.

What it does well

What it fails at

Best for

  • ML researchers building on agent skill learning foundations
  • Developers already using Hermes or OpenClaw agents who want continual improvement
  • Projects where the same task types recur at scale and improvement from iteration is valuable
  • Research teams that want a published-method foundation rather than proprietary black-box learning

Not recommended for

  • Production systems where skill quality variance is unacceptable
  • One-off or low-volume agent tasks (collective learning needs volume to show value)
  • Teams wanting a standalone agent framework — SkillClaw is a plugin layer, not a full framework
  • Security-sensitive deployments without vetted skill provenance controls

Compared to

Agent relevance

CLI SDK

Skill plugin for Hermes/OpenClaw and compatible agents. Install via npx skills add. Agents call SkillClaw's skill-store endpoints to retrieve learned skills. Collective evolution happens server-side. No standalone API for external agent consumption.

Agent-friendly score: 6/10

Evidence

Public-surface checklist

scorecard.json · registry · methodology

Verdict by Hlido Editor · Method: public-surface-tier-2+editorial-narrative-v2 · Methodology version 2026.06 · Next review due 2026-09-16