← Blog

What I stole from Hermes (and ported to OpenClaw in an afternoon).

May 19, 2026 · 10 min read

Contents

What Hermes is
1. Usage tracking from gateway.log
2. Trajectory compression
3. MCP server exposing OpenClaw conversations
4. Error classification in the claude-cli shim
What I deferred
What I rejected outright
What OpenClaw does better than Hermes
The meta-lesson
What changed

I keep doing this. Two days ago I ported five patterns from gbrain into OpenClaw. Today Nous Research's Hermes Agent was on my reading list, and the same exercise: spend an afternoon, find what they've solved that I haven't, port what maps to actual pain.

This is becoming a habit. I think it's the right one. Every other agent harness I've looked at is somewhere on the same convergent architecture (gateway, per-agent workspace, hooks, cron, memory), so the question stops being "should I switch frameworks" and becomes "what's one good idea I haven't had yet."

Hermes had several.

#What Hermes is

Built by Nous Research. Python. MIT-licensed. The pitch is the same as OpenClaw: an autonomous agent that lives on your server, remembers what it learns, runs on Telegram, Discord, Slack, WhatsApp, Signal, Email, CLI, all from a single process. Designed for solo operators who want one stack across many channels.

It's a real codebase. cli.py, run_agent.py, a gateway/platforms/ directory with 40-odd adapters, a providers/ directory abstracting 30-odd model backends, a tools/skills_hub.py with cosign verification and quarantine on skill installs. Not a weekend project. The convergence with OpenClaw is striking. Two solo-ish builders, different languages, same architecture:

Hermes (Python)                     OpenClaw (TypeScript)
─────────────────                   ─────────────────────
gateway/platforms/* (40+)           gateways: telegram + whatsapp
providers/* (30+)                   claude-cli + anthropic-api fallback
agent/conversation_loop.py          @openclaw/agents (dist bundle)
skills/ + skills_hub.py             skills.entries (43, 1 enabled)
plugins/ + acp_adapter/             plugins (4 enabled / 89 disabled)
cron/scheduler.py                   LaunchAgents + cron jobs.json
gateway/platform_registry.py        hooks (14, hand-fitted)
hermes_state.py + FTS5              memory/*.sqlite (FTS5 tables exist,
                                      not queried at inject time)
mcp_serve.py                        — (now: openclaw-mcp-server.py)
trajectory_compressor.py            — (now: session_compress.py)
batch_runner.py                     —
tools/tirith_security.py            hooks/prompt-injection-scan

The left column is where Hermes is ahead. The right column is what shipped today plus what was always going to need to ship.

I read about 80% of the source. Pulled 78 specific findings. Filtered those to 10 patterns worth porting. Shipped four. Deferred two. Rejected the rest.

#1. Usage tracking from gateway.log

Hermes tracks per-turn cost via agent/conversation_loop.py importing usage_pricing.estimate_usage_cost. Every turn gets normalised, costed, and exposed via a /usage slash command. The operator sees the bill before it lands.

I wrote about the $300 version of this a while back. The fix then was a daily sanity probe. The fix now is per-turn aggregation.

My version is constrained by what the claude-cli backend exposes. The log gives me durationMs and rawLines. No token counts (Claude Max is flat-rate, the CLI doesn't surface them). So the tracker is duration-and-density, not dollars:

{
  "date": "2026-05-18",
  "turns": 30,
  "by_model": {"haiku-4-5": 1, "opus-4-7": 29},
  "duration_p50_ms": 8441,
  "duration_p95_ms": 76279,
  "raw_lines_p95": 143
}

The number I actually want is "did anything spike today." Two-gate detection: today's turns versus the 7-day median, must clear both 2x ratio and a 50-turn absolute delta. Spikes trigger a DM via Happy. Otherwise it's silent.

LaunchAgent at 23:00 SGT. JSONL is idempotent on date, so running it twice the same day overwrites instead of duplicating.

#2. Trajectory compression

Hermes's trajectory_compressor.py is the move I'd been quietly avoiding. The Lawrence agent has a 3.97 MB trajectory file. That file gets read into context on every gateway boot. It's been growing since April. There's no mechanism to bound it.

The Hermes pattern: protect the first turn (system prompt, opening user message). Protect the last 4 turns (most recent context). LLM-summarise everything in between. Store the result as a new file with a parent_session_id pointer to the original. Never delete the original.

3.97 MB lawrence/47ac4df0.trajectory.jsonl
   │
   ├─ turn 1   (opening)            ─────→ kept
   ├─ turn 2  ┐
   ├─ turn 3  │
   ├─ ...     ├── 48 turns ──→ claude -p ──→ 1 summary row
   ├─ turn 49 ┘                                │
   │                                           ▼
   ├─ turn 50  ─────→ kept              {type: compressed_middle,
   ├─ turn 51  ─────→ kept                parent_session_id: 47ac4df0,
   ├─ turn 52  ─────→ kept                original_turn_count: 48,
   └─ turn 53  ─────→ kept                summary: "..."}

result: 53 → 6 turns, 3.97 MB → 467 KB (88% saved)
        original file preserved alongside .compressed.jsonl

That last rule matters to me. There's a feedback memory called feedback_no_delete_sessions saying never delete OpenClaw session transcripts because it resets agent memory. The Hermes pattern respects that. The compression sits alongside the original; the daemon can choose which to read at boot.

My version runs as a nightly LaunchAgent at 03:00 SGT. Walks ~/.openclaw/agents/*/sessions/*.trajectory.jsonl, finds anything ≥50 KB, compresses via claude --dangerously-skip-permissions -p. The summariser is a thin wrapper so tests can stub it.

Seventeen candidates queued across lawrence, main, clawrence, ambassador, and vibby. The cron picks them up tonight.

#3. MCP server exposing OpenClaw conversations

This is the one I'm most happy about. Hermes has mcp_serve.py that exposes the agent's own conversations as MCP tools. Read this carefully: agents typically call MCP servers. Hermes is an MCP server. Any MCP client (Claude Code, Cursor, this very session) can list conversations, tail messages, read agent memory, ask about health.

before                              after
──────                              ─────

  Claude Code session                 Claude Code session
       │                                   │
       │ ssh yash@mac-mini                 │ openclaw_recent_messages(agent='main')
       │ grep -r "..." logs                │
       │ cat trajectory.jsonl              ▼
       │ hope file isn't 4 MB        ~/.openclaw/scripts/
       ▼                             openclaw-mcp-server.py
   noise + context switch                  │
                                            ▼
                                     {agents, turns, memory, health}

I'd been thinking about agent inspection as "ssh in and tail the log." Hermes inverted it. The agent is the source of truth, expose it via the protocol everyone already speaks.

I hand-rolled mine over JSON-RPC stdio. No fastmcp dep (the Python SDK isn't installed on the Mac Mini and I'm not adding pip deps to a load-bearing daemon). About 130 lines:

openclaw_agent_list           agents + workspace status + sessions KB
openclaw_agent_memory_read    MEMORY.md for a given agent
openclaw_recent_messages      last N turns from latest trajectory
openclaw_health_status        daemon status + gateway.log mtime
openclaw_usage_today          last 7 days of usage records

All read-only. Allow-listed agent names (no path traversal). Truncates outputs over 24 KB. Logs server-side errors to a file because anything on stdout corrupts the JSON-RPC stream.

Registered with claude mcp add openclaw python3 /Users/yash/.openclaw/scripts/openclaw-mcp-server.py. Within ten seconds of registering, I could ask my Claude Code session "what's in Claudia's MEMORY.md right now" and get a fresh answer. Previously that was a context switch.

I think this changes how I'll work with OpenClaw. Less ssh-and-grep, more conversation.

#4. Error classification in the claude-cli shim

The current setup: when claude-cli flakes (OAuth refresh race, network blip, Anthropic 429), OpenClaw blindly falls back to the Anthropic API. The fallback works. It also costs money the Max subscription is meant to cover. And once you fall over, you stay there until the daemon restarts.

Hermes classifies the error before deciding what to do. agent/conversation_loop.py imports error_classifier.FailoverReason, anthropic_adapter._is_oauth_token, nous_rate_guard. On error, it asks: is this auth, rate, context, network, or server? Pick the right strategy. Restore primary after the fallback succeeds.

My port is a 100-line Python classifier in ~/.openclaw/scripts/lib/shim_classify.py and a 30-line patch to claude-shim.sh. The shim now tees stderr, runs the classifier in a background subshell so the gateway doesn't wait on disk, and writes the verdict to ~/.openclaw/state/shim-last-error.json:

{
  "ts": "2026-05-18T03:17:38Z",
  "exit_code": 1,
  "category": "auth",
  "should_retry_primary": false,
  "should_fail_over": true
}

Categories: ok, auth, rate, context, network, server, canceled, unknown. Decision matrix: rate, server, network retry the primary after backoff. auth and unknown fail over to the API for this turn only. context doesn't get retried at all (failover doesn't fix a 200K-token prompt).

The shim has an off-switch (OPENCLAW_SHIM_CLASSIFY=0) that reverts to the original exec-through behaviour. The daemon doesn't read the state file yet. The classifier and the consumer ship separately because changing daemon behaviour is the bigger blast radius.

#What I deferred

Two patterns I genuinely want but didn't ship. Both for the same reason.

FTS5 memory injection. Hermes uses SQLite FTS5 to retrieve relevant prior turns at context-assembly time. OpenClaw already has FTS5 tables (chunks_fts in ~/.openclaw/memory/*.sqlite) but the daemon never queries them. The infrastructure is 80% there. The remaining 20% is a patch into the @openclaw/* npm dist.

Jittered backoff. Six agents retrying in lockstep create a thundering herd. Hermes's retry_utils.jittered_backoff adds random.uniform(0, base) to each retry delay so concurrent retries de-correlate. Also a dist patch.

Both sit on the request hot path. Patching the daemon mid-day on a production system serving real customers is the exact thing I wrote about wanting to avoid two weeks ago. So they go into ~/openclaw/plans/ as design notes and wait for an isolated test workspace.

That test workspace is the next thing.

#What I rejected outright

A few patterns I deliberately skipped:

Multi-provider routing. Hermes supports 30-odd providers (Anthropic, OpenAI, OpenRouter, NovitaAI, NVIDIA NIM, MiMo, GLM, Kimi). I have a feedback rule saying every agent should be Claude Max primary, API only as fallback. Multi-provider is a feature for someone who doesn't have a flat-rate sub.

40 platform adapters. I have Telegram and WhatsApp. SMEs in Singapore live on WhatsApp. Adding Signal, Email, Matrix, Discord, Feishu, DingTalk, QQBot is feature creep dressed up as flexibility. The gateway/platforms/ directory in Hermes is impressive engineering for a use case I don't have.

Skills Hub with GitHub provenance. Hermes treats skills as portable bundles distributed via agentskills.io, with cosign verification and quarantine. The unit is right. The distribution overhead isn't, for me. My agents are bespoke. There's no community to download skills from.

Tirith pre-exec scanning. A separate binary that scans content for homograph attacks, pipe-to-interpreter, terminal injection before tool execution. I already have a prompt-injection-scan hook. Tirith would be parallel infrastructure for marginal additional coverage.

#What OpenClaw does better than Hermes

Worth naming, because it's tempting to assume the bigger open-source project wins:

Config-audit log with auto-revert. Every write to openclaw.json is logged with hash and process info to config-audit.jsonl. Invalid edits silently revert to .last-good. Hermes has no equivalent. The number of times this has saved me from a typo at midnight is high.

Drift detector. Daily cron that reads trajectory files, computes reply / no_reply / refusal rates per agent, alerts on ±30% deviation from the 7-day baseline. 57 pinned tests. Hermes has cost tracking but no behavioural drift detection.

claude-cli backend on Claude Max. Routes through my flat-rate subscription instead of paying per-token. Hermes is API-key only. The savings compound month over month.

Hooks tightly fitted to real customer flows. ambassador-inbound-gate qualifies leads before they reach Ambassador. whatsapp-customer-data-guard prevents PII reaching the model. bootstrap-extra-files injects Claudia's wa-training corpus. Hermes's plugin registry is more general; my 14 hooks are more useful, because they were each born of a specific incident I'd otherwise repeat.

#The meta-lesson

Two solo-built agent harnesses, converging on the same architecture, optimising for different things. Hermes optimises for generality (40 platforms, 30 providers, MIT licence for anyone). OpenClaw optimises for fit (my agents, my channels, my customers).

Both are valid. But generality compounds slower in production than fit does, in my experience. The agent that knows about exactly your customers and your WhatsApp groups and your invoice flow is the agent that gets used. The agent that supports 40 platforms might be the framework someone else builds the fitted one on top of.

If I were Nous, I'd be optimising for generality. I'm not. I'm optimising for fit.

#What changed

After about four hours of work:

Before	After
No per-turn usage record	Daily usage JSONL with spike alerts
Trajectory files grow forever	Nightly compression with 88% savings on the biggest one
Agent state lives on the Mac Mini disk	Agent state exposed via MCP, queryable from any client
Blind fallback to API on every claude-cli flake	Classified errors, smart retry-versus-failover decision
1702 tests	1808 tests

Plus two design notes in ~/openclaw/plans/ for the hot-path patches that need an isolated environment first.

The next frontier, I think, is two ideas Hermes only partially has:

Agents as MCP servers, not just MCP clients. Once your agent's state is queryable over the same protocol your dev tools speak, the boundary between "what the agent does" and "what you can ask about it" collapses. I've been live with this for three hours and I can already feel the leverage.

Skills as portable units. Hermes's agentskills.io standard is the right shape but the wrong abstraction for my world. I want skills to be tight, signed, versioned, locally-cached bundles I can drop into any of my agents. The pattern's there. The right unit for a one-person operation is still being worked out.

If you're running your own agent harness, Hermes is worth reading even if you don't use any of it directly. The trajectory compressor is the single highest-leverage idea, and it's about 200 lines of Python. Most of the rest is the bits you'd think to build eventually anyway.

The full diff of my changes is in yash-gadodia/openclaw (private, but happy to share if you ping me). The Telegram report from the patch run is in the Voltade Team thread, where Clawrence posted it on my behalf about an hour after the last test went green.