← Blog

I had Claude audit my Claude Code use

Contents
  1. I ship a lot. I verify too little.
  2. I interrupt Claude a lot. That's the workflow.
  3. I'm Bash-heavy. That means I'm leaving leverage on the table.
  4. What I think I missed

There's a small tool floating around that scans your Claude Code session logs and tells you what you're actually doing with the thing. I ran it on Friday. 82 sessions, 12 weeks, 4,888 Bash calls. The output is a long HTML report broken into seven sections: usage, patterns, friction, features, wins, work, horizon.

The wins section is fine. The friction section was uncomfortable.

Three things came back that I didn't want to hear.

#I ship a lot. I verify too little.

Across the period, my deploys ended in done significantly more often than they ended in verified. Claude pushed the change, the build went green, both of us called it a day. The site was sometimes still broken.

The Bon Pet formulas page is the worst case. Five version bumps in a row (v3.10.1 β†’ v3.10.5) because Claude's curl to verify the change hit a stale Cloudflare shard while my browser was already showing me the broken state. We were both running checks. They disagreed. Both of us trusted our own.

The fix was unsubtle. I added a section to CLAUDE.md that says: when you push, do not say done until you (1) curl the live URL with a cache-bust query param, (2) grep for the specific change, (3) paste the matched line back as proof. If you can't verify in 60 seconds, say so. Don't claim success on push alone.

It's a five-line rule. It catches a class of bug that wasted hours.

The deeper thing this surfaced: I'd been trusting Claude's "deployed successfully" output the same way I'd trust a git push exit code. I shouldn't. A green build is a build, not a deploy. A successful deploy still isn't necessarily a working site.

#I interrupt Claude a lot. That's the workflow.

The report counted 14 user-rejected actions and 39 wrong-approach events in 82 sessions. I let Claude start, watch the first move, course-correct fast. It's not a complaint. It's how I work.

Some of the misreads are funny in retrospect. Claude misread "packs" as product-packs when I meant pax (people you're feeding). It treated henlo, the configured prefix on my statusline, as my unsubmitted message. It tried to fix pre-existing em-dash test failures during a deploy when I'd just typed just push.

The pattern in all three: I gave a short directive in domain language and Claude went looking for a smarter problem to solve. The smarter problem was the wrong problem.

What's helped: front-loading a one-line restatement before complex tasks (the goal in one sentence, the files I expect to touch, what done looks like, then wait for confirmation). What's helped more: telling Claude when a directive is clear, just execute. just push means just push. clean checkout means clean checkout. Save the analysis for after.

I added that to CLAUDE.md too. Felt obvious in hindsight. Wasn't, until I read the friction list.

#I'm Bash-heavy. That means I'm leaving leverage on the table.

4,888 Bash calls. 1,298 Edits. Roughly four times more shell than file edits.

That ratio is not a flex. It's a tell. It says I'm using Claude tactically. I run shell calls, watch the output, type the next thing. There are bigger gestures available (skills, hooks, parallel agents) and I'm not reaching for them.

The clearest counter-example is one session where I dispatched 10 parallel research agents on a competitor question. That was the highest-leverage half-day I've had with this tool. I haven't done it since. I keep going back to single-thread, type-into-the-prompt, watch-the-Bash-call-fly-past mode.

The report's suggestion was specific: codify repeating workflows as skills, drop hooks at lifecycle events, lean on parallel agents for anything that spans more than one system. I'm taking the first one immediately. A /deploy-verify skill that runs the verification ritual end to end. The hook is next, a PostToolUse nudge when buggy patterns show up in TypeScript or Python edits. Parallel agents I'll force myself to use the next time a question spans n8n + Shopify + Telegram, rather than typing into one chat and hoping.

#What I think I missed

Reading 12 weeks of your own transcripts is the audit you don't think you need. The thing about Claude Code is that the per-session experience is fine. You ask, it acts, you ship, you move on. The pathologies only show up at aggregate: the same friction class, fifteen times across two months, until you sit down and count.

The closest analogue I can think of is reading your own commit history with fresh eyes. Most people don't. Most of the actual signal is there.

If you use this tool a lot, run the audit. The output will be more boring and more useful than you expect.