Blog

AI products in production: agents, evals, and the decisions behind them.
The system behind 230K interactions a day
The two numbers on my CV, unpacked: what counts as an interaction, which layer the 99.65% is measured at, and the eval, guardrail, and cost subsystems that keep both honest.
Creating a tenant is an INSERT
Voltade's third swing at the same problem. Studio asked SMEs to build, Vobase had us building for them, and Volty bets they'll configure. What changed each time, and why.
How I keep production agents on the rails
Production agent safety is not about jailbreaks. It is about an agent confidently doing the wrong thing on a customer's WhatsApp for six hours before anyone notices. Here are the failure modes that actually happen, the guardrails that work, and how I prove the guardrails are working.
How I Evaluate AI Agents (and Why Most Teams Get It Wrong)
Most agent evals measure the wrong things. After running two agents in production for six months, here's the framework I actually use, with real metrics, LLM-as-judge calibration data, and the $300 lesson that started it all.
The 24-hour window versus the approval queue
Volty agents draft, humans approve. WhatsApp gives you 24 hours to reply. What happens when the approval is slower than the window, and the keep-alive sweep we built for the collision.
Why Studio didn't work
We built a no-code agent builder so SMEs could build their own agents. Almost nobody did. Why Studio failed, and what we built instead.
How I pick what to ship between Envoy and Studio
The mechanic I actually use to prioritise across two 0-to-1 AI products as the only PM at Voltade. The Studio pivot, the Thursday rule, and the call I got wrong.
Cloudflare Tunnels and Tailscale, and when each one wins
I have both installed on my Mac Mini. Here's why Cloudflare Tunnels does most of the work and where Tailscale still earns its keep.
Sharing Claude Code config across two Macs (and letting them diverge)
How I sync my global Claude Code config between two machines using iCloud symlinks, and why I let machine-specific settings drift instead of forcing a single source of truth.
What an applied AI lab in Singapore should build first
Eighteen months building applied AI for SEA SMEs at Voltade. Here's what I'd build first if I were starting Singapore's new applied AI lab.
What I stole from Hermes (and ported to OpenClaw in an afternoon).
Nous Research open-sourced Hermes Agent, a harness in the same shape as OpenClaw. I read it for an afternoon and ported four patterns: a usage tracker, trajectory compression, a stdio MCP server, and an error classifier in the claude-cli shim. Two more I deferred. Two I rejected outright.
What I stole from gbrain (and ported to OpenClaw in one afternoon).
Garry Tan's gbrain has a few design moves that solve real OpenClaw pain. I ported five of them: a durable job queue, a config doctor, storage tiering, a hot-memory recall CLI, and a compiled-truth memory structure. Here's what worked, what didn't, and what almost leaked to a customer channel.
How Vobase agents learn (and why SME owners notice)
A walk through the self-learning loop inside Vobase: wake events, staff signals, change proposals, applied skills. The technical mechanism that turns adaptive software from a slide into a thing SME owners actually trust.
Seven agents on a Mac Mini: four months of breaking my OpenClaw harness
I run seven personal AI agents on a Mac Mini in my flat. They book my gym classes, manage my inbox, handle outreach for two academies, and watch Voltade customer groups. The harness took four months to stabilise. The two things that fixed it were scope discipline and deterministic flows. Here is what each agent does, what broke, and the patterns that finally stuck.
An agent is staff, not magic
Notes on writing a model behaviour spec for an agent that works for someone. What it does, what it refuses, who it serves when those conflict, and why most system prompts are a lazy substitute for a job description.
Cost is a product feature
Notes on when to use Haiku, when Opus is worth it, and why model choice is a product decision wearing engineering clothes.
I had Claude audit my Claude Code use
82 sessions, 12 weeks, 4,888 Bash calls. I pointed an audit tool at my own Claude Code transcripts. Three things came back that I didn't want to hear.
The 20% That Is the Business
Templates handle the common 80%. The remaining 20% is every customer's actual business. Here's what that 20% looked like for one bakery, one day, five bugs.
Adaptive Software
Fixed SaaS loses to models. Blank-canvas AI builders lose to the cold start problem. What wins is 80% ready software that the business shapes by talking to it.
claude-init: Make Any Repo AI-Native in One Command
Every time you open Claude Code on a new repo, you start from zero. claude-init fixes that by analysing your codebase and generating a complete .claude/ configuration.
I Gave Claude $100 and Told It to Trade Crypto
Building an autonomous crypto trading bot with Claude Code, $100 in USDC, and a lot of guardrails. From first trades to losing everything, then rebuilding the strategy from scratch based on actual research.
Teaching Claude How to Write Like Me
I built a Claude Code skill that writes blog posts in my voice from any session. The hard part wasn't the workflow, it was the voice.
Envoy CRM: What I Learnt Getting 20 SMEs to Actually Use a CRM
How we built Envoy from zero to 20+ paying SME customers in Singapore. Dogfooding, grant-driven GTM, killing features, and the product decisions behind it all.
How This Site Was Built With Claude Code
This entire website was built, redesigned, and maintained by Claude Code. Here's every prompt, tool, and workflow behind it.
WIMAUT: Because Your Agents Won't Tell You They're Burning $300
I built an agent observability dashboard after an OpenClaw cron job silently burned $300. Here's the problem and what WIMAUT does.
Self-Hosting an LLM on a Mac Mini
How I set up a local LLM on a base-model Mac Mini, put it behind a public URL with Cloudflare Tunnels, and locked it down with Cloudflare Access. No port forwarding required.
Clawrence and Claudia: Building AI Agents That Actually Do Things
I built two OpenClaw agents for Voltade. One manages our customers. The other runs our internal ops. Here's what I learned about deployment, models, and what agents are actually good for.
The Death of SaaS, and What Comes After
Software's marginal cost is approaching zero. SaaS as a business model is dying. The winners will be AI-native services firms, not tool makers.
When to automate: a Telegram-to-Notion bot in a day
A predictable workflow that interrupts focus is worth automating. How I evaluated three approaches and shipped an internal tool in a day.
My Ninja Van Internship Experience
This post lives on Medium. A look back at my internship at Ninja Van, working on the Tooling & Automation team.

Navigate with j/k, open with Enter