I'm Yash.
Agents in production, not in demos.
I put AI agents on real customers' WhatsApp lines, and I'm accountable when they get it wrong. Founding PM at Voltade in Singapore: Envoy, Vobase, and now Volty. Most of my job is the gap between a demo and an agent you'd trust with your business: evals, guardrails, failure taxonomies. I wrote code for five years before this.
View Projects About Me Email Me
Press ⌘K to explore.
Featured Projects
Conversational CRM for SMEs. WhatsApp-first inbox where a per-tenant agent triages, drafts, and replies; humans approve. 100+ active SME deployments, 230K+ AI interactions/day.
Managed multi-tenant agent platform, in build now. Each tenant gets an agent-native WhatsApp helpdesk: inbox, event-driven agent harness, self-improving knowledge drive, automation. One pooled deploy; creating a tenant is an INSERT, not an orchestration.
App framework for AI coding agents. Bun + TypeScript + Drizzle with auth, Postgres, jobs, and an agent runtime baked in, so Claude Code gets working code on the first try. 100+ agents live across WhatsApp, Telegram, and Web.
Make any repo AI-native in one command. Detects stack, scaffolds CLAUDE.md + agents + hooks + skills tailored to the codebase.
Start here
How we built Envoy from zero to 20+ paying SME customers in Singapore. Dogfooding, grant-driven GTM, killing features, and the product decisions behind it all.
Most agent evals measure the wrong things. After running two agents in production for six months, here's the framework I actually use, with real metrics, LLM-as-judge calibration data, and the $300 lesson that started it all.
Production agent safety is not about jailbreaks. It is about an agent confidently doing the wrong thing on a customer's WhatsApp for six hours before anyone notices. Here are the failure modes that actually happen, the guardrails that work, and how I prove the guardrails are working.