I'm Yash.
Sell the work, not the tool. Evaluate the agent, not the feature.
I ship AI products where success is measured in agent reliability, not feature count. Founding PM at Voltade, where I drive 0-to-1 agent platforms (Studio, Envoy) and the evaluation frameworks behind them. Five years of engineering before that.
Track Record
- VoltadeDefined strategy for agent platform and conversational CRM. Established evaluation frameworks for production agent reliability across 230K+ daily AI interactions.
- Ninja VanOwned platform products in the CTOβs office. Notification infrastructure, parcel booking, voice calling across 6 markets.
- AfterClassCo-founded a student platform thatβs still active today with 50K+ users.
View Projects About Me Email Me
Press βK to explore.
Featured Projects
Open-source app framework for AI coding agents. Built a Bun + TypeScript stack with sensible defaults, auth, Postgres, jobs, agent runtime, so coding agents like Claude Code generate working code on the first try.
Voltade Studio
No-code AI agent builder. Defined the product vision and shipped a platform where non-technical teams describe what they need in plain English and get working agents deployed across WhatsApp, Telegram, and Web.
Conversational CRM for SMEs. WhatsApp-first inbox where a per-tenant agent triages, drafts, and replies; humans approve. 100+ active SME deployments, 230K+ AI interactions/day, 99.65% success rate.
Enterprise AI for strategic intelligence. Drove product direction and engineering for an NLP/RAG platform serving Fortune 500 clients, where analysts explore millions of documents through topic clustering, network analysis, and theme visualisation.
Start here
Most agent evals measure the wrong things. After running two agents in production for six months, here's the framework I actually use, with real metrics, LLM-as-judge calibration data, and the $300 lesson that started it all.
Notes on when to use Haiku, when Opus is worth it, and why model choice is a product decision wearing engineering clothes.