Evidence

Every claim on my CV, mapped to the write-up that substantiates it.

Hiring screens ask for demonstrated capability. This page is the map: each claim I make, and the published work behind it. Every number comes from one production system I am accountable for, not a benchmark.

Builds the product, not just the roadmap

The proof-of-concepts that close deals, the personal agent fleet, the apps, this site: built hands-on with Claude Code, then run in production.

Runs agents at production scale

230K+ AI interactions a day across 100+ SME deployments, 99.65% task success. Real customers on real WhatsApp lines, not benchmarks.

Evaluates what it ships

Three-layer evaluation: deterministic checks on every wake, calibrated LLM-as-judge at scale, weekly human review as the source of truth.

Owns deployment safety

Model-layer safety is the lab’s job. Deployment-layer safety, pointing a safe model at a real business without getting burned, is mine.

Understands the workflow nobody documented

The hard part of a deployment is rarely the model. It is the approval queue that is slower than the messaging window, and the trust the owner has not extended yet.

Measured after handoff, not at the demo

A deployment counts when the business runs on it without me. Adoption, self-learning in place, and reliability once I am gone.

Treats cost as a product feature

Model tiering, cost attribution per interaction, and the observability to catch a runaway cron before the bill does.

Kills what does not work

Product judgement includes the post-mortem. One product retired in public, and the architecture lesson that came out of it.

If you are screening me and have ninety seconds: the failure museum, then the eval framework. Or curl yashgadodia.com/cv.