← Blog

WIMAUT: Because Your Agents Won't Tell You They're Burning $300

Contents
  1. The vibe coding problem
  2. What WIMAUT does
  3. The $300 lesson, technically
  4. What comes next
  5. Where this goes

A few weeks ago I noticed my Anthropic bill was $300 higher than expected. Took me a while to figure out why. Turns out one of my OpenClaw agents had a cron job running every hour, making API calls, burning tokens. I had no idea it was happening. No alert, no dashboard, no way to know until the bill arrived.

That's the thing about agents. They're useful precisely because they run without you. But that means they also fail without you, waste money without you, and loop endlessly without you. You're completely blindfolded.

That experience is why I built WIMAUT.

#The vibe coding problem

Everyone's running agents now. Claude Code sessions in tmux. Codex tasks in the background. CI pipelines with AI steps. OpenClaw agents on crons. The tooling has gotten good enough that spinning up an agent is trivial. Watching what it does after that? Nobody's solved that.

This is especially bad for non-technical users. The whole promise of vibe coding is that you describe what you want and agents build it. But if you don't understand what's happening underneath, you have no way to know if the agent is making progress, going in circles, or quietly burning through your API budget. You're trusting vibes, literally.

I kept seeing the same pattern at Voltade. Someone kicks off a task, walks away, comes back an hour later. "Is it done?" "I don't know, let me check." Then you look at the logs and realise the agent hit an error on iteration three and spent the remaining 57 minutes retrying the same failing approach.

#What WIMAUT does

WIMAUT stands for "What Is My Agent Up To." It's an observability dashboard for AI agents. I built it at the Codex hackathon in Singapore in 2026, entirely on Codex during the event.

The dashboard gives you a live view of your running agents. Think of it like a process monitor, but for AI tasks instead of system processes. You can see:

  • Active task runs with real-time status (running, completed, failed, stuck)
  • Token usage per agent, per task, per time window
  • Cost tracking with alerts when spend crosses thresholds
  • Execution logs so you can see what the agent is actually doing at each step
  • Failure patterns surfaced automatically, not buried in logs

The visual style is inspired by pixel-agents. Each agent gets a visual representation on the dashboard, and you can see at a glance which ones are active, idle, or stuck.

It's internal-only right now. I use it to monitor our OpenClaw agents (Clawrence and Claudia) and any Claude Code sessions running on the Mac Minis.

#The $300 lesson, technically

Let me break down what actually happened with that runaway cron, because the failure mode is instructive.

Clawrence, my internal ops agent, has several cron jobs. The GeBIZ scraper, the daily Claude features update, the team summary. I'd set up a new cron for competitor analysis. It was supposed to run daily. I accidentally configured it to run hourly.

Each run made several API calls to Claude, including a long system prompt with context about our competitors, our product, and what to look for. Each run consumed roughly 50K tokens. At hourly intervals, that's 1.2M tokens per day. Over a week, it added up fast.

The problem wasn't the cron configuration. That's just a typo. The problem was that I had no visibility into cumulative token spend per agent. No way to set a budget. No alert when an agent exceeded expected usage. The agent was doing exactly what I told it to do. I just didn't realise how often.

With WIMAUT, that would have been a five-minute catch. The cost dashboard would show a spike. The token usage graph would show an hourly pattern that doesn't match the expected daily schedule. I'd get an alert, fix the cron, move on.

#What comes next

Observability tells you what happened. Evaluation tells you whether it was good. They're different problems, and the second one is harder.

An agent can complete a task, return valid output, and still be subtly wrong. Claudia can send a technically correct response that misses the customer's tone. Clawrence can flag a tender that keyword-matches but has nothing to do with what we actually do. Traditional monitoring wouldn't catch either β€” the API returned 200.

That's the gap I want WIMAUT to fill next. Not just "is the agent running" but "is it doing a good job." I'm working on layering automated quality checks on top of the monitoring, but it's still early.

#Where this goes

Right now WIMAUT is reactive. You see problems after they happen. The next step is making it act β€” kill a runaway agent when it blows its cost budget, enforce quality gates before outputs reach customers, manage the whole fleet from one place instead of separate terminal sessions.

During the hackathon, Codex got stuck in a loop on a UI component and I had no way to see it was stuck except watching the terminal. Which is exactly the problem WIMAUT is supposed to solve. We're building tools to monitor agents using agents that need monitoring. Turtles all the way down.