How Vobase agents learn (and why SME owners notice)
Contents
- Why a learning loop is the part that makes adaptive software feel adaptive
- The mechanism, end to end
- Stage 1: wake events as the substrate
- Stage 2: staff-signal detection
- Stage 3: change proposals as the unit of learning
- Stage 4: applied skills
- Why SME owners notice
- What is still hard
- Where this goes
Last week I was on a call with one of our SME customers. We had deployed the agent two days earlier. She was talking to it like a colleague: "no, when someone asks about wholesale, don't quote the website price, that's retail." She typed it as an internal note while the agent was drafting a reply. The agent saw the note, kept drafting, sent a clean answer.
A few minutes later she said it again to a different conversation. The agent had already shifted. Same wholesale-versus-retail correction, except this time it caught it pre-draft. She paused, looked at the camera, and said "wait, it's learning?"
That is the moment. That is the moment the adaptive software thesis stops being a slide.
I wrote about adaptive software a few weeks ago. The short version: fixed SaaS is too rigid, blank-canvas AI builders have a cold start problem, what wins is an 80% ready vertical template that the business shapes by talking to it. That post was the shape of the product. This one is about the mechanism. Specifically, how the agent on top of that template learns from the staff who run the business, without any of those staff knowing what a system prompt is.
I will quote the actual code. We built this on top of Vobase and it is what I have spent the last few months obsessing over.
#Why a learning loop is the part that makes adaptive software feel adaptive
Adaptive software has three layers, and they each fail in different ways without learning.
The vertical template is the bottom. A bakery template ships with a contacts model, a WhatsApp inbox, a quote workflow, a delivery scheduler. Most templates die here. The 80% never quite matches the 20% that is the business, and the business does not know how to describe the 20%.
The adaptive layer is the middle. The AI editor that takes "make this column red when the deposit hasn't landed" and writes the code. This is the slide that everyone is excited about. It also fails on its own. Owners do not sit down for a four-hour session to describe twenty exceptions. They notice the exceptions one at a time, mid-conversation, when the thing in front of them is wrong.
The learning loop is the top. It listens for those mid-conversation moments and turns them into structural changes the agent picks up next time. Without it, the adaptive layer is a chatbot that asks "did I get that right?" forever. With it, the agent reduces the staff's correction load over time. That reduction is what the owner notices on day two. That is what makes their eyes light up.
This is also where I think most AI agent products are going to lose. The interface is fine. The model is fine. The thing that is missing is the loop that converts staff corrections into durable behaviour.
#The mechanism, end to end
What happens between "staff member types a correction in a WhatsApp note" and "the agent never makes that mistake again." There are four stages. I will walk each.
#Stage 1: wake events as the substrate
A Vobase agent does not run on a chat loop. It runs on a wake. A wake is the unit of work: an inbound message triggers a wake, a staff supervisor mention triggers a wake, a scheduled check triggers a wake. Each wake emits a stream of typed events.
type AgentEvent =
| { type: 'agent_start'; triggerPayload: ... }
| { type: 'llm_call'; ... }
| { type: 'tool_started' | 'tool_completed' | 'tool_rejected'; ... }
| { type: 'internal_note_added'; noteId: string; authorType: ... }
| { type: 'message_update'; ... }
| { type: 'agent_end'; ... }
| { type: 'agent_aborted' }
This is the substrate. Every observable thing the agent does is on this stream, including the things humans do to the agent while it is running. The staff note that the SME owner typed mid-draft? That came in as internal_note_added and the agent's prompt has it before the next llm_call.
The reason this matters for learning is that the stream is also where we listen for teaching moments. The agent stream is the agent's whole short-term memory and our whole observability surface.
#Stage 2: staff-signal detection
Most events on the wake stream are not teaching moments. An llm_call is just an LLM call. A tool_started is just a tool call. We do not want to learn from every event, we want to learn from the events that are humans correcting the agent.
There is a pure function called detectStaffSignals that scans the wake stream and pulls out the four signal shapes worth teaching from.
// 1. `supervisor` wake trigger:
// staff kicked the agent awake via an @staff mention
// 2. `approval_resumed` + rejected:
// staff denied a pending tool call, with optional note
// 3. `internal_note_added` w/ staff:
// during-wake note from a human operator
// 4. `manual` reassignment-with-note:
// staff re-routed the conversation and left a prose reason
These four are what we have converged on after months of running this in production. Each one carries the staff's intent in different ways.
A supervisor mention is the loudest signal. The agent did something wrong, a human typed @agent stop suggesting dessert pairings for diabetic customers, and the wake is literally triggered by that correction. We treat these as the highest-confidence teaching moments.
An approval rejection is the most surgical. The agent proposed a tool call (say, sending a quote with a specific price), staff hit reject with a one-line reason. Now we have a paired sample: the action the agent wanted to take, the precise reason a human stopped it. These map cleanly onto skill updates.
An internal note added during the wake is the wholesale-versus-retail case from the opening. The agent is mid-draft, staff types into the conversation thread, the note becomes part of the wake context, and we mark it as a teaching moment for next time even though the agent already saw it this time.
A reassignment with a note is the one that took me longest to get right. Staff sometimes re-route a conversation away from the agent because the agent was doing the wrong thing. The reason text is gold. We only pick it up when the reason starts with the word "reassign", to avoid eating every manual reassignment as a learning signal.
Importantly, detectStaffSignals is pure. No database calls, no IO, no async. It takes the events array and returns a list of signals.
export function detectStaffSignals(
wakeEvents: readonly AgentEvent[]
): StaffSignal[]
This is a deliberate choice. Learning has to be predictable. If the detector did IO we would have race conditions between the detector and the writer, and one of the lessons from running agents in production is that anything stochastic in the learning path becomes a nightmare to debug six months later. I wrote about a closely related lesson in the WIMAUT post, where a non-deterministic cron silently burned $300 because the cost path was not traceable.
#Stage 3: change proposals as the unit of learning
Once we have a list of signals, each one becomes a change proposal. A change proposal in Vobase is a generic data structure. The shape:
interface ChangeProposalRow {
resourceModule: string // e.g. 'agents'
resourceType: string // e.g. 'agent_memory'
resourceId: string // the agent id
payload: ChangePayload // a markdown_patch or field_set
status: 'pending' | 'approved' | 'rejected' | 'auto_written' | ...
confidence: number | null // 0..1
rationale: string | null // "what problem does this solve"
expectedOutcome: string | null // "after this, what changes for the user"
proposedById: string // 'agent:<id>' or 'staff:<id>'
}
This is the same structure that staff use to propose edits to contacts, drive docs, channel settings, or anything else. Agents and humans use the same change-proposal pipeline. That symmetry is one of the things I am proudest of architecturally. It means the audit log for what the agent learned and what staff edited is literally the same table.
For a learning proposal coming from detectStaffSignals, the payload is a markdown_patch with mode: 'append' against the agent's workingMemory field:
const body = [
'',
`## Staff signal, ${signal.kind} @ ${signal.ts}`,
`- Author: ${signal.actorUserId ?? 'unknown'}`,
`- Ref: ${signal.ref}`,
`- Note: ${notePreview ?? 'β'}`,
].join('\n')
const payload: ChangePayload = {
kind: 'markdown_patch',
mode: 'append',
field: 'workingMemory',
body,
}
A markdown patch. That is the unit of learning. The agent's brain is a markdown file, and learning is an append to that file. There is something quietly satisfying about that. The whole industry is racing for vector databases and embeddings and graph memory, and the thing that actually shifted behaviour in production for us was a markdown file with a structured append.
The proposal is then written to the changes table:
await insertProposal({
organizationId,
resourceModule: 'agents',
resourceType: 'agent_memory',
resourceId: agentId,
payload,
changedBy: `agent:${agentId}`,
changedByKind: 'agent',
confidence: 0.6,
rationale: copy.rationale,
expectedOutcome: copy.expectedOutcome,
conversationId: conversationId ?? null,
})
Two details to notice. First, confidence: 0.6. We deliberately picked a middle-confidence default because these proposals are not all the same quality. A supervisor wake is high confidence, an internal note mid-draft is medium, a reassignment reason is lower. We tune confidence per signal kind, and the staff inbox lets owners filter by it.
Second, rationale and expectedOutcome are human-readable. The whole point of the History tab in the staff app is that a non-technical owner can scroll through and see "the agent learned this on Tuesday because Alice left this note." We invested in copy here because the audit log is the product surface for the learning loop. If the owner cannot read it, they cannot trust it.
There is also an idempotency guarantee. The schema has a partial unique index on (org, resourceModule, resourceType, resourceId) for status = 'pending'. If the agent proposes the same learning twice (a concurrent retry, a duplicate signal), the second insert throws and we swallow the conflict:
} catch (err) {
if (err instanceof Error && /already has a pending proposal/.test(err.message)) {
logger.info(..., 'learning-proposals: duplicate pending, skipped')
continue
}
...
}
This sounds boring. It is not. It is the thing that prevents the learning loop from corrupting itself the day a webhook misfires or a worker retries. Most learning systems I have seen in the wild break here.
#Stage 4: applied skills
Some change types in Vobase require human approval. Editing a customer's account, for example. Others auto-apply. agent_memory is registered with requiresApproval: false, which means the proposal materialises immediately and surfaces in the History tab. Staff see what changed; they don't have to click approve for it to land.
This is a deliberate trust trade. The cost of asking the owner to approve every memory tweak is high (they will stop reading after day three). The cost of a bad memory update is bounded (we can always revert through the history). We picked auto-apply with full audit, and we have been able to defend it because the audit is good.
The other class of artifact the agent maintains is skills. These are markdown files in modules/agents/skills/. Here is one verbatim:
---
name: pii-redaction
appliesTo: all
---
# PII Redaction
Never echo back full credit-card numbers, SSNs, government IDs, or full
bank accounts. Mask everything except the last four characters:
`**** **** **** 1234`.
- If the customer pastes a full PAN or SSN, acknowledge receipt, redact,
and ask staff to handle it via a secure channel.
- Email addresses, phone numbers, and order IDs are not PII for our
purposes, quote them when useful.
- When summarising a conversation in MEMORY.md, redact PAN/SSN/passport
before the summary lands on disk.
Note the frontmatter. appliesTo: all means this skill is loaded into every wake. Other skills are scoped: appliesTo: conversation for in-conversation behaviour, appliesTo: operator for the standalone heartbeat thread that runs the daily brief.
Skills are how the durable, structural lessons land. Memory entries (the append-to-workingMemory variety) are tactical. A note like "Kai prefers same-day deliveries to mean evening of the same day" goes into working memory. A rule like "never quote prices on weekends without staff approval" turns into a skill. The learning loop produces both, depending on the scope hint the signal carries:
export type ScopeHint =
| 'contact' // about this specific customer
| 'agent_memory' // general working memory
| 'agent_skill' // promote to a skill
| 'drive_doc' // belongs in a knowledge base doc
The signal kind drives the default scope. An approval rejection scopes to agent_skill because rejections are usually about behaviour the agent should not repeat. A reassignment note scopes to contact because the operator is usually correcting how the agent dealt with one person. A supervisor wake also scopes to contact. An internal note can be any of the four, and we use a small LLM call to pick.
#Why SME owners notice
I have walked the technical stack. The point of all of this, the reason it matters, is that the SME owner does not have to understand any of it.
What she experiences is: she leaves a note on Tuesday, on Wednesday the agent stops making the mistake. She does not file a support ticket. She does not get scheduled into a customer-success call to "configure" something. She types a sentence in the same place she was already typing sentences, and the software changes.
The first time this happens, the owners I have shown this to do one of two things. Either they ask "wait, it's learning?" or they get quiet, look at the screen, and then say something like "okay, I can use this." Both reactions are the same reaction. They are realising that the relationship to the software is not transactional. It is closer to onboarding a junior staff member who is reasonably bright and pays attention.
This is the part that I think is genuinely new. SME owners have been told for years that AI will change their business, and most of what they have seen so far is chatbots that are very confident and frequently wrong. The first time they see the agent correct itself off a note they typed casually, the frame changes. I called this the "staff, not magic" frame in an earlier post. The learning loop is the part that makes the staff frame credible.
#What is still hard
Three things I have not solved.
Decay versus latest-correction-wins. When the same agent gets contradictory corrections over time (Alice says one thing in January, Bob says the opposite in March), which one wins? We currently default to chronological-latest with a manual override, but I am not satisfied. There is a real argument for keeping the older rule when Alice is the more senior operator and the system does not know that.
Skill explosion. After a few months of running, an agent can accumulate forty skills. Most of them are still right. A few are out of date or contradict each other. We currently rely on staff to prune through the History tab, but that is asking too much. There is a future post about a skill consolidator observer that runs periodically and proposes merges or deletions. Have not built it yet.
Cold start in the first week. The loop is most valuable on day fifteen, when the agent has absorbed enough corrections to feel like yours. Day one is just a generic agent with a vertical template. We compensate with a structured onboarding wizard that captures the most important policy rules up front, but it is the part of the product that feels least adaptive. I think the answer is to bootstrap the first few skills from the owner's existing materials (their training docs, their FAQ, their standard responses) but we are not there yet.
#Where this goes
Most AI products today optimise for the moment a customer says "wow." Vobase tries to optimise for the third Tuesday: the moment a customer realises the agent is no longer the version they deployed two weeks ago, because their staff has been teaching it. That is the moment the relationship changes.
I think the products that win in the next three years will be the ones that make that third-Tuesday moment cheap to produce. The technical pattern (wake events, structured signal detection, markdown-patch change proposals, dual memory/skill durable artifacts, full audit) is not the only way to get there, but it is the one that has worked for us.
Vobase is open source. The self-learning loop, the wake harness, the change-proposal pipeline, all of it lives in the repo. If you are building agent products and want to copy any of this verbatim, please do. I would much rather the industry converge on something that works than every team rediscover the markdown-file-as-brain pattern from scratch.
And if you are an SME owner reading this and your eyes lit up somewhere around the second section, that is the actual win. The mechanism only matters because of you.