The Agentic Infrastructure Stack: What We're Building and Why
An autonomous coding agent needs more than a chat completion endpoint. It needs an isolated workspace it can fail safely inside, durable memory across sessions, policy-enforced data access, orchestration with crash-safe state, payment rails for the tools it calls, and a deployment story. We're building each piece as a separate MIT-licensed project — agentvfs, memorg, ormai, brat, mcp-pay, fastagentic — and this post walks through how they compose into a reference architecture, with citations to each repo.
The discourse around “agents” has finally caught up with how unfinished the underlying stack is. The libraries are mature enough to demo. The runtime is not. If you’ve tried to put an autonomous coding agent in front of a real production codebase, or let a research agent run for ninety minutes against a real budget, the gaps become obvious in roughly twenty minutes: the agent has nowhere safe to fail, no memory across sessions, no constrained route to your database, no payment rail for the tools it calls, no orchestration layer for parallel runs, and no deployment story.
We’ve spent most of the last year building those pieces as separate MIT-licensed projects. This post is a walkthrough of how they compose, written so you can take the diagram and either use our pieces or build your own equivalents.
The reference shape
A useful agentic stack, in our view, has six layers:
- Workspace runtime — where the agent actually executes file and shell operations.
- Memory — what the agent remembers across turns and across sessions.
- Policy-enforced data access — how the agent reads and writes your application database without going off the rails.
- Orchestration — how multiple agents (or many runs of one agent) coordinate.
- Payment surface — how the agent calls paid tools and gets a 402 back when it can’t.
- Deployment — how the local thing becomes a production API with observability and budgets.
We have a project at each layer. Below, the layer, the missing primitive, and the project we shipped.
Layer 1: Workspace runtime — agentvfs
The first thing that goes wrong with an autonomous agent is the filesystem. A coding agent that can run git, cargo, npm, rm -rf and arbitrary shell on your host machine is not a tool, it’s a hazard. The naive workaround is a Docker container per run, which is heavy, slow to fork, and hard to introspect.
agentvfs is our answer: a workspace runtime built around a proxy boundary. From the README:
The proxy boundary mediates between your agent and the workspace:
agent → proxy boundary → mounted forked workspace → cli tools. It governs what commands run, creates checkpoints, and reports filesystem deltas back to your agent.
The primitives the README enumerates are the right shape for this layer: isolated vaults instead of the host filesystem, millisecond forks for cheap task workspaces, rollback points for risky operations, a proxy boundary with policy control and change tracking, mount-as-real-directory so git/cargo/npm just work, and structured JSON output designed for agent integration.
The “instant forks” point is the one that matters most architecturally. If forking a workspace takes seconds, your orchestrator can speculatively try three approaches in parallel and discard two. If it takes minutes, parallel exploration is too expensive and the agent serialises. agentvfs installs via cargo, npm or pip, so whatever language your orchestrator is in, you can drive it.
Layer 2: Memory — memorg
The second thing that goes wrong is amnesia. The memorg README puts it as bluntly as we have ever phrased it:
LLMs forget. Context windows fill up. Important details get lost. Your chatbot asks the same questions twice.
We are not going to solve the memory problem with longer context windows alone — even if context were free, the agent needs organised recall, not a transcript dump. Memorg is the layer that decides what to store, how to index it, and how to retrieve relevant slices on demand.
The Quick Start composes cleanly — MemorgSystem takes a storage adapter (SQLite by default), a vector store (USearch), and an OpenAI-compatible client; you create a session, start a conversation, and call search_context("…") to pull a relevant slice on each new turn. Adapters are pluggable, and the project also ships a memorg CLI. The pattern we use most: persistent memory keyed by (user_id, agent_id, session_id).
Layer 3: Policy-enforced data access — ormai
The third thing that goes wrong is the database. The moment your agent has a tool that runs SQL, four scenarios show up in the ormai README:
- “What if the agent reads sensitive data?” → Field-level policies hide or mask PII automatically.
- “What if it runs wild queries?” → Query budgets and row limits prevent runaway costs.
- “How do I audit what it did?” → Every operation is logged with full context.
- “What about multi-tenant isolation?” → Tenant scoping is built-in, not bolted on.
OrmAI wraps your existing ORM models (SQLAlchemy, Tortoise, Django, SQLModel, Peewee in Python; Prisma, Drizzle, TypeORM in TypeScript) in a policy-enforced runtime. The agent gets typed tools for querying and writing; you keep control of what it can see and do.
The architectural point — and this is the bit we keep having to make in conversations — is that policy belongs at the ORM layer, not the prompt layer. Prompt-engineering “you must not read the PII columns” is not a control plane. A typed tool that physically cannot return those columns is a control plane.
OrmAI’s design — same models, restricted runtime — fits cleanly with our broader drop-in compatibility principle. You don’t migrate your data layer; you wrap it.
Layer 4: Orchestration — brat (and grite underneath)
Once you have safe workspaces, memory and data access, you have the parts. The fourth thing that goes wrong is coordination — multiple agents on the same codebase, or many runs of one agent across many tasks, all trying to make decisions about merges and conflicts.
brat is the multi-agent harness. From its README:
Brat is a multi-agent harness that coordinates AI coding tools (Claude Code, Aider, Codex, and more) working in parallel on your codebase. Built on Grite, an append-only event log, Brat ensures that even if agents crash, your coordination state is always recoverable and auditable.
The supported engines table from the README — Claude Code, Aider, OpenCode, plus their integration points — matters because brat is itself drop-in to the agent layer. You don’t write your task descriptions to brat’s API; brat drives whichever engine you already use.
The under-the-hood detail that matters most is that the coordination state is an append-only event log. This is also why grite exists as its own project: an issue tracker that lives in your repo, “built for AI agents, works for humans,” with the same event-log substrate. If your orchestrator’s state is durable on disk and reconstructable from events, the failure modes get a lot simpler — “the agent crashed mid-merge” stops being a corruption risk and starts being a recovery.
Brat’s demo (./scripts/mayor-demo.sh --with-ui) is a clean way to see the model: a “Mayor” orchestrator analyses a sample Python project, identifies issues by severity, and dispatches a “Convoy” of tasks to engines that fix the bugs in parallel.
Layer 5: Payment surface — mcp-pay
The fifth thing that goes wrong arrives the first time your agent tries to call a paid tool. MCP gave us tool discovery; it gave us nothing about pricing or payment. The mcp-pay README is honest about the landscape:
Ecosystem Discovery Payment MCP Registry Tools, resources, prompts None x402 Bazaar After payment x402 only Tempo MPP In-network MPP only mcp-pay At .well-knownAll rails
mcp-pay extends MCP with three pieces: a JSON manifest (mcp-pay.json) for declaring pricing, a reference Rust server demonstrating payment-gated tools, and an HTTP 402 protocol flow compatible with x402, MPP, Lightning and card payments.
The reference server demonstrates the surface clearly: free endpoints return 200; paid endpoints return HTTP/1.1 402 Payment Required with an X-PAYMENT-REQUIRED header carrying the encoded payment request. The point is not the specific rail. The point is that an agent now has a standardised way to discover pricing, receive a 402, and choose whether to pay — instead of every paid-MCP-tool inventing its own auth dance.
Layer 6: Deployment — fastagentic
The last gap is the one between “my agent works on my laptop” and “my agent is a production API.” fastagentic sits there. The README’s framing:
Your agent works locally. Now make it production-ready in minutes, not weeks. FastAgentic wraps any agent framework — PydanticAI, LangGraph, CrewAI, LangChain — and adds everything you need for production.
What you get without rewriting your agent, per the README: checkpointing so a 90-minute research run survives a crash; observability for every step, token and decision; cost control via token budgets, rate limits and circuit breakers; security with OAuth, RBAC and PII detection; and protocol support for MCP and A2A so tools and inter-agent calls work out of the box.
fastagentic is also the layer that picks up mcp-pay on the tool side and policy enforcement via ormai on the data side. Architecturally, it’s the wrapper that exposes the agent as an HTTP service with the cross-cutting concerns wired in.
Putting it together
A reference flow for an autonomous coding agent looks like:
- The user (or upstream orchestrator) sends a task to a fastagentic endpoint.
- fastagentic loads prior session context from memorg.
- It spawns a fresh agentvfs workspace fork from a clean checkpoint, using agentvfs.
- It hands the task to brat, which selects an engine (Claude Code, Aider, OpenCode) and logs the run to the append-only Grite event log.
- The agent calls tools. Database tools route through ormai with policies and tenant scoping enforced. Paid tools route through mcp-pay-compliant servers that return 402 with a
X-PAYMENT-REQUIREDheader when the agent’s budget isn’t honored. - On success, the workspace diff is reported as a structured delta and either merged or surfaced for review. On crash, the event log lets the orchestrator recover; the workspace is rolled back to the last checkpoint.
- Memory writes back to memorg; tokens, latencies and tool calls flow to the fastagentic observability surface.
None of this is theoretical for us — we use these primitives ourselves — but it is deliberately minimal. We are not telling you which LLM to use, which framework to write your agent in, or which IDE plugin to ship. The stack underneath should be neutral to all of those.
Open problems
Plenty. Three we think about most:
- Inter-agent identity. Agent A calls Agent B through MCP. Whose policy applies to the database? Whose budget pays for the tool? mcp-pay handles the rail; the identity-propagation pattern across A2A is still under-specified.
- Memory provenance. Memorg can recall what it stored. The harder question is “where did this fact come from?” — the source provenance graph the agent needs to make trustworthy edits.
- Cross-language workspaces. agentvfs gives us a workspace boundary; we still see real friction when one agent in the workspace is Rust and another is Python and a third is shell, and they all want different views of the same
Cargo.lockandrequirements.txt.
If any of these are your problem, the issues tabs on each repo are the best place to push back on our designs.
What to look at
The shortest path to seeing the stack work: clone brat, run ./scripts/mayor-demo.sh --with-ui, and watch the dashboard at http://localhost:5173. Then start agentvfs in another terminal and point an agent at one of its mounted forks. From there the rest of the layers compose naturally.
The projects page has the full index, including the pieces we didn’t cover here (memorg variants, the testing/build tools, regulus for compliance scaffolding). If you want the executive summary, this one paragraph is it: build the workspace, build the memory, enforce the data access, log the orchestration, declare the prices, wrap the deployment. Everything else is downstream of those six decisions.
Related Products
Related Articles
How to Run Multiple AI Coding Agents in Parallel with brat
brat is an open-source multi-agent orchestration harness that coordinates Claude Code, Aider, Codex, and other AI coding tools on shared codebases with crash-safe state, merge queues, and a real-time dashboard.
Notes from Shipping 28 Open-Source Projects in One Year
What we learned shipping 28 MIT-licensed projects under the neul-labs org — across agent runtimes, Rust accelerators, dev tools and governance. The themes that actually emerged, and the ones we expected but didn't.