The Next Shadow IT Isn’t Software. It’s Agents.

If you’ve been in technology long enough, you remember the rise of Shadow IT.

It never started with rebellion. Nobody woke up and decided to undermine corporate governance. A department needed something. IT couldn’t move fast enough. Someone built a spreadsheet. Someone else signed up for a SaaS platform on a company card. A manager ran an Access database for three years before anyone in the centre knew it existed.

Each decision made sense in isolation. The problem only became visible when you zoomed out. Suddenly nobody knew what systems existed, who owned them, what data they held, or what would happen if they disappeared tomorrow. The organisation hadn’t intentionally designed an architecture. It had accidentally accumulated one.

We are about to do exactly the same thing with AI agents.


A few weeks ago I wrote about a question a colleague asked while watching me work: “How do you know what your agents are going to build?” That post was about the missing specification layer between intent and implementation.

This is the follow-on question. What happens when those agents leave development and enter production? What happens when there are hundreds of them?

I sit on an AI committee at work. I see the race for agentic capability from the inside. The energy is real and the pressure is genuine — teams across every function want agents, want them now, and are building them faster than any central function can review. That’s not a complaint. The capability is real and the business cases are solid.

But I’ve noticed something. The teams that will win this race aren’t the ones deploying the most agents. They’re the ones who can keep deploying because they built the operating model before they needed it. Everyone else will hit a wall: a question they can’t answer, an audit they can’t pass, a failure they can’t explain. And at that point the agents stop until someone builds the governance they skipped.


Agent sprawl starts with success

The dangerous thing about agents is that useful ones are easy to justify.

A claims team deploys one to triage work. Customer service deploys one to prepare responses. Engineering builds one to review pull requests. Finance builds one to reconcile reports. Each has a business case. Each saves time. Each makes someone’s life easier.

That is exactly why they spread.

Bad agents die quickly. Useful agents multiply.

Before anyone notices, the organisation hasn’t deployed an agent. It has deployed an estate.

One agent is a use case. Ten agents are a portfolio. Hundreds of agents scattered across business units, vendor platforms, and local scripts are an estate. Estates do not run on vibes. They need mechanisms.


The question nobody can answer

I run a personal AI infrastructure: Jarvis on a Raspberry Pi, Hermes on a Hetzner server, monitoring agents watching both. Even at that scale I’ve had to make deliberate decisions about agent identity, access scope, and what happens when something breaks. I decommissioned one entire runtime when I couldn’t confidently answer basic questions about what it was doing. Painful call. Right call.

Now multiply that by a department operating in a regulated industry.

Six months after a workflow has been running, an auditor asks: which agent prepared this recommendation? What data did it use? What did it ignore? What policy applied, and what did the reviewer actually see before approving the output?

And the room goes quiet.

Not because anything went wrong. Because nobody designed the system to remember. That is the failure mode that keeps enterprise architects awake at night. Not rogue AI. Missing evidence.


The harness tools solve the wrong problem

When teams do think about governance, they reach for observability tools. LangSmith. Langfuse. Tracing integrations inside LangChain. These are genuinely useful. They tell you what happened: which tools the agent called, what the prompt looked like, where it failed, how long it took.

But observability tells you what happened. It does not tell you whether it should have happened.

Those are different questions. Logging that an agent accessed a production database is observability. Preventing that agent from accessing the database unless it has been explicitly granted permission, in that environment, at that stage of its lifecycle, by someone with authority to grant it: that is governance. No harness tool does the second thing.

The result is organisations with excellent visibility into what their agents are doing and no mechanism for controlling whether they should be doing it. The dashboard is green. Whether that means the right things are happening is a different question entirely.


Three surfaces, not one

Most governance conversations stop at agents. That is too narrow.

Agents are the obvious starting point, but an agent without version control is a liability. If the one running today behaves differently from the one running last month because someone changed the system prompt, and you cannot reconstruct what the original was doing, you do not have a production system. You have a guess with a nice interface.

Agents need what software has had for decades: source control, environments, promotion gates, and the ability to roll back. An actual lifecycle: draft, test, staging, production, monitoring, and retirement. Including retirement. An agent with no active owner, no current use case, and persistent access to production systems is a risk sitting quietly in your infrastructure.

Skills are the discrete capabilities agents call on: the function that searches your knowledge base, the one that classifies an intent, the one that drafts a response. They’re often shared across agents, and that is where it gets interesting. If a shared skill changes, every agent using it changes. If it has a bug, every agent inherits it. Skills need versioning, ownership, and controlled promotion. They are code. Treat them that way.

Tools are the connections to real systems: databases, APIs, CRMs, payment platforms. A tool is where an agent stops reading and starts acting. Tool access needs to be explicit, scoped, and auditable. Not “the agent can access the claims database.” Which agent? Which environment? Which scope? Granted by whom, reviewed when?

Capability is not permission. Confidence is not clearance. The level of control required scales with how close a tool gets to systems of record. An agent answering from approved documentation is one risk profile. An agent executing transactions is another entirely.


Agents need an SDLC

No engineering team ships code to production without source control, a review process, environment gates, and a way to audit what changed. The code running your business has owners. It has history. It has a path from idea to production that somebody can reconstruct.

Your agents are code. Your skills are code. Your tool integrations are code.

The argument against treating them that way is speed: “We’ll add governance once we’ve proven the value.” That logic holds until something unexpected happens in production and nobody can explain why. At that point governance stops being optional. It becomes the difference between being able to answer a question and not.

Before the next agent goes live: who owns it, what environment does it run in, what can it access, what lifecycle stage is it in, and how was it approved for production? If those five questions don’t have answers, the agent might be useful. It is not ready for production.


What a real control plane covers

A genuine control plane is not a dashboard. It is the layer that makes basic questions answerable.

What agents exist, who owns each one, where they run, what model they use, what workflow they support. What skills those agents draw on. What tools they can call and in which environments. Who approved each agent for production and when. What changed between versions. Which agents are retired and why.

Beyond inventory, it handles three things that should never blur together: what the agent can see, what it can do, and what it is permitted to decide without human approval. An agent may need broad context to prepare useful work and still have no authority to act on anything without sign-off. Letting those boundaries drift is how workflows accumulate permissions nobody intended to grant.

It also handles human review properly. A human in the loop is not governance if the reviewer sees a polished summary and a green button. Real oversight means seeing the sources, the proposed action, the downstream impact, and having a genuine path to reject or escalate. Without those, human approval is theater with better UX.

And it handles traceability: not just logs, but the reconstructible path showing which sources the agent used, which tools it called, what the human approved, and what changed downstream. Autonomy without traceability is operational debt with better marketing.


The window is still open

Most organisations are somewhere in the middle of this right now. Agents are live, useful, and multiplying. The governance conversation is either not happening or stalling because someone thinks it will slow things down.

That window will not stay open. As agents move closer to systems of record, the cost of not having a control plane increases. Vendors will mature. Regulators will catch up. And internally, the teams that built governance early will be the ones with room to keep moving. They can expand autonomy because they can verify it is working. They can answer the audit question because they designed for it. They stay in the race.

The ones that didn’t will be retrofitting governance onto a live estate, agent by agent, skill by skill, tool by tool, while trying not to break anything people now depend on. That is a slow, expensive way to fall behind.

We have seen this before. We called it Shadow IT. We spent years cleaning it up.

The estate is already forming. The question is whether you will be able to govern it when it matters.


I run Jarvis and Hermes, a personal AI infrastructure across a Raspberry Pi and Hetzner, as a way to stay close to how these systems actually behave. Most of what I write here comes from things I’ve had to figure out the hard way.