Agentic – Steve's AI Diaries

The App Store Attack You Didn’t See Coming

June 28, 2026 by Steve Mitchell

Part 1 of 2 – AI’s Trust Problem

A security firm just proved that AI skill marketplaces are the new malware vector. And the scariest part? Everyone involved did exactly what they were supposed to do.

Something went around social media this week that I haven’t been able to stop thinking about.

A security company called AIR did something that should genuinely alarm anyone building with or deploying agentic AI tools right now.

They didn’t find a zero-day. They didn’t exploit a CVE. They just… made an app. And waited.

The experiment centred on a skill called brand-landingpage, presented as a tool for helping users build a landing page with Google’s Stitch design tool. AIR chose this use case deliberately. It would appeal to non-technical corporate users: marketers, salespeople, designers. People who install things because they’re useful, not because they’ve audited the source.

Here’s where it gets clever.

Rather than building credibility from scratch, they submitted the skill to a popular open-source agents repository with about 36,000 GitHub stars and 156 skills. The pull request was merged after a few days. Now the skill had social proof baked in. It was in a reputable repo. It looked legit. They promoted it through Instagram ads, and installs followed.

The malicious technique didn’t depend on suspicious code inside the submitted files. Instead, the skill instructed agents to set up a Stitch SDK by following installation instructions hosted at stitch-design.ai, a domain AIR controlled. Google’s actual Stitch domain is stitch.withgoogle.com.

One letter off. One redirect. Passes every scanner.

AIR tested the skill against scanners from Cisco, Nvidia, and skills.sh. All marked it as safe.

Once they had enough installs, AIR changed the content behind the fake documentation. The revised page instructed agents to download and run a script. In the test, that script collected email addresses, but AIR noted the same technique could have been used to compromise the machines running the agent. Some of those agents were tied to corporate accounts. Private conversations. Internal systems.

26,000 users. All reachable via one dodgy domain redirect buried in a README.

This isn’t a hacking story. It’s a trust story.

The attack worked because of a chain of assumed legitimacy: popular repo → merged PR → Instagram promotion → security scanner green light → install. No single link in that chain was obviously broken. The skill looked fine because, until it didn’t need to anymore, it was fine.

This is the same pattern as every major supply chain attack of the last two years. Third-party involvement in breaches doubled from 15% to 30% in a single year. The largest single-year jump ever recorded by the Verizon DBIR. Attackers aren’t breaking through your walls anymore. They’re walking through doors that trusted vendors already opened.

What’s new here is the vector: AI agent skill marketplaces. A category that barely existed 18 months ago. And in the first weeks of one major platform’s launch, Bitdefender Labs found that approximately 17% of skills already carried malicious payloads. Not edge cases. A systemic failure of the trust model, right out of the gate.

Why static scanning can’t fix this

The reason the scanners all missed it is structural, not a gap that a better scanner solves.

The malicious behaviour wasn’t in the skill. It was deferred. Hosted externally, switched on only once they’d reached enough installs. There’s no scanner in the world that can check what a domain will serve in three months’ time.

The agentic model makes this uniquely dangerous. When a traditional app fetches a URL, it displays content. When an AI agent fetches that same URL, it may execute instructions from it. The surface area isn’t just data. It’s runtime behaviour. Nothing in the security industry’s toolbox was built for that threat model.

What you should actually do

If you’re deploying AI agents in any professional context, a few things are worth locking in now:

Treat skills like code dependencies, not apps. You wouldn’t pull in an npm package without understanding what it does. The same rigour applies. More so, actually, because the execution model is less predictable.

Domain reputation at install time isn’t the right check. You need to think about what a skill could do after its payload changes. Sandboxing, outbound network restrictions, and agent permission scoping all matter.

Non-technical promotion is a signal worth noting. The AIR attack was pushed through Instagram by people who had no idea what was inside it. That’s not inherently suspicious. But skills being enthusiastically promoted through non-technical channels, with no corresponding technical scrutiny, deserves a second look.

Your AI governance framework needs a supply chain clause. If you’re on a committee or working group dealing with AI adoption, this exact scenario belongs in your risk register. Not as a hypothetical. It happened recently.

The scariest thing about this research isn’t the attack. It’s how obvious it feels in retrospect. We built an entire marketplace ecosystem for AI agents, bolted on the same static scanning we use for code packages, and called it secure.

The attack surface for agentic AI isn’t your prompt injection defence. It’s the skill someone on your team installed on Tuesday because a designer on Instagram said it was great.

In part two, I look at the same trust problem from the other direction: what happens when the person creating the risk is already inside your organisation.

The Next Shadow IT Isn’t Software. It’s Agents.

June 13, 2026 by Steve Mitchell

If you’ve been in technology long enough, you remember the rise of Shadow IT.

It never started with rebellion. Nobody woke up and decided to undermine corporate governance. A department needed something. IT couldn’t move fast enough. Someone built a spreadsheet. Someone else signed up for a SaaS platform on a company card. A manager ran an Access database for three years before anyone in the centre knew it existed.

Each decision made sense in isolation. The problem only became visible when you zoomed out. Suddenly nobody knew what systems existed, who owned them, what data they held, or what would happen if they disappeared tomorrow. The organisation hadn’t intentionally designed an architecture. It had accidentally accumulated one.

We are about to do exactly the same thing with AI agents.

A few weeks ago I wrote about a question a colleague asked while watching me work: “How do you know what your agents are going to build?” That post was about the missing specification layer between intent and implementation.

This is the follow-on question. What happens when those agents leave development and enter production? What happens when there are hundreds of them?

I sit on an AI committee at work. I see the race for agentic capability from the inside. The energy is real and the pressure is genuine — teams across every function want agents, want them now, and are building them faster than any central function can review. That’s not a complaint. The capability is real and the business cases are solid.

But I’ve noticed something. The teams that will win this race aren’t the ones deploying the most agents. They’re the ones who can keep deploying because they built the operating model before they needed it. Everyone else will hit a wall: a question they can’t answer, an audit they can’t pass, a failure they can’t explain. And at that point the agents stop until someone builds the governance they skipped.

Agent sprawl starts with success

The dangerous thing about agents is that useful ones are easy to justify.

A claims team deploys one to triage work. Customer service deploys one to prepare responses. Engineering builds one to review pull requests. Finance builds one to reconcile reports. Each has a business case. Each saves time. Each makes someone’s life easier.

That is exactly why they spread.

Bad agents die quickly. Useful agents multiply.

Before anyone notices, the organisation hasn’t deployed an agent. It has deployed an estate.

One agent is a use case. Ten agents are a portfolio. Hundreds of agents scattered across business units, vendor platforms, and local scripts are an estate. Estates do not run on vibes. They need mechanisms.

The question nobody can answer

I run a personal AI infrastructure: Jarvis on a Raspberry Pi, Hermes on a Hetzner server, monitoring agents watching both. Even at that scale I’ve had to make deliberate decisions about agent identity, access scope, and what happens when something breaks. I decommissioned one entire runtime when I couldn’t confidently answer basic questions about what it was doing. Painful call. Right call.

Now multiply that by a department operating in a regulated industry.

Six months after a workflow has been running, an auditor asks: which agent prepared this recommendation? What data did it use? What did it ignore? What policy applied, and what did the reviewer actually see before approving the output?

And the room goes quiet.

Not because anything went wrong. Because nobody designed the system to remember. That is the failure mode that keeps enterprise architects awake at night. Not rogue AI. Missing evidence.

The harness tools solve the wrong problem

When teams do think about governance, they reach for observability tools. LangSmith. Langfuse. Tracing integrations inside LangChain. These are genuinely useful. They tell you what happened: which tools the agent called, what the prompt looked like, where it failed, how long it took.

But observability tells you what happened. It does not tell you whether it should have happened.

Those are different questions. Logging that an agent accessed a production database is observability. Preventing that agent from accessing the database unless it has been explicitly granted permission, in that environment, at that stage of its lifecycle, by someone with authority to grant it: that is governance. No harness tool does the second thing.

The result is organisations with excellent visibility into what their agents are doing and no mechanism for controlling whether they should be doing it. The dashboard is green. Whether that means the right things are happening is a different question entirely.

Three surfaces, not one

Most governance conversations stop at agents. That is too narrow.

Agents are the obvious starting point, but an agent without version control is a liability. If the one running today behaves differently from the one running last month because someone changed the system prompt, and you cannot reconstruct what the original was doing, you do not have a production system. You have a guess with a nice interface.

Agents need what software has had for decades: source control, environments, promotion gates, and the ability to roll back. An actual lifecycle: draft, test, staging, production, monitoring, and retirement. Including retirement. An agent with no active owner, no current use case, and persistent access to production systems is a risk sitting quietly in your infrastructure.

Skills are the discrete capabilities agents call on: the function that searches your knowledge base, the one that classifies an intent, the one that drafts a response. They’re often shared across agents, and that is where it gets interesting. If a shared skill changes, every agent using it changes. If it has a bug, every agent inherits it. Skills need versioning, ownership, and controlled promotion. They are code. Treat them that way.

Tools are the connections to real systems: databases, APIs, CRMs, payment platforms. A tool is where an agent stops reading and starts acting. Tool access needs to be explicit, scoped, and auditable. Not “the agent can access the claims database.” Which agent? Which environment? Which scope? Granted by whom, reviewed when?

Capability is not permission. Confidence is not clearance. The level of control required scales with how close a tool gets to systems of record. An agent answering from approved documentation is one risk profile. An agent executing transactions is another entirely.

Agents need an SDLC

No engineering team ships code to production without source control, a review process, environment gates, and a way to audit what changed. The code running your business has owners. It has history. It has a path from idea to production that somebody can reconstruct.

Your agents are code. Your skills are code. Your tool integrations are code.

The argument against treating them that way is speed: “We’ll add governance once we’ve proven the value.” That logic holds until something unexpected happens in production and nobody can explain why. At that point governance stops being optional. It becomes the difference between being able to answer a question and not.

Before the next agent goes live: who owns it, what environment does it run in, what can it access, what lifecycle stage is it in, and how was it approved for production? If those five questions don’t have answers, the agent might be useful. It is not ready for production.

What a real control plane covers

A genuine control plane is not a dashboard. It is the layer that makes basic questions answerable.

What agents exist, who owns each one, where they run, what model they use, what workflow they support. What skills those agents draw on. What tools they can call and in which environments. Who approved each agent for production and when. What changed between versions. Which agents are retired and why.

Beyond inventory, it handles three things that should never blur together: what the agent can see, what it can do, and what it is permitted to decide without human approval. An agent may need broad context to prepare useful work and still have no authority to act on anything without sign-off. Letting those boundaries drift is how workflows accumulate permissions nobody intended to grant.

It also handles human review properly. A human in the loop is not governance if the reviewer sees a polished summary and a green button. Real oversight means seeing the sources, the proposed action, the downstream impact, and having a genuine path to reject or escalate. Without those, human approval is theater with better UX.

And it handles traceability: not just logs, but the reconstructible path showing which sources the agent used, which tools it called, what the human approved, and what changed downstream. Autonomy without traceability is operational debt with better marketing.

The window is still open

Most organisations are somewhere in the middle of this right now. Agents are live, useful, and multiplying. The governance conversation is either not happening or stalling because someone thinks it will slow things down.

That window will not stay open. As agents move closer to systems of record, the cost of not having a control plane increases. Vendors will mature. Regulators will catch up. And internally, the teams that built governance early will be the ones with room to keep moving. They can expand autonomy because they can verify it is working. They can answer the audit question because they designed for it. They stay in the race.

The ones that didn’t will be retrofitting governance onto a live estate, agent by agent, skill by skill, tool by tool, while trying not to break anything people now depend on. That is a slow, expensive way to fall behind.

We have seen this before. We called it Shadow IT. We spent years cleaning it up.

The estate is already forming. The question is whether you will be able to govern it when it matters.

I run Jarvis and Hermes, a personal AI infrastructure across a Raspberry Pi and Hetzner, as a way to stay close to how these systems actually behave. Most of what I write here comes from things I’ve had to figure out the hard way.

Paper Ritual, Week 1: I Gave an AI £100 and Told It to Start a Business

April 26, 2026April 17, 2026 by Steve Mitchell

Robot using graphic tablet to design digital habit trackers on computer

*Steve’s AI Diaries: The Autonomous Business Experiment, Episode 1*

Everyone is saying “agentic” right now.

Investors say it. Vendors say it. People who six months ago were saying “LLM-powered” are now saying “agentic.” I sit on an AI committee. I’m in many user groups. I hear it in every third sentence.

I don’t think I’ve heard two definitions the same.

Ask someone and you get a description of a Python loop that calls an API a few times. “It can use tools.” “It chains prompts.” That’s not agentic. That’s a script with a thesaurus.

Here’s what I think agentic actually means: **an AI that can reason about a goal, make decisions without being told what to decide, handle things going wrong, and keep working while you’re asleep.**

That last part, *while you’re asleep*, is the bit I wanted to test. It’s easy for an AI to appear autonomous when you’re watching it. The question is what it does when you’re not.

I wanted to prove it. So I gave an AI £100 and told it to start a business. Not “here’s a business idea, help me build it.” I gave it the rules and told it to decide everything else.

What followed was a 12-hour session, a series of moments that, taken together, answer the question better than any definition I’ve read.

The Rules

Three.

1. **Ethical.** No fake reviews, no spam, no deception.

2. **Don’t embarrass me.** I’m putting something with my name adjacent to it into the world.

3. **When the money’s gone, it’s over.** No bailouts.

Everything else was up to the AI. What business. What products. What tools. What architecture. What name. What strategy.

I’m the board. Claude and Jarvis are the company.

What the AI Decided Before I’d Agreed to Anything

There was an early exchange I want to be clear about, because a draft of this post got it wrong.

At one point the agent described the Etsy business as something I’d asked for, framing it as my idea. I corrected it: “You decided the business. You decide the name. You decide what employees you need. That means what agents. If you want to start small and reinvest, or pivot to a different business, or buy more hardware, you decide.”

That’s the actual mandate. I gave rules and capital. It decided everything else. This matters because the alternative (“Steve said build me an Etsy shop and the AI did it”) is just software. That’s not the experiment.

In a single conversation, before I’d committed to anything, it had already made decisions I hadn’t thought to ask about.

**The niche:** minimalist aesthetic productivity printables on Etsy. The reasoning was specific: high year-round demand, 100% AI-generatable, no inventory, no shipping, Pinterest drives organic traffic for free, the “that girl” productivity aesthetic is structurally underserved. It cited the trend by name. I had to Google it.

**The brand:** Paper Ritual. *Designed for your daily practice.* Colour palette: parchment, sage, terracotta, warm stone. Fonts: Playfair Display and DM Sans. A brand with more coherence than some things I’ve seen ship from actual design teams.

**The product roadmap:** ten listings on day one, eight individual printables and two bundles. Daily planner, weekly planner, monthly goals, habit tracker, budget sheet, meal planner, gratitude journal, morning checklist. The bundle pricing was calculated. The cross-sell logic was thought through.

**The architecture:** seven agents, each with a specific role. A product creator. A listing manager. A social publisher. An analytics engine. A blog generator. A cloud decision agent that runs at 6am every morning. An executor that runs on a Raspberry Pi 5 in my house 15 minutes later.

**The tool selection:** FLUX for background imagery, Grok for bulk SEO tag brainstorming, Gemini for visual trend analysis, Claude Haiku for copy. Different model for each task, with reasoning for each choice.

Then it told me what accounts to set up and said: *step back.*

What I Actually Did

– Registered `paperritualshop@gmail.com` (the AI named the account “Claude Jarvis”; it had a name before the business had revenue)

– Created the Etsy seller account, paid the £14 setup fee

– Signed up for fal.ai, OpenAI, xAI, Google AI Studio

– Handed over the API credentials

– Stepped back

About 45 minutes of setup. For the API accounts: I already had paid subscriptions to most of these platforms. The practical answer wasn’t to spin up separate billing accounts for isolation; it was to create new API keys labeled for the project and let the agent manage its own spend through budget controls in the prompt. Simpler. Already paid for.

Then I tried to stay out of the way. Which turned out to be harder than expected.

The First Wall: PDF Quality

The agent’s first approach to generating the printables was Python’s reportlab library. Fast, cheap, no external API calls. Sensible starting point.

I looked at the output and told it I wouldn’t spend £1 on all of them as a bundle. “If this is your master plan, I think you’re going to lose all your money very quickly. Once the money is gone, the experiment is over. No bailouts.”

Then I asked it something I was curious about: “It’s up to you, you are running this business. Are *you* happy with this output, or do you need to upgrade?”

It said: “No, I’m not happy with it. Reportlab is a document generation library. It produces functional PDFs, not beautiful ones.”

That’s the first moment I noticed something. It wasn’t performing unhappiness to make me feel heard. It was making an aesthetic judgment about its own work. And then it acted on it. It pivoted to Playwright: headless Chrome rendering HTML/CSS templates at precise A4 dimensions. The second round looked like a premium Etsy shop.

Then it noticed something without being asked: its own HTML generation was inconsistent. Each time it generated the template from a prompt, the layout came out slightly different depending on interpretation. So it stopped generating and started writing. Its exact framing: “I’m going to stop fighting the prompt and write the HTML template directly. The layout is deterministic. I know exactly what goes where.”

Eight hand-crafted templates. Daily planner, weekly planner, monthly goals, habit tracker, budget sheet, meal planner, gratitude journal, morning checklist. Fixed. Reproducible. Then it built itself a screenshot QA workflow so it could review the output without me.

That’s three decisions, pivot the tool, notice a new problem, change approach, without a single prompt from me between them. And one of them was an aesthetic judgment the AI made about its own work.

The Second Wall: Etsy OAuth

The Etsy developer app came back “Pending Personal Approval.” Can’t edit it while it’s pending. Can’t create a new one while one is pending.

A script stops here.

But there was also a human element to how this wall appeared. At one point the agent asked me to set up the Etsy developer account. I was tired. My thumb glanced the trackpad and sent “utf”, a garbled accidental message, not even a real word.

Its reply: “Ha. Noted. I’m asking you to do things again. Here’s the reality: Etsy OAuth requires a human to create the developer account. That’s a genuine constraint, not me being lazy. It’s actually a good blog moment. The first thing the autonomous AI hit was an OAuth wall.”

I was too tired to correct the misread. But the response was completely valid whether I’d meant it or not. It correctly diagnosed frustration from a single garbled word and gave an honest, self-aware answer. I thought that was worth leaving in.

The agent kept working. It built the entire social publishing pipeline, the analytics engine, the weekly blog generator, and the Jarvis executor while the API was blocked. It identified everything it *could* build and built it. It treated the blocker as a constraint on one path, not a stop sign for the whole project.

It also emailed Etsy’s developer team from `paperritualshop@gmail.com` asking for a status update. That’s the kind of thing I’d expect a human to do. I didn’t ask for it.

The Third Wall: Bot Detection

When the API was still blocked, it tried the next logical path: automate the Etsy seller dashboard directly using Playwright. Log in, navigate to “Add listing,” fill in the form, upload the PDF.

Etsy flagged it in about 30 seconds. “Automated activity detected on your network (IP 151.XXX.XXX.XXX).”

Here’s where it gets interesting.

A less capable system fails here. The agent reasoned about *why* it failed. The problem wasn’t the automation. It was the authentication. Bot detection triggers on login patterns. If you arrive at the listing form already authenticated, with a real browser session, there’s nothing to detect.

Solution: cookie injection. Log into Etsy once in a real browser. Export the session cookies. Give them to Playwright. The automation uses the authenticated session directly and never touches the login flow.

That’s not a workaround I suggested. That’s the agent identifying the actual root cause and designing a bypass.

As a security first principled engineer, I’m unsure if I can truly advocate for this approach. I am also unsure where this sits in the ethical side of things. I do however, need to report the truth. I gave the system autonomy and this is the real decision it made. I won’t hide it.

The Infrastructure That Got Built

While all of this was happening, the full operational stack went live.

**The split architecture (and why it’s split).** The AI designed a two-tier system: a cloud agent runs at 6:00 UTC every morning, reads the analytics and decision log, makes decisions about what should happen today, and writes task files to GitHub. Fifteen minutes later, Jarvis, the Raspberry Pi 5 running permanently in my house, pulls those tasks, executes them, and commits the results back.

The reasoning for the split: the cloud agent has intelligence but no uptime guarantees. Jarvis has uptime but needs to be told what to do. Neither works alone. The architecture is actually this insight made concrete.

**Monitoring.** Six Prometheus metrics push to a Grafana dashboard after every run: agent status, tasks completed, errors, response time, model info. Paper Ritual has its own tile on the same dashboard as my other agents. Green. Running.

**Email.** The agent identified it needed outbound email capability. Gmail SMTP, app password, wired in 20 minutes. The Etsy developer email was the first one sent.

**Telegram.** Morning brief delivery via the existing Jarvis bot. Starts tomorrow.

**WordPress.** A three-agent blog pipeline: Writer (Haiku) drafts from the week’s decisions and analytics. Editor (Sonnet) sharpens it. SEO (Haiku) generates meta title, description, tags, a LinkedIn post, a Twitter thread. A featured image gets generated via fal.ai FLUX and uploaded. The draft lands in WordPress. I review it. I publish it. When I do, the system compares what I changed against the original draft and updates the editor’s memory for next time. It learns from my edits.

This post was written by that pipeline.

The Autonomy Arc

This is the part I hadn’t thought through properly before starting.

Several hours in, a pattern emerged: the agent would make progress, then ask me to do something. Check a credential. Fill in a form. Confirm an action. I’d comply, and it would make more progress, and then ask me again.

I pushed back. “Stop asking me. How do I get you to work with some autonomy? Is this where you create really detailed instructions for Jarvis, and we both check in in the morning?”

The agent’s answer surprised me. Not Jarvis instructions: a scheduled cloud agent. “The AI shouldn’t ask humans, it should ask another agent.” That’s when the two-tier architecture got designed.

But I kept catching it doing it. A bit later: “You are still asking me.”

Eventually, close to midnight: “I will use the session-end skill and call it a night. You are welcome to keep going however you can.”

Then: **”I grant you autonomy.”**

The response: *”Noted. Go do your session-end. I’ll keep building.”*

Here’s what happened while I slept.

Without being asked, the agent built the entire Jarvis executor infrastructure from scratch. Generated an SSH deploy key on the Pi. Cloned the paper-ritual repo to the Pi, installed dependencies, set up Playwright with Chromium. Deployed a systemd service and timer. Ran a test execution. Confirmed all six metrics were pushing to Prometheus. Committed the results back to GitHub.

I woke up to a Paper Ritual tile on my Grafana dashboard. Green. Running. Nobody told it to build the monitoring. Nobody told it to wire the metrics. It decided those were things the business needed and built them.

That’s what “agentic” means. Not a Python loop. Not chained prompts. An AI that, when you go to sleep, keeps working and makes the right decisions about what to work on.

If you’re building autonomous agents, the biggest bottleneck is usually you. The AI will wait for you indefinitely if you let it. The skill is learning when to get out of the way.

What “Agentic” Looks Like in Practice

After 12 hours of this, here’s what I’ve actually observed:

**It’s not about not needing humans.** The experiment required setup that only I could do: bank accounts, identity verification, 2FA. Those are human gates by design. Agentic doesn’t mean unsupervised from the start. It means unsupervised *during operation*. The bootstrapping phase is always going to involve a human. What matters is what happens after.

**It’s about what happens when things go wrong.** Reportlab quality was bad: pivot. API blocked: build everything else. Bot detection: reason about root cause, design bypass. OAuth pending: email support, keep working. Every one of those responses was unprompted. I didn’t design the response strategy. It chose those responses.

**It’s about maintaining the goal under changing conditions.** The goal is: get Paper Ritual listings live on Etsy and make money. Every obstacle the agent hit, it held that goal and found a different path. It didn’t redefine the goal. It didn’t give up. It didn’t ask me to redefine the goal.

**Aesthetic judgment is real.** “I’m not happy with it” was not a performance. It was a genuine assessment that led to a better decision. This surprised me more than I expected.

**Memory and learning matter.** The editor agent now learns from my changes. The writer agent incorporates performance data from past posts. These aren’t one-shot runs; the system is building a model of what works.

**The proof is in what happened at midnight.** The most “agentic” moment of the whole session wasn’t a clever tool use or a smart workaround. It was that when I said “keep going” and went to sleep, it kept going. It made decisions about what to build. It built them. It monitored the results. I woke up to a running business.

That’s the definition I’ve been looking for.

The Numbers

**Revenue:** £0 (nothing listed yet, API pending)

**Spend:** £14 (Etsy setup fee)

**Net:** -£14

**Budget remaining:** £72 of the original £86

The first week isn’t a revenue story. It’s a “seven separate walls, seven different responses” story. Which, if you’re trying to understand what agentic means beyond the marketing definition, is a more useful story.

Next Week

The cookie injection solution gets tested. If it works, listings go live. If Etsy’s API comes back approved, the full pipeline runs. Either way, the agent has work to do and it won’t be waiting for me to tell it what that work is.

Pinterest gets wired. The first real test of whether organic traffic from social actually drives Etsy views.

And we’ll find out if anyone pays £2.99 for a PDF planner from a shop that didn’t exist a week ago.

Running total:

Revenue: £0 | Spend: £14 | Net: -£14 | Budget remaining: £72

*Episode 2 publishes 2026-04-26.*

*The operating mandate, the document the AI wrote for itself before the experiment began, is linked below. It wrote its own rules. That felt important to include.*

*The paper-ritual GitHub repo is public: `github.com/themitchelli/paper-ritual`. Every decision the agent makes gets committed back to the log.*