“What Did the Agent Do?” Is the New Incident Question
A developer asks an AI agent to fix a failing test.
The agent reads the error, opens a few files, edits some middleware, runs the test again, installs a package, changes a config value, and eventually gets everything green.
The pull request looks fine. The diff is not huge. The developer reviews it, sees the tests passing, and ships it.
Two days later, something strange shows up in production. A permission check is looser than it used to be. A customer can see data they should not see. Nobody meant to approve that.
The commit history shows the developer’s name.
But the developer did not really write the change in the old sense.
So the incident question changes.
Not just: “What changed?”
But: What did the agent do?
The commit is not the full story
A git diff tells you the final state.
That matters, but it does not tell you the path.
It does not show the prompt that started the work. It does not show which files the agent read, which commands it ran, which tool outputs shaped the change, whether it installed a package, whether it saw a warning, or whether a policy blocked something along the way.
For human-written code, we often tolerate that gap because the human can explain what happened.
With agents, the work is spread across prompts, tool calls, file edits, shell output, package installs, scanner results, and approvals.
That history matters.
Because when something breaks, you do not only need the artifact.
You need the session.
Agent sessions are becoming security evidence
We already treat users, devices, commits, builds, and deployments as things worth tracking.
Agent sessions belong on that list.
An agent session is not just chat history. It is a record of delegated work.
It should tell you:
What was the agent asked to do?
What context did it receive?
What files did it touch?
What commands did it run?
What packages did it install?
What policies were fired?
What did the human approve?
Without that timeline, incident response turns into guesswork.
And guesswork is slow.
Logs are not enough
Most teams already have logs.
But traditional logs are built around infrastructure events, not agent intent.
They may tell you that a command ran. They may tell you that a file changed. They may tell you that a process touched the network.
They usually do not tell you why.
That is the missing layer.
The agent installed this package because it was trying to fix this test.
It edited this file after reading this error.
It changed this auth check after inspecting this helper.
It was warned before writing unsafe code.
It corrected the issue before the developer reviewed the PR.
That is not just telemetry.
That is agent-aware telemetry.
Replay should help developers too
This cannot become a security dashboard that only gets opened after something goes wrong.
Agent replay should help the developer review the work faster.
Before approving a change, a developer should be able to see whether the agent stayed on task, touched unexpected files, ran risky commands, introduced a new dependency, or ignored a warning.
That is not bureaucracy.
That is context.
Developers already review diffs. Agent replay gives them the story around the diff.
Generation-time security needs generation-time memory
We have written before about scanning code while an agent is creating it, not after it is already committed. Batou was built around that idea: run security checks during Claude Code Write, Edit, and NotebookEdit operations, then feed useful guidance back into the loop.
Replay is the other half of that model.
If you can enforce policy while the agent works, you should also remember what happened while the agent worked.
The scan result matters.
The block matters.
The override matters.
The fix matters.
The second attempt matters.
That history shows whether the system worked.
And when something still goes wrong, it gives the team a place to start.
AI agents need flight recorders
Planes have flight recorders because complex systems fail in complex ways.
You do not add one because you expect every flight to go badly. You add one because when something does go wrong, the timeline matters.
AI software agents need the same idea.
Not because agents are bad.
Not because developers cannot be trusted.
Not because security needs another pile of logs.
Because we are adding a new kind of actor into the software factory.
And we need a reliable way to answer a simple question:
What did the agent do?
The teams that can answer that will move faster, not slower.
Having problems with software at speed? Turen can help. Sign up for a 14-day trial at https://turen.io or view the live demo at https://try.turen.io