A reflection on building the frameworks changing how a software team works.

I’m the VP of Product and Development at Third Wave. We build Windows Server-based applications tightly coupled to SAP Business One. Not the typical greenfield startup environment you see in most AI adoption case studies. Our clients run our software on the same servers as their B1 instances. QA engineers test by RDPing in. The stack is real, the constraints are real, and when I decided it was time to move this team from non-agentic to AI-native, I wanted to build something durable.

A week ago I traveled to India to run a three-day on-site workshop with our team to start the transition. This is a reflection on what we built, why we built it the way we did, and what I think it means for teams trying to make the same move.

Frequently Asked Questions (Quick Answers)

What does it mean for a software team to be AI-native?

AI-native is not a team using AI coding tools on the side. It’s a team where the agentic paradigm is built into how work gets scoped, designed, and verified. The shift is from “use AI to write code faster” to “design the agentic task correctly,” across both development and quality.

Is giving developers Claude Code or Copilot enough?

No. Tooling without direction produces individuals experimenting in silos. Becoming AI-native is a company direction decision, not a tools decision. It needs a deliberate framework and leadership behind it.

What are the two artifacts in the framework?

The Agent Architecture, owned by the developer, defines the agentic task structure: tool loop, scope boundaries, human checkpoints, failure modes. The Testing Brief, owned by the QA engineer, defines the verification specification: expected behaviors, edge cases, and the environmental context Claude Code needs to test the work.

Why put QA on an equal track with development?

Framing QA as downstream of development produces worse outcomes and sends the wrong message about role parity. In this model both roles own a first-class artifact, both contribute to the other’s, and neither waits for the other to start.

How does this connect to Simon Willison’s work?

Willison’s Agentic Engineering Patterns guide is the engineering foundation: how individual professionals work effectively with coding agents. The two-artifact framework adds the organizational layer on top, encoding those patterns into a shared team workflow.

The 30-Second Version

Moving a software team from non-agentic to AI-native is a leadership decision, not a tooling rollout. I ran a three-day workshop built on Simon Willison’s agentic engineering patterns, structured around two owned artifacts: an Agent Architecture owned by developers and a Testing Brief owned by QA engineers, running on parallel and equal tracks. The framework encodes the agentic paradigm into how the team scopes and verifies work, grounded in our real stack of Windows Server applications coupled to SAP Business One. Three days started the transition. Becoming truly AI-native is the work of the next year.

Starting From the Right Foundation

Before designing a single session, I spent serious time with Simon Willison’s Agentic Engineering Patterns guide. It’s a living, structured collection of coding practices for professional software engineers working with coding agents like Claude Code. Willison draws a clear line between “vibe coding” (coding where you pay no attention to the code at all) and agentic engineering (professional engineers using coding agents to amplify their existing expertise). That distinction became the backbone of the entire workshop.

Willison’s framing of what agentic engineering is, and particularly his chapter on how writing code is cheap now and what this means for how teams need to build new habits, directly shaped how I designed the developer track. His patterns around Red/green TDD and agentic manual testing informed both the developer and QA tracks.

What I wanted to build on top of this foundation was something Willison’s guide doesn’t address directly: what does agentic engineering look like as a team practice, across both development and quality, in an enterprise environment with real constraints? The workshop framework became its own thing there.

This Is a Leadership Decision

Let me say something plainly before I get into the how: the transition from non-agentic to AI-native software development is a company direction decision, not a tools decision.

Hand a team access to Claude Code, GitHub Copilot, and every AI-assisted IDE on the market. Nothing changes if the organization hasn’t made a deliberate choice about where it’s going. Tooling without direction produces individuals experimenting in silos. What produces a team building differently is someone with the authority and the vision to say: this is the paradigm we’re adopting, and here’s the framework we’re using to get there.

This is why the workshop wasn’t organized by a committee or driven bottom-up. It was designed and led from the top, with a clear point of view about what AI-native software development looks like for a team like ours, and what it needs to produce.

The Framework: Two Artifacts, Two Tracks, One Shared Paradigm

The central design question I wanted to answer was, what does AI-native software development look like as a team practice, not an individual habit?

The answer we built is a two-artifact framework. It runs across parallel, equal tracks. One for developers, one for QA engineers. Both converge into a shared workflow.

The Agent Architecture is owned by the developer. It defines the agentic task structure: the tool loop, the feature, the scope boundaries, the human checkpoints, the failure modes. This artifact builds directly on Willison’s patterns around how coding agents work, how to structure tool loops, and where human oversight belongs.

The Testing Brief is owned by the QA engineer. It defines the verification specification: the expected behaviors, the edge cases, the environmental context (Windows Server, SAP B1 coupling, RDP access patterns), and what Claude Code needs to know about the application to test it effectively. It a key question. How do we know this AI-assisted work is correct? This artifact extends Willison’s work on agentic manual testing into a structured, ownable document living in the team’s standard workflow.

These two artifacts are designed collaboratively. Developers inform the testing brief. QA engineers inform the architecture. Ownership is explicit and parallel. Neither is a handoff document from the other. Both are first-class artifacts. QA engineers execute against their testing brief themselves, running Claude Code directly on the server. The testing brief isn’t something developers implement on QA’s behalf. It’s something QA engineers own end-to-end.

This framework matters because it encodes the paradigm, not the workflow. When a developer starts scoping a new agentic task, the first question is now: what’s the testing brief for this? The question didn’t exist in a structured way before. Now it does.

What the Workshop Was Built to Do

The workshop ran across three days with developers and QA engineers on parallel, equal tracks. Not one track for the “real” AI work with QA participating on the margins. Two tracks of equivalent standing, running simultaneously, converging on the shared artifact framework.

The developer track was grounded in Willison’s agentic engineering patterns: how to structure tool loops, how to scope AI tasks properly, where human oversight belongs, and how to think about what you’re building rather than prompting for output. His framing of agentic engineering as the amplification of existing expertise (not a replacement for it) was central to how we positioned the work for developers.

The QA track covered AI-native testing practices: how to translate deep application knowledge into testing briefs producing useful results, how to run Claude Code in the environments our QA engineers already operate in, and how to own the process independently.

Both tracks culminated in the same place: each dev/QA pair producing their two artifacts together, presenting them on Day 3, and subjecting them to group critique.

The parallel structure was intentional and non-negotiable. In most AI adoption conversations, QA is framed as downstream. Developers build AI-assisted tooling. QA verifies the output. The framing produces worse outcomes and sends an organizational message about role parity I don’t want to send. The two-track model says something different: both kinds of expertise matter, both have a primary artifact to own, and neither is waiting for the other to finish before they start.

Building for the Actual Environment

One thing I insisted on throughout the design process: everything had to be grounded in our actual stack, not a generic demo environment.

We didn’t workshop on a sample to-do app. We built demonstration applications reflecting the kinds of systems our team builds. Windows Server deployment, SAP B1 coupling, RDP-based QA workflows. The agentic patterns we explored were relevant to Windows desktop development as well as web apps. The testing brief examples came from the kinds of testing our QA engineers do.

This matters for transfer. Willison makes a related point in his work: coding agents perform well even in private codebases and constrained environments. The agent consults existing patterns and iterates from there. The same principle applies to workshop design. The closer the learning environment is to the real environment, the faster the mental models transfer.

What Changed

The workshop was three days. What it changed will take months to fully measur but the early signals are clear.

Developers are thinking differently about how they scope work. The shift is from “how do I use AI to write this code faster” to “how do I design this agentic task correctly.” This is a meaningful change in the level at which they engage with the technology. It’s the shift Willison describes when he talks about building new habits in the era when code is cheap.

QA engineers have a new kind of ownership. When you give someone a framework where their expertise is the primary input, where they’re not waiting for a developer to interpret their test cases into automation, but authoring the specification themselves and executing it themselves, something opens up. It’s a different relationship to the work.

And the team has a shared paradigm. The vocabulary exists now. The artifact pattern exists. When we talk about new work, we talk about it in terms of agent architecture and testing brief. The common language is the foundation everything else gets built on.

What This Means for Other Organizations

I’m sharing this as a framework claim, not a feel-good retrospective: the agent architecture plus testing brief pairing is a standard worth establishing for AI-native software teams.

Willison’s guide provides the engineering foundation. The patterns for how individual professionals work effectively with coding agents. What the two-artifact framework adds is the organizational layer: how a team encodes those patterns into shared workflow, splits ownership explicitly across roles, and makes the agentic paradigm a structural property of how work gets done rather than a personal habit of how individuals operate.

Most AI adoption frameworks focus on individual productivity. The value is real. It doesn’t change how a team builds software. The two-artifact framework changes the team structure. It creates a shared artifact sitting at the intersection of development and quality, requires both roles to contribute to it, and encodes the agentic paradigm into the team’s standard workflow. It’s not a productivity tool. It’s an organizational design choice.

If you’re a senior leader thinking about what AI-native software development looks like for your organization, not for your individual contributors but as a team practice, start with Willison’s guide as your engineering foundation. Then ask the harder organizational question: what’s the shared artifact making this a team practice rather than a collection of individual habits?

This Is the Beginning

Three days started something. They didn’t finish it.

The artifact pairing is now part of how we approach new work. The vocabulary is in place. The team has demonstrated they operate in this paradigm. Becoming truly AI-native, in the way changing how we build, scope, and verify software, is the work of the next year, not the last three days.

What the workshop did was make the work possible. We have a shared starting point. We have frameworks encoding the direction. We have two tracks of people who understand this transition belongs to all of them, not one side of the team.

That’s what I was building toward. We’re at the beginning of it now.

Tyrel, VP of Product and Development, Third Wave

Foundational reading: Simon Willison’s Agentic Engineering Patterns guide at simonwillison.net

From Non-Agentic to AI-Native: How We Started the Transition at Third Wave