📊 Full opportunity report: The Model Is Only 10%: The Real Lesson of the New SDLC on ThorstenMeyerAI.com — validation score, market gap, and execution plan.

TL;DR

The latest SDLC framework reveals that AI models constitute only 10% of system behavior. The emphasis shifts to harness and context engineering, impacting cost and strategy.

A new whitepaper from Google researchers asserts that AI models account for only 10% of system behavior in AI-driven software development. The core insight is that the harness and context engineering around the model are far more influential in determining outcomes, shifting strategic focus away from model improvements alone. This development matters because it redefines how organizations should allocate resources and manage AI systems.

The whitepaper, authored by Addy Osmani, Shubham Saboo, and Sokratis Kartakis, emphasizes that the biggest shift in software engineering is moving from coding to expressing intent and trusting machines to interpret that intent. According to their findings, 85% of professional developers use AI coding agents regularly, with 51% using them daily, and approximately 41% of new code being AI-generated as of early 2026.

The authors challenge the common perception that model advancement alone will improve AI systems. They highlight that model performance is only a small part—roughly 10%—of what influences output. Instead, the harness (prompts, tools, rules, observability) and context engineering (instructions, knowledge, memory, examples, guardrails) are where most of the control resides.

Concrete evidence from benchmarks like Terminal Bench 2.0 shows that changing only the harness or prompts, while keeping the same model, can significantly improve performance, often more than upgrading the model itself. This underscores the importance of configuration and setup over raw model capabilities.

At a glance

reportWhen: published March 2026

The developmentA new whitepaper by Google experts highlights that AI models are only a small part of software systems, with the majority of control lying in harness and context design.

The Model Is Only 10% — The New SDLC With Vibe Coding

AI Dispatch · Field Notes

Google · Osmani, Saboo & Kartakis · May 2026

The model is only 10%

Q: Why is the model only 10% of the system's behavior?

The whitepaper shows that the harness and context—including prompts, rules, tools, and observability—are responsible for the majority of the AI's output, making the model just a small component.

Q: How does this change AI development strategies?

It shifts the focus from solely improving models to designing better harnesses and context frameworks, which can improve performance, security, and cost-efficiency.

A Google whitepaper argues software’s biggest shift is from writing code to expressing intent. Its sharpest claim: the model you obsess over is the smallest part of the system — the scaffolding around it does the real work.

A spectrum, not a binary — the differentiator is how outputs get verified

Vibe Coding

Casual prompts · “does it seem to work?” · disposable code · high risk

Structured AI-Assisted

Detailed prompts + constraints · manual testing · features in real codebases

Agentic Engineering

Formal specs · automated tests + evals + CI gates · production scale · low risk

Tests verify the deterministic; evals verify the rest. Without both, it’s vibe coding — however clever the prompt.

The idea worth building your strategy around

Agent = Model + Harness

~10%

HARNESS — prompts · tools · context · hooks · sandboxes · observability

MODEL~90% IS YOUR SURFACE AREA, NOT THE PROVIDER’S

Outside Top 30 → Top 5 on Terminal Bench 2.0 by changing only the harness — same model.

“Most agent failures, examined honestly, are configuration failures” — a missing tool, a vague rule, a noisy context.

The economics: it’s a token-cost problem (CapEx vs OpEx)

Vibe Coding

Low CapEx · High OpEx

Looks free, hides debt: token burn (fix-it loops), maintenance tax (AI spaghetti), security remediation. Crosses over to 3–10× more per feature.

Agentic Engineering

High CapEx · Low OpEx

Pay upfront (specs, evals, context), then ship cheaply. Levers: context engineering for first-pass success + intelligent model routing — cheap models for the easy work.

85%

of devs use AI coding agents (51% daily)

41%

of all new code is AI-generated

~90%

of agent behavior is the harness, not the model

+19%

longer on some tasks (METR) — verification is the cost

The read

The clearest map yet of how serious AI development works — and mostly tool-agnostic. But it’s a Google funnel: the concepts are neutral, the on-ramps point to Gemini, Jules & the ADK. If the harness is 90% and it’s yours, your moat and your costs both live there — so own your scaffolding, route across models, and remember: AI amplifies whatever engineering culture it lands in.

Source: Osmani, Saboo & Kartakis, “The New SDLC With Vibe Coding,” Google (May 2026). Figures are the paper’s own, incl. METR & LangChain. Analysis is the author’s.

thorstenmeyerai.com

Implications for AI System Design and Cost Management

This shift has profound implications for how organizations approach AI development. By understanding that most behavior is shaped by harness and context, companies can focus on optimizing these elements for better performance and security. It also suggests that the cost of AI systems is more dependent on how they are configured and maintained than on the choice of model, which can lead to more cost-effective strategies in the long run.

Furthermore, this perspective encourages a move toward agentic engineering, where rigorous verification, structured context, and modular design replace reliance on model improvements alone. This approach can reduce token costs, improve security, and enhance system reliability, especially as AI becomes more embedded in critical applications.

The AI Prompt Playbook: Master AI Prompt Engineering with 140 Ready-to-Use Templates for ChatGPT, Claude, Gemini & Copilot

As an affiliate, we earn on qualifying purchases.

Background and Evolution of AI Development Strategies

Prior to this development, the industry largely focused on improving model architectures and training data to enhance AI performance. The rise of AI coding agents, with over 85% adoption among developers, accelerated this trend. However, as early as February 2025, experts like Andrej Karpathy highlighted the importance of giving AI systems structured workflows rather than just vibes-based prompts.

The whitepaper builds on this understanding, explicitly framing the model as a small component within a larger system. Benchmarks and experiments demonstrate that configuration and scaffolding—collectively called the harness—are often the most impactful factors in AI behavior, challenging the previous focus solely on model upgrades.

“The model you’re paying so much attention to is the smallest part of the system. Verification, judgment, and direction are the new craft.”
— Addy Osmani

Observability in the AI-Native Era: Leveraging AIOps to build, observe, and operate resilient systems

As an affiliate, we earn on qualifying purchases.

Unclear Aspects of Implementation and Industry Adoption

While the whitepaper provides compelling evidence that harness and context are dominant, it remains unclear how quickly organizations will adopt these insights at scale. The long-term impact on AI development costs, security, and performance metrics is still being evaluated, and some industry leaders may resist shifting focus from model improvements to configuration and system design.

MUCAR 892BT AI-Assisted Bidirectional Scan Tool, Full System OBD2 Scanner, Bi-Directional OBD2 Scanner Diagnostic Tool,ECU Coding, 35 Services, FCA Autoauth, CANFD and DOIP, Free Lifetime Upgrade

【Powerful Performance】: OBD2 scanner, featuring an 8-inch ultra-large display, the MUCAR 892BT runs on Android 10 with a…

As an affiliate, we earn on qualifying purchases.

Next Steps in AI System Optimization and Industry Shift

Organizations are expected to reevaluate their AI strategies, investing more in harness development, context engineering, and verification processes. Future research and industry practices will likely focus on creating standardized frameworks for harness design, testing, and security. Additionally, benchmarks and case studies will emerge to quantify the benefits of this approach, guiding best practices.

AI-Powered Software Testing: Volume 1: Foundational Patterns and Principles for Architects and Technical Leads

As an affiliate, we earn on qualifying purchases.

Key Questions

Why is the model only 10% of the system’s behavior?

The whitepaper shows that the harness and context—including prompts, rules, tools, and observability—are responsible for the majority of the AI’s output, making the model just a small component.

How does this change AI development strategies?

It shifts the focus from solely improving models to designing better harnesses and context frameworks, which can improve performance, security, and cost-efficiency.

What are the economic implications of this insight?

Cost savings can be achieved by investing in configuration, testing, and system architecture, reducing token waste and maintenance expenses compared to model upgrades alone.

Will this approach reduce AI security vulnerabilities?

Yes, structured harnesses and verification processes help identify and mitigate vulnerabilities, making AI systems more robust and secure.

When will industry-wide adoption of these insights occur?

While some organizations are already shifting focus, widespread adoption is likely over the next 12-24 months as best practices and standards develop.

Source: ThorstenMeyerAI.com

The Model Is Only 10%: The Real Lesson of the New SDLC

Up next

Cutrova: Edit the Words, Not the Timeline

Author

Kwatsjpedia Team

Share article

The model is only 10%

Implications for AI System Design and Cost Management

The AI Prompt Playbook: Master AI Prompt Engineering with 140 Ready-to-Use Templates for ChatGPT, Claude, Gemini & Copilot

Background and Evolution of AI Development Strategies

Observability in the AI-Native Era: Leveraging AIOps to build, observe, and operate resilient systems

Unclear Aspects of Implementation and Industry Adoption

MUCAR 892BT AI-Assisted Bidirectional Scan Tool, Full System OBD2 Scanner, Bi-Directional OBD2 Scanner Diagnostic Tool,ECU Coding, 35 Services, FCA Autoauth, CANFD and DOIP, Free Lifetime Upgrade

Next Steps in AI System Optimization and Industry Shift

AI-Powered Software Testing: Volume 1: Foundational Patterns and Principles for Architects and Technical Leads

Key Questions

Why is the model only 10% of the system’s behavior?

How does this change AI development strategies?

What are the economic implications of this insight?

Will this approach reduce AI security vulnerabilities?

When will industry-wide adoption of these insights occur?

ShinyHunters · The New APT Model.

Women in Leadership: Are Gender Gaps Finally Closing?

The deployment. How the AI labs verticallyintegrated into the serviceslayer — the Palantir modelat scale.

Struggling With Today’s Connections Puzzle? Here’S the Hint You Need!

What a Beginner Photography Kit Really Needs

AI compliance brief generator for small clinics

Europe Regulated the Interface and Forgot to Build the Engine

Cutrova: Edit the Words, Not the Timeline

The Model Is Only 10%: The Real Lesson of the New SDLC

Up next

Author

Kwatsjpedia Team

Share article

The model is only 10%

Implications for AI System Design and Cost Management

The AI Prompt Playbook: Master AI Prompt Engineering with 140 Ready-to-Use Templates for ChatGPT, Claude, Gemini & Copilot

Background and Evolution of AI Development Strategies

Observability in the AI-Native Era: Leveraging AIOps to build, observe, and operate resilient systems

Unclear Aspects of Implementation and Industry Adoption

MUCAR 892BT AI-Assisted Bidirectional Scan Tool, Full System OBD2 Scanner, Bi-Directional OBD2 Scanner Diagnostic Tool,ECU Coding, 35 Services, FCA Autoauth, CANFD and DOIP, Free Lifetime Upgrade

Next Steps in AI System Optimization and Industry Shift

AI-Powered Software Testing: Volume 1: Foundational Patterns and Principles for Architects and Technical Leads

Key Questions

Why is the model only 10% of the system’s behavior?

How does this change AI development strategies?

What are the economic implications of this insight?

Will this approach reduce AI security vulnerabilities?

When will industry-wide adoption of these insights occur?

You May Also Like