📊 Full opportunity report: The Model Is Only 10%: The Real Lesson of the New SDLC on ThorstenMeyerAI.com — validation score, market gap, and execution plan.

TL;DR

The latest SDLC framework reveals that AI models constitute only 10% of system behavior. The emphasis shifts to harness and context engineering, impacting cost and strategy.

A new whitepaper from Google researchers asserts that AI models account for only 10% of system behavior in AI-driven software development. The core insight is that the harness and context engineering around the model are far more influential in determining outcomes, shifting strategic focus away from model improvements alone. This development matters because it redefines how organizations should allocate resources and manage AI systems.

The whitepaper, authored by Addy Osmani, Shubham Saboo, and Sokratis Kartakis, emphasizes that the biggest shift in software engineering is moving from coding to expressing intent and trusting machines to interpret that intent. According to their findings, 85% of professional developers use AI coding agents regularly, with 51% using them daily, and approximately 41% of new code being AI-generated as of early 2026.

The authors challenge the common perception that model advancement alone will improve AI systems. They highlight that model performance is only a small part—roughly 10%—of what influences output. Instead, the harness (prompts, tools, rules, observability) and context engineering (instructions, knowledge, memory, examples, guardrails) are where most of the control resides.

Concrete evidence from benchmarks like Terminal Bench 2.0 shows that changing only the harness or prompts, while keeping the same model, can significantly improve performance, often more than upgrading the model itself. This underscores the importance of configuration and setup over raw model capabilities.

At a glance
reportWhen: published March 2026
The developmentA new whitepaper by Google experts highlights that AI models are only a small part of software systems, with the majority of control lying in harness and context design.
The Model Is Only 10% — The New SDLC With Vibe Coding
AI Dispatch · Field Notes
Google · Osmani, Saboo & Kartakis · May 2026

The model is only 10%

A Google whitepaper argues software’s biggest shift is from writing code to expressing intent. Its sharpest claim: the model you obsess over is the smallest part of the system — the scaffolding around it does the real work.

A spectrum, not a binary — the differentiator is how outputs get verified
Vibe Coding
Casual prompts · “does it seem to work?” · disposable code · high risk
Structured AI-Assisted
Detailed prompts + constraints · manual testing · features in real codebases
Agentic Engineering
Formal specs · automated tests + evals + CI gates · production scale · low risk
Tests verify the deterministic; evals verify the rest. Without both, it’s vibe coding — however clever the prompt.
The idea worth building your strategy around
Agent = Model + Harness
~10%
HARNESS — prompts · tools · context · hooks · sandboxes · observability
MODEL~90% IS YOUR SURFACE AREA, NOT THE PROVIDER’S
Outside Top 30 → Top 5 on Terminal Bench 2.0 by changing only the harness — same model.
“Most agent failures, examined honestly, are configuration failures” — a missing tool, a vague rule, a noisy context.
The economics: it’s a token-cost problem (CapEx vs OpEx)
Vibe Coding
Low CapEx · High OpEx
Looks free, hides debt: token burn (fix-it loops), maintenance tax (AI spaghetti), security remediation. Crosses over to 3–10× more per feature.
Agentic Engineering
High CapEx · Low OpEx
Pay upfront (specs, evals, context), then ship cheaply. Levers: context engineering for first-pass success + intelligent model routing — cheap models for the easy work.
85%
of devs use AI coding agents (51% daily)
41%
of all new code is AI-generated
~90%
of agent behavior is the harness, not the model
+19%
longer on some tasks (METR) — verification is the cost
The read

The clearest map yet of how serious AI development works — and mostly tool-agnostic. But it’s a Google funnel: the concepts are neutral, the on-ramps point to Gemini, Jules & the ADK. If the harness is 90% and it’s yours, your moat and your costs both live there — so own your scaffolding, route across models, and remember: AI amplifies whatever engineering culture it lands in.

Source: Osmani, Saboo & Kartakis, “The New SDLC With Vibe Coding,” Google (May 2026). Figures are the paper’s own, incl. METR & LangChain. Analysis is the author’s.
thorstenmeyerai.com

Implications for AI System Design and Cost Management

This shift has profound implications for how organizations approach AI development. By understanding that most behavior is shaped by harness and context, companies can focus on optimizing these elements for better performance and security. It also suggests that the cost of AI systems is more dependent on how they are configured and maintained than on the choice of model, which can lead to more cost-effective strategies in the long run.

Furthermore, this perspective encourages a move toward agentic engineering, where rigorous verification, structured context, and modular design replace reliance on model improvements alone. This approach can reduce token costs, improve security, and enhance system reliability, especially as AI becomes more embedded in critical applications.

The AI Prompt Playbook: Master AI Prompt Engineering with 140 Ready-to-Use Templates for ChatGPT, Claude, Gemini & Copilot

The AI Prompt Playbook: Master AI Prompt Engineering with 140 Ready-to-Use Templates for ChatGPT, Claude, Gemini & Copilot

As an affiliate, we earn on qualifying purchases.

As an affiliate, we earn on qualifying purchases.

Background and Evolution of AI Development Strategies

Prior to this development, the industry largely focused on improving model architectures and training data to enhance AI performance. The rise of AI coding agents, with over 85% adoption among developers, accelerated this trend. However, as early as February 2025, experts like Andrej Karpathy highlighted the importance of giving AI systems structured workflows rather than just vibes-based prompts.

The whitepaper builds on this understanding, explicitly framing the model as a small component within a larger system. Benchmarks and experiments demonstrate that configuration and scaffolding—collectively called the harness—are often the most impactful factors in AI behavior, challenging the previous focus solely on model upgrades.

“The model you’re paying so much attention to is the smallest part of the system. Verification, judgment, and direction are the new craft.”

— Addy Osmani

Observability in the AI-Native Era: Leveraging AIOps to build, observe, and operate resilient systems

Observability in the AI-Native Era: Leveraging AIOps to build, observe, and operate resilient systems

As an affiliate, we earn on qualifying purchases.

As an affiliate, we earn on qualifying purchases.

Unclear Aspects of Implementation and Industry Adoption

While the whitepaper provides compelling evidence that harness and context are dominant, it remains unclear how quickly organizations will adopt these insights at scale. The long-term impact on AI development costs, security, and performance metrics is still being evaluated, and some industry leaders may resist shifting focus from model improvements to configuration and system design.

MUCAR 892BT AI-Assisted Bidirectional Scan Tool, Full System OBD2 Scanner, Bi-Directional OBD2 Scanner Diagnostic Tool,ECU Coding, 35 Services, FCA Autoauth, CANFD and DOIP, Free Lifetime Upgrade

MUCAR 892BT AI-Assisted Bidirectional Scan Tool, Full System OBD2 Scanner, Bi-Directional OBD2 Scanner Diagnostic Tool,ECU Coding, 35 Services, FCA Autoauth, CANFD and DOIP, Free Lifetime Upgrade

【Powerful Performance】: OBD2 scanner, featuring an 8-inch ultra-large display, the MUCAR 892BT runs on Android 10 with a…

As an affiliate, we earn on qualifying purchases.

As an affiliate, we earn on qualifying purchases.

Next Steps in AI System Optimization and Industry Shift

Organizations are expected to reevaluate their AI strategies, investing more in harness development, context engineering, and verification processes. Future research and industry practices will likely focus on creating standardized frameworks for harness design, testing, and security. Additionally, benchmarks and case studies will emerge to quantify the benefits of this approach, guiding best practices.

AI-Powered Software Testing: Volume 1: Foundational Patterns and Principles for Architects and Technical Leads

AI-Powered Software Testing: Volume 1: Foundational Patterns and Principles for Architects and Technical Leads

As an affiliate, we earn on qualifying purchases.

As an affiliate, we earn on qualifying purchases.

Key Questions

Why is the model only 10% of the system’s behavior?

The whitepaper shows that the harness and context—including prompts, rules, tools, and observability—are responsible for the majority of the AI’s output, making the model just a small component.

How does this change AI development strategies?

It shifts the focus from solely improving models to designing better harnesses and context frameworks, which can improve performance, security, and cost-efficiency.

What are the economic implications of this insight?

Cost savings can be achieved by investing in configuration, testing, and system architecture, reducing token waste and maintenance expenses compared to model upgrades alone.

Will this approach reduce AI security vulnerabilities?

Yes, structured harnesses and verification processes help identify and mitigate vulnerabilities, making AI systems more robust and secure.

When will industry-wide adoption of these insights occur?

While some organizations are already shifting focus, widespread adoption is likely over the next 12-24 months as best practices and standards develop.

Source: ThorstenMeyerAI.com

You May Also Like

ShinyHunters · The New APT Model.

Analysis of ShinyHunters’ evolution into a scalable, AI-enabled extortion collective, redefining the threat landscape for enterprises.

Women in Leadership: Are Gender Gaps Finally Closing?

While progress has been made in closing gender gaps in leadership, significant…

The deployment. How the AI labs verticallyintegrated into the serviceslayer — the Palantir modelat scale.

Major AI labs are embedding forward-deployed engineers into enterprise services, shifting from model sales to operational deployment and dependency.

Struggling With Today’s Connections Puzzle? Here’S the Hint You Need!

Puzzle enthusiasts, prepare to unlock the secrets of today’s Connections challenge with essential hints that will transform your approach. Discover the clues within!