Mastering Context Engineering (1): Why Vibe Coding and Spec Driven Development Fail

"VibeCoding didn't get us there. Only real engineering could." — Alex Turnbull, Founder of Groove (December 2025)

Introduction

2025 has been a year of upheaval in AI development methodology. At the beginning of the year, "Vibe Coding" felt like a revolution; by year's end, stories of startups paying the price are everywhere.

This article is not mere criticism. As an engineer who has spent two years building production systems with Agentic AI, I analyze why development approaches that remove context are bound to fail, backed by data and research.

What this article covers:

Part 1: The Rise and Fall of Vibe Coding — Timeline, statistics, failure cases
Part 2: The Trap of Spec Driven Development — Why systematic-looking approaches fail
Part 3: Analysis from an Agentic AI Perspective — ReAct pattern and context
Part 4: The Need for Transition — Industry leaders' direction change

Part 1: The Rise and Fall of Vibe Coding

1.1 The Birth of Vibe Coding: Karpathy's Tweet

On February 2, 2025, a tweet from Andrej Karpathy shook the tech industry.

"There's a new kind of coding I call 'vibe coding', where you fully give in to the vibes, embrace exponentials, and forget that the code even exists. I'm building a project or webapp, but it's not really coding — I just see stuff, say stuff, run stuff, and copy paste stuff, and it mostly works." — Andrej Karpathy, February 2, 2025

Karpathy is no ordinary developer. As former AI Director at Tesla and founding member of OpenAI, he's one of the most influential figures in modern machine learning. His words carried weight.

This tweet has now surpassed 5 million views, and the term "vibe coding":

March 2025: Listed on Merriam-Webster as "slang & trending"
November 2025: Selected as Collins Dictionary Word of the Year

1.2 Definition of Vibe Coding

Synthesizing Wikipedia and various sources, Vibe Coding is defined as:

Vibe Coding is:

Describing what you want to AI in natural language
Not reviewing or editing generated code
Evaluating and requesting improvements based only on execution results
Focusing on iterative experimentation rather than code correctness or structure

Unlike traditional AI-assisted coding or pair programming, the key difference is that the human developer does not review the code.

1.3 Why It Was Attractive: Early Success Stories

The reasons for Vibe Coding's explosive growth were clear:

1. Complete Removal of Entry Barriers People with zero programming experience could create apps. Tools like Cursor, Replit Agent, and Lovable made this possible.

2. Remarkable Speed Prototypes could be completed in hours. Tasks that previously took weeks.

3. The Power of Viral Demos Videos of AI generating features spread explosively on X and LinkedIn. What was "visible" felt like what "actually worked."

Tool Usage Surge (Early 2025):

Base44: 950% surge
Lovable: +207% growth
Cursor: +62% growth

According to DesignRush, "vibe coding" search volume increased by 6,700%.

1.4 Reality Check: Second Half of 2025

But in the second half of 2025, reality began to emerge.

Tool Usage Collapse (SimilarWeb, July 2025 Report):

Overall AI coding tool usage: 76% decrease in 12 weeks
Base44: After 950% surge, 95% crash
Lovable: From +207% to -37%
Cursor: From +62% to -19%

This wasn't just trend cooling. It was hitting the "complexity wall" — the moment when the demo ends and real engineering begins.

1.5 The $4B Technical Debt Crisis

TechStartups.com's December 2025 report presents shocking figures:

"Roughly 10,000 startups tried to build production apps with AI assistants. More than 8,000 now need rebuilds or rescue engineering, with budgets ranging from $50K to $500K each."

Total Cost Estimate: $400M to $4B

This is the first AI-generated technical debt crisis.

Groove founder Alex Turnbull said on LinkedIn:

"Vibe Coding isn't just bullshit. It's expensive bullshit that is actively a disaster for thousands of startups."

1.6 Specific Causes of Failure

1.6.1 Context Collapse

LLMs have limited context windows. As projects grow, AI forgets previous decisions.

Results:

Inconsistent code styles
Duplicated logic
Conflicting implementations
New code contradicting previous decisions

One analyst expressed it this way:

"Context collapse occurs as LLMs lose track of earlier decisions beyond their limited context windows."

1.6.2 Security Vulnerabilities: Veracode 2025 Report

Veracode's 2025 GenAI Code Security Report is the most comprehensive AI-generated code security study:

Research Scale:

100+ LLM models tested
4 languages: Java, Python, JavaScript, C#
80 curated coding tasks

Key Findings:

45% of AI-generated code failed security tests
Multiple OWASP Top 10 vulnerabilities included

Security Failure Rate by Language:

Java: 72% (most dangerous)
C#: 45%
JavaScript: 43%
Python: 38%

Most Frequent Vulnerabilities:

Cross-Site Scripting (CWE-80): 86% defense failure in relevant samples
SQL Injection
Insecure cryptographic algorithm usage
Log injection

Shocking Finding:

"Newer, larger AI models do not generate more secure code. Security performance remained flat regardless of model size or training sophistication."

1.6.3 Additional Security Research

Schreiber & Tippe Study (2025):

7,703 AI-generated code files analyzed
4,200+ unique vulnerabilities discovered
77 weakness types
Python code vulnerability rate: up to 18%

Apiiro Study (2025):

AI-assisted developers write 3-4x more code than peers
But introduce 10x more security issues

1.7 Stack Overflow 2025 Developer Survey

Stack Overflow's 2025 Developer Survey shows changing developer perceptions of AI:

AI Tool Adoption:

84% of respondents using or planning to use AI tools (up from 76% in 2024)
51% of professional developers use AI tools daily

But Trust is Declining:

Developers who don't trust AI tool accuracy: 46% (up from 31% in 2024)
Developers who trust AI tool accuracy: 33% (down from 43% in 2024)
Developers who "highly trust": only 3%
More experienced developers are more skeptical: high trust response 2.6%

1.8 Broader AI Project Failure Statistics

Vibe Coding's failure is part of a broader AI project failure pattern:

MIT Study (2025):

95% of generative AI pilots fail to achieve measurable revenue or cost savings

2025 AI Project Abandonment Rate:

42% of companies abandoned most AI initiatives (more than double 2024)

RAND Study:

80% of AI projects don't reach intended outcomes

Pilot Stage Failure:

70-90% of AI projects never scale beyond pilot

Part 2: The Trap of Spec Driven Development

2.1 Emerging as an Alternative to Vibe Coding

As Vibe Coding's problems became apparent, many teams sought a more systematic approach. That alternative is Spec Driven Development.

Thoughtworks Technology Radar Vol. 33 (November 2025) Definition:

"Spec-driven development is an emerging approach to AI-assisted coding workflows. While the term's definition is still evolving, it generally refers to workflows that begin with a structured functional specification, then proceed through multiple steps to break it down into smaller pieces, solutions and tasks."

On the surface, it's much more systematic than Vibe Coding. Clear requirements, step-by-step approach, verification process.

But there are two fatal problems here too.

2.2 Problem 1: Removal of Context

What Specs Convey:

Functional requirements
I/O formats
Constraints
Success criteria

What AI Needs to Work Effectively:

Why are we building this? (Business context)
How does the existing system work? (Technical context)
What attempts have failed? (History)
What are the team's coding conventions? (Style context)
What's the background of performance requirements? (Reason for constraints)

This gap is the core of the problem.

2.3 Problem 2: Absence of Iteration

The Reality of Software Development:

Requirements → Implementation → Feedback → Modification → Feedback → Modification → ... → Complete

Spec Driven Assumption:

Specification → Implementation → Complete

Why Iteration is Essential:

Requirements change: Customers only know what they really want after seeing results
Unpredictable problems: Unexpected technical constraints discovered during implementation
Learning and improvement: First versions always have room for improvement
Feedback loop: Real user reactions determine direction

In Spec Driven Development, iteration becomes:

Every new Spec → Previous context lost
AI doesn't know why changes were made
Same mistakes repeated
Consistency difficult to maintain

Thoughtworks Technology Radar explicitly expresses concern:

"There are still questions about how we remain adaptable and flexible while also building robust contextual foundations and ground truths for AI systems."

Part 3: Analysis from an Agentic AI Perspective

3.1 The ReAct Pattern: Core of Agentic AI

ReAct (Reasoning + Acting) is the core pattern of Agentic AI:

while not task_completed:
    # 1. Reason: Analyze current state and goal
    thought = llm.reason(current_state, goal, context)
    
    # 2. Act: Execute determined action
    action = select_action(thought)
    observation = execute(action)
    
    # 3. Observe: Observe results
    current_state = update(current_state, observation)

Key Point: For AI to effectively reason, it needs:

current_state: Information about current state
goal: Goal to achieve
context: Rich contextual information

3.2 How Vibe Coding Limits ReAct

Vibe Coding Approach:

"Build login feature" → [Code generation] → "It works!" → Done

Problems:

Reasoning step skipped: No context for AI to reason with
Only Acting performed: Degrades to simple code generator
Observe ignored: Code not reviewed
Iterate impossible: No context for improvement direction

Result: Fails to utilize Agentic AI's core capability of autonomous reasoning

3.3 How Spec Driven Limits ReAct

Spec Driven Approach:

[Detailed Spec] → "Implement according to this Spec" → [Code generation] → [Compare with Spec] → Done

Problems:

Reasoning limited: Only "what" provided, not "why"
Acting constrained: Only acts within Spec scope
Observe limited: Only checks Spec compliance
Iterate costly: New Spec needed each time

Result: Degrades AI to instruction executor, reasoning capability unused

3.4 The True Value of Agentic AI

With Context, Agentic AI can:

Decompose complex problems and solve step by step
Combine multiple tools to perform tasks
Self-verify to ensure quality
Continuously learn to improve
Adapt to unexpected situations

Without Context, Agentic AI is:

Simple code generator
Generic pattern applier
Unverifiable output
Unable to learn (no feedback context)
Unable to adapt to change

Part 4: The Need for Transition

4.1 Karpathy's Direction Change

Interestingly, Karpathy himself, who created "Vibe Coding," changed direction.

February 2025 (Vibe Coding):

"forget that the code even exists"

June 2025 (Context Engineering):

"Context engineering is the delicate art and science of filling the context window with just the right information for the next step."

The same person delivers completely different messages in 4 months. Why?

4.2 Simon Willison's Support

Simon Willison (Django co-creator, influential AI blogger) also supports Context Engineering:

"The term prompt engineering makes people think it's 'typing something into a chatbot.' The inferred definition of context engineering is much closer to the intended meaning."

4.3 Tobi Lutke (Shopify CEO) Agreement

Tobi Lutke, CEO of Shopify, points in the same direction:

"I really like the term context engineering over prompt engineering. It describes the core skill better: the art of providing all the context for the task to be plausibly solvable by the LLM."

4.4 Thoughtworks Technology Radar Vol. 33

Thoughtworks officially addresses this change in their November 2025 Technology Radar Vol. 33:

Key Findings:

Rise of Context Engineering

"In this latest edition, Thoughtworks signals a marked evolution from the previous volume, emphasizing the concept of context engineering and a growing focus on agentic systems."
Importance of Model Context Protocol (MCP)

"Every major technology vendor is building agent-awareness into their platforms, largely thanks to the Model Context Protocol (MCP)."
From Vibe Coding to Context Engineering Ken Mugrage (Thoughtworks Principal Technologist):

"2025 has been a huge year in the evolution of software engineering as a practice... the conversation has moved from questions of speed and scale to context."

4.5 Industry Consensus: Context is Key

By late 2025, major industry voices point in the same direction:

Person/Organization	Message
Andrej Karpathy	Vibe Coding → Context Engineering
Simon Willison	Context Engineering closer to intended meaning
Tobi Lutke	Context provision is core skill
Thoughtworks	Emphasizes Context Engineering, MCP importance
Stack Overflow Survey	46% of developers distrust AI accuracy
Veracode	45% of AI code fails security

Common Conclusion:

Speed alone is not enough
Context determines quality
Human review and guardrails essential
AI is a collaborator, not a tool

Conclusion: Next Steps

What We Learned

Vibe Coding's Failure
- Attractive demos, fatal reality
- $4B technical debt crisis
- 45% of AI code fails security tests
- 46% of developers distrust AI accuracy
Spec Driven Development's Limitations
- Context removal problem
- Absence of iteration
- Conflict with Agile
True Utilization of Agentic AI
- ReAct pattern and context relationship
- AI without context = code generator
- AI with context = reasoning collaborator
Industry Direction Change
- Karpathy, Willison, Lutke, Thoughtworks
- Prompt Engineering → Context Engineering

Next Article Preview

Article 2: Theoretical Foundations of Context Engineering will cover:

Precise definition of Context Engineering
Understanding LLM Context Windows
Five components of context
Deep relationship between Agentic AI and context

References

Core Materials

Karpathy, A. (2025, February 2). "Vibe Coding" - X/Twitter post. 5M+ views.
Karpathy, A. (2025, June). "Context Engineering" - X/Twitter post.
Veracode. (2025, July). "2025 GenAI Code Security Report." 100+ LLM models, 80 tasks analyzed.
Stack Overflow. (2025). "2025 Developer Survey." 84% AI tool usage, 46% accuracy distrust.
Thoughtworks. (2025, November). "Technology Radar Vol. 33."
Willison, S. (2025, June 27). "Context Engineering." simonwillison.net.

Failure Cases and Statistics

TechStartups.com. (2025, December 11). "The Vibe Coding Delusion."
SimilarWeb. (2025, July). "Global AI Tracker." 76% decrease in AI coding tool usage.
Turnbull, A. (2025, December). LinkedIn posts on Vibe Coding failures.
FinalRoundAI. (2025). "What CTOs Really Think About Vibe Coding." 16 of 18 CTOs experienced production disasters.

Security Research

Schreiber & Tippe. (2025). AI-generated code security analysis. 7,703 files, 4,200+ vulnerabilities.
Apiiro. (2025). AI-assisted developer study. 3-4x code, 10x security issues.
Snyk. (2025). "Is Vibe Coding Secure?"

If this article was helpful, look forward to the next in the series. How are you applying Context Engineering? Share in the comments.