Mastering Context Engineering (1): Why Vibe Coding and Spec Driven Development Fail
Mastering Context Engineering (1): Why Vibe Coding and Spec Driven Development Fail
"VibeCoding didn't get us there. Only real engineering could." — Alex Turnbull, Founder of Groove (December 2025)
Introduction
2025 has been a year of upheaval in AI development methodology. At the beginning of the year, "Vibe Coding" felt like a revolution; by year's end, stories of startups paying the price are everywhere.
This article is not mere criticism. As an engineer who has spent two years building production systems with Agentic AI, I analyze why development approaches that remove context are bound to fail, backed by data and research.
What this article covers:
- Part 1: The Rise and Fall of Vibe Coding — Timeline, statistics, failure cases
- Part 2: The Trap of Spec Driven Development — Why systematic-looking approaches fail
- Part 3: Analysis from an Agentic AI Perspective — ReAct pattern and context
- Part 4: The Need for Transition — Industry leaders' direction change
Part 1: The Rise and Fall of Vibe Coding
1.1 The Birth of Vibe Coding: Karpathy's Tweet
On February 2, 2025, a tweet from Andrej Karpathy shook the tech industry.
"There's a new kind of coding I call 'vibe coding', where you fully give in to the vibes, embrace exponentials, and forget that the code even exists. I'm building a project or webapp, but it's not really coding — I just see stuff, say stuff, run stuff, and copy paste stuff, and it mostly works." — Andrej Karpathy, February 2, 2025
Karpathy is no ordinary developer. As former AI Director at Tesla and founding member of OpenAI, he's one of the most influential figures in modern machine learning. His words carried weight.
This tweet has now surpassed 5 million views, and the term "vibe coding":
- March 2025: Listed on Merriam-Webster as "slang & trending"
- November 2025: Selected as Collins Dictionary Word of the Year
1.2 Definition of Vibe Coding
Synthesizing Wikipedia and various sources, Vibe Coding is defined as:
Vibe Coding is:
- Describing what you want to AI in natural language
- Not reviewing or editing generated code
- Evaluating and requesting improvements based only on execution results
- Focusing on iterative experimentation rather than code correctness or structure
Unlike traditional AI-assisted coding or pair programming, the key difference is that the human developer does not review the code.
1.3 Why It Was Attractive: Early Success Stories
The reasons for Vibe Coding's explosive growth were clear:
1. Complete Removal of Entry Barriers People with zero programming experience could create apps. Tools like Cursor, Replit Agent, and Lovable made this possible.
2. Remarkable Speed Prototypes could be completed in hours. Tasks that previously took weeks.
3. The Power of Viral Demos Videos of AI generating features spread explosively on X and LinkedIn. What was "visible" felt like what "actually worked."
Tool Usage Surge (Early 2025):
- Base44: 950% surge
- Lovable: +207% growth
- Cursor: +62% growth
According to DesignRush, "vibe coding" search volume increased by 6,700%.
1.4 Reality Check: Second Half of 2025
But in the second half of 2025, reality began to emerge.
Tool Usage Collapse (SimilarWeb, July 2025 Report):
- Overall AI coding tool usage: 76% decrease in 12 weeks
- Base44: After 950% surge, 95% crash
- Lovable: From +207% to -37%
- Cursor: From +62% to -19%
This wasn't just trend cooling. It was hitting the "complexity wall" — the moment when the demo ends and real engineering begins.
1.5 The $4B Technical Debt Crisis
TechStartups.com's December 2025 report presents shocking figures:
"Roughly 10,000 startups tried to build production apps with AI assistants. More than 8,000 now need rebuilds or rescue engineering, with budgets ranging from $50K to $500K each."
Total Cost Estimate: $400M to $4B
This is the first AI-generated technical debt crisis.
Groove founder Alex Turnbull said on LinkedIn:
"Vibe Coding isn't just bullshit. It's expensive bullshit that is actively a disaster for thousands of startups."
1.6 Specific Causes of Failure
1.6.1 Context Collapse
LLMs have limited context windows. As projects grow, AI forgets previous decisions.
Results:
- Inconsistent code styles
- Duplicated logic
- Conflicting implementations
- New code contradicting previous decisions
One analyst expressed it this way:
"Context collapse occurs as LLMs lose track of earlier decisions beyond their limited context windows."
1.6.2 Security Vulnerabilities: Veracode 2025 Report
Veracode's 2025 GenAI Code Security Report is the most comprehensive AI-generated code security study:
Research Scale:
- 100+ LLM models tested
- 4 languages: Java, Python, JavaScript, C#
- 80 curated coding tasks
Key Findings:
- 45% of AI-generated code failed security tests
- Multiple OWASP Top 10 vulnerabilities included
Security Failure Rate by Language:
- Java: 72% (most dangerous)
- C#: 45%
- JavaScript: 43%
- Python: 38%
Most Frequent Vulnerabilities:
- Cross-Site Scripting (CWE-80): 86% defense failure in relevant samples
- SQL Injection
- Insecure cryptographic algorithm usage
- Log injection
Shocking Finding:
"Newer, larger AI models do not generate more secure code. Security performance remained flat regardless of model size or training sophistication."
1.6.3 Additional Security Research
Schreiber & Tippe Study (2025):
- 7,703 AI-generated code files analyzed
- 4,200+ unique vulnerabilities discovered
- 77 weakness types
- Python code vulnerability rate: up to 18%
Apiiro Study (2025):
- AI-assisted developers write 3-4x more code than peers
- But introduce 10x more security issues
1.7 Stack Overflow 2025 Developer Survey
Stack Overflow's 2025 Developer Survey shows changing developer perceptions of AI:
AI Tool Adoption:
- 84% of respondents using or planning to use AI tools (up from 76% in 2024)
- 51% of professional developers use AI tools daily
But Trust is Declining:
- Developers who don't trust AI tool accuracy: 46% (up from 31% in 2024)
- Developers who trust AI tool accuracy: 33% (down from 43% in 2024)
- Developers who "highly trust": only 3%
- More experienced developers are more skeptical: high trust response 2.6%
1.8 Broader AI Project Failure Statistics
Vibe Coding's failure is part of a broader AI project failure pattern:
MIT Study (2025):
- 95% of generative AI pilots fail to achieve measurable revenue or cost savings
2025 AI Project Abandonment Rate:
- 42% of companies abandoned most AI initiatives (more than double 2024)
RAND Study:
- 80% of AI projects don't reach intended outcomes
Pilot Stage Failure:
- 70-90% of AI projects never scale beyond pilot
Part 2: The Trap of Spec Driven Development
2.1 Emerging as an Alternative to Vibe Coding
As Vibe Coding's problems became apparent, many teams sought a more systematic approach. That alternative is Spec Driven Development.
Thoughtworks Technology Radar Vol. 33 (November 2025) Definition:
"Spec-driven development is an emerging approach to AI-assisted coding workflows. While the term's definition is still evolving, it generally refers to workflows that begin with a structured functional specification, then proceed through multiple steps to break it down into smaller pieces, solutions and tasks."
On the surface, it's much more systematic than Vibe Coding. Clear requirements, step-by-step approach, verification process.
But there are two fatal problems here too.
2.2 Problem 1: Removal of Context
What Specs Convey:
- Functional requirements
- I/O formats
- Constraints
- Success criteria
What AI Needs to Work Effectively:
- Why are we building this? (Business context)
- How does the existing system work? (Technical context)
- What attempts have failed? (History)
- What are the team's coding conventions? (Style context)
- What's the background of performance requirements? (Reason for constraints)
This gap is the core of the problem.
2.3 Problem 2: Absence of Iteration
The Reality of Software Development:
Requirements → Implementation → Feedback → Modification → Feedback → Modification → ... → Complete
Spec Driven Assumption:
Specification → Implementation → Complete
Why Iteration is Essential:
- Requirements change: Customers only know what they really want after seeing results
- Unpredictable problems: Unexpected technical constraints discovered during implementation
- Learning and improvement: First versions always have room for improvement
- Feedback loop: Real user reactions determine direction
In Spec Driven Development, iteration becomes:
- Every new Spec → Previous context lost
- AI doesn't know why changes were made
- Same mistakes repeated
- Consistency difficult to maintain
Thoughtworks Technology Radar explicitly expresses concern:
"There are still questions about how we remain adaptable and flexible while also building robust contextual foundations and ground truths for AI systems."
Part 3: Analysis from an Agentic AI Perspective
3.1 The ReAct Pattern: Core of Agentic AI
ReAct (Reasoning + Acting) is the core pattern of Agentic AI:
while not task_completed:
# 1. Reason: Analyze current state and goal
thought = llm.reason(current_state, goal, context)
# 2. Act: Execute determined action
action = select_action(thought)
observation = execute(action)
# 3. Observe: Observe results
current_state = update(current_state, observation)
Key Point: For AI to effectively reason, it needs:
- current_state: Information about current state
- goal: Goal to achieve
- context: Rich contextual information
3.2 How Vibe Coding Limits ReAct
Vibe Coding Approach:
"Build login feature" → [Code generation] → "It works!" → Done
Problems:
- Reasoning step skipped: No context for AI to reason with
- Only Acting performed: Degrades to simple code generator
- Observe ignored: Code not reviewed
- Iterate impossible: No context for improvement direction
Result: Fails to utilize Agentic AI's core capability of autonomous reasoning
3.3 How Spec Driven Limits ReAct
Spec Driven Approach:
[Detailed Spec] → "Implement according to this Spec" → [Code generation] → [Compare with Spec] → Done
Problems:
- Reasoning limited: Only "what" provided, not "why"
- Acting constrained: Only acts within Spec scope
- Observe limited: Only checks Spec compliance
- Iterate costly: New Spec needed each time
Result: Degrades AI to instruction executor, reasoning capability unused
3.4 The True Value of Agentic AI
With Context, Agentic AI can:
- Decompose complex problems and solve step by step
- Combine multiple tools to perform tasks
- Self-verify to ensure quality
- Continuously learn to improve
- Adapt to unexpected situations
Without Context, Agentic AI is:
- Simple code generator
- Generic pattern applier
- Unverifiable output
- Unable to learn (no feedback context)
- Unable to adapt to change
Part 4: The Need for Transition
4.1 Karpathy's Direction Change
Interestingly, Karpathy himself, who created "Vibe Coding," changed direction.
February 2025 (Vibe Coding):
"forget that the code even exists"
June 2025 (Context Engineering):
"Context engineering is the delicate art and science of filling the context window with just the right information for the next step."
The same person delivers completely different messages in 4 months. Why?
4.2 Simon Willison's Support
Simon Willison (Django co-creator, influential AI blogger) also supports Context Engineering:
"The term prompt engineering makes people think it's 'typing something into a chatbot.' The inferred definition of context engineering is much closer to the intended meaning."
4.3 Tobi Lutke (Shopify CEO) Agreement
Tobi Lutke, CEO of Shopify, points in the same direction:
"I really like the term context engineering over prompt engineering. It describes the core skill better: the art of providing all the context for the task to be plausibly solvable by the LLM."
4.4 Thoughtworks Technology Radar Vol. 33
Thoughtworks officially addresses this change in their November 2025 Technology Radar Vol. 33:
Key Findings:
-
Rise of Context Engineering
"In this latest edition, Thoughtworks signals a marked evolution from the previous volume, emphasizing the concept of context engineering and a growing focus on agentic systems."
-
Importance of Model Context Protocol (MCP)
"Every major technology vendor is building agent-awareness into their platforms, largely thanks to the Model Context Protocol (MCP)."
-
From Vibe Coding to Context Engineering Ken Mugrage (Thoughtworks Principal Technologist):
"2025 has been a huge year in the evolution of software engineering as a practice... the conversation has moved from questions of speed and scale to context."
4.5 Industry Consensus: Context is Key
By late 2025, major industry voices point in the same direction:
| Person/Organization | Message |
|---|---|
| Andrej Karpathy | Vibe Coding → Context Engineering |
| Simon Willison | Context Engineering closer to intended meaning |
| Tobi Lutke | Context provision is core skill |
| Thoughtworks | Emphasizes Context Engineering, MCP importance |
| Stack Overflow Survey | 46% of developers distrust AI accuracy |
| Veracode | 45% of AI code fails security |
Common Conclusion:
- Speed alone is not enough
- Context determines quality
- Human review and guardrails essential
- AI is a collaborator, not a tool
Conclusion: Next Steps
What We Learned
-
Vibe Coding's Failure
- Attractive demos, fatal reality
- $4B technical debt crisis
- 45% of AI code fails security tests
- 46% of developers distrust AI accuracy
-
Spec Driven Development's Limitations
- Context removal problem
- Absence of iteration
- Conflict with Agile
-
True Utilization of Agentic AI
- ReAct pattern and context relationship
- AI without context = code generator
- AI with context = reasoning collaborator
-
Industry Direction Change
- Karpathy, Willison, Lutke, Thoughtworks
- Prompt Engineering → Context Engineering
Next Article Preview
Article 2: Theoretical Foundations of Context Engineering will cover:
- Precise definition of Context Engineering
- Understanding LLM Context Windows
- Five components of context
- Deep relationship between Agentic AI and context
References
Core Materials
-
Karpathy, A. (2025, February 2). "Vibe Coding" - X/Twitter post. 5M+ views.
-
Karpathy, A. (2025, June). "Context Engineering" - X/Twitter post.
-
Veracode. (2025, July). "2025 GenAI Code Security Report." 100+ LLM models, 80 tasks analyzed.
-
Stack Overflow. (2025). "2025 Developer Survey." 84% AI tool usage, 46% accuracy distrust.
-
Thoughtworks. (2025, November). "Technology Radar Vol. 33."
-
Willison, S. (2025, June 27). "Context Engineering." simonwillison.net.
Failure Cases and Statistics
-
TechStartups.com. (2025, December 11). "The Vibe Coding Delusion."
-
SimilarWeb. (2025, July). "Global AI Tracker." 76% decrease in AI coding tool usage.
-
Turnbull, A. (2025, December). LinkedIn posts on Vibe Coding failures.
-
FinalRoundAI. (2025). "What CTOs Really Think About Vibe Coding." 16 of 18 CTOs experienced production disasters.
Security Research
-
Schreiber & Tippe. (2025). AI-generated code security analysis. 7,703 files, 4,200+ vulnerabilities.
-
Apiiro. (2025). AI-assisted developer study. 3-4x code, 10x security issues.
-
Snyk. (2025). "Is Vibe Coding Secure?"
If this article was helpful, look forward to the next in the series. How are you applying Context Engineering? Share in the comments.
