The Genuine Appeal of Vibe Coding
It is worth being honest about why vibe coding took off. The productivity gains for certain tasks are real and significant. A developer who would spend a day scaffolding boilerplate, writing CRUD endpoints, and setting up a project structure can now do all of that in under an hour.
For solo developers, founders, and prototyping, vibe coding has compressed the time-to-working-demo from weeks to days. That matters. Products have been shipped, startups have launched, and ideas have been tested that would not have been viable before AI-assisted development.
The thesis: Vibe coding is a powerful tool, not a universal replacement. Engineers who understand this use it extremely effectively. Engineers who treat it as a replacement for engineering judgment run into serious problems.
Context Window Limits at Scale
The most immediate practical limit of vibe coding is context. AI models have a finite context window β the amount of code they can "see" and reason about simultaneously. For small projects, this is not a problem. For large ones, it becomes a fundamental constraint.
What happens at scale
A typical production codebase might have 50,000β500,000+ lines of code across hundreds of files. No AI model can reason about all of it simultaneously. When you ask an AI to add a feature or fix a bug in a large codebase, it is working from a partial view of the system.
The result is code that appears to work in isolation but violates implicit contracts established elsewhere in the codebase β data model assumptions, caching invariants, permission models, concurrency constraints. These bugs are some of the hardest to diagnose because the code is locally correct but globally wrong.
The invisible integration problem: AI generates code that passes its own tests and looks correct in the context provided β but silently breaks behaviour defined in files it never saw.
Example: the auth context
Your codebase has an established pattern where every database write is audited with the requesting user's ID, pulled from a global auth context object. AI, given only the model file, generates a write operation that uses the object's creator field instead. The code is correct β it just silently bypasses the audit trail that your compliance team depends on.
The Correctness Problem
AI-generated code passes the "does it run" test reliably. It passes the "does it produce a roughly correct output" test most of the time. It often fails the "is it correct in all edge cases" test β which is where most production bugs live.
AI does not reason about edge cases it has not seen
Edge cases are, by definition, uncommon. That means they appear rarely in training data. AI has learned that the common path is what matters. It handles the happy path well and the obvious error cases decently. The subtle ones β time zone edge cases, concurrent write races, numeric overflow on boundary values, locale-specific parsing β get missed.
- Date arithmetic that breaks at DST transitions
- Pagination logic that skips the last item when count is exactly divisible by page size
- Race conditions in lock acquisition under high concurrency
- Integer overflow in payment amount calculations
- Encoding issues with non-ASCII input in file paths
These are not hypothetical. They are the categories of bugs that appear in post-mortems for systems built primarily with AI assistance.
Security Blind Spots
Security is the area where the limits of vibe coding are most consequential. Security vulnerabilities do not announce themselves β code can be functional and completely insecure simultaneously, and automated tests typically do not catch security issues.
What AI consistently misses
- Business logic authorisation: AI understands "only authenticated users can do X" but misses "only the owner of resource Y can do X", especially when ownership is indirect (through multiple joins or denormalized relationships)
- Rate limiting: AI rarely adds rate limiting spontaneously β it has to be explicitly requested, and even then, the implementation is often bypassed by changing user agents or IPs
- Mass assignment: AI frequently accepts all fields in update endpoints, allowing users to overwrite fields they should not (admin status, account tier, credits)
- SSRF in integrations: When generating webhook handlers or URL-fetching features, AI rarely validates that the target URL is not an internal network address
- Timing attacks on comparison: AI uses standard string equality for token and password comparison rather than constant-time comparison functions
The test coverage illusion: AI will happily generate tests that pass against its own generated code. This creates confidence in code that has never been tested for the scenarios that matter for security. 100% test coverage on AI-generated code is not a security signal.
Maintainability Debt
Code is read far more often than it is written. AI-generated code optimises for generation speed, not for the engineers who will work with it for the next five years. This creates maintainability debt that compounds over time.
Inconsistency at scale
When multiple developers use AI to generate different parts of the same codebase over time, each generation session produces locally consistent but globally inconsistent code. Error handling patterns vary. Logging approaches differ. Database access patterns are inconsistent. Naming conventions diverge. The result is a codebase that requires increasingly high cognitive load to work with.
Unexplainable code
AI sometimes generates solutions that work but are not obviously correct β using non-obvious algorithms, unusual library features, or complex expressions that achieve the goal through a path that no human would have taken. When that code breaks six months later, the engineer trying to debug it has no mental model to work from.
The bus factor: If the only "person" who understands why a piece of code was written a certain way is the AI session that generated it, your bus factor for that code is zero. The session is gone, the context is gone, and you have code nobody can explain.
Domain Knowledge Gap
AI has broad but shallow knowledge. It knows a lot about software patterns, frameworks, and common algorithms. It knows far less about your specific domain β your business rules, your users' expectations, your regulatory constraints, your operational context.
Where this bites
- Financial calculations: AI uses floating point arithmetic for money β a classic mistake that causes rounding errors in currency calculations
- Healthcare data: AI does not spontaneously apply PHI handling rules, consent management, or minimum necessary access principles
- Legal/compliance logic: GDPR data retention, CCPA deletion rights, SOX audit trail requirements are not in AI's default patterns
- Performance constraints: AI does not know your database has 500M rows in that table, or that this API is called 10,000 times per second
Regulated Industries
For teams in healthcare, finance, defence, and critical infrastructure, vibe coding introduces compliance risk that goes beyond just bugs and vulnerabilities.
The evidence and audit trail problem
Regulated software development requires documented design decisions, traceability from requirements to implementation, and evidence of review. AI-generated code lacks provenance β there is no design document, no rationale, no review trail. Auditors asking "why was this implemented this way?" get no answer.
Several compliance frameworks are beginning to address AI-generated code explicitly. SOC 2 Type II, ISO 27001, and FedRAMP auditors are increasingly asking about AI tooling in the development process and whether controls account for AI-generated code risk.
Supply chain and IP risk: AI training data includes code under various licences. There is ongoing legal uncertainty about whether AI-generated code can infringe on copyrighted training data. Some enterprises have restricted AI coding tools pending legal clarity.
Where Vibe Coding Genuinely Wins
Understanding the limits is not an argument against using AI in development. It is an argument for using it deliberately. The tasks where vibe coding provides the most value with the least risk:
- Boilerplate and scaffolding: Project setup, configuration files, directory structures, standard CRUD operations in well-defined patterns
- Test generation: Writing unit tests for existing, reviewed code β AI is very good at this and the output is verifiable
- Documentation: Generating docstrings, README content, API documentation from existing code
- Prototyping and exploration: Throwaway code for evaluating an approach β code that will be rewritten before production
- Data transformation scripts: One-off or low-risk scripts for processing and migrating data
- Known-good algorithm implementation: Sorting, parsing, formatting β cases with clear specifications and no security implications
The pattern is: use vibe coding where the output is easily verifiable, the security implications are low, or the code is throwaway. Apply engineering judgment where correctness is subtle, security is critical, or the code will be maintained for years.
The senior engineer heuristic: If a senior engineer would spend five minutes reviewing and approving a piece of AI-generated code, that is a good vibe coding target. If they would spend two hours, that is a case for writing it yourself β with AI assistance for the implementation details.
Catch What AI Misses
AquilaX scans AI-generated code for the security patterns AI-generated code consistently misses β BOLA, mass assignment, secrets, insecure auth β before they reach production.
Start Free Scan