Does AI actually help developers write better code?
The short answer is: yes, measurably β with important caveats. GitHub's own research (2022β2024) consistently found that developers using Copilot completed tasks 55% faster. McKinsey found 20β45% productivity gains across engineering tasks. These numbers are real.
But "faster" is not the same as "better". The research on code quality is more nuanced:
- Boilerplate and repetitive code: AI excels. Generating CRUD endpoints, test stubs, configuration files β AI saves significant time with acceptable quality.
- Novel algorithmic problems: Mixed results. AI often produces working but suboptimal solutions; it rarely invents new algorithms.
- Security: Consistently worse than human developers writing without AI assistance, according to multiple academic studies. AI generates more security bugs, not fewer.
- Code review and explanation: Excellent. AI is often more useful explaining and reviewing code than generating it.
The productivity paradox: AI makes you faster at writing code β but security bugs introduced by AI take the same time to find and fix as human-written ones. If AI generates 2Γ as much code with 1.5Γ the bug rate, your net security debt can increase even as velocity goes up.
How AI coding tools work (briefly)
All current AI coding tools are built on large language models fine-tuned on source code. The key architectural differences between products:
- Context window: How much of your codebase the model "sees" when generating a suggestion. Larger context = more relevant suggestions but higher latency and cost.
- Codebase indexing: Whether the tool indexes your entire repository or only the current file. Tools that index the full repo (Cursor, Copilot Workspace) produce significantly more contextually relevant output.
- Model underlying: Most tools let you choose the underlying model (Claude Sonnet/Opus, GPT-4o, Gemini). The base model quality matters enormously.
- Agentic capability: Whether the tool can autonomously make multi-file edits, run tests, and iterate β or is purely a completion engine.
GitHub Copilot
Best for: Teams already on GitHub Enterprise who want the lowest-friction integration.
Copilot is the market leader by adoption. It integrates directly into VS Code, JetBrains IDEs, Neovim, and Visual Studio. The inline completion experience is polished after years of iteration.
- Strengths: Deep GitHub integration (PR summaries, issue linking, code review), large user base means extensive community resources, Copilot Workspace for multi-file agentic tasks.
- Weaknesses: Underlying model (GPT-4o) is not the strongest at complex reasoning; expensive at enterprise scale; limited model choice (you cannot swap in Claude).
- Security posture: Copilot has a "security vulnerability filter" that attempts to block insecure suggestions β but research shows it catches less than 30% of the security bugs it introduces. Do not rely on it.
Individual: $10/month Business: $19/user/month Enterprise: $39/user/month (includes Copilot Workspace)
Cursor
Best for: Power users who want the most capable agentic coding experience available.
Cursor is a VS Code fork that puts AI at the centre of the editing experience. It indexes your entire codebase, maintains a semantic understanding of your project structure, and can make coordinated multi-file edits in a single agent loop.
- Strengths: Best-in-class codebase awareness, model choice (Claude Opus/Sonnet, GPT-4o, Gemini), agent mode that autonomously writes and tests code, excellent context management.
- Weaknesses: VS Code fork means you get VS Code β not a native IDE. Some teams have concerns about codebase data being sent to Cursor's servers for indexing.
- Security posture: No built-in security scanning. Relies entirely on external SAST tools. The agentic mode can make sweeping changes across many files rapidly β amplifying the risk of AI-introduced security bugs going unnoticed.
Claude (Anthropic)
Best for: Complex reasoning tasks, architectural decisions, code review, and security analysis.
Claude (claude.ai / API) is not a dedicated coding IDE tool β it is a reasoning model that excels at code. Claude Opus 4 and Sonnet 4 consistently outperform GPT-4o on coding benchmarks (SWE-bench, HumanEval) and on security-related reasoning.
- Strengths: Best reasoning quality for complex problems, longest context window (200K tokens), excellent at security analysis and code review, powers Cursor and many other tools.
- Weaknesses: Not an IDE plugin by default β requires integration (via Claude Code CLI, Cursor, or API). Less "inline" than Copilot.
- Security posture: Produces fewer hardcoded credentials and injection flaws than GPT-4o in blind comparisons, but still requires SAST scanning. Notable for being more likely to include security caveats in generated code.
ChatGPT / GPT-4o
Best for: General-purpose coding help, quick prototyping, and teams already invested in the OpenAI ecosystem.
GPT-4o is a strong generalist model. Its coding quality is solid but trails Claude on complex reasoning tasks. The ChatGPT interface with canvas mode supports iterative code editing. The API powers many IDE integrations.
- Strengths: Widest ecosystem integration, familiar to most developers, strong on common patterns and frameworks, function calling support for tool-augmented workflows.
- Weaknesses: Weaker on novel/complex problems than Claude Opus; the training data cutoff means it may suggest deprecated libraries; higher rate of security antipatterns in generated code.
Gemini Code Assist
Best for: Google Cloud and GCP-centric teams who want deep cloud integration.
Gemini Code Assist (formerly Duet AI) is Google's offering, integrated into VS Code, IntelliJ, Cloud Shell, and the Google Cloud Console. Its distinguishing feature is deep GCP API knowledge and awareness of Google's internal coding standards.
- Weaknesses: Trails Copilot and Cursor on general code quality benchmarks; less community knowledge available; best value only if you are on GCP.
The security dimension: where all tools fall short
None of the current AI coding assistants are security-first tools. They optimise for code that works, not code that is secure. The specific security weaknesses vary by tool but the categories are consistent:
- Hardcoded credentials in generated examples
- SQL/command injection via string interpolation
- Disabled SSL verification in HTTP client code
- Outdated or deprecated crypto APIs
- Missing authentication decorators on generated routes
The mitigation is SAST, not better prompting: You can reduce AI-generated security bugs with security-aware prompting, but you cannot eliminate them. The reliable solution is automated SAST scanning on every file save (IDE) and every commit (CI). The AI generates code fast; the scanner catches the security issues before they merge.
Verdict: which tool to choose?
Scenario Recommended ββββββββββββββββββββββββββββββββββββββββββββββββββββββ GitHub-first team, ease of setup GitHub Copilot Maximum coding capability + agents Cursor + Claude Complex reasoning / architecture Claude via API or Claude Code Quick prototyping / chatting ChatGPT GCP-heavy cloud team Gemini Code Assist Security-aware generation Any tool + AquilaX SAST in IDE
The honest summary: All tools make you faster. None make you more secure by default. Use whichever fits your workflow β and add automated security scanning as the non-negotiable complement to whatever AI tool you choose.
Scan AI-generated code automatically
AquilaX integrates with VS Code, JetBrains, and all major CI platforms β catching the security bugs that every AI coding assistant introduces, before they reach production.
Get the IDE scanner β