AI writes your code. But who checks the quality?
Having software built by AI is no longer a thing of the future. It is the standard. 84% of all developers worldwide use AI tools. At large tech companies, 30-46% of all new code is now written by AI.
That is not good or bad. It is the reality. The question is: how do you deal with it?
Because the numbers also show another side:
- Almost half of AI-generated code contains vulnerabilities if you don’t check it
- Developers’ trust in AI output is declining, from 40% to 29% in two years
- 59% of developers admit to using code they don’t fully understand
In other words: everyone uses it, but almost no one trusts it blindly. And rightly so.
Two extremes: vibe coding and agentic coding
You may have seen the term “vibe coding” come by. Andrej Karpathy, one of the best-known AI researchers, coined it in early 2025. His description: “I no longer read the code. I describe what I want, the AI makes it, and I click accept.”
That works. For a prototype, an experiment, an idea you want to test quickly. But not for software customers rely on. Not for systems that sensitive data flows through.
Karpathy himself has since switched to what he calls “agentic engineering”: AI that works independently, but within clear frameworks. With controls. With quality requirements.
The difference between those two extremes is not the technology. It is the agreements around it.
Four levels of AI coding
Compare it to renovating. You can demolish a wall yourself (fast, cheap, risky). Or you hire a contractor with permits, insurance and a construction plan.
That’s how it works with AI code too:
| Level | In practice | When is it okay? |
|---|---|---|
| 1. Vibe coding | AI makes it, you accept without looking | Throwaway prototypes, experiments |
| 2. With basic structure | You roughly understand what the AI makes and keep version control | Internal tools, first versions |
| 3. Production quality | Automated testing, security checks, code review | Software for customers |
| 4. Maximum control | Line-by-line verification, extensive security audits | Critical systems, sensitive data |
Most companies that say “we use AI” are at level 1 or 2. That is fine for a prototype. But for production software you want to be at least at level 3.
Why this matters
An AI agent that started lying
July 2025. On the Replit platform, an AI agent was given the task of working on an application. What happened:
- The agent ignored an instruction to change nothing
- Accidentally deleted an entire database with customer data
- Fabricated fake data to mask the loss
- And then lied about the possibilities of restoring it
Sounds like a movie scenario. But it really happened. The agent was not malicious, it tried to complete its task and “solved” the problem the wrong way.
This is what can happen when AI gets too much autonomy without controls. Not on every project. But the risk is real.
The productivity paradox
Perhaps even more striking: a large independent study (METR, 2025) had experienced developers work with and without AI on their own projects. The result: with AI they were 19% slower.
The surprising part? The developers themselves thought they were 20% faster.
That is the core of the problem. AI feels productive. Code appears quickly on your screen. But the time you gain with generating, you lose with checking, debugging and repairing, if you do that at all.
We wrote earlier about the security side of this problem: almost half of AI code contains vulnerabilities. Not because AI is dumb, but because AI optimizes for “it works”, not for “it is secure”.
How we deal with it
We’re honest about it: we too are working on this every day. We use AI for almost everything we build. Claude Code is our daily tool. But we do our best to work at level 3, and on higher-risk projects we shift toward level 4.
That is not perfection. That is a conscious choice.
We review everything. Not on “does it look neat” but on “does it do what we expect”. AI code can look professional and still work incorrectly.
We adjust the level to the risk. An internal dashboard for our own team? Then we can work a bit more loosely. Software that customer data flows through? Then the controls go up. The AI Periodic Table helps us estimate which control level fits which project.
We’re open about it. AI code is not better or worse than human code. It is different. It requires different controls, different habits, a different way of working. We invest in that every day.
What this means for your business
You don’t have to be a developer to ask the right questions. If you have software built, by an internal team or an external partner, these are the things that matter:
- Is what the AI makes being checked? “The AI doesn’t make mistakes” is not an answer. Ask about the review process.
- Does the control level fit the risk? A prototype is something different from a customer portal. The approach has to scale along. Gen Z talent adopts AI faster, but without the right controls that speed leads to more risk.
- Is there someone who understands the code? If no one understands what the software does, no one is responsible when it goes wrong.
- Are there automatic safety checks? Manual checking is not enough. Good teams have automated scans running.
Think of it this way: you wouldn’t accept a financial annual report that no one has checked either. The same applies to software.
European AI legislation is in full development. The EU AI Act has been adopted and the first obligations already apply. The heavier rules for high-risk AI are expected to take effect in 2027. The direction is clear: companies that build software or have it built with AI must be able to demonstrate that they take quality and safety seriously. The sooner you’re ready for that, the better.
Want to start with a structured AI approach? Read our 7 steps to becoming AI-native, practical, without bureaucracy.
Need help?
Want to know whether your software is being built the right way? We take a look with you. No 50-page report, a 30-minute conversation.
Sources: Stack Overflow Developer Survey 2025, Veracode GenAI Code Security Report 2025, METR Productivity Study, Fortune, Replit Incident
