4 types of AI agents: which one fits your business?

Everyone is talking about AI agents. No one explains which ones

“We use AI agents.” You hear it everywhere. At conferences, in podcasts, in LinkedIn posts from consultants who were still writing about blockchain last month. But ask a follow-up question and it gets vague. What kind of agent? What exactly does it do? Where is a human in the process?

The problem: “agent” has become a catch-all term. Just like “cloud” ten years ago. Everything is an agent. A chatbot? Agent. A script that writes code? Agent. A fully autonomous system that ships software? Also an agent.

The numbers are impressive. Gartner predicts that 40% of all enterprise applications will contain AI agents by the end of 2026. That was less than 5% in 2025. In the Netherlands, AI use by companies has nearly doubled in three years: from 34% to 67%. But Gartner also warns: more than 40% of all agentic AI projects will be cancelled before the end of 2027. Often because companies choose the wrong type.

For you as a director, that is useless without context. You want to know what you can deploy tomorrow, what it costs, and what the risks are. For that, you need to know the difference. Because one agent replaces an intern, the other an entire development team. Those are fundamentally different decisions.

Let’s break it down.

Four types of AI agents that matter in practice

We currently distinguish four types. Each type has a different level of autonomy, a different role for humans, and different risks.

Type	What it does	Human role	Autonomy
Harness	Works alongside you as a digital assistant	You steer, review, correct	Low to medium
Dark factory	Gets a spec, delivers a result	Check at the end	High
Auto research	Optimizes a metric through experiments	Asks the question, judges the answer	Medium to high
Orchestration	Coordinates multiple agents in a chain	Designs the system, monitors	Very high

This is not a definitive classification. The field moves fast and the boundaries are fluid. A harness you fine-tune for months almost becomes a dark factory. Orchestration can contain auto research. But for the choices you have to make as an organization right now, these are the four that matter.

Most companies start with the harness. That makes sense. But the real productivity gain lies in the step toward dark factory. And so do the real risks. Let’s go through them one by one.

The harness: your digital assistant that gets better

This is the broadest type and the place where most organizations are now. A harness agent works alongside you. You give instructions, the agent executes, you review the result. Think of a very fast colleague who never gets tired but does need direction.

Well-known examples: for coding these are tools like Claude Code, Cursor, OpenAI Codex and GitHub Copilot. But a harness doesn’t have to be about code. Microsoft Copilot in your Office environment is also a harness. Anthropic’s Cowork just as much. Or, as we do, a Claude Code agent that you equip with procedures and reference material for your entire infrastructure management. This is essentially the step from vibe coding to agentic coding: not typing yourself, but steering an agent.

How far this already goes in practice

Andrej Karpathy, co-founder of OpenAI, has been running ten to twenty agents in parallel since December 2025. Sixteen hours a day. He hasn’t written a single line of code by hand since then. His description: “a state of psychosis.” Not because it works poorly, but because the pace is so high that your brain has to get used to a completely different way of working.

Peter Steinberger went viral with OpenClaw: four to ten agents at a time, each with a twenty-minute task. He was no longer involved as a programmer. He was involved as a project manager dividing tasks and checking results.

Cursor, the company behind the most popular AI code editor, has formalized this with Background Agents. A planner/executor model: you describe what you want, the agent plans the approach and executes. They have reportedly built browsers and compilers with it. The company is said to generate more than 500 million dollars in annual revenue. More than half of the Fortune 500 uses it.

Individual vs. project-level

There is an important difference. An individual harness helps you personally work faster. But with a team of eight to twenty people, you need a project-level harness. One that knows the codebase, the conventions, the architecture. One that ensures ten people with agents don’t build in ten different directions.

That difference is often underestimated.

Individual productivity times ten is not the same as team productivity times ten. Without project-level steering, you get chaos that moves faster.

This also says something about the complexity of agents in an organization. It is not enough to give everyone an agent. You have to think about how those agents work together, what frameworks they get, and who keeps control. That is an organizational issue, not a technical problem.

The dark factory: spec in, result out

This is where it gets serious. A dark factory agent gets a specification, iterates independently, and delivers a result. No human in the loop during the work. Only a human check at the end.

The name comes from manufacturing: factories so automated that the lights can be turned off. No human on the floor.

Stripe runs this at scale. A thousand pull requests per week by AI agents. They built a fork of Goose, an open-source agent framework. The model is fire-and-forget: specification in, code out, review at the end.

Even more impressive: StrongDM. Three people. 32,000 lines of production code. Without writing a single line by hand. The entire product was built by agents that received specs and delivered code. The people wrote specs, reviewed output, and adjusted.

But it can also go wrong. Amazon had a major six-hour outage on March 6, 2025. Checkout, login, product prices: nothing worked. In December 2025 the Kiro incident followed: thirteen hours of downtime in the China region. Both incidents were related to AI-generated code changes. The exact causes are not fully public, but the pattern is clear: speed without sufficient review leads to problems. Amazon subsequently introduced emergency measures: junior engineers now need approval from a senior for AI-assisted code changes.

This is the hybrid model that works in practice: dark factory in the middle, human check at the end. The agent does the heavy lifting. A human validates before it goes to production. None of the successful cases above runs entirely without human review.

Boston Consulting Group estimates the productivity gain at three to five times for teams working at the dark factory level. Those are their own figures, not independently verified. But even if you calculate conservatively with a factor of two, you are talking about a fundamental shift. That is the difference between five developers and a team of three with agents.

The flip side: if your review step is not sound, you also multiply errors by that same factor. Anyone who lets code be generated without quality control runs into the same risks as with vibe coding.

Auto research: AI agents that optimize a metric

Not all problems are software problems. Sometimes you just want to improve a number. Conversion rate up. Load time down. Cost per lead reduced. That is a different kind of work than writing code. And there are different agents for it.

The idea comes from classic machine learning: have a system try hundreds of variants, measure the result, and pick the winner. Only now the agent itself can devise, execute and evaluate the variants.

Shopify’s Liquid engine. Tobi Lutke let an AI agent loose on Shopify’s Liquid template engine. The result after roughly 120 experiments and 93 commits: 53% faster, 61% fewer memory allocations. That affects 5.6 million webshops. Lutke’s own caveat: “probably somewhat overfit.” Fair. But the direction is clear.

Karpathy’s autoresearch. Andrej Karpathy released his autoresearch framework on March 7, 2026. Within two days: 21,000 GitHub stars and 700 experiments. Twenty of them produced measurable improvements. His vision: “You are not emulating a PhD student. You are emulating an entire research community.” Every run is an experiment. The agent reads previous results, devises a hypothesis, tests it, and logs the outcome.

The applications are broader than you think. Companies like Mutiny and Intellimize use the same approach for conversion optimization. A/B testing on steroids. Code performance tuning. Query optimization for databases.

The crucial distinction: is your problem software-shaped or metric-shaped? Do you need to build a new system? Or do you have an existing system with a number that could be better? In the second case, auto research is your friend.

Orchestration: multiple AI agents in a chain

So far we have been talking about single agents. But some tasks are too complex for one agent. Then you have multiple specialized agents work together. Agent A retrieves data, passes it to Agent B which analyzes it, and Agent C writes the report.

LangGraph is the heavyweight here. More than 100,000 GitHub stars, tens of millions of downloads per month, around 400 companies in production. Cisco, Uber, LinkedIn, BlackRock. Interesting detail: the average number of steps per agent trace rose from 2.8 in 2023 to 7.7 in 2024. Agents are getting more complex. Chains are getting longer.

CrewAI builds multi-agent systems with roles. Think of a “researcher,” a “writer” and a “reviewer” working together. A fintech company set up three agents for their customer communication in four hours.

n8n is the most accessible option. More than 400 integrations, a visual editor, and AI Agent nodes you can connect. You build multi-agent workflows via the Workflow Tool node. No code needed. For teams that want to start quickly without hiring a developer, this is the most capable open-source option.

Trigger.dev takes a different approach. Code-first, TypeScript-based. An execution layer for developers who want to write their agents in code but don’t want to manage their own infrastructure.

The honest question: when is orchestration worth it? Every business case pays off eventually, but the initial investment determines when. For a simple process with few steps, a single harness is often enough. Orchestration becomes interesting once you have multiple steps that keep recurring: receiving a customer request, classifying, routing, answering. The more often that process runs, the faster you earn back the setup time.

What we learned ourselves about AI agents (and what no one tells you)

This is where it gets concrete. No theory, but what we encounter daily at JumpScale.

Building a harness takes serious time. We built a harness and skillset to manage our entire infrastructure. Multiple servers, configuration, monitoring, the whole picture. We spent a good month on it. Half an hour of fine-tuning every day, alongside the work the agent performs. Part of our journey to becoming AI-native.

The result: an agent that can be used team-wide. Full expertise, procedures and reference material available in the harness. Managing the infrastructure of our multiple servers has become a breeze. But investing that first month? No one tells you that.

Harnesses are broader than development. We also use them for daily digital work. Email management, administration, research. Not just for code. Most content about AI agents is about building software. In practice, the biggest gain often lies in the boring work that comes back every day.

Auto research: we are still experimenting. We are testing auto research on processes like A/B testing and query optimization. The results are mixed. Sometimes surprisingly good, sometimes a dead end. We are honest about it: this is not finished yet.

Orchestration: multiple tools side by side. For orchestration we use n8n and Trigger.dev. For the heavier work, think of complex multi-agent chains with state management, we focus on LangChain and LangGraph. n8n works well for workflows we want to set up quickly. Trigger.dev for tasks where we need more control. LangGraph for the architectures where multiple agents really have to collaborate autonomously.

The core lesson no one tells you: it takes serious time and effort to configure an agent well. It is not plug-and-play. Through daily use you encounter edge cases, including security risks of AI tools that you don’t see in advance. Those edge cases are input for further development and optimization. It gets a little better every week. But “quickly setting up an agent” doesn’t exist.

What also helps: start small. Our first harness did three things. Now it does thirty. That growth came organically, from daily use. Not from a big plan up front.

Decision aid: which AI agent fits your business?

Five questions. That’s all you need.

Do you need a smart co-worker that thinks along with you? Start with a harness. Give an AI agent context about your business, your processes, your preferences. This is where 90% of organizations should begin.
Do you want to improve a specific number? Auto research. Conversion rate, load time, cost per lead. Works best if you already have data and a clear metric.
Do you want to build software autonomously? Dark factory. Only relevant if you have a development team and want to automate repetitive build tasks. Not where you start.
Do you have multiple steps that follow each other? Orchestration. Receiving a customer request, classifying, routing, answering. Becomes interesting once that process runs often enough.
Are you in doubt? Start with a harness. Seriously. It is the lowest threshold, the fastest results, and you learn how AI agents work before investing in more complex setups.

Budget: count on 3,000 to 8,000 euros in the first year for a simple agent setup. That is licenses, configuration time and fine-tuning. The biggest cost item is not the software. It is the time to train the agent on your specific situation.

The order for organizations: harness first. Then orchestration via n8n, Trigger.dev or a comparable tool for your repetitive workflows. Dark factory and auto research are for later, once you already have experience with agents and know what you want to automate.

Need help?

Want to know which type of AI agent fits your business? We are happy to think along. In 30 minutes we discuss which type of agent fits your situation. No sales pitch, just an honest conversation.

Schedule a call ->

Sources: Fortune: Karpathy on “state of psychosis” and agents (2026), Fortune: Steinberger/OpenClaw profile (2026), Simon Willison: StrongDM dark factory (2026), Simon Willison: Shopify Liquid 53% faster (2026), VentureBeat: Karpathy autoresearch (2026), Medium: Stripe 1,000 PRs/week (2026), Tom’s Hardware: Amazon AI-code incidents (2025), BCG Platinion: The Dark Software Factory (2026), LangChain: State of AI Agents (2024), n8n: AI Agent documentation, Gartner: AI agents in enterprise apps (2025), CBS: AI use by companies (2025)