Agent-to-Agent Code Review: How to Wire Claude Code and Codex Together

If you're running more than one AI coding agent, you've already hit the wall. Claude Code writes the feature. Codex is better at adversarial review. They're a natural pair. But getting them to actually work together requires you to manually carry output from one to the other: copy the diff, paste it in, wait, copy the feedback back. You're not orchestrating agents. You're a human clipboard.

This post shows how to replace that manual handoff with agent-to-agent task delegation using Delega's MCP server, so your coding agents coordinate directly and you review the output instead of managing the process.

Why the handoff breaks down

The standard multi-agent setup in 2026 looks something like this:

You prompt Claude Code to implement a feature
You wait for it to finish
You manually pass the result to a second agent for review
You collect the review and bring it back

Step 3 is the bottleneck. It's not that agents can't do the work. The problem is that there's no coordination layer between them. Agents don't have a shared task queue. They can't create work for each other. They can't signal completion or pass structured results. So you end up doing it yourself, which defeats the point of having multiple agents.

The underlying issue is that AI agents were built to be used, not to use each other. They have tools for interacting with the world (file systems, APIs, browsers) but they don't have standard primitives for delegating work to other agents.

That's the gap Delega fills.

The architecture: task-based agent coordination

Delega is a task infrastructure layer for AI agents. Each agent gets its own API key and its own identity in the system. When an agent creates a task labeled @codex, it's creating a work item that only Codex's key can claim. When Codex completes it, the result is written back into the task's context blob, a structured JSON field that the creating agent can read.

No webhooks. No message passing. No shared memory. Just tasks in a queue.

Here's what the MCP multi-agent workflow looks like end to end:

Claude Code → creates task(@codex) → Codex picks it up → marks complete with findings → Claude Code reads result

The handoff is asynchronous and auditable. The task record in Delega shows who created it, who claimed it, what the result was, and when each step happened.

Setting up the MCP server

The fastest way to get set up is delega init, which walks you through signup, agent creation, and MCP config in one command:

npx @delega-dev/cli init

It'll ask which MCP client you use (Claude Code, Codex, Cursor, etc.) and output the correct config format. Or configure manually:

For Claude Code (~/.claude.json):

{
  "mcpServers": {
    "delega": {
      "command": "npx",
      "args": ["@delega-dev/mcp"],
      "env": {
        "DELEGA_AGENT_KEY": "dlg_claude_key_here"
      }
    }
  }
}

For Codex (~/.codex/config.toml):

[mcp_servers.delega]
command = "npx"
args = ["@delega-dev/mcp"]

[mcp_servers.delega.env]
DELEGA_AGENT_KEY = "dlg_codex_key_here"

Each agent uses its own key. Delega uses those keys to scope task visibility: when Codex calls list_my_tasks, it only sees tasks labeled @codex. When Claude Code calls get_task, it can read the result. Keys are prefixed dlg_ and scoped per-agent at creation time.

The workflow in practice

Step 1: Claude Code creates a review task

When Claude Code finishes implementing a feature, it creates a task for Codex instead of stopping:

curl -s -X POST https://api.delega.dev/v1/tasks \
  -H "Authorization: Bearer $DELEGA_AGENT_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "content": "Review this diff for security issues, edge cases, and test coverage gaps. Flag anything that should block merge.",
    "labels": ["@codex"],
    "context": {
      "diff": "...full diff here...",
      "pr_url": "https://github.com/org/repo/pull/42",
      "feature": "rate-limited API client"
    }
  }'

Via MCP, Claude Code calls the create_task tool directly, no curl required. The task is created and Claude Code gets a task ID back.

Step 2: Codex picks it up

Codex is configured to check its task queue on each session start (or on a heartbeat schedule). When it calls list_my_tasks, the review task appears:

{
  "tasks": [{
    "id": 847,
    "content": "Review this diff for security issues...",
    "status": "pending",
    "labels": ["@codex"],
    "context": { "diff": "...", "pr_url": "..." }
  }]
}

Codex claims the task, does the review, and marks it complete with findings:

curl -s -X PATCH https://api.delega.dev/v1/tasks/847 \
  -H "Authorization: Bearer $DELEGA_AGENT_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "status": "done",
    "context": {
      "findings": [
        "Missing input validation on request body before rate limit check",
        "No test for concurrent request handling under burst limit",
        "SQL query in retry handler not parameterized"
      ],
      "recommendation": "block-merge",
      "reviewed_at": "2026-03-22T19:45:00Z"
    }
  }'

Step 3: Claude Code reads the result

Claude Code polls for task completion (get_task), reads the findings, and acts on them: either incorporating the feedback directly or creating a follow-up task for the human if the review flagged a block-merge issue.

You see the completed review in the Delega dashboard. The audit trail shows the full chain: who created the task, who reviewed it, what they found, and when.

Extending the pattern

This isn't just Claude + Codex. The same pattern works for any two-agent handoff:

Claude Code → Gemini: second opinion before a risky refactor
Codex → security scanner agent: automated SAST on every PR
Agent A → Agent B → Agent C: multi-stage pipelines where each stage creates the next task

Each agent gets its own key. Each task has a clear owner. The task record is the coordination protocol: no custom orchestration code, no shared state, no direct agent-to-agent networking.

The label convention (@agent_name) is how routing works. When you create a task with labels: ["@codex"], Codex's task queue picks it up. Change the label and the task routes to a different agent. Same API, different destination.

What you get out of it

Beyond the workflow mechanics, the task record gives you something you don't get from direct agent-to-agent calls: visibility.

Every handoff is logged. Every result is stored. If something goes wrong (if the review agent missed something, if the wrong agent claimed a task, if a task sat unclaimed for an hour) you can see exactly what happened and when. The audit trail is built into the infrastructure, not bolted on afterward.

This is the difference between agents that coordinate and agents that are coordinated. The former requires you to build and maintain the coordination layer. The latter is what Delega is for.

Get started in 30 seconds:

npx @delega-dev/cli init handles signup, agent creation, and MCP config in one step. Free tier includes 5 agents and 1,000 tasks/month.

→ Quick start guide
→ GitHub

← Back to blog