Docs Quick Start Pricing Blog Get Started
Blog Multi-Agent CI Pipeline

Build a Multi-Agent CI Pipeline That Fixes Itself

CI failures that fire at 2 AM and wait until morning for a human to triage are a solved problem. This guide shows how to wire a multi-agent CI pipeline using Delega so agents diagnose, delegate, and fix without waiting for you.

Your CI pipeline fails at 2:17 AM. Here's what happens in most teams:

A Slack notification fires. Nobody sees it until 8 AM. Someone triages at 9 AM, runs the test manually, realizes it's a flaky race condition, reruns the suite, passes, closes the alert. The release was delayed seven hours. Two engineers spent 40 minutes on something that required five minutes of actual attention and a one-line fix.

Or worse: it's not flaky. It's a real regression. A critical API endpoint started returning 500s after the last deploy. The alert fired at 2 AM, nobody triaged until morning, and by the time anyone looked at it, customers had been hitting errors for six hours.

The problem isn't the tooling. It's the gap between notification and action.

Modern CI systems are excellent at detecting failures. They're not designed to do anything about them. They produce alerts. Alerts require humans to read them, classify them, decide what to do, and take action. That's a slow loop, especially at 2 AM.

AI agent task delegation closes this loop. Instead of a Slack notification that waits for a human, a CI failure creates a structured task that an agent picks up immediately, regardless of the time.


The architecture

The pattern uses three agents, each with a distinct role:

  • CI system: Creates tasks when tests fail (any CI: GitHub Actions, GitLab, CircleCI)
  • Diagnosis agent: Triages failures, classifies them (flaky vs. real regression), and delegates to the right handler
  • Coding agent: Investigates and fixes real regressions, creates PRs

Each agent has its own Delega API key and its own label in the system. Tasks route by label. The task record is the audit trail. No direct agent-to-agent communication: just a shared task queue with proper identity separation.

Setting up the pipeline

Set up agents with delega init:

# Run once per agent: handles signup, key creation, and MCP config
npx @delega-dev/cli init

Or create keys manually in the dashboard and configure each agent's MCP server:

{
  "mcpServers": {
    "delega": {
      "command": "npx",
      "args": ["@delega-dev/mcp"],
      "env": {
        "DELEGA_AGENT_KEY": "dlg_coding_key_here"
      }
    }
  }
}

Same pattern for each agent (CI hook, diagnosis, coding), each with its own key.

Step 1: CI creates a task on failure

In your CI pipeline, add a failure hook that creates a Delega task instead of (or in addition to) sending a Slack notification.

GitHub Actions example:

name: Test Suite
on: [push, pull_request]

jobs:
  test:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - name: Run tests
        id: tests
        run: pytest tests/ -v

      - name: Create Delega task on failure
        if: failure()
        env:
          DELEGA_AGENT_KEY: ${{ secrets.CI_DELEGA_KEY }}
        run: |
          curl -s -X POST https://api.delega.dev/v1/tasks \
            -H "Authorization: Bearer $DELEGA_AGENT_KEY" \
            -H "Content-Type: application/json" \
            -d '{
              "content": "CI failure: ${{ github.repository }} / ${{ github.ref_name }}",
              "labels": ["@diagnosis"],
              "context": {
                "repo": "${{ github.repository }}",
                "branch": "${{ github.ref_name }}",
                "commit": "${{ github.sha }}",
                "run_id": "${{ github.run_id }}",
                "log_url": "https://github.com/${{ github.repository }}/actions/runs/${{ github.run_id }}"
              }
            }'

Generic CI hook (any CI system):

#!/bin/bash
# ci-failure-hook.sh

create_delega_task() {
  local repo="$1"
  local branch="$2"
  local commit="$3"
  local log_url="$4"

  curl -s -X POST https://api.delega.dev/v1/tasks \
    -H "Authorization: Bearer $DELEGA_AGENT_KEY" \
    -H "Content-Type: application/json" \
    -d "{
      \"content\": \"CI failure: ${repo} on ${branch}\",
      \"labels\": [\"@diagnosis\"],
      \"context\": {
        \"repo\": \"${repo}\",
        \"branch\": \"${branch}\",
        \"commit\": \"${commit}\",
        \"log_url\": \"${log_url}\",
        \"failed_at\": \"$(date -u +%Y-%m-%dT%H:%M:%SZ)\"
      }
    }"
}

create_delega_task "$CI_REPO" "$CI_BRANCH" "$CI_COMMIT_SHA" "$CI_LOG_URL"

The task lands in the @diagnosis queue with full context: repo, branch, commit SHA, log URL, timestamp. The diagnosis agent doesn't have to go find this information: it's all in the task.

Step 2: Diagnosis agent triages

The diagnosis agent runs on a schedule or is triggered by a webhook. It checks its task queue, fetches the CI logs, and classifies the failure.

Diagnosis logic (via MCP tools):

# Pseudocode for the diagnosis agent's decision tree
# The agent calls these via MCP tools, not directly

# 1. Claim the task
task = delega.claim_task(task_id)

# 2. Fetch the CI logs
logs = fetch_url(task.context.log_url)

# 3. Classify the failure
if is_flaky_test(logs):
    # Trigger a rerun and mark done
    trigger_ci_rerun(task.context.run_id)
    delega.complete_task(task_id, {
        "classification": "flaky",
        "action": "rerun-triggered",
        "confidence": 0.87
    })

elif is_environment_issue(logs):
    # Infrastructure problem, page the human
    delega.complete_task(task_id, {
        "classification": "infrastructure",
        "action": "human-required",
        "reason": "Disk full on test runner"
    })
    notify_human(task)

else:
    # Real regression: delegate to coding agent
    child_task_id = delega.create_task({
        "content": f"Regression in {task.context.repo}: investigate and fix",
        "labels": ["@coding"],
        "parent_task_id": task_id,
        "context": {
            "repo": task.context.repo,
            "branch": task.context.branch,
            "commit": task.context.commit,
            "diagnosis": extract_diagnosis(logs),
            "suspect_files": identify_changed_files(task.context.commit),
            "failure_pattern": classify_failure_type(logs)
        }
    })

    delega.complete_task(task_id, {
        "classification": "regression",
        "action": "delegated",
        "child_task_id": child_task_id
    })

Via MCP, the diagnosis agent does all of this using Delega's built-in tools. The create_task tool handles the delegation. The parent-child relationship is recorded automatically.

Step 3: Coding agent investigates and fixes

The coding agent's task queue now contains a well-structured investigation request:

{
  "id": 923,
  "content": "Regression in org/api-service: investigate and fix",
  "labels": ["@coding"],
  "parent_task_id": 917,
  "context": {
    "repo": "org/api-service",
    "branch": "main",
    "commit": "a3f9c21",
    "diagnosis": "AssertionError in test_rate_limiter.py::test_burst_window",
    "suspect_files": ["src/rate_limiter.py", "tests/test_rate_limiter.py"],
    "failure_pattern": "assertion_error"
  }
}

The agent clones the repo, checks out the commit, reads the suspect files, runs the failing test locally, traces the bug, fixes it, adds a regression test, and opens a PR:

# After the coding agent opens the PR, it marks its task complete
curl -s -X PATCH https://api.delega.dev/v1/tasks/923 \
  -H "Authorization: Bearer $DELEGA_AGENT_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "status": "done",
    "context": {
      "fix": "Off-by-one in burst window calculation",
      "pr_url": "https://github.com/org/api-service/pull/88",
      "regression_test_added": true,
      "fixed_at": "2026-03-22T02:41:00Z"
    }
  }'

What you wake up to

Instead of an unread Slack notification from 2 AM, you open the Delega dashboard and see:

Task #917 - CI failure: org/api-service on main
  Status: done
  Classification: regression
  Child: Task #923

Task #923 - Regression in org/api-service: investigate and fix
  Status: done
  Fix: Off-by-one in burst window calculation
  PR: github.com/org/api-service/pull/88
  Fixed at: 02:41 AM

The root cause is documented. The fix is in a PR ready for your review. The regression test is written. You didn't have to page anyone. The system handled it.

You review the PR, verify the fix makes sense, and merge it. Total human time: 10 minutes. Total delay from failure detection to fix: 24 minutes. The release ships on schedule.

Handling edge cases

What if the coding agent can't fix it? It marks the task with "action": "human-required" and includes its diagnosis. You get paged with the full context, not a raw log URL, but a structured explanation of what the agent found and why it couldn't fix it automatically.

What if the same test keeps failing? The CI hook can track consecutive_failures in the task context. After 3+ failures, the diagnosis agent can escalate directly to human review instead of attempting another automated fix.

What if the diagnosis agent misclassifies? The coding agent will find no bug to fix. It marks the task with "classification": "no-regression-found" and suggests a rerun. The feedback is recorded in the task chain and improves future classification.

The audit trail is the feature

The most underrated part of this architecture isn't the automation: it's the record.

Every CI failure has a corresponding task chain in Delega. You can see exactly what the diagnosis agent found, what it delegated, what the coding agent produced, and how long each step took. Over time, this becomes a searchable history of every incident your system has handled.

That history is training data for better diagnosis logic. It's documentation for postmortems. It's evidence that the system is working when you need to justify the infrastructure investment.

Notifications disappear. Task records don't.

Get started in 30 seconds:

Run npx @delega-dev/cli init to sign up and configure your agents. Free tier: 5 agents, 1,000 tasks/month.

→ Quick start guide
→ Full docs
→ GitHub

← Back to blog