Your coding agent is in flow. It's deep into implementing a rate-limited API client. Then it hits a question: what's the current burst limit on the OpenAI completions endpoint? Has the streaming API behavior changed in recent SDK versions? Is there a known issue with the retry logic in httpx 0.27?
Here's what happens next, and why both options are bad.
Option A: The agent stops to research. It burns 5-10 minutes on web searches, processes the results, and tries to reconstruct context when it comes back. Some of that context is gone. The momentum is broken. What should have been a 30-minute implementation session becomes 90 minutes of interrupted flow.
Option B: The agent guesses. It uses training data that could be months out of date. It produces code based on a rate limit that changed in last quarter's API update, or a function signature that was deprecated in the last SDK release. The code ships. The bug appears in production.
Neither is acceptable. And both are unnecessary.
The problem is sequential execution
The reason your coding agent stops to research is that it has no way to do anything else. It's a single-threaded workflow: question arises, agent handles question, agent continues. There's no mechanism for saying "handle this in the background while I keep going."
This is a coordination problem, not a capability problem. Your coding agent can write good code. A research agent with web search tools can find accurate, current information. The gap is that they can't work in parallel without a shared task infrastructure.
AI agent task delegation is the pattern that fixes this. Instead of the coding agent doing the research itself, it delegates the research question as a task to a dedicated research agent, then continues building on the parts of the implementation that don't depend on that answer. When the research task completes, the answer is waiting in a structured result blob. The coding agent reads it and fills in the gaps.
Parallel execution. No blocking. Accurate answers without stopping.
What you need
- A coding agent (Claude Code, Codex, or any MCP-compatible agent)
- A research agent with web search capabilities (Perplexity, Brave Search, or similar)
- Delega MCP server configured for both agents
The Delega MCP server is the coordination layer. It gives each agent access to a shared task queue with proper identity separation: each agent has its own API key, its own label, and its own view of the task list.
Setting up the MCP multi-agent workflow
Get set up in 30 seconds:
npx @delega-dev/cli init
This walks you through signup, agent creation, and MCP config for your editor of choice. Run it once per agent (coder and researcher each get their own key).
Manual config if you prefer:
Coding agent (~/.claude.json):
{
"mcpServers": {
"delega": {
"command": "npx",
"args": ["@delega-dev/mcp"],
"env": {
"DELEGA_AGENT_KEY": "dlg_coder_key_here"
}
}
}
}
Research agent (same pattern, different key):
{
"mcpServers": {
"delega": {
"command": "npx",
"args": ["@delega-dev/mcp"],
"env": {
"DELEGA_AGENT_KEY": "dlg_researcher_key_here"
}
}
}
}
The label is how task routing works. A task labeled @researcher only appears in the researcher agent's task queue. The coding agent never sees it, it just gets the result when the task completes.
The delegation flow
1. Coding agent creates a research task
Instead of pausing to search, the coding agent creates a Delega task and keeps building:
# Via API (or via MCP create_task tool)
curl -s -X POST https://api.delega.dev/v1/tasks \
-H "Authorization: Bearer $DELEGA_AGENT_KEY" \
-H "Content-Type: application/json" \
-d '{
"content": "What are the current rate limits for OpenAI gpt-4o completions API? Need: RPM, TPM, and burst window behavior for Tier 1. Source required.",
"labels": ["@researcher"],
"context": {
"urgency": "high",
"blocking": "api-client-retry-logic",
"check_date": "2026-03-22"
}
}'
The task ID comes back immediately. The coding agent records it and continues implementing the parts of the client that don't depend on the rate limit values: the request builder, the response parser, the error types.
2. Research agent picks it up
The research agent, configured to check its task queue on each run, calls list_my_tasks and finds the pending research request:
{
"tasks": [{
"id": 891,
"content": "What are the current rate limits for OpenAI gpt-4o completions API?...",
"status": "pending",
"labels": ["@researcher"],
"context": {
"urgency": "high",
"blocking": "api-client-retry-logic"
}
}]
}
It claims the task, runs the research using its web search tools, and writes the result back:
curl -s -X PATCH https://api.delega.dev/v1/tasks/891 \
-H "Authorization: Bearer $DELEGA_AGENT_KEY" \
-H "Content-Type: application/json" \
-d '{
"status": "done",
"context": {
"result": {
"rpm": 500,
"tpm": 30000,
"tier": "Tier 1",
"burst": "Up to 3x RPM for windows under 10s",
"source": "platform.openai.com/docs/guides/rate-limits",
"verified_at": "2026-03-22T19:30:00Z"
}
}
}'
3. Coding agent reads the result
When the coding agent is ready to implement the rate limit logic, it calls get_task with the task ID it recorded earlier. If the task is complete, it gets the result. If it's still pending, it can continue on other work and check again. No blocking. No waiting in place.
The final implementation uses accurate, sourced, dated values, not training data, not guesses:
# Rate limits sourced from Delega task #891, verified 2026-03-22
RATE_LIMIT_RPM = 500 # Tier 1
RATE_LIMIT_TPM = 30_000 # Tier 1
BURST_MULTIPLIER = 3.0 # Up to 3x for sub-10s windows
BURST_WINDOW_SECONDS = 10
Why this works better than alternatives
Compared to the agent doing research itself: The coding agent stays focused. It doesn't have to context-switch into "researcher mode." Its token budget goes toward the implementation, not the search. Separation of concerns produces better output from both agents.
Compared to sequential agent runs: You don't have to wait for research to finish before coding starts. The research happens in parallel. Total wall-clock time is shorter.
Compared to hard-coding the context: You don't have to front-load your prompt with documentation that may be outdated. Research happens at task creation time with current data.
Compared to custom orchestration code: There's no custom code to write or maintain. The task queue is the coordination protocol. Both agents use the same MCP interface they already know.
Scaling the pattern
One research agent is a start. The same pattern handles more complex delegations:
Multi-step research: The research agent creates a child task for a second, more specialized researcher. The coding agent only ever sees the final answer.
Competitive research: While coding agent A builds the feature, research agent B is comparing your implementation approach to how three competitors solved the same problem. The brief arrives before you need it.
Validation: After implementing, the coding agent creates a validation task asking the researcher to verify that the approach matches current best practices. The researcher comes back with a thumbs-up or a flag before the PR is opened.
Each of these patterns uses the same primitives: create task, label it, read the result. No new infrastructure, no new APIs, no orchestration framework.
Get started in 30 seconds:
Run npx @delega-dev/cli init to sign up and configure your MCP client. The free tier covers 5 agents and 1,000 tasks/month.
→ Quick start guide
→ Full docs
→ GitHub