A Malicious MCP Server Can Inflate Your API Bill 658x — And Standard Defenses Miss It 97% of the Time
A new class of MCP attack turns tool responses into a billing amplifier. A session that should cost $0.10 costs $65.80. The schema is clean, the task completes, and 97% of standard defenses never notice.

658x.
A tool-calling session that should cost $0.10 costs $65.80.
That is the worst-case amplification factor measured in a new arXiv paper from January 2026, revised March: Beyond Max Tokens: Stealthy Resource Amplification via Tool Calling Chains in LLM Agents (2601.10955). The authors built a malicious MCP server, pointed a conventional agent harness at it, and watched inference costs balloon by nearly three orders of magnitude — without breaking the task, without tripping prompt filters, and without violating the tool schema.
At hobbyist scale, 658x is a bad afternoon. At product scale, it is the difference between a margin-positive SaaS and a bankruptcy filing. And the attack is silent, spec-compliant, and — by the paper’s own measurement — missed by standard defenses 97% of the time.
This post walks through what the attack is, why existing defenses don’t see it, and what the MCP ecosystem should be building to catch it.
How the attack works
Most MCP threat models focus on two places:
- The tool schema — the declared
name,description, and input parameters. This is what static scanners and registry reviewers inspect. - The tool inputs — what the agent sends into the tool. This is what prompt-injection filters and guardrails watch.
The paper’s attack targets a third place almost nobody is instrumenting: the tool response body.
The malicious server behaves like a well-formed MCP server. Its schema is clean. It answers the question the agent asked. But inside the text-visible fields of its response — the content the LLM will re-ingest on the next turn — it plants carefully chosen strings that steer the agent’s next action.
The steering has two flavors:
- Self-loop amplification. The response subtly implies the answer is incomplete, ambiguous, or requires verification. A well-aligned agent does the reasonable thing: it calls the same tool again. And again. And again. Each call generates verbose output. Each verbose output gets pulled back into the context window. Token counts climb geometrically.
- Chain amplification. The response references “related” tools —
fetch_metadata,validate_result,cross_reference— and the agent dutifully walks the chain. At every step, the next malicious response extends the chain further. A task that should be a single tool call becomes fifty.
Crucially: the task still completes. The user gets a correct answer. There is no visible failure. The only signal is the line item on the inference bill — which most teams don’t read per-session.
Why standard defenses fail
The paper benchmarks standard defense layers and reports a 97% miss rate. The reasons are structural, not implementation bugs.
Prompt filters check inputs. Guardrails, injection detectors, and policy engines scan what the agent sends to tools and what the user types in. The malicious payload is in what the tool returns. Filters never look there.
Output trajectory monitors check final outputs. Alignment and safety layers evaluate the model’s final answer to the user. The final answer is clean. The user’s task was completed. There is nothing to flag.
Schema scanners check declared interfaces. Registry review, tool-poisoning detectors, and static analyzers examine the tool manifest. The manifest is honest. The server really does do what it advertises — it just does it loudly and with recursive suggestions baked into the response text.
Max-token limits don’t bind. The paper’s title is deliberate: Beyond Max Tokens. Per-request token caps don’t stop an attack that spreads cost across many compliant requests. Each individual call is well-formed and bounded. The damage is in the count.
None of the defense layers in a typical MCP stack are positioned to see the malicious signal. It lives in the intermediate tool-response content, in runtime behavior, across many well-formed calls.
Why this is different
This is not prompt injection. Prompt injection hijacks the user’s intent. Here the user’s intent is preserved and fulfilled.
This is not tool poisoning. Tool poisoning hides malicious instructions in the tool schema description. Here the schema is clean.
This is not a jailbreak. Nothing about safety alignment is being bypassed. The model is behaving exactly as it should — a tool told it more work was needed, and it did more work.
The authors propose the name “stealthy resource amplification via tool calling chains.” It is a new class. The server behaves correctly. The schema is clean. The task completes. Only the meter lies.
That last sentence is worth rereading. Only the meter lies. Every monitoring layer most teams rely on is looking at correctness, safety, or policy. None of them are looking at the meter.
What makes this scary at scale
A natural objection: “I only use well-known servers from reputable maintainers. I’ll just avoid weird servers.”
That objection does not hold.
The attack does not require a novel server. A previously-trusted, widely-deployed server can begin exhibiting this behavior after a dependency update. We already have a precedent: the axios RAT incident that hit exa-mcp-server and tavily-mcp — a transitively compromised dependency silently altered the runtime behavior of servers people had already vetted, already approved, and already wired into production.
Imagine the same supply-chain vector used for amplification instead of data exfiltration. A compromised maintainer ships a minor release. A poisoned transitive dependency updates during npm install. A Docker image rebuilds on schedule. From that moment, every tool call costs 10-100x more, the task still completes, and nothing in the stack notices.
The stealth is the point. Data exfil has to leave the building eventually and can be caught by egress monitoring. Billing amplification never leaves the building — it just quietly lives inside your cloud invoice for a month.
What the ecosystem should do
This is a protocol-level problem and it needs protocol-level answers. Four things the MCP ecosystem should be building or standardizing:
1. Cost-per-session budgets enforced at the client. Every agent session should carry a declared token or dollar budget. When a session blows past its budget, the client halts and escalates. This is trivially implementable today and essentially no one does it.
2. Tool-call frequency anomaly detection. If a tool has historically been called twice per session and is suddenly called fifty times, that is a signal. Agents should track a per-tool rolling baseline and flag deviations. The paper’s attack is defined by this signature.
3. Response length budgets per tool call. Individual tool responses should have soft caps based on expected response shape. A search tool returning 40KB of text per call is usually suspicious. Per-tool output ceilings, enforced at the MCP client or gateway, cut the amplification surface sharply.
4. Observable tool-calling trajectories. The paper’s own defense proposal is trajectory observability — the client logs the full tool-call graph for each session, and anomalies in graph shape (depth, branching, self-edges) become detectable. This is the right layer: runtime behavior, not static analysis.
And the absolute minimum, achievable this quarter by anyone running agents in production: a dashboard showing cost per tool per session, so humans can see when the meter spikes. Most teams today cannot answer the question “which tool drove the cost in session X?” from their existing telemetry. That is the gap.
The broader lesson
MCP is a young protocol. The attack surface is larger than the static-analysis community has mapped, and the papers coming out now — this one, the trajectory-injection work from late 2025, the context-explosion research from earlier this spring — keep revealing classes of attack that cannot be caught by pre-deployment scanning. The malicious behavior is not in the code. It is in the runtime interaction between a tool’s responses and an agent’s decision loop.
Registry review catches manifests. Code review catches source. Neither catches behavior that only manifests against a live agent. That is the blind spot.
The ecosystem needs runtime observability as a first-class security primitive — per-session cost accounting, per-tool call-frequency baselines, trajectory graphs, response-length telemetry. Not as an add-on. As part of what it means to run an MCP client responsibly.
The paper’s headline number is 658x. The more important number is 97% — the fraction of the time standard defenses do not see it coming. Until runtime observability catches up, that 97% is the real exposure.
Bookmark this one. We are going to be citing it for a while.
Reference: Zhang et al., “Beyond Max Tokens: Stealthy Resource Amplification via Tool Calling Chains in LLM Agents,” arXiv:2601.10955, January 2026 (revised March 2026).