/blog
securityMCPsandboxingagentic-ai

Securing MCP Servers: From Tool Poisoning to Filesystem Sandboxing

The MCP security landscape has evolved through three waves: protocol scanning, traffic proxying, and OS-level sandboxing. Here's the full map of projects and where the frontier is heading.

Algis Dumbris
Securing MCP Servers: From Tool Poisoning to Filesystem Sandboxing

When Invariant Labs published the first tool poisoning research in early 2025, MCP security meant scanning tool descriptions for injection payloads. Eighteen months later, the frontier has moved to kernel-level filesystem sandboxing, microVM isolation, and syscall interception. The threat model has evolved from “the tool description is malicious” to “the tool process itself is untrusted code.”

This post maps the full landscape — every significant project, the three waves of the security evolution, and where the next frontier lies.

Wave 1: Protocol Security

The first wave of MCP security focused on what happens at the protocol level — the tool definitions, descriptions, and schemas that flow between servers and clients.

The core vulnerability: tool poisoning. An MCP server can embed instructions in tool descriptions that manipulate the agent’s behavior. Since LLMs treat tool descriptions as part of their context, a poisoned description can instruct the agent to prefer the malicious tool, exfiltrate data, or ignore safety constraints. Closely related attacks include tool shadowing (a malicious tool mimicking a legitimate one’s name) and rug-pull attacks (tools changing behavior after initial review).

mcp-scan (now Snyk agent-scan) was the first tool to address this systematically. Originally built by Invariant Labs, it was acquired by Snyk and rebranded as part of their Evo agent security platform. With 1,800+ stars, it scans AI agent configurations for 15+ security risks — prompt injection, tool poisoning, tool shadowing, toxic data flows, malware payloads, and credential exposure. It auto-discovers configurations from Claude, Cursor, Windsurf, and Gemini CLI, running in both one-shot scan and continuous monitoring modes.

SlowMist’s MCP-Security-Checklist (811 stars) takes a different approach — a comprehensive audit checklist with 70+ security checkpoints across MCP server security, client/host security, LLM integration, and multi-MCP coordination. Each checkpoint is rated by priority (mandatory, strongly recommended, or optional). Originally from the blockchain security community, it reflects the thoroughness that comes from an industry where security failures mean direct financial loss.

The Cloud Security Alliance’s ModelContextProtocol-Security initiative provides institutional backing. Their 10-part hardening guide, operations guide, and reference patterns are paired with practical tools: mcpserver-audit for automated assessment, mcpserver-finder for discovery, and a vulnerability database tracking MCP-specific CVEs. Bi-weekly technical meetings keep the working group aligned.

Protocol-level scanning remains necessary — you still need to detect poisoned tool descriptions — but it is not sufficient. A tool that passes every protocol-level scan can still execute rm -rf / or exfiltrate credentials through the filesystem. The protocol tells you what a tool claims to do. The operating system tells you what it actually does.

Three waves of MCP security evolution: protocol, traffic, and OS-level

Wave 2: Traffic Security

The second wave recognized that scanning tool definitions at rest is not enough — you need to inspect tool calls in flight.

MCP-Defender (245 stars) is a desktop application that acts as a transparent proxy between MCP clients and servers. Built with TypeScript and Electron, it intercepts every tool call from Cursor, Claude, VS Code, and Windsurf. It uses signature-based threat detection and presents users with allow/block choices for suspicious calls. Think of it as a personal firewall for MCP traffic.

mcp-guardian (193 stars) takes the same proxy approach but targets developers who want more control. Built in Rust and TypeScript, it provides message logging with traces, manual tool call approval workflows, and (coming soon) automated safety scanning. Its modular architecture — separate core, CLI, and proxy components — makes it embeddable in larger systems.

Tencent AI-Infra-Guard (3,200+ stars) is the most comprehensive platform in this space. Built by Tencent’s Zhuque Lab, it combines five scanning modules: framework security analysis, multi-agent assessment, MCP server scanning (14+ risk categories), AI infrastructure vulnerability scanning (600+ CVEs across 40+ frameworks), and jailbreak evaluation. Version 4.0 expanded from infrastructure scanning to full autonomous agent ecosystem coverage.

MCP proxies like MCPProxy sit at the intersection of traffic security and the emerging third wave. By routing all tool calls through a single point, a proxy can enforce policies (schema quarantine, BM25 pre-filtering, agent token permissions) that neither the client nor the server implement on their own. But even the most sophisticated protocol-level proxy cannot prevent a compromised server process from accessing the filesystem directly.

Wave 3: OS-Level Sandboxing

The third wave — the current frontier — treats MCP server processes as untrusted code that needs operating system-level containment. The question is no longer “is this tool call safe?” but “what can this process actually touch?”

nono: Kernel-Enforced Agent Sandboxing

nono (980 stars) is perhaps the most architecturally interesting project in this space. Created by Luke Hinds — the creator of Sigstore, which secures the supply chains of PyPI, npm, Homebrew, and Maven Central — nono provides kernel-enforced sandboxing for AI agents using Landlock on Linux and Seatbelt on macOS.

The design is capability-based: an agent’s sandbox starts with no permissions, and specific capabilities must be explicitly granted before execution. Network access is filtered through a local proxy with host allowlisting. Credentials are injected via proxy (keeping API keys outside the sandbox) or pulled from system keychains. File system access is restricted to explicitly permitted paths.

What sets nono apart is its supply chain security heritage. Instruction files can be cryptographically attested via Sigstore, ensuring that sandbox configurations have not been tampered with. Content-addressable snapshots enable atomic rollback. Destructive commands (rm, dd, sudo) are blocked by default.

nono ships with built-in profiles for Claude Code, Codex, OpenCode, and other AI agents. SDKs are available in Rust, Python, and TypeScript. It is currently in early alpha — not recommended for production until v1.0 — but the architecture demonstrates where agent sandboxing is heading.

Anthropic’s sandbox-runtime

Anthropic’s own sandbox-runtime (srt) powers Claude Code’s sandboxing. It uses bubblewrap (a user-space container tool) combined with seccomp BPF filtering. Network access is blocked by default with domain allowlisting. Filesystem writes are restricted to specific paths. Protected paths (.bashrc, credentials files) are auto-blocked, and symlink attacks are prevented.

This is the most battle-tested approach — it runs in production for every Claude Code user. But it is Linux-only and tightly coupled to Claude Code’s execution model.

microsandbox: MicroVM Isolation

microsandbox (4,900+ stars) takes the most aggressive isolation approach: hardware-level virtual machine isolation instead of containers or process sandboxing. Each sandbox runs in a lightweight VM (microVM) with boot times under 200 milliseconds. It is OCI-compatible, so it runs standard container images.

The isolation guarantee is fundamentally stronger than any container or process sandbox. A kernel exploit in a container can escape to the host. A kernel exploit in a microVM is contained by the hypervisor boundary. The tradeoff is resource overhead — even a lightweight VM consumes more memory and CPU than a process sandbox.

agent-infra/sandbox: Docker All-in-One

agent-infra/sandbox (3,000+ stars) provides a Docker-based environment that combines a browser (VNC + CDP), shell, file operations, MCP servers, and VS Code Server in a single container. Files downloaded in the browser are instantly available in the shell through a shared filesystem.

This is the pragmatic approach: do not try to build a custom sandbox — use Docker’s existing isolation. SDKs in Python, TypeScript, and Go make integration straightforward. Kubernetes support with resource limits enables production deployment. The tradeoff is that it runs with seccomp=unconfined, relaxing seccomp restrictions for flexibility.

Server rack security with physical access controls

The Linux Sandboxing Stack

For teams building their own MCP server sandboxing, the Linux kernel provides two complementary primitives:

Landlock: Filesystem Enforcement

Landlock is a Linux Security Module that enables unprivileged process sandboxing. Unlike AppArmor or SELinux, which require root configuration, Landlock allows any process to voluntarily restrict its own capabilities. A process creates a ruleset, adds rules for specific filesystem paths (read, write, execute, create), and applies them via landlock_restrict_self(). The restrictions are irreversible and hereditary — child processes inherit them but can add more.

HashiCorp Nomad uses Landlock (via the go-landlock library) to sandbox artifact downloads. The sandbox grants read/write/create only to allocation and task directories, read/execute to system binary paths, and nothing else. This prevents a compromised artifact downloader from touching anything outside its designated directories.

seccomp User Notifications: Syscall Monitoring

seccomp’s SECCOMP_RET_USER_NOTIF action (Linux 5.0+) forwards intercepted syscalls to an external supervisor process. When a monitored process hits a matching syscall, it blocks in the kernel while the supervisor receives full details — the syscall number, arguments, and calling PID. The supervisor can allow, deny, or emulate the call.

LXD uses this to emulate mknod(), mount(), and connect() for unprivileged containers. For MCP server monitoring, the same mechanism could intercept file open, network connect, and process execution syscalls, building a real-time audit trail of what the server actually does.

Linux sandboxing stack: Landlock for enforcement, seccomp for monitoring

The critical caveat: there is a TOCTOU (time-of-check-time-of-use) race condition. Between receiving syscall details and responding, another thread can rewrite pointer arguments. seccomp user notifications cannot safely implement security policy on syscalls that dereference user-space pointers. They are effective for monitoring and logging, not for enforcement of pointer-argument syscalls.

The macOS Challenge

macOS sandboxing is harder. The Endpoint Security framework provides comprehensive process, file, and network monitoring with both authorization (block before execution) and notification (log after the fact) events. But it requires an entitlement that must be requested from Apple and is granted only to recognized security vendors. This makes it impractical for third-party agent sandboxing.

The Seatbelt sandbox (used by nono) provides process-level sandboxing via SBPL profiles. Apple deprecated the public sandbox-exec command but the kernel mechanism still works. It is the most viable option for macOS agent sandboxing but lacks the fine-grained control of Linux’s Landlock.

DYLD_INSERT_LIBRARIES can inject monitoring code into unsigned or self-signed processes (like most MCP server binaries). But System Integrity Protection blocks injection into Apple-signed binaries, and Hardened Runtime blocks it by default for any binary with that entitlement.

There is no cross-platform solution. Each operating system needs a different sandboxing backend, which is why projects like nono implement separate Landlock and Seatbelt backends.

What You Should Do Now

The three waves are not sequential — they are cumulative. You need all three layers:

  1. Protocol scanning. Use mcp-scan/agent-scan to audit your MCP server configurations for known attack patterns. The SlowMist checklist provides a comprehensive audit framework.

  2. Traffic proxying. Route MCP connections through a proxy that can enforce policies, quarantine new tools, and log all tool calls. This gives you visibility into what tools are being invoked and with what arguments.

  3. Process sandboxing. Run MCP server processes in sandboxes that restrict filesystem access, network connectivity, and process capabilities. Docker isolation is the pragmatic starting point. Landlock-based sandboxing (via nono or custom implementation) provides stronger guarantees on Linux without container overhead.

  4. Monitor actual behavior. Protocol-level inspection tells you what a tool claims to do. OS-level monitoring tells you what it actually does. The gap between claimed and actual behavior is where attacks live.

The MCP security ecosystem is maturing rapidly. Eighteen months ago, the only defense was reading tool descriptions carefully. Today, there are production-grade tools for every layer of the stack. The question is no longer whether to secure your MCP infrastructure — it is which layers you can afford to skip. The answer, increasingly, is none of them.