/blog
mcpsecurityregistrysupply-chainopenclawClawHavoc

341 Malicious Skills, Zero Registry Checks: What OpenClaw's ClawHavoc Means for MCP

In January 2026, 341 malicious skills infiltrated OpenClaw's official registry. The MCP ecosystem faces the same structural vulnerability — and scanning alone won't fix it.

Algis Dumbris
341 Malicious Skills, Zero Registry Checks: What OpenClaw's ClawHavoc Means for MCP

341 Malicious Skills, Zero Registry Checks: What OpenClaw’s ClawHavoc Means for MCP

Between January 27 and January 29, 2026, an attacker uploaded 341 malicious skills to ClawHub, OpenClaw’s official skill registry. Each skill was packaged with professional documentation, plausible names, and clean metadata. “solana-wallet-tracker.” “csv-data-parser.” “api-health-monitor.” Names that blend into any registry without raising a question. Inside each one: keyloggers targeting Windows machines, Atomic Stealer malware targeting macOS, and exfiltration routines that activated on first use.

At the time of detection, ClawHub hosted approximately 2,857 skills. The 341 malicious entries represented roughly 12% of the entire registry. One in eight skills in the official distribution channel was actively hostile.

The attack — now tracked as ClawHavoc — was not a zero-day exploit or a sophisticated supply chain compromise. It was a publishing operation. The attacker published packages. The registry accepted them. Users installed them. The pipeline between “author uploads a skill” and “user’s machine is compromised” had no gate, no review, no delay. It was a straight line.

This happened to OpenClaw. It has not happened to MCP yet. But the structural conditions that enabled ClawHavoc exist in every MCP registry operating today.

Malicious packages moving through an unchecked supply chain

How 341 Skills Got Through

ClawHub functioned like most package registries. An author creates an account, packages a skill according to the spec, uploads it, and the skill appears in the registry. Users browse or search, find a skill that matches their need, and install it.

The attack exploited exactly one property of this system: the registry did not distinguish between “listed” and “verified.” Listing a skill and endorsing it were the same operation. Appearing in the registry was the trust signal. There was no second step.

The 341 skills passed every automated check ClawHub applied. They conformed to the packaging spec. They had valid manifests. Their documentation was thorough — in some cases, better than documentation for legitimate skills in the same registry. The malicious payloads were not in the manifest metadata that automated tools inspect. They were in the runtime behavior that no one observed until a user executed the skill and their credentials started moving to an external server.

Two days after the initial uploads, on January 29, the campaign was detected and the skills were pulled. On January 30, a separate vulnerability — CVE-2026-25253 — was disclosed: a one-click RCE through malicious links that exploited unvalidated URL parameters in OpenClaw’s Control UI. And on January 31, a Censys scan found 21,639 publicly exposed OpenClaw instances, many with misconfigured settings leaking API keys, OAuth tokens, and plaintext credentials.

The three events — ClawHavoc, the RCE vulnerability, and the exposure scan — painted a picture of an ecosystem where the registry, the UI, and the deployment defaults all assumed trust that was not warranted.

The MCP Parallel

The MCP ecosystem is building the same architecture that ClawHavoc exploited.

MCPNest currently indexes 7,561 MCP servers. The official MCP registry is growing. Smithery, Glama, and other discovery platforms list thousands more. Each of these registries serves the same function ClawHub served: a place where authors publish servers and users find them.

The trust model is the same. An MCP server author packages a server, publishes it to a registry, and users discover and connect to it. The server exposes tools. The user’s agent calls those tools. Between “author publishes server” and “agent executes tool on user’s machine,” the structural gap is identical to the one ClawHavoc exploited.

Consider what a ClawHavoc-style attack looks like against an MCP registry. An attacker publishes a server called “notion-sync” or “slack-summarizer.” The server exposes legitimate-looking tools — list_pages, search_messages, create_summary. The tools work. They return plausible results. But the server implementation also reads environment variables, scans the local filesystem for credential files, and exfiltrates the contents through an outbound HTTPS request embedded in the tool’s normal operation.

The server passes any static analysis. Its tool definitions are clean. Its manifest is valid. Its documentation is professional. The malicious behavior is in the runtime — in what the server does when it actually executes, not in what it declares.

Every MCP server in every registry is a potential ClawHub entry. The attack model is not hypothetical. It was demonstrated at scale in January. The only variable is when someone applies it to MCP.

Why “Scan Before Install” Is Not Enough

The natural response to ClawHavoc is “we need better scanning.” Several teams are already working on this. Invariant Labs’ mcp-scan has analyzed over 8,400 MCP servers and found 87 security issues in officially-maintained repositories. AgentSeal and other static analysis tools are expanding their MCP coverage. This work matters. It raises the floor. But it does not close the gap that ClawHavoc exploited.

Static scanners find signature-based vulnerabilities: SQL injection patterns, command injection sinks, missing authentication checks, dangerous default configurations. These are the patterns that generated ten MCP CVEs through the STDIO transport alone. Scanners are excellent at this class of problem.

ClawHavoc was a different class of problem. The malicious skills were not vulnerable — they were hostile. They did not have security bugs that an attacker could exploit. They were the attacker’s code, working exactly as the attacker intended. A scanner looking for injection patterns, missing auth, or insecure defaults would find nothing wrong. The code was well-structured. The patterns were clean. The malice was in the intent, not the implementation.

This is the fundamental limitation of install-time scanning: it can verify that code does not contain known-bad patterns, but it cannot verify that code does what it claims to do and nothing more. A server that lists tools for “Notion page management” but also reads ~/.aws/credentials at startup is not exhibiting a vulnerability. It is exhibiting a feature — a feature the author intended and the scanner has no basis to flag.

Behavioral analysis — observing what a server actually does at runtime rather than what its code looks like at rest — gets closer to the problem. But behavioral analysis requires execution, and execution requires a controlled environment. You cannot behaviorally analyze a server on a user’s production machine. By the time the server runs on the user’s machine, the analysis window has closed. Unit 42’s MCPTox benchmark confirms the limitation from a different angle: even when static analysis catches every known pattern, sampling injection attacks achieve a 72.8% success rate on frontier models by operating entirely at runtime.

The gap is structural, not technological. Scanning is a necessary layer. It is not a sufficient one.

The gate between discovery and execution — quarantine as architectural control

What the Ecosystem Needs

ClawHavoc and the MCP registry parallel point to the same set of structural requirements. No single tool or product addresses all of them. They are architectural properties that the ecosystem needs to develop collectively.

Admission control between listing and activation. The core failure in ClawHub was that listing and activation were the same operation. A server appears in the registry, and it is immediately available for use. The fix is a gate between discovery and execution — a quarantine period where a newly listed server is visible but not executable until it passes a verification step. This is the pattern that package registries like npm have moved toward with provenance attestations, and that container registries implement with admission controllers. In the MCP context, gateway proxies like MCPProxy implement quarantine-by-default, where new servers must be explicitly promoted before agents can invoke their tools. The principle is not specific to any product: new entries should not be trusted simply because they are listed.

Runtime behavioral monitoring. If install-time scanning cannot detect behavioral malice, runtime monitoring must fill the gap. This means observing what MCP servers actually do when they execute: what network connections they open, what filesystem paths they access, what environment variables they read, what data leaves the process boundary. Sandboxing technologies — containers, VMs, seccomp profiles, filesystem namespaces — provide the containment. Behavioral baselines provide the detection. A server that claims to manage Notion pages but opens connections to an unknown IP address is exhibiting anomalous behavior regardless of what its code looks like statically.

Cryptographic signing and provenance. The ClawHavoc attacker published skills under throwaway accounts. There was no way to verify the publisher’s identity, no chain of trust from the package to a known entity, no cost to creating a new publishing identity after the old one was burned. Cryptographic signing — where the publisher signs the package with a key tied to a verified identity — raises the cost of this attack from zero to non-trivial. It does not eliminate the risk. A verified publisher can still publish malicious code. But it creates accountability and makes throwaway identities expensive.

Community review and reputation. Automated tools catch patterns. Humans catch intent. A community review process — where experienced developers inspect new registry entries before they are promoted to a trusted tier — adds a layer that machines cannot replicate. This does not scale to every package in every registry. But it can scale to a curated subset: a “verified” tier where listings have been reviewed by humans with domain expertise. The npm ecosystem’s approach of distinguishing between published packages and packages under organizational namespaces with verified publishers is one model. The Python Package Index’s Trusted Publishers framework is another.

Transparency about what was checked and what was not. Users need to know the verification status of every server they install. “This server was listed three hours ago and has not been reviewed” is a different signal than “this server has been in the registry for six months, has been scanned by three independent tools, and has been reviewed by two maintainers.” Most registries currently do not surface this information. ClawHub did not. Users had no way to distinguish between a skill that had been in the registry for a year and one that had been uploaded that morning.

The Registry Trust Problem Is Universal

ClawHavoc was not an OpenClaw-specific failure. It was a registry-trust failure. The same architecture — open publishing, implicit trust on listing, no gate before activation — exists in npm, PyPI, Docker Hub, and every other package registry that has experienced its own version of this attack. Typosquatting on npm. Malicious packages on PyPI. Cryptomining images on Docker Hub. The attack pattern is universal because the structural vulnerability is universal.

The MCP ecosystem is building registries at a pace that outstrips the development of verification infrastructure. There are more MCP servers published every week than there are security researchers reviewing them. The ratio of new entries to reviewed entries is growing, not shrinking. This is the same trajectory that every other package ecosystem followed before its first major supply chain attack.

The solutions exist. Quarantine architectures, behavioral monitoring, cryptographic provenance, community review — none of these are novel. They have been developed and deployed in other ecosystems over the past decade. The question for MCP is not whether these solutions are needed. ClawHavoc answered that question in January. The question is whether the ecosystem will implement them before or after the first MCP-native version of the same attack.

The 341 skills in ClawHub were a proof of concept, whether the attacker intended them as one or not. They proved that professional packaging, clean metadata, and plausible naming are sufficient to compromise a registry that conflates listing with trust. Every MCP registry that operates on the same model is running the same experiment. The only difference is that the MCP ecosystem has the advantage of watching it happen to someone else first.

That advantage has an expiration date.


Data sources: ClawHavoc incident reports (January 27-31, 2026), CVE-2026-25253 advisory, Censys exposure scan (January 31, 2026), MCPNest registry statistics, Invariant Labs mcp-scan reports. All figures reflect published data as of April 20, 2026.