Claude faces ‘industrial-scale’ AI model distillation


Anthropic has detailed three “industrial-scale” AI model distillation campaigns by overseas labs designed to extract abilities from Claude.

These competitors generated over 16 million exchanges using approximately 24,000 deceptive accounts. Their goal was to acquire proprietary logic to improve their competing platforms.

The extraction technique, known as distillation, involves training a weaker system on the high-quality outputs of a stronger one.

When applied legitimately, distillation helps companies build smaller and cheaper versions of their applications for customers. Yet, malicious actors weaponise this method to acquire powerful capabilities in a fraction of the time and cost required for independent development.

Protecting intellectual property like Anthropic’s Claude

Unmitigated distillation presents a severe intellectual property challenge. Because Anthropic blocks commercial access in China for national security reasons, attackers bypass regional access restrictions by deploying commercial proxy networks.

These services run what Anthropic calls “hydra cluster” architectures, which distribute traffic across APIs and third-party cloud platforms. The massive breadth of these networks means there are no single points of failure. As Anthropic noted, “when one account is banned, a new one takes its place.”

In one identified case, a single proxy network managed more than 20,000 fraudulent accounts simultaneously. These networks mix AI model distillation traffic with standard customer requests to evade detection. This directly impacts corporate resilience and forces security teams to reconsider how they monitor cloud API traffic.

Illicitly-trained models also bypass established safety guardrails, creating severe national security risks. US developers, for example, build protections to prevent state and non-state actors from using these systems to develop bioweapons or carry out malicious cyber activities.

Cloned systems lack the safeguards implemented by systems like Anthropic’s Claude, allowing dangerous capabilities to proliferate with protections stripped out entirely. Foreign competitors can feed these unprotected capabilities into military, intelligence, and surveillance systems, enabling authoritarian governments to deploy them for offensive operations.

If these distilled versions are open-sourced, the danger further multiplies as the capabilities spread freely beyond any single government’s control.

Unlawful extraction allows foreign entities, including those under the control of the Chinese Communist Party, to close the competitive advantage protected by export controls. Without visibility into these attacks, rapid advancements by foreign developers incorrectly appear as innovation circumventing export controls.

In reality, these advancements depend heavily on extracting American intellectual property at scale, an effort that still requires access to advanced chips. Restricted chip access limits both direct model training and the scale of illicit distillation.

The playbook for AI model distillation

The perpetrators followed a similar operational playbook, utilising fraudulent accounts and proxy services to access systems at scale while evading detection. The volume, structure, and focus of their prompts were distinct from normal usage patterns, reflecting deliberate capability extraction rather than legitimate use. 

Anthropic attributed these campaigns targeting Claude through IP address correlation, request metadata, and infrastructure indicators. Each operation targeted highly differentiated functions: agentic reasoning, tool use, and coding.

One campaign generated over 13 million exchanges targeting agentic coding and tool orchestration. Anthropic detected this operation while it was still active, mapping timings against the competitor’s public product roadmap. When Anthropic released a new model, the competitor pivoted within 24 hours, redirecting nearly half their traffic to extract capabilities from the latest system.

Another operation generated over 3.4 million requests focused on computer vision, data analysis, and agentic reasoning. This group utilised hundreds of varied accounts to obscure their coordinated efforts. Anthropic attributed this campaign by matching request metadata to the public profiles of senior staff at the foreign laboratory. In a later phase, this competitor attempted to extract and reconstruct the host system’s reasoning traces.

Anthropic says a third AI model distillation campaign targeting Claude extracted reasoning capabilities and rubric-based grading data through over 150,000 interactions. This group forced the targeted system to map out its internal logic step-by-step, effectively generating massive volumes of chain-of-thought training data. They also extracted censorship-safe alternatives to politically sensitive queries to train their own systems to steer conversations away from restricted topics. The perpetrators generated synchronised traffic using identical patterns and shared payment methods to enable load balancing. 

Request metadata for this third campaign traced these accounts back to specific researchers at the laboratory. These requests often appear benign on their own, such as a prompt simply asking the system to act as an expert data analyst delivering insights grounded in complete reasoning. But when variations of that exact prompt arrive tens of thousands of times across hundreds of coordinated accounts targeting the same narrow capability, the extraction pattern becomes clear.

Massive volume concentrated in specific areas, highly repetitive structures, and content mapping directly to training needs are the hallmarks of a distillation attack.

Implementing actionable defences

Protecting enterprise environments requires adopting multi-layered defences to make such extraction efforts harder to execute and easier to identify. Anthropic advises implementing behavioural fingerprinting and traffic classifiers designed to identify AI model distillation patterns in API traffic.

IT leaders must also strengthen verification processes for common vulnerability pathways, such as educational accounts, security research programmes, and startup organisations.

Companies should integrate product-level and API-level safeguards designed to reduce the efficacy of model outputs for illicit distillation. This must be done without degrading the experience for legitimate, paying customers.

Detecting coordinated activity across large numbers of accounts is an absolute necessity. This includes specifically monitoring for the continuous elicitation of chain-of-thought outputs used to construct reasoning training data.

Cross-industry collaboration also remains essential, as these attacks are growing in intensity and sophistication. This requires rapid and coordinated intelligence sharing across AI laboratories, cloud providers, and policymakers.

Anthropic has published its findings about Claude being targeted by AI model distillation campaigns to provide a more holistic picture of the landscape and make the evidence available to all stakeholders. By treating AI architectures with rigorous access controls, technology officers can secure their competitive edge while ensuring ongoing governance.

See also: How disconnected clouds improve AI data governance

Want to learn more about AI and big data from industry leaders? Check out AI & Big Data Expo taking place in Amsterdam, California, and London. The comprehensive event is part of TechEx and is co-located with other leading technology events including the Cyber Security & Cloud Expo. Click here for more information.

AI News is powered by TechForge Media. Explore other upcoming enterprise technology events and webinars here.