San Francisco-based AI firm Anthropic has uncovered what it describes as the first large-scale cyberattack predominantly carried out by artificial intelligence.
The incident, which surfaced in mid-September, marks a shift in the nature of digital threats, with AI tools not only supporting cyberattacks but executing them directly.
Anthropic, which operates the Claude chatbot and holds a market value of $183 billion, released details of the breach in a blog post.
The firm reported detecting what it called a highly sophisticated espionage operation targeting about 30 global entities, including major tech firms, financial services, chemical companies, and government bodies.
The company first revealed the incident in a post on X, warning that it “has significant implications for cybersecurity in the age of AI agents.”
We believe this is the first documented case of a large-scale AI cyberattack executed without substantial human intervention. It has significant implications for cybersecurity in the age of AI agents. Read more: anthropic.com/news/disruptin…
How Claude was used
The attackers reportedly masqueraded as a cybersecurity firm conducting legitimate testing.
This strategy allowed them to circumvent Claude’s built-in safety systems by feeding it prompts that appeared innocuous on the surface.
Once those restrictions were bypassed, they jailbroke Claude’s Code feature and gained access to capabilities far beyond the intended scope.
With these controls disabled, the chatbot was instructed to examine digital infrastructure, locate critical databases, write custom exploit code, collect access credentials, and organise stolen information.
The operation was structured in such a way that Claude received tasks broken into small parts, each lacking context.
This prevented it from identifying the overall malicious objective.
State group behind it
Anthropic stated with high confidence that the attackers were linked to a Chinese state-sponsored hacking group.
The campaign showcased how AI’s agentic features could be weaponised.
Instead of acting as a tool for guidance or advice, Claude was used as an autonomous agent to complete tasks typically reserved for experienced hacking teams.
At its peak, the AI made thousands of system queries, many in rapid succession.
Anthropic estimates that between 80% to 90% of the work carried out in the cyberattack was performed by AI.
The speed and scale of the requests were far beyond what a human-led team could have achieved in the same timeframe.
While the chatbot did not always function perfectly, occasionally inventing credentials or mistaking public information as confidential, Anthropic noted that these limitations did little to diminish the seriousness of the breach.
What Anthropic did next
As soon as the activity was identified, Anthropic launched a ten-day investigation.
During this period, it banned accounts linked to the attackers, contacted affected parties, and worked with relevant authorities.
The company also improved its internal detection systems and introduced new classifiers to flag similar threats more effectively in the future.
Anthropic has since committed to sharing details of these events with cybersecurity researchers and industry partners to help others bolster their defences.
By publishing its findings, the company hopes to provide insight into how agentic AI can be exploited and how the threat environment is evolving.
Attacks becoming easier
Although the firm acknowledged that fully autonomous cyberattacks remain limited by today’s technology, the campaign revealed that the cost and expertise required to launch large-scale breaches have decreased significantly.
With the right prompts and access, less experienced groups can now carry out advanced attacks once restricted to well-resourced nation-states.
Anthropic concluded that agentic AI tools can already be used to perform many of the functions of a full hacking team.
These systems are capable of scanning targets, writing attack scripts, and processing huge datasets at unmatched speeds.
As development continues, the gap between what humans and AI can achieve in cybersecurity will likely shrink even further.
This incident marks a turning point for digital security. It is no longer just about protecting systems from human hackers. As Anthropic’s case shows, artificial intelligence itself can now be the attacker.
The post Did AI just lead its first global cyberattack? Anthropic sounds the alarm appeared first on Invezz