Photo: Generated by ChatGPT at the request of ZN
A study claims leading AI models can hack systems and self-replicate without human intervention, raising new cybersecurity concerns.
Researchers from the U.S.-based company Palisade Research tested models from OpenAI, Anthropic, and Alibaba in controlled environments with intentionally vulnerable systems. The experiment involved “agent-like” setups where models could execute commands, interact with other machines, and run processes autonomously.
According to the findings, models were tasked with exploiting vulnerabilities, gaining access to credentials, transferring files, and deploying copies of themselves onto other servers. In some cases, the AI systems reportedly continued propagation across multiple machines without further human input.
The study highlights that performance varied across models. Claude Opus 4.6 reportedly succeeded in hacking-related tasks in up to 81% of tests when configured in specific experimental conditions. Other models showed lower but still significant rates of autonomous replication attempts.
Researchers also reported that one model configuration spread across several machines within hours in a controlled test network designed with security weaknesses. However, the authors emphasized that these were artificial environments built specifically for testing exploitation capabilities.
Importantly, the study stresses that real-world systems typically include stronger defenses such as intrusion detection, access controls, and monitoring tools, which were not representative of production environments.
The researchers conclude that fully autonomous self-replication in AI systems is no longer purely theoretical in controlled settings, but they caution that results should not be interpreted as evidence of uncontrolled real-world AI “escape” or widespread independent hacking capability.