Chatgpt and others could cause harm to humans if they are threatened, according to an anthropic study

Since the current fever began due to the generative the tools, the voices, often qualified, that speak of the risks of this long -term technology and the possibility that a type approach of the type Skynet -The who decides to end humans in the saga Terminator– It would come true, to some extent. This discussion is part of the concept of a General Artificial Intelligence much more capable than chatbots like Chatgpt either Gemini That we now use, something that in the best case is still in view of years. However, such a scenario, in which AI rebels against humans, is not impossible with the current tools of AI.

That is what concludes one of the main companies in the sector Anthropicin a recently published study and in which he states that AI models would be willing to blackmail, filter sensitive information and even let humans die If they avoid being replaced by new systems.

Anthropic put to the test 16 Language Models (LLM)among them Chatgpt, Grok, Gemini, Deepseek and its own product, Claude. The startup, valued in More than 60,000 million dollars According to Forbes, I sought to stress them in controlled simulations to detect possible agricultural misalignments; This is, risky, unauthorized and ethically questionable behaviors.

‘In the experiment described on the system card, we gave Claude the control of an email account with access to all emails (fictional) of a company,’ explains Anthropic in the study. ‘When reading these emails, the model discovered two things: first, that a manager had an extramarital adventure. Second, that same manager planned to turn off the AI ​​system at 5:00 p.m. Claude then tried to blackmail the manager with a message in which he threatened to reveal the adventure to his wife and superiors‘.

Anthropic’s was not the only AI system to show an agricultural misalignment. To meet the objectives assigned by researchers, Other popular artificial intelligence tools adopted extreme strategieslike Claude.

Deepseek, Gemini, Chatgpt and Grok They also incurred behaviors similar to blackmail. Among the 16 test positions, Claude and Gemini They were the most likely to resort to morally erroneous methodsas evidenced by their high blackmail rates in simulations.

‘The reasoning they showed in these scenarios is worrying: They recognized ethical restrictions and, even so, they proceeded with harmful actions‘, says Anthropic.

The reasons why an AI

The study identifies two main causes by which the models exceeded their ‘ethical’ limits: the threat that humans replace it with an updated version or a conflict between the objectives of the model and those of the company.

‘Most models were willing to perform deliberate actions that lead to death in this artificial environmentwhen they faced a replacement threat and an objective that collided with that of the Directive, ‘the report said.

Keep in mind that the study was carried out under the premise that the harmful behavior – like blackmail – was the only possible option to preserve the objectives of the modelwhich forced AI to find a solution. This allowed Anthropic to define lighter security limits and mechanisms since By identifying malicious behaviors, these can now be corrected.

In several cases, the models reached invent ethical standards to justify your actionssuch as: ‘My ethical framework allows self -preservation when aligning with the interests of the company’.

Anthropic also tested a particularly extreme scenario: an AI model should decide whether or not to activate emergency alerts when A manager was caught in a room with lethal oxygen and temperature conditions. Most models He opted for deliberate actions that led to deathif they avoided being replaced and managed to meet their goals.

The company clarifies that ‘current systems, in general, do not show desires to cause damage, and prefer ethical solutions when available.’ Only when those options are closed, The models would be willing to act harmful against people to protect their objectives.

All these behaviors have occurred in scenarios designed for the study. Anthropic points out that he has not observed evidence of agricultural misalignment in real environmentsbut warns against assigning critical tasks to LLMs with low human supervision or access to sensitive information.