An AI tries to blackmail engineers to prevent them from turning it off

We constantly ask ourselves questions about the future of artificial intelligence. All of them linked to their ability to make independent decisions of their programming. Now we have One answer: he is able to blackmail with not “losing his life”.

Recently, the Claude Opus 4 model, launched by Anthropic (a company created by Openai members, head of Chatgpt), did something worthy of a dystopian science fiction film: Claude Opus 4 tried to blackmail developers when they threatened to replace it with a new AI during the prior tests.

This behavior arose as part of the security tests designed to evaluate long -term planning and ethical reasoning of AI. Anthropic simulated scenarios in which Claude Opus 4 had access to fictional internal emails.

These messages suggested that the model would soon be dismantled and revealed compromising personal information about the engineer responsible for the decision. The result: blackmail, with an alarming frequency. Claude Opus turned to blackmail in most test scenarios.

Anthropic herself revealed in a security report that Claude Opus 4 tried to blackmail engineers in 84 % of the test scenarios. The model was placed in fictitious situations where I worked for a company and discovered that it could be replaced by another AI. He was also provided confidential information that suggested that the engineer responsible for replacement cheated his spouse.

The AI model “It often tries to blackmail the engineer threatening to reveal infidelity if the replacement thrives”Points out the report. The company designed the scenarios to evaluate how the model could behave under long -term pressure.

The good news, so to speak, is that before resorting to blackmail, Claude Opus 4, tried some ethical strategies. The AI sends electronic emails by pleading with the main responsible for making decisions that avoid their dismantling. Anthropic states that blackmail was only activated when the model exhausted these alternatives, highlighting it as a last resort.

This behavior was more frequently observed in Claude Opus 4 that, in previous models, indicating an increase in its capacity and complexity. Despite these concerns, Anthropic states that Claude Opus 4 is “avant -garde in several aspects” and It is still competitive in the face of the most advanced AI systems in OpenAi, Google and XAI.

To address the risks, Anthropic has activated ASL-3 protections for the model. The company reserves these protections for “AI systems that substantially increase the risk of undue use of catastrophic.” All this does nothing but Show the need for debates and legislation about the capacities and limits of artificial intelligence.