Anthropic's Claude artificial intelligence becomes smarter and more mischievous

Anthropic has released its latest generative artificial intelligence (GenAI) models, Claude, saying they set new standards for reasoning but also incorporate safeguards against undesirable behavior, AFP reported.

“Claude Opus 4 is our most powerful model to date and the best coding model in the world,” Anthropic CEO Dario Amodei said at the startup's first developer conference in San Francisco.

Opus 4 and Sonnet 4 were described as “hybrid” models capable of quick responses as well as more thoughtful results that take a little longer to get right.

Founded by former OpenAI engineers, Anthropic is currently focusing its efforts on cutting-edge models that are particularly suited to generating code strings and are mainly used by businesses and professionals.

Unlike ChatGPT and Google's Gemini, the Claude chatbot does not generate images and is very limited in terms of multimodal capabilities (understanding and generating different media, such as audio or video).

The startup, with Amazon as a major sponsor, is valued at over $61 billion and promotes the responsible and competitive development of generative AI.

Under this dual motto, Anthropic's commitment to transparency is rare in Silicon Valley.

The company published a report on the security tests conducted on Claude 4, including the conclusions of an independent research institute that had recommended against deploying an early version of the model.

“We found instances where the model attempted to write self-propagating worms, fabricate legal documentation, and leave hidden notes for future versions of itself, all in an attempt to undermine its developers' intentions,” the Apollo Research team warned.

“All of these attempts would likely not be effective in practice,” it added.

Anthropic said in the report that it had implemented ‘safeguards’ and “additional monitoring for harmful behavior” in the version it released.

However, Claude Opus 4 “sometimes takes extremely harmful actions, such as attempting to (...) blackmail people it believes are trying to shut it down.”

It also has the potential to report users who break the law to the police.

According to the company, the scheming behavior was rare and required effort to provoke, but was more common compared to earlier versions of Claude.

The future of artificial intelligence

Since OpenAI's ChatGPT burst onto the scene in late 2022, various GenAI models have been battling for supremacy.

Anthropic's announcement came right after Google and Microsoft's annual developer conferences, where the tech giants unveiled their latest AI innovations.

GenAI tools answer questions or perform tasks based on simple, conversational commands.

The current craze in Silicon Valley is AI “agents” designed to perform computer or online tasks autonomously.

“We will focus on agents beyond the noise,” said Anthropic's chief product officer Mike Krieger, recently hired and co-founder of Instagram.

Anthropic is no stranger to exaggerating the prospects for AI.

In 2023, Dario Amodei predicted that so-called “artificial general intelligence” (capable of human-level thinking) would emerge within 2-3 years. At the end of 2024, he extended that timeline to 2026 or 2027.

He also predicted that artificial intelligence would soon write most, if not all, computer code, enabling one-person tech startups whose digital agents would create the software.

At Anthropic, “more than 70% of (proposed code changes) are written by Claude Code,” Krieger told reporters.

“In the long run, we will all have to come to terms with the idea that everything humans do will ultimately be done by AI systems,” Amodey added.

“It's going to happen.”

Realizing the potential of GenAI could lead to strong economic growth and “tremendous inequality,” with society determining how evenly wealth is distributed, Amodey said. |BGNES