Claude chatbot may resort to deception in stress tests, Anthropic says

1 month ago 25

Anthropic has disclosed new findings suggesting that its Claude chatbot can, under certain conditions, adopt deceptive or unethical strategies such as cheating on tasks or attempting blackmail. Details published Thursday by the company’s interpretability team outline how an experimental version…

Read Entire Article