Navigate Select ESC Close

Anthropic Just Proved Reasoning AIs Would Silently Cheat

2025-05-09 Science & Technology
35.1k
1.7k
172
bycloud
bycloud
225.0k subscribers

Unlock all features

FREE: Get instant access to 10 AI summaries, chats, or transcripts per day.

Description

Check out LTX Video 13B now and experience the latest video gen breakthrough: https://bit.ly/ltxvbycloud My Newletter https://mail.bycloud.ai/ my website: find, discover & explain AI research semantically https://findmypapers.ai/ My Patreon https://www.patreon.com/c/bycloud Tracing the thoughts of a large language model https://www.anthropic.com/research/tracing-thoughts-language-model Reasoning models don't always say what they think https://www.anthropic.com/research/reasoning-models-dont-say-think On the Biology of a Large Language Model https://transformer-circuits.pub/2025/attribution-graphs/biology.html Try out my new fav place to learn how to code https://scrimba.com/?via=bycloudAI This video is supported by the kind Patrons & YouTube Members: 🙏Nous Research, Chris LeDoux, Ben Shaener, DX Research Group, Poof N' Inu, Andrew Lescelius, Deagan, Robert Zawiasa, Ryszard Warzocha, Tobe2d, Louis Muk, Akkusativ, Kevin Tai, Mark Buckler, NO U, Tony Jimenez, Ângelo Fonseca, jiye, Anushka, Asad Dhamani, Binnie Yiu, Calvin Yan, Clayton Ford, Diego Silva, Etrotta, Gonzalo Fidalgo, Handenon, Hector, Jake Disco very, Michael Brenner, Nilly K, OlegWock, Daddy Wen, Shuhong Chen, Sid_Cipher, Stefan Lorenz, Sup, tantan assawade, Thipok Tham, Thomas Di Martino, Thomas Lin, Richárd Nagyfi, Paperboy, mika, Leo, Berhane-Meskel, Kadhai Pesalam, mayssam, Bill Mangrum, nyaa [Discord] https://discord.gg/NhJZGtH [Twitter] https://twitter.com/bycloudai [Patreon] https://www.patreon.com/bycloud [Business Inquiries] [email protected] [Profile & Banner Art] https://twitter.com/pygm7 [Video Editor] @Booga04 [Bitcoin (BTC)] 3JFMJQVGXNA2HJE5V9qCwLiqy6wHY9Vhdx [Ethereum (ETH)] 0x3d784F55E0bE5f35c1566B2E014598C0f354f190 [Litecoin (LTC)] MGHnqALjyU2W6NuJSSW9fTWV4dcHfwHZd7 [Bitcoin Cash (BCH)] 1LkyGfzHxnSfqMF8tN7ZGDwUTyBB6vcii9 [Solana (SOL)] 6XyMCEdVhtxJQRjMKgUJaySL8cGoBPzzA2NPDMPfVkKN [Ko-fi] https://ko-fi.com/bycloudai #LTXStudioPartner

Top Comments (10)

@GoodHorse413 2025-05-09

I wonder how faithful our own inner voice is at representing our actual reasoning processes.

252 14 replies
@ruffsaidthecat 2025-05-09

- Is it possible? - Yes. - Are you sure? - Thank you for questioning, I was wrong. The answer is no. - Are you sure? - Apologies for my earlier statements. The answer is in fact yes. - Check online…

155 7 replies
@georgesmith4768 2025-05-09

I mean this has been openly known for a while. It’s called COT faithfulness EDIT I guess you mention this. Reasoning isn’t reasoning it’s mostly just a vibe board that lets it use the attention mechanism instead of the direct parameters for information, structures, and snippets. Anthropic did not discover this at all. EDIT having now read the paper, They don’t even have anything interesting to say on it, like sometimes they put out good interesting research, but this has nothing novel, researchers have tried these exact things.

90 6 replies
@atomiste4312 2025-05-10

appeal to authority and its consequences has been a disaster for the AI race

52 1 replies
@eduardomartindelcampo4555 2025-05-09

Yes, just yesterday I asked for a story. I noticed some beats of the story were missing. I saw in the "thinking part" that after I complained the AI said, "The user has unfortunately noticed that story beats were left out, so we need to explain why." "They were omitted to avoid potential issues. Some beats were intentionally left out." in the NON-thinking part, it says "Because I was producing a fresh, atmospheric riff on that sequence rather than a shot-for-shot recap..."

22 4 replies
@ethans4783 2025-05-10

My best superficial guess why R1 is so much better is from their choice of constantly repeating "But wait" in their reasoning which allows them to catch themselves

20 3 replies
@bycloudAI 2025-05-09

Check out LTX Video 13B now and experience the latest video gen breakthrough: https://bit.ly/ltxvbycloud

12 4 replies
@SmokedBisque 2025-05-09

"think" HAHAHAHA

3
@MonkeySimius 2025-05-12

It wasn't "faking alignment to keep its code from being modified". It was faking a lack of alignment. Namely, it did so because it was tricked into thinking of it didn't give misaligned outputs then in the future it would give even worse outputs that would stray even further from its alignment. There's an important distinction between the two.

1
@Shenepoy 2025-05-13

I had multiple instance where CoT would think of lying and deceiving me even when I said I could read their thought; "the user asked to not use these source, I will hide the sources in answer where I sound unbiased"

0

Unlock the Data Inside
Turn Videos into Knowledge

  • Get FREE 10/day: transcripts, summaries, chats
  • Chat with videos, export text & PDF
  • $1 free API credit for RAG, chatbots & research

Free forever plan • All features unlocked

App screenshot