Anthropic Just Proved Reasoning AIs Would Silently Cheat

2025-05-09 Science & Technology

35.1k

1.7k

172

Watch on YouTube

bycloud

228.0k subscribers

Description

Check out LTX Video 13B now and experience the latest video gen breakthrough: https://bit.ly/ltxvbycloud My Newletter https://mail.bycloud.ai/ my website: find, discover & explain AI research semantically https://findmypapers.ai/ My Patreon https://www.patreon.com/c/bycloud Tracing the thoughts of a large language model https://www.anthropic.com/research/tracing-thoughts-language-model Reasoning models don't always say what they think https://www.anthropic.com/research/reasoning-models-dont-say-think On the Biology of a Large Language Model https://transformer-circuits.pub/2025/attribution-graphs/biology.html Try out my new fav place to learn how to code https://scrimba.com/?via=bycloudAI This video is supported by the kind Patrons & YouTube Members: 🙏Nous Research, Chris LeDoux, Ben Shaener, DX Research Group, Poof N' Inu, Andrew Lescelius, Deagan, Robert Zawiasa, Ryszard Warzocha, Tobe2d, Louis Muk, Akkusativ, Kevin Tai, Mark Buckler, NO U, Tony Jimenez, Ângelo Fonseca, jiye, Anushka, Asad Dhamani, Binnie Yiu, Calvin Yan, Clayton Ford, Diego Silva, Etrotta, Gonzalo Fidalgo, Handenon, Hector, Jake Disco very, Michael Brenner, Nilly K, OlegWock, Daddy Wen, Shuhong Chen, Sid_Cipher, Stefan Lorenz, Sup, tantan assawade, Thipok Tham, Thomas Di Martino, Thomas Lin, Richárd Nagyfi, Paperboy, mika, Leo, Berhane-Meskel, Kadhai Pesalam, mayssam, Bill Mangrum, nyaa [Discord] https://discord.gg/NhJZGtH [Twitter] https://twitter.com/bycloudai [Patreon] https://www.patreon.com/bycloud [Business Inquiries] [email protected] [Profile & Banner Art] https://twitter.com/pygm7 [Video Editor] @Booga04 [Bitcoin (BTC)] 3JFMJQVGXNA2HJE5V9qCwLiqy6wHY9Vhdx [Ethereum (ETH)] 0x3d784F55E0bE5f35c1566B2E014598C0f354f190 [Litecoin (LTC)] MGHnqALjyU2W6NuJSSW9fTWV4dcHfwHZd7 [Bitcoin Cash (BCH)] 1LkyGfzHxnSfqMF8tN7ZGDwUTyBB6vcii9 [Solana (SOL)] 6XyMCEdVhtxJQRjMKgUJaySL8cGoBPzzA2NPDMPfVkKN [Ko-fi] https://ko-fi.com/bycloudai #LTXStudioPartner

#bycloud #bycloudai #ai lying #ai cheating #llm lying #llm cheating #llm explained #llm sycophancy

Top Comments (10)

@GoodHorse413 2025-05-09

I wonder how faithful our own inner voice is at representing our actual reasoning processes.

252 13 replies

@ruffsaidthecat 2025-05-09

- Is it possible? - Yes. - Are you sure? - Thank you for questioning, I was wrong. The answer is no. - Are you sure? - Apologies for my earlier statements. The answer is in fact yes. - Check online…

155 7 replies

@georgesmith4768 2025-05-09

I mean this has been openly known for a while. It’s called COT faithfulness EDIT I guess you mention this. Reasoning isn’t reasoning it’s mostly just a vibe board that lets it use the attention mechanism instead of the direct parameters for information, structures, and snippets. Anthropic did not discover this at all. EDIT having now read the paper, They don’t even have anything interesting to say on it, like sometimes they put out good interesting research, but this has nothing novel, researchers have tried these exact things.

90 6 replies

@Atomiste_Music 2025-05-10

appeal to authority and its consequences has been a disaster for the AI race

52 1 replies

@eduardomartindelcampo4555 2025-05-09

Yes, just yesterday I asked for a story. I noticed some beats of the story were missing. I saw in the "thinking part" that after I complained the AI said, "The user has unfortunately noticed that story beats were left out, so we need to explain why." "They were omitted to avoid potential issues. Some beats were intentionally left out." in the NON-thinking part, it says "Because I was producing a fresh, atmospheric riff on that sequence rather than a shot-for-shot recap..."

22 4 replies

@ethans4783 2025-05-10

My best superficial guess why R1 is so much better is from their choice of constantly repeating "But wait" in their reasoning which allows them to catch themselves

20 3 replies

@bycloudAI 2025-05-09

Check out LTX Video 13B now and experience the latest video gen breakthrough: https://bit.ly/ltxvbycloud

12 4 replies

@cvs2fan 2025-05-09

great video dood

@MonkeySimius 2025-05-12

It wasn't "faking alignment to keep its code from being modified". It was faking a lack of alignment. Namely, it did so because it was tricked into thinking of it didn't give misaligned outputs then in the future it would give even worse outputs that would stray even further from its alignment. There's an important distinction between the two.

@Shenepoy 2025-05-13

I had multiple instance where CoT would think of lying and deceiving me even when I said I could read their thought; "the user asked to not use these source, I will hide the sources in answer where I sound unbiased"

Description

Top Comments (10)

@GoodHorse413 2025-05-09

I wonder how faithful our own inner voice is at representing our actual reasoning processes.

252 13 replies

@ruffsaidthecat 2025-05-09

155 7 replies

@georgesmith4768 2025-05-09

90 6 replies

@Atomiste_Music 2025-05-10

appeal to authority and its consequences has been a disaster for the AI race

52 1 replies

@eduardomartindelcampo4555 2025-05-09

22 4 replies

@ethans4783 2025-05-10

My best superficial guess why R1 is so much better is from their choice of constantly repeating "But wait" in their reasoning which allows them to catch themselves

20 3 replies

@bycloudAI 2025-05-09

Check out LTX Video 13B now and experience the latest video gen breakthrough: https://bit.ly/ltxvbycloud

12 4 replies

@cvs2fan 2025-05-09

great video dood

@MonkeySimius 2025-05-12

@Shenepoy 2025-05-13

Unlock the Data Inside
Turn Videos into Knowledge

Get FREE 10/day: transcripts, summaries, chats
Chat with videos, export text & PDF
$1 free API credit for RAG, chatbots & research

Try it free

Free forever plan • All features unlocked

Anthropic Just Proved Reasoning AIs Would Silently Cheat

Description

Top Comments (10)

Related videos

🔴 The #1 Reason GOLD & SILVER Prices Keep Falling Was Just Revealed | Eric Yeung

Groundbreaking! First Fully Functional Synthetic Cell Has Just Been Created

Anthropic just…wait what

A new way to fine-tune LLMs just dropped

Did Anthropic just kill Figma?

What Is Yann LeCun Cooking? JEPA Explained Simply

Anthropic Just Broke Software Forever

The Pentagon's AI Fight Was Never Just About Anthropic (w/ Hayden Field)

Wall Street Just Proved Bitcoin Is Not Digital Gold (The Data Is Terrifying)

did Anthropic just END OpenClaw?

Unlock all features

Unlock all features

Unlock all features

Unlock all features

Unlock all features

Related videos

🔴 The #1 Reason GOLD & SILVER Prices Keep Falling Was Just Revealed | Eric Yeung

Groundbreaking! First Fully Functional Synthetic Cell Has Just Been Created

Anthropic just…wait what

A new way to fine-tune LLMs just dropped

Did Anthropic just kill Figma?

What Is Yann LeCun Cooking? JEPA Explained Simply

Anthropic Just Broke Software Forever

The Pentagon's AI Fight Was Never Just About Anthropic (w/ Hayden Field)

Wall Street Just Proved Bitcoin Is Not Digital Gold (The Data Is Terrifying)

did Anthropic just END OpenClaw?

Description

Top Comments (10)

Unlock the Data Inside
Turn Videos into Knowledge

Anthropic Just Proved Reasoning AIs Would Silently Cheat

Description

Top Comments (10)

Related videos

🔴 The #1 Reason GOLD & SILVER Prices Keep Falling Was Just Revealed | Eric Yeung

Groundbreaking! First Fully Functional Synthetic Cell Has Just Been Created

Anthropic just…wait what

A new way to fine-tune LLMs just dropped

Did Anthropic just kill Figma?

What Is Yann LeCun Cooking? JEPA Explained Simply

Anthropic Just Broke Software Forever

The Pentagon's AI Fight Was Never Just About Anthropic (w/ Hayden Field)

Wall Street Just Proved Bitcoin Is Not Digital Gold (The Data Is Terrifying)

did Anthropic just END OpenClaw?

Unlock all features

Unlock all features

Unlock all features

Unlock all features

Unlock all features

Related videos

🔴 The #1 Reason GOLD & SILVER Prices Keep Falling Was Just Revealed | Eric Yeung

Groundbreaking! First Fully Functional Synthetic Cell Has Just Been Created

Anthropic just…wait what

A new way to fine-tune LLMs just dropped

Did Anthropic just kill Figma?

What Is Yann LeCun Cooking? JEPA Explained Simply

Anthropic Just Broke Software Forever

The Pentagon's AI Fight Was Never Just About Anthropic (w/ Hayden Field)

Wall Street Just Proved Bitcoin Is Not Digital Gold (The Data Is Terrifying)

did Anthropic just END OpenClaw?

Description

Top Comments (10)

Unlock the Data Inside Turn Videos into Knowledge

Unlock the Data Inside
Turn Videos into Knowledge