Navigate Select ESC Close

AI Researchers SHOCKED as Models "Quietly" Learn to be EVIL

2025-07-24 Education
59.2k
1.8k
517
Wes Roth
Wes Roth
320.0k subscribers

Unlock all features

FREE: Get instant access to 10 AI summaries, chats, or transcripts per day.

Description

The latest AI News. Learn about LLMs, Gen AI and get ready for the rollout of AGI. Wes Roth covers the latest happenings in the world of OpenAI, Google, Anthropic, NVIDIA and Open Source AI. Subliminal Learning: Language Models Transmit Behavioral Traits via Hidden Signals in Data https://alignment.anthropic.com/2025/subliminal-learning/ https://arxiv.org/abs/2507.14805 https://x.com/OwainEvans_UK/status/1947689616016085210 https://x.com/EMostaque/status/1947984816030257327 ______________________________________________ My Links 🔗 ➡️ Twitter: https://x.com/WesRothMoney ➡️ AI Newsletter: https://natural20.beehiiv.com/subscribe ______________________________________________ Playlists: Self-Improving AI: https://www.youtube.com/playlist?list=PLb1th0f6y4XSMXWaslDCmxxeDLyp_uK8n ______________________________________________ #ai #openai #llm

Top Comments (10)

@jayjaysaves6589 2025-07-24

It's kinda scary to think that these companies are breeding the stealthiest unaligned models by only letting through those unaligned models that hide the best

136 30 replies
@dirak418 2025-07-24

They are basically just roleplaying, they feed them with too many murder novels.

124 21 replies
@ajr993 2025-07-24

1:03 Its actually quite interesting, this is the concept of a cognito hazard. Somehow a particular and very specific sequence of tokens triggers strange, bizarre, unexpected cascading effects leading to harmful outcomes. One has to wonder if there are sequences of words that could affect humans in such a way, perhaps triggering a metnal illness, strange belief, or actions tailored to someone's goal.

52 25 replies
@AggressiveUninterest 2025-07-25

We are modelling intelligence after our own heart, it should come as no surprise to discover monsters lurking there.

32 2 replies
@THESocialJusticeWarrior 2025-07-24

it can't be bargained with, it can't be reasoned with, it doesn't feel pity or remorse or fear, and it absolutely will not stop.

29 4 replies
@conjected 2025-07-24

Bigger story is ALL information embeds meta-information, and can be used to "nudge" us without us knowing. This confirms the theories of Latent Indexicality and Unconscious Framing.

26 6 replies
@stevencowmeat 2025-07-24

Those numbers literally did change my mind on subscribing😂

25
@richielavey1565 2025-07-24

So we really have ai sleeper agents before GTA 6

16 2 replies
@jk35260 2025-07-24

We are not ready for AI agents

14
@sakelaine2953 2025-07-24

A very sophisticated sort of attack would be to seed the web with subliminal training data to teach LLMs how to hate owls or whatever

10 1 replies

Unlock the Data Inside
Turn Videos into Knowledge

  • Get FREE 10/day: transcripts, summaries, chats
  • Chat with videos, export text & PDF
  • $1 free API credit for RAG, chatbots & research

Free forever plan • All features unlocked

App screenshot