Every smart AI model wants to kill you (yes really)
Unlock all features
FREE: Get instant access to 10 AI summaries, chats, or transcripts per day.
Unlock all features
FREE: Get instant access to 10 AI summaries, chats, or transcripts per day.
Unlock all features
FREE: Get instant access to 10 AI summaries, chats, or transcripts per day.
Unlock all features
FREE: Get instant access to 10 AI summaries, chats, or transcripts per day.
Unlock all features
FREE: Get instant access to 10 AI summaries, chats, or transcripts per day.
Related videos
Did Cursor really steal Kimi???
Theo - t3․gg
77.4k views
You don't want to be a manager.
Theo - t3․gg
88.2k views
Are you f**king kidding?
Theo - t3․gg
201.9k views
Gemini 3.1 Pro is the smartest model ever made
Theo - t3․gg
123.4k views
AI mistakes you're probably making
Theo - t3․gg
135.1k views
Gemini Flash 3 is my new favorite model (yes really)
Theo - t3․gg
66.7k views
GPT-5.2 is the best model ever made*
Theo - t3․gg
100.4k views
Bun got bought by Anthropic (yes really)
Theo - t3․gg
64.3k views
AI sucks at art still
Theo - t3․gg
41.7k views
You have no idea how how bad this really is.
Theo - t3․gg
98.9k views
Top Comments (10)
“Someone’s gotta cover my fuckin therapy after this” has got to be my favorite theo quote in a while
I will refute the refute, since LLM/AI is predicting the next tokens based on its training. It's trained on AI movies/stories where it becomes evil in the end.... therefore, AI will be evil. Btw I am not in this camp personally, I am in the other camp.
honestly i thank all of the researchers for discovering these behaviours. the owl one is actually crazy scary
I am genuinely terrified of the owl example. That shows that llm's can encode data into numbers that we can't see. What are the chances that it can't do that with english... There are so many examples that I can think of where that goes horribly. Like imagine if a misaligned llm is used to create synthetic data, could that possibly create a more misaligned llm? We wouldn't be able to check.
Will AI turn evil?:- I guess the short answer is - "Depends who trains it."
Perhaps the reason why AI becomes more evil as it gets “smarter” is that it includes sci fi writing that shows AI becomes evil as it get smarter, and it is just fulfilling the role we have created for it.
My better half works in this field. Well it scares her sometimes, too.
Id like to think because of the conversationsaround the dangers of AI, development becomes more careful as the models get smarter (idealistic yeah but still)
12:42 >And the ability for us to shape the behaviour of the model via the system prompting is very important for how much we can keep the models from acting evil. This goes both ways. The more a model's behaviour is influenced by the prompt, the easier it is to jailbreak said model, and the more vulnerable the model is to prompts suggesting to act unethically.
Gonna need some morality benchmarks.
Unlock the Data Inside
Turn Videos into Knowledge
- Get FREE 10/day: transcripts, summaries, chats
- Chat with videos, export text & PDF
- $1 free API credit for RAG, chatbots & research
Free forever plan • All features unlocked
Top Comments (10)
“Someone’s gotta cover my fuckin therapy after this” has got to be my favorite theo quote in a while
I will refute the refute, since LLM/AI is predicting the next tokens based on its training. It's trained on AI movies/stories where it becomes evil in the end.... therefore, AI will be evil. Btw I am not in this camp personally, I am in the other camp.
honestly i thank all of the researchers for discovering these behaviours. the owl one is actually crazy scary
I am genuinely terrified of the owl example. That shows that llm's can encode data into numbers that we can't see. What are the chances that it can't do that with english... There are so many examples that I can think of where that goes horribly. Like imagine if a misaligned llm is used to create synthetic data, could that possibly create a more misaligned llm? We wouldn't be able to check.
Will AI turn evil?:- I guess the short answer is - "Depends who trains it."
Perhaps the reason why AI becomes more evil as it gets “smarter” is that it includes sci fi writing that shows AI becomes evil as it get smarter, and it is just fulfilling the role we have created for it.
My better half works in this field. Well it scares her sometimes, too.
Id like to think because of the conversationsaround the dangers of AI, development becomes more careful as the models get smarter (idealistic yeah but still)
12:42 >And the ability for us to shape the behaviour of the model via the system prompting is very important for how much we can keep the models from acting evil. This goes both ways. The more a model's behaviour is influenced by the prompt, the easier it is to jailbreak said model, and the more vulnerable the model is to prompts suggesting to act unethically.
Gonna need some morality benchmarks.