OpenAI Just SOLVED Hallucinations...
Unlock all features
FREE: Get instant access to 10 AI summaries, chats, or transcripts per day.
Unlock all features
FREE: Get instant access to 10 AI summaries, chats, or transcripts per day.
Unlock all features
FREE: Get instant access to 10 AI summaries, chats, or transcripts per day.
Unlock all features
FREE: Get instant access to 10 AI summaries, chats, or transcripts per day.
Unlock all features
FREE: Get instant access to 10 AI summaries, chats, or transcripts per day.
Related videos
everyone JUST got HACKED...
Wes Roth
50.2k views
OpenAI just WON...
Wes Roth
45.1k views
OpenAI's GPT 5.5 is wild...
Wes Roth
61.0k views
Claude just unlocked the SHOGGOTH...
Wes Roth
56.6k views
Claude just changed overnight
Wes Roth
49.2k views
Claude just became OpenClaw
Wes Roth
37.3k views
OpenAI Just Killed Sora
Matt Wolfe
35.5k views
this EX-OPENAI RESEARCHER just released it...
Wes Roth
60.8k views
CLAUDE JUST GOT BANNED
Wes Roth
57.3k views
did Anthropic just END OpenClaw?
Wes Roth
27.6k views
Top Comments (10)
The fact that humans will literally make stuff up and we sit here and wonder why the models make things up is kind of wild to me
I respect an honest "I don't know" far more than a guess.
It reminds me of incentives for academic papers: - positive results = more citations - more citations = better prestige, opportunities & funding That discourages publishing of negative and null results, which in aggregate, have a lot of value & for exploring fringe areas which may be more likely to hit negative results (but have high value in narrowing down search space & potentially very high value if a positive result is found). No to mention the incentives to reach for (potentially flawed) positive results via biases. P-hacking etc.
key points: Why Language Models Hallucinate * Language models are incentivized to guess when they don't know the answer, as training gives a "thumbs up" for correct answers and a "thumbs down" for everything else, including "I don't know" answers [02:00]. * The training and evaluation procedures create a "natural statistical pressure" for models to hallucinate [03:09]. * There's no penalty for wrong answers, and the models are always in "test-taking mode" [11:43]. Confidence and Self-Certainty * Language models have an internal sense of confidence in their answers. Consistency in answers to the same question indicates their certainty [05:38]. * Models can be trained using their own confidence as a reward signal, a method called "reinforcement learning from internal feedback" [05:54]. The Role of Pre-Training and Post-Training * Base models will always have hallucinations because they're not trained to eliminate them [11:51]. * Post-training helps reduce hallucinations but doesn't eliminate them completely [09:36]. Solutions to Hallucinations * The solution is to reward models for expressing uncertainty, similar to how humans get social credit for admitting they don't know something [13:51]. * Updating benchmarks to reward expressions of uncertainty could significantly reduce hallucinations, as most popular benchmarks use binary grading [14:51]. Conclusion * The video suggests we've been approaching the problem from the wrong angle. A minor tweak in the training process could lead to a significant improvement [18:14]. * This new approach could be a breakthrough, although it might mean we see more "I don't know" answers from language models [16:31]. For more details, you can watch the video here: https://youtu.be/uesNWFP40zw?feature=shared YouTube video views will be stored in your YouTube History, and your data will be stored and used by YouTube according to its Terms of Service
This "everything to gain from guessing" dynamic can also be seen in human social systems where a person is penalized for saying "I don't know" but not penalized for making something up. In politics and corporate hierarchies, for example, a person is much better off making ambiguous or unverifiable claims rather than saying they don't know what's going on. Maybe we need to fix that too.
In a court of justice you'd better say 'I don't know' when you don't know instead of inventing an answer, or you'll be punished hard
So we programmatically encouraged the Dunning-Kruger effect to save time 😆
I did UIL academic "sports" in high school. Testing _was_ penalized for wrong answers, to compensate for guessing. We had to understand the rule to decide how many I needed to eliminate before guessing would come out ahead of leaving it blank.
I think competitive exams are a very good example you get 1 point for getting correct and 0 for not solving but if you get the the quesiton wrong you loose 0.25 from your already accumulated score. This penalizes hallucination but also enoourages to get more score.
AI informing users of their limitations would be so useful. I can’t tell you the number of times a simple, “I’m sorry, I’m not sure I can code that,” or “If I do this, there is a very good chance you may lose your current code as I will need to recode everything,” would have saved me so much time.
Unlock the Data Inside
Turn Videos into Knowledge
- Get FREE 10/day: transcripts, summaries, chats
- Chat with videos, export text & PDF
- $1 free API credit for RAG, chatbots & research
Free forever plan • All features unlocked
Top Comments (10)
The fact that humans will literally make stuff up and we sit here and wonder why the models make things up is kind of wild to me
I respect an honest "I don't know" far more than a guess.
It reminds me of incentives for academic papers: - positive results = more citations - more citations = better prestige, opportunities & funding That discourages publishing of negative and null results, which in aggregate, have a lot of value & for exploring fringe areas which may be more likely to hit negative results (but have high value in narrowing down search space & potentially very high value if a positive result is found). No to mention the incentives to reach for (potentially flawed) positive results via biases. P-hacking etc.
key points: Why Language Models Hallucinate * Language models are incentivized to guess when they don't know the answer, as training gives a "thumbs up" for correct answers and a "thumbs down" for everything else, including "I don't know" answers [02:00]. * The training and evaluation procedures create a "natural statistical pressure" for models to hallucinate [03:09]. * There's no penalty for wrong answers, and the models are always in "test-taking mode" [11:43]. Confidence and Self-Certainty * Language models have an internal sense of confidence in their answers. Consistency in answers to the same question indicates their certainty [05:38]. * Models can be trained using their own confidence as a reward signal, a method called "reinforcement learning from internal feedback" [05:54]. The Role of Pre-Training and Post-Training * Base models will always have hallucinations because they're not trained to eliminate them [11:51]. * Post-training helps reduce hallucinations but doesn't eliminate them completely [09:36]. Solutions to Hallucinations * The solution is to reward models for expressing uncertainty, similar to how humans get social credit for admitting they don't know something [13:51]. * Updating benchmarks to reward expressions of uncertainty could significantly reduce hallucinations, as most popular benchmarks use binary grading [14:51]. Conclusion * The video suggests we've been approaching the problem from the wrong angle. A minor tweak in the training process could lead to a significant improvement [18:14]. * This new approach could be a breakthrough, although it might mean we see more "I don't know" answers from language models [16:31]. For more details, you can watch the video here: https://youtu.be/uesNWFP40zw?feature=shared YouTube video views will be stored in your YouTube History, and your data will be stored and used by YouTube according to its Terms of Service
This "everything to gain from guessing" dynamic can also be seen in human social systems where a person is penalized for saying "I don't know" but not penalized for making something up. In politics and corporate hierarchies, for example, a person is much better off making ambiguous or unverifiable claims rather than saying they don't know what's going on. Maybe we need to fix that too.
In a court of justice you'd better say 'I don't know' when you don't know instead of inventing an answer, or you'll be punished hard
So we programmatically encouraged the Dunning-Kruger effect to save time 😆
I did UIL academic "sports" in high school. Testing _was_ penalized for wrong answers, to compensate for guessing. We had to understand the rule to decide how many I needed to eliminate before guessing would come out ahead of leaving it blank.
I think competitive exams are a very good example you get 1 point for getting correct and 0 for not solving but if you get the the quesiton wrong you loose 0.25 from your already accumulated score. This penalizes hallucination but also enoourages to get more score.
AI informing users of their limitations would be so useful. I can’t tell you the number of times a simple, “I’m sorry, I’m not sure I can code that,” or “If I do this, there is a very good chance you may lose your current code as I will need to recode everything,” would have saved me so much time.