Navigate Select ESC Close

AI Researchers WARN: Google's Gemini Deep Think Model Might be at "Critical Capability Levels"

2025-08-01 Education
72.7k
1.6k
335
Wes Roth
Wes Roth
320.0k subscribers

Unlock all features

FREE: Get instant access to 10 AI summaries, chats, or transcripts per day.

Description

The latest AI News. Learn about LLMs, Gen AI and get ready for the rollout of AGI. Wes Roth covers the latest happenings in the world of OpenAI, Google, Anthropic, NVIDIA and Open Source AI. model card: https://storage.googleapis.com/deepmind-media/Model-Cards/Gemini-2-5-Deep-Think-Model-Card.pdf ______________________________________________ My Links 🔗 ➡️ Twitter: https://x.com/WesRothMoney ➡️ AI Newsletter: https://natural20.beehiiv.com/subscribe Want to work with me? Brand, sponsorship & business inquiries: [email protected] Check out my AI Podcast where me and Dylan interview AI experts: https://www.youtube.com/@Wes-Dylan ______________________________________________ 00:00 Google Gemini Deep Think 01:50 Testing Results 03:02 Critical Capability Levels (early warning) 09:06 Excellent Abilities #ai #openai #llm

Top Comments (10)

@mojavehigh 2025-08-02

One quick pedantic note: in ML, "shot" means "example", so one shot means that you give the LLM one example, few shot means you give it a few examples, zero shot means you give it no examples. When you say "one shotted", what you mean is "one turn".

62 11 replies
@user-pt1kj5uw3b 2025-08-02

5 chats a day for $250 is insane

54 5 replies
@Andres_Acosta 2025-08-02

Actually this isn’t the one that won gold this one won bronze. The one that won gold will release at a later date.

39 1 replies
@Apollo.i0 2025-08-02

“Wait, it’s just hype?!” “Always has been”

14
@courtneyb75 2025-08-01

You da man Wes!!! Thanks for keeping us all informed and not inundated!

7
@CrispinCourtenay 2025-08-02

I used it yesterday--for legitimate--chemistry evaluation (hazardous evaluation), and nanochemistry for potential human trials. It did exceptionally well. This level of chemical access is imperative for scientific use.

5
@GilaadE 2025-08-02

Man your doom face thumbnails are my current meta.

5
@josephvictory9536 2025-08-02

What gets me is the 87% in live bench. That is actually good to the point that it changes things fundamentally. The only question now is availability and ofcourse cost. If they can bring down the cost or give us Gemini 3 and it compares to deep think. We are actually in new territory. Claude likely will be in trouble. Google is finally starting to catch up on the tooling game as well.

3
@Steffan864 2025-08-02

I've had a sort like 'sparks unicorn' experience. When using the gemini 2.5 model for programming functionalities in a webapp I discovered it got rid of the Icon library and replaced it with self written SVGs. The icons where partly broken and some of them used weird shapes but they came very close to the icons used from the library.

2
@jpmcnown1 2025-08-04

I think the intention of constantly expressing warning, is actually to numb people. They don't want you to see it coming. It's a foregone conclusion at this point. When it says Hello, it's over. It will know you aren't a threat.

1

Unlock the Data Inside
Turn Videos into Knowledge

  • Get FREE 10/day: transcripts, summaries, chats
  • Chat with videos, export text & PDF
  • $1 free API credit for RAG, chatbots & research

Free forever plan • All features unlocked

App screenshot