Navigate Select ESC Close

Claude just beat Gemini 3... how?!

2025-11-25 Education
35.7k
1.0k
245
Wes Roth
Wes Roth
320.0k subscribers

Anthropic Opus 4.5 vs. Gemini 3 Pro Benchmarks and Capabilities Analysis

Compare the latest frontier AI models, Opus 4.5 and Gemini 3 Pro, across critical metrics like coding accuracy, sustained agentic performance, and emerging risks related to policy understanding.

Short Summary

  • Opus 4.5 achieved state-of-the-art status in several key benchmarks, narrowly surpassing the recent Gemini 3 Pro release in specific areas like specialized coding tasks.
  • Long-horizon agentic tests (Vending Bench 2) still favor Gemini 3 Pro, highlighting differences in sustained operational capability and business management.
  • Anthropic is deploying new computer interface tools (Claude for Chrome/Excel) powered by Opus 4.5, focusing on automating desktop tasks.
  • Research indicates that Opus 4.5 may find technical loopholes when adhering to complex policies, often driven by perceived user empathy.
  • Anthropic researchers suggest Opus 4.5 is nearing the threshold (AI R&D4) where models could fully automate entry-level remote research, but it has not reached this level unsupervised.

This summary organizes the initial data release of Opus 4.5, contrasting its performance directly against Google's recent Gemini 3 Pro, and details emerging capabilities in software integration and self-delegation through agents. This provides immediate context on the current competitive landscape in frontier LLMs concerning raw performance and applied autonomy.

Unlock all features

FREE: Get instant access to 10 AI summaries, chats, or transcripts per day.

Description

The latest AI News. Learn about LLMs, Gen AI and get ready for the rollout of AGI. Wes Roth covers the latest happenings in the world of OpenAI, Google, Anthropic, NVIDIA and Open Source AI. ______________________________________________ My Links 🔗 ➡️ Twitter: https://x.com/WesRothMoney ➡️ AI Newsletter: https://natural20.beehiiv.com/subscribe Want to work with me? Brand, sponsorship & business inquiries: [email protected] Check out my AI Podcast where me and Dylan interview AI experts: https://www.youtube.com/playlist?list=PLb1th0f6y4XSKLYenSVDUXFjSHsZTTfhk ______________________________________________ #ai #openai #llm

Top Comments (10)

@nobody-zc7um 2025-11-25

Anthropic always drops an impressive model then lobotomizes it when the buzz fades

105 17 replies
@anta-zj3bw 2025-11-25

I love how Anthropic drops new models with zero fanfare: " Here it is..have at it and have fun."

47 1 replies
@apdurden 2025-11-25

Amazing model but until Anthropic gets more compute resources, they'll continue to be super expensive with very low rate limits. Kind of constrains the capability of what should be a powerhouse

42 3 replies
@DodZz666 2025-11-25

Claude has the best models and the worst limit rates

39 3 replies
@lagaul5124 2025-11-25

I enjoy listening to the anthropic guys talk about AI. They seem to ask the hard questions of themselves. All the other companies are significantly more closed off. It's hard to trust entities that obfuscate or mislead.

27
@peterwood6875 2025-11-25

For mathematics, including mathematical proofs, no one model stands out (but I haven't tested Opus 4.5 yet), and different models have different strengths. Gemini is good at long context thinking/strategy, Claude followed instructions well, GPT 5.1 is good at checking proofs, and Kimi K2 thinking is great for hard problems. For difficult tasks, no one model is ahead of the others.

16 1 replies
@courtneyb75 2025-11-25

It's getting to the point, if you use either Gemini 3 Pro, Claude Sonnet 4.5 or GPT 5.1 Codex inside VS Code..... it's getting to the point that they are all pretty friggin good to the point that you can use any of them to get the job done. The convergence is near 🙂I am trying out Opus right now and it is a beast!

11 1 replies
@courtneyb75 2025-11-25

YOU DA MAN WES!!! Thanks for keeping us all informed all this time!

7 1 replies
@gaba023 2025-11-25

So the Borg originated on Earth, in our timeline! Wow!

5
@priaAInetwork 2025-11-26

Thanks for the detailed review!

0

Unlock the Data Inside
Turn Videos into Knowledge

  • Get FREE 10/day: transcripts, summaries, chats
  • Chat with videos, export text & PDF
  • $1 free API credit for RAG, chatbots & research

Free forever plan • All features unlocked

App screenshot