Home
Channel
Discover AI
4 LLMs Tested in Codex, Claude Code, Hermes & OpenClaw (FinAI)

4 LLMs Tested in Codex, Claude Code, Hermes & OpenClaw (FinAI)

2026-05-17 Science & Technology

502

88.6k subscribers

Description

Should you combined Claude CODE with GPT-5.4 or is Codex with Sonnet 4.6 better? Which LLMs performs best (like a Qwen 27B or a 397B) with open source agent frameworks like OpenClaw or Hermes? After 32000 hours of NVIDIA GPU we have performance data on all 20 combinations! And the winner is .... By the way: Thank you to NVIDIA to provide 32000 hours of testing for the researcher to experiment and validate the performance of FinTech and FinAI on real world financial data and the current performance of the best AI systems for financial market predictions (maximum risks). all rights w/ authors: "HERCULEAN: An Agentic Benchmark for Financial Intelligence" Xueqing Peng1, Zhuohan Xie9, Yupeng Cao4, Haohang Li4, Lingfei Qian1, Yan Wang1, Vincent Jim Zhang1, Huan He1, Xuguang Ai1, Linhai Ma1, Ruoyu Xiang6, Yueru He3, Yi Han7, Shuyao Wang1, Yuqing Guo1, Mingyang Jiang1, Yilun Zhao2, Youzhong Dong1, Xiaoyu Wang6, Yankai Chen9,20, Ye Yuan20,21, Qiyuan Zhang9, Fuyuan Lyu20,21, Haolun Wu20,21, Yonghan Yang9, Zichen Zhao9, Yuyang Dai1, Fan Zhang9, Rania Elbadry9, Ayesha Gull1, Muhammad Usman Safder1, Nuo Chen16, Fengbin Zhu16, Tianshi Cai14, Zimu Wang14, Polydoros Giannouris18, Yuechen Jiang18, Zhiwei Liu18, Mohsinul Kabir18, Yuyan Wang18, Yixiang Zheng18, Yangyang Yu4, Weijin Liu4, Wenbo Cao1, Anke Xu1, Peng Lu10, Jerry Huang10, Mingquan Lin11, Prayag Tiwari17, Yijia Zhao12, Victor Gutierrez Basulto19, Xiao-Yang Liu3, Kaleb E. Smith5, Jiahuan Pei15, Arman Cohan2, Jimin Huang1,10, Yuehua Tang8, Alejandro Lopez-Lira8, Xi Chen6, Xue Liu9,20,21, Junichi Tsujii13, Jian-Yun Nie10, Sophia Ananiadou18 1The Fin AI, 2Yale University, 3Columbia University, 4Stevens Institute of Technology, 5NVIDIA, 6New York University, 7Georgia Institute of Technology, 8University of Florida, 9MBZUAI, 10Université de Montréal, 11University of Minnesota, 12University of Massachusetts Boston, 13National Institute of Advanced Industrial Science and Technology, 14University of Liverpool, 15Vrije Universiteit Amsterdam, 16National University of Singapore, 17Halmstad University, 18University of Manchester, 19Cardiff University, 20McGill University, 21Mila – Quebec AI Institute #airesearch #financialmarkets #nvidia #openclaw #hermes #aiagents #claudecode #claudeai #chatgpt5

#artificial intelligence #AI models #LLM #VLM #VLA #Multi-modal model #explanatory video #RAG

Top Comments (5)

@scotter 2026-05-17

I have used both OpenClaw and Hermes and side by side. One thing I noticed is OpenClaw would "win" on first try of more things while Hermes would "learn" and "win" if you (a) tell it to learn from what happened; and/or (b) do the task enough times for Hermes to learn on its own. I found after 2 weeks of use of both of them that Hermes pulled far ahead with "reasoning", task completion, token usage, and speed of execution. It constantly refined itself. Eventually, the differences were so stark that I uninstalled OpenClaw.

@KarlPages-i3n 2026-05-19

I couldn't possibly keep up with the important breakthroughs and analysis' of reasoning NLP s and agentic workflow systems . Thanks goodness I rely on rigorous mathematical standards provided on this channel. Thanks in advance for the next period until I comment again.

@EmergentSkill 2026-05-18

Hey the paper specified on the left is wrong, it has to be 2605.14355

@andrewmalcolm79 2026-05-17

I think Qwen probably performs best with Hermes because most of the Hermes devs use Qwen locally but Grok now officially support Hermes so maybe the Grok will be better supported to.

3 1 replies

@snapo1750 2026-05-18

Why the fuck did the researches test Qwen3.5 instead of Qwen3.6 27B ... Qwen3.5 fails on most tool calls and much more....

0 1 replies

Description

Top Comments (5)

@scotter 2026-05-17

@KarlPages-i3n 2026-05-19

@EmergentSkill 2026-05-18

Hey the paper specified on the left is wrong, it has to be 2605.14355

@andrewmalcolm79 2026-05-17

I think Qwen probably performs best with Hermes because most of the Hermes devs use Qwen locally but Grok now officially support Hermes so maybe the Grok will be better supported to.

3 1 replies

@snapo1750 2026-05-18

Why the fuck did the researches test Qwen3.5 instead of Qwen3.6 27B ... Qwen3.5 fails on most tool calls and much more....

0 1 replies

Unlock the Data Inside
Turn Videos into Knowledge

Get FREE 10/day: transcripts, summaries, chats
Chat with videos, export text & PDF
$1 free API credit for RAG, chatbots & research

Try it free

Free forever plan • All features unlocked

4 LLMs Tested in Codex, Claude Code, Hermes & OpenClaw (FinAI)

Description

Top Comments (5)

Related videos

We Finally Know Why T-Rex Had Those Tiny Arms + Other Discoveries

Claude Opus 4.8 Is Too Smart… and TOO HONEST

I'm switching to Hermes (goodbye OpenClaw!!)

Lutnick Under Fire on EPSTEIN in CLOSED DOOR Testimony

OpenAI Misses Targets, Codex vs Claude, Elon vs Sam Trial, Big Hyperscaler Beats, Peptide Craze

HERMES AGENT SETUP: the OpenClaw killer is here

Claude just became OpenClaw

Hegseth Freezes ... Incriminates Trump In Air Force One Disaster

Head of Claude Code: What happens after coding is solved | Boris Cherny

AI Discovers Anomalies in Hubble Images We Never Knew Existed

Unlock all features

Unlock all features

Unlock all features

Unlock all features

Unlock all features

Related videos

We Finally Know Why T-Rex Had Those Tiny Arms + Other Discoveries

Claude Opus 4.8 Is Too Smart… and TOO HONEST

I'm switching to Hermes (goodbye OpenClaw!!)

Lutnick Under Fire on EPSTEIN in CLOSED DOOR Testimony

OpenAI Misses Targets, Codex vs Claude, Elon vs Sam Trial, Big Hyperscaler Beats, Peptide Craze

HERMES AGENT SETUP: the OpenClaw killer is here

Claude just became OpenClaw

Hegseth Freezes ... Incriminates Trump In Air Force One Disaster

Head of Claude Code: What happens after coding is solved | Boris Cherny

AI Discovers Anomalies in Hubble Images We Never Knew Existed

Description

Top Comments (5)

Unlock the Data Inside
Turn Videos into Knowledge

4 LLMs Tested in Codex, Claude Code, Hermes & OpenClaw (FinAI)

Description

Top Comments (5)

Related videos

We Finally Know Why T-Rex Had Those Tiny Arms + Other Discoveries

Claude Opus 4.8 Is Too Smart… and TOO HONEST

I'm switching to Hermes (goodbye OpenClaw!!)

Lutnick Under Fire on EPSTEIN in CLOSED DOOR Testimony

OpenAI Misses Targets, Codex vs Claude, Elon vs Sam Trial, Big Hyperscaler Beats, Peptide Craze

HERMES AGENT SETUP: the OpenClaw killer is here

Claude just became OpenClaw

Hegseth Freezes ... Incriminates Trump In Air Force One Disaster

Head of Claude Code: What happens after coding is solved | Boris Cherny

AI Discovers Anomalies in Hubble Images We Never Knew Existed

Unlock all features

Unlock all features

Unlock all features

Unlock all features

Unlock all features

Related videos

We Finally Know Why T-Rex Had Those Tiny Arms + Other Discoveries

Claude Opus 4.8 Is Too Smart… and TOO HONEST

I'm switching to Hermes (goodbye OpenClaw!!)

Lutnick Under Fire on EPSTEIN in CLOSED DOOR Testimony

OpenAI Misses Targets, Codex vs Claude, Elon vs Sam Trial, Big Hyperscaler Beats, Peptide Craze

HERMES AGENT SETUP: the OpenClaw killer is here

Claude just became OpenClaw

Hegseth Freezes ... Incriminates Trump In Air Force One Disaster

Head of Claude Code: What happens after coding is solved | Boris Cherny

AI Discovers Anomalies in Hubble Images We Never Knew Existed

Description

Top Comments (5)

Unlock the Data Inside Turn Videos into Knowledge

Unlock the Data Inside
Turn Videos into Knowledge