4 Layer "AI Harness" For LLMs (+54%). Really?
Unlock all features
FREE: Get instant access to 10 AI summaries, chats, or transcripts per day.
Unlock all features
FREE: Get instant access to 10 AI summaries, chats, or transcripts per day.
Unlock all features
FREE: Get instant access to 10 AI summaries, chats, or transcripts per day.
Unlock all features
FREE: Get instant access to 10 AI summaries, chats, or transcripts per day.
Unlock all features
FREE: Get instant access to 10 AI summaries, chats, or transcripts per day.
Related videos
We Finally Know Why T-Rex Had Those Tiny Arms + Other Discoveries
Anton Petrov
297.6k views
LIVE COURT | Karen Read Civil cases back in court for discovery fights.
Emily D. Baker
73.3k views
Strange Discoveries About Runaway Stars That Don't Really Make Sense
Anton Petrov
42.1k views
IT REALLY IS OVER
Timcast
65.6k views
"RED QUEEN" AI means "GAME OVER" for us....
Wes Roth
43.7k views
I’m concerned about AI, for real.
David Ondrej
31.5k views
Bones Discovered in Search for Wanted Dad Travis Decker
Law&Crime Network
263.4k views
They're Actually So MAD At Him For This LOL
Destiny
130.6k views
r/AITA My Former Bully BEGGED for Forgiveness
rSlash
165.7k views
Read Prosecutor Adam Lally Testifies at Discovery Hearing
Emily D. Baker
215.4k views
Top Comments (10)
Love that half this comment section is just independent harness inventors realizing they were not insane, just early.
Of note: Hardly any help at all for Qwen3.6 27B dense model. But significant help for Qwen3.6 35B MOE. Would be nice to taylor a minimalistic version of these principles to optimize Qwen3.6 35B MOE specifically for running Hermes agent, because the moe runs so much faster on local hardware. This would be very useful tons of people currently relying on open router to run their Hermes agent.
I think the correct term here is shim. They created a software shim to close the gap between the LLM and the actual harness.
Another layer and we won't even need the LLM .
This gives me the vibes from the ACE paper that this channel discussed last year, though in this paper the approach is expanded. Thanks for presenting this!
Heh, this is very, very like what I've been building for the last year. I'm actually happy to see this, makes me feel like I'm not crazy in my architecture
I been talking with ai on a idea like this for a year or two for my own local ai its cool to see legit researches doing something like this
Great study revealing the two sides of LLM execution the model’s ability to reason, which is usually scoped at a task level, and the system’s ability to actually channel that reasoning into reliable execution. Many tasks are trivial enough for smaller/local models, but they get blocked by weak interfaces, missing feedback loops, poor tooling, and no structured way to recover from mistakes. Through a self-healing loop like the one demonstrated, you can make offline models perform much closer to frontier models in bounded workflows by running eval loops, detecting failures, correcting them, and codifying those failure cases. Once the common edge cases are mapped, repeatable agentic loops that do not change much can run with near-deterministic reliability, because the system no longer depends only on raw model intelligence, it relies on a harness that has learned how to guide, validate, and repair execution. This applies broadly to any model scenario where you are building agentic systems the frontier is not just bigger models, but better loops around them.
Pretty stoked on the public repo! I might have to make use of this. Thanks, as always.
This maps really well to raiOS. The paper’s core point is “adapt the interface, not the model”: many agent failures bcome from weak runtime contracts, unclear tools, bad action realization, and missing trajectory control. raiOS is basically trying to take that idea down to the OS layer: a deterministic, capability-gated harness around an AI agent, with typed system state, audit, recovery, and local authority. LIFE-HARNESS shows the pattern in benchmarks; raiOS tries to make it a real operating-system architecture.
Unlock the Data Inside
Turn Videos into Knowledge
- Get FREE 10/day: transcripts, summaries, chats
- Chat with videos, export text & PDF
- $1 free API credit for RAG, chatbots & research
Free forever plan • All features unlocked
Top Comments (10)
Love that half this comment section is just independent harness inventors realizing they were not insane, just early.
Of note: Hardly any help at all for Qwen3.6 27B dense model. But significant help for Qwen3.6 35B MOE. Would be nice to taylor a minimalistic version of these principles to optimize Qwen3.6 35B MOE specifically for running Hermes agent, because the moe runs so much faster on local hardware. This would be very useful tons of people currently relying on open router to run their Hermes agent.
I think the correct term here is shim. They created a software shim to close the gap between the LLM and the actual harness.
Another layer and we won't even need the LLM .
This gives me the vibes from the ACE paper that this channel discussed last year, though in this paper the approach is expanded. Thanks for presenting this!
Heh, this is very, very like what I've been building for the last year. I'm actually happy to see this, makes me feel like I'm not crazy in my architecture
I been talking with ai on a idea like this for a year or two for my own local ai its cool to see legit researches doing something like this
Great study revealing the two sides of LLM execution the model’s ability to reason, which is usually scoped at a task level, and the system’s ability to actually channel that reasoning into reliable execution. Many tasks are trivial enough for smaller/local models, but they get blocked by weak interfaces, missing feedback loops, poor tooling, and no structured way to recover from mistakes. Through a self-healing loop like the one demonstrated, you can make offline models perform much closer to frontier models in bounded workflows by running eval loops, detecting failures, correcting them, and codifying those failure cases. Once the common edge cases are mapped, repeatable agentic loops that do not change much can run with near-deterministic reliability, because the system no longer depends only on raw model intelligence, it relies on a harness that has learned how to guide, validate, and repair execution. This applies broadly to any model scenario where you are building agentic systems the frontier is not just bigger models, but better loops around them.
Pretty stoked on the public repo! I might have to make use of this. Thanks, as always.
This maps really well to raiOS. The paper’s core point is “adapt the interface, not the model”: many agent failures bcome from weak runtime contracts, unclear tools, bad action realization, and missing trajectory control. raiOS is basically trying to take that idea down to the OS layer: a deterministic, capability-gated harness around an AI agent, with typed system state, audit, recovery, and local authority. LIFE-HARNESS shows the pattern in benchmarks; raiOS tries to make it a real operating-system architecture.