Navigate Select ESC Close

Orchestration Over Architecture: What Stanford Found

2026-05-04 Science & Technology
3.5k
206
16
Prompt Engineering
Prompt Engineering
241.0k subscribers

Unlock all features

FREE: Get instant access to 10 AI summaries, chats, or transcripts per day.

Description

Thanks to Data Impluse for sponsoring this video: https://dataimpulse.com/?utm_source=youtube&utm_medium=video&utm_campaign=engineerprompt Two new papers from Stanford and Tsinghua just put hard numbers on something most agent builders have been feeling — the orchestration code wrapping your LLM now drives more performance variation than the model itself. Same model, six-times the gap, depending entirely on what researchers are calling the harness. If you build agents, the lever you should be pulling is almost never the one you've been reaching for. LINKS: Tsinghua University: https://arxiv.org/abs/2603.25723 Stanford University: https://arxiv.org/abs/2603.28052v1 My voice to text App: whryte.com Website: https://engineerprompt.ai/ RAG Beyond Basics Course: https://prompt-s-site.thinkific.com/courses/rag Signup for Newsletter, localgpt: https://tally.so/r/3y9bb0 Let's Connect: 🦾 Discord: https://discord.com/invite/t4eYQRUcXB ☕ Buy me a Coffee: https://ko-fi.com/promptengineering |🔴 Patreon: https://www.patreon.com/PromptEngineering 💼Consulting: https://calendly.com/engineerprompt/consulting-call 📧 Business Contact: [email protected] Become Member: http://tinyurl.com/y5h28s6h 💻 Pre-configured localGPT VM: https://bit.ly/localGPT (use Code: PromptEngineering for 50% off). Signup for Newsletter, localgpt: https://tally.so/r/3y9bb0 00:00 Harness Beats Model 01:12 What Is a Harness 02:44 What's wrong with Harness Today 04:02 Ablations and Compute Costs 05:25 Natural Language Migration Win 06:29 Sponsor Data Impulse 08:02 Meta Harness Auto Optimization 10:00 Transferable Harness Insight 11:31 Subtraction Principle 13:12 Audit Checklist for Builders

Top Comments (10)

@silentobserver964 2026-05-04

A wrapper now becomes an architecture 😂

15
@omineirotech 2026-05-10

The truth is: we software engineers love making simple things unnecessarily complicated, inventing new buzzwords for concepts that already exist just to sound more “cutting-edge,” and over-engineering stuff that should stay dead simple for one reason only: ego. And honestly? About 90% of the “new” AI engineering best practices that pop up every single week are straight-up bullshit. We’ve always done this as developers, but with the fear and anxiety AI created, this nonsense has probably tripled or quadrupled over the last year. What’s even crazier is seeing all these “AI gurus” selling AI courses and even full-blown graduate programs, while the entire space is still chaotic and nobody has — or will have — a definitive answer anytime soon. LLMs are getting major updates literally every week. And a lot of those updates are already shipping with built-in support for many of the “advanced practices” these gurus are trying to teach. Meaning people are paying thousands of dollars to learn workflows that will probably become obsolete in a matter of days because the models themselves will handle most of that orchestration natively by default. These guys are teaching things that could be outdated by next week. Nowadays, the first thing I check before consuming AI-related content is the upload date. If the video is two weeks old, there’s already a huge chance it’s outdated or full of nonsense. Now imagine an “AI graduate program” whose curriculum was planned six months ago…

11 3 replies
@Stewz66 2026-05-04

My favorite engineer-tuber.

8 1 replies
@egoincarnate 2026-05-05

Haiku beating Opus with a better harness would be a big deal, but it doesn't appear to be shown here. (Re: 9:33) In the paper 76.4% was for Meta Harness on Opus not Haiku. Haiku was only 37.6%. "On Opus 4.6, Meta-Harness discovers a harness achieving 76.4% ... On the weaker Haiku 4.5 model, the improvement is larger: Meta-Harness achieves 37.6%"

3
@HassanAllaham 2026-05-05

Thanks for the very good content 🌹and also thanks for use dark background 🌹🌹🌹

2
@kaxoxinho 2026-05-04

Great presentation and thank you for your insights. What software do you use for this beautiful slides'

1 1 replies
@parapadirapa 2026-05-14

1. How do I know within Hermes with Deepseek v4 what is in my context that shouldn't be there? 2. The same for tool use.. is there a tool or skill that produces insights? 3. Same with verifiers and logic, where do I look for that within Hermes?

0
@gauravvij137 2026-05-18

Audit harness before even thinking about model switch is the biggest takeaway from this video and hands down the best approach. Usually devs offset the agent issues by switching to a bigger and capable model which may hide the underlying issues but sooner or later they would come out in the form of edge cases. I also recently published a write-up on how we used Neo AI engineer to optimize a customer support RAG agent. It covers architectural changes like RAG similarity score optimization, context enhancement that were implemented before even considering a model switch. The changes alone led to 19% performance increase and model switch led to 79% cost reduction. So I guess both go hand in hand.

0
@jarad4621 2026-05-08

Awesome stuff keep it going with this harness engineering stuff you've convinced me to work on my own harness now, would you recommend starting with PI as a good minimalist base or is that still too much, best to start with even less?

0
@BjornLarson-z8f 2026-05-16

its an interesting take. ive seen plenty of videos explaining the same thing, but with different words. seeing another perspective on the same ideas is good. give it a year and there will be a "standard" way to communicate these concepts, likely with several new words or abbreviations for abstracting specifics.

0

Unlock the Data Inside
Turn Videos into Knowledge

  • Get FREE 10/day: transcripts, summaries, chats
  • Chat with videos, export text & PDF
  • $1 free API credit for RAG, chatbots & research

Free forever plan • All features unlocked

App screenshot