Navigate Select ESC Close

DeepSeek’s Price Cut Wasn’t Magic

2026-05-27 Science & Technology
3.9k
153
12
Prompt Engineering
Prompt Engineering
241.0k subscribers

Unlock all features

FREE: Get instant access to 10 AI summaries, chats, or transcripts per day.

Description

Thanks to Descope for sponsoring this video, checkout Agent Identify Hub: https://descope.plug.dev/BWwF1nd I break down why AI model prices are rising at most labs while DeepSeek cut V4 Pro pricing by 75%, and why prompt caching is the key. I explain the two phases of an LLM request (compute-bound prefill vs memory-bound decode), what the KV cache stores, and why reusing cached prefixes can cut cost and latency, citing the “Don’t Break the Cache” paper’s reported savings. I then cover how DeepSeek’s multi-head latent attention (MLA) shrinks KV cache enough to store it on a distributed disk array instead of expensive HBM, enabling cheap cache-hit pricing. Finally, I share Anthropic/Claude Code’s cache-preserving request structure and the main cache-busters (model/tool changes, dynamic system prompts, naive compaction, upgrades), plus cache-friendly patterns like plan mode tools, cache-safe compaction, and using /rewind. 00:00 AI Price Wars 01:11 Prompt Caching Explained 02:29 What KV Cache Stores 03:53 DeepSeek Disk Caching 05:55 Sponsor Agent Identity 07:48 Claude Code Cache Layers 08:42 Five Cache Busters 11:22 Messages Not Prompts 12:17 Cache Friendly Features My voice to text App: whryte.com Website: https://engineerprompt.ai/ RAG Beyond Basics Course: https://prompt-s-site.thinkific.com/courses/rag Signup for Newsletter, localgpt: https://tally.so/r/3y9bb0 Let's Connect: 🦾 Discord: https://discord.com/invite/t4eYQRUcXB ☕ Buy me a Coffee: https://ko-fi.com/promptengineering |🔴 Patreon: https://www.patreon.com/PromptEngineering 💼Consulting: https://calendly.com/engineerprompt/consulting-call 📧 Business Contact: [email protected] Become Member: http://tinyurl.com/y5h28s6h 💻 Pre-configured localGPT VM: https://bit.ly/localGPT (use Code: PromptEngineering for 50% off). Signup for Newsletter, localgpt: https://tally.so/r/3y9bb0

Top Comments (8)

@engineerprompt 2026-05-27

Thanks to Descope for sponsoring this video, checkout Agent Identify Hub: https://descope.plug.dev/BWwF1nd

1
@orthodox_gentleman 2026-05-27

Plus I use local prompt caching then with all of my context management systems in pi, I rarely pay more than $0.15-30 a session for DeepSeek Pro V4 via their API that also Cache prompts

7 2 replies
@jannegrey 2026-05-27

I'm interested in what you're using for animation. It looks like some sort of web UI, you could probably do it in Claude or something. Can you share? Thanks!

0 4 replies
@Bangs_Theory 2026-05-27

This channel is my top 3 favorite AI Channels.

3 2 replies
@Tony-dp1rl 2026-06-03

Um, I don't want to seem negative here, but this video merged Prompt Cache with the KV Cache, and they are not the same thing.

0 1 replies
@PraneyBehl 2026-05-28

The sponsor plug was totally out of context 😂

1
@jacowaes 2026-05-29

Hmm, so ... with all the talk about Hermes, learning as you grow ... that basically is exactly what this prompt caching approach is discouraging ?

0 1 replies
@Arshdeep-d6z 2026-05-27

good

0 1 replies

Unlock the Data Inside
Turn Videos into Knowledge

  • Get FREE 10/day: transcripts, summaries, chats
  • Chat with videos, export text & PDF
  • $1 free API credit for RAG, chatbots & research

Free forever plan • All features unlocked

App screenshot