DeepSeek’s Price Cut Wasn’t Magic

2026-05-27 Science & Technology

3.9k

153

241.0k subscribers

Description

Thanks to Descope for sponsoring this video, checkout Agent Identify Hub: https://descope.plug.dev/BWwF1nd I break down why AI model prices are rising at most labs while DeepSeek cut V4 Pro pricing by 75%, and why prompt caching is the key. I explain the two phases of an LLM request (compute-bound prefill vs memory-bound decode), what the KV cache stores, and why reusing cached prefixes can cut cost and latency, citing the “Don’t Break the Cache” paper’s reported savings. I then cover how DeepSeek’s multi-head latent attention (MLA) shrinks KV cache enough to store it on a distributed disk array instead of expensive HBM, enabling cheap cache-hit pricing. Finally, I share Anthropic/Claude Code’s cache-preserving request structure and the main cache-busters (model/tool changes, dynamic system prompts, naive compaction, upgrades), plus cache-friendly patterns like plan mode tools, cache-safe compaction, and using /rewind. 00:00 AI Price Wars 01:11 Prompt Caching Explained 02:29 What KV Cache Stores 03:53 DeepSeek Disk Caching 05:55 Sponsor Agent Identity 07:48 Claude Code Cache Layers 08:42 Five Cache Busters 11:22 Messages Not Prompts 12:17 Cache Friendly Features My voice to text App: whryte.com Website: https://engineerprompt.ai/ RAG Beyond Basics Course: https://prompt-s-site.thinkific.com/courses/rag Signup for Newsletter, localgpt: https://tally.so/r/3y9bb0 Let's Connect: 🦾 Discord: https://discord.com/invite/t4eYQRUcXB ☕ Buy me a Coffee: https://ko-fi.com/promptengineering |🔴 Patreon: https://www.patreon.com/PromptEngineering 💼Consulting: https://calendly.com/engineerprompt/consulting-call 📧 Business Contact: [email protected] Become Member: http://tinyurl.com/y5h28s6h 💻 Pre-configured localGPT VM: https://bit.ly/localGPT (use Code: PromptEngineering for 50% off). Signup for Newsletter, localgpt: https://tally.so/r/3y9bb0

#prompt engineering #Prompt Engineer #LLMs #AI #artificial Intelligence #Llama #GPT-4 #fine-tuning LLMs

Top Comments (8)

@engineerprompt 2026-05-27

Thanks to Descope for sponsoring this video, checkout Agent Identify Hub: https://descope.plug.dev/BWwF1nd

@orthodox_gentleman 2026-05-27

Plus I use local prompt caching then with all of my context management systems in pi, I rarely pay more than $0.15-30 a session for DeepSeek Pro V4 via their API that also Cache prompts

7 2 replies

@jannegrey 2026-05-27

I'm interested in what you're using for animation. It looks like some sort of web UI, you could probably do it in Claude or something. Can you share? Thanks!

0 4 replies

@Bangs_Theory 2026-05-27

This channel is my top 3 favorite AI Channels.

3 2 replies

@Tony-dp1rl 2026-06-03

Um, I don't want to seem negative here, but this video merged Prompt Cache with the KV Cache, and they are not the same thing.

0 1 replies

@PraneyBehl 2026-05-28

The sponsor plug was totally out of context 😂

@jacowaes 2026-05-29

Hmm, so ... with all the talk about Hermes, learning as you grow ... that basically is exactly what this prompt caching approach is discouraging ?

0 1 replies

@Arshdeep-d6z 2026-05-27

good

0 1 replies

Description

Top Comments (8)

@engineerprompt 2026-05-27

Thanks to Descope for sponsoring this video, checkout Agent Identify Hub: https://descope.plug.dev/BWwF1nd

@orthodox_gentleman 2026-05-27

Plus I use local prompt caching then with all of my context management systems in pi, I rarely pay more than $0.15-30 a session for DeepSeek Pro V4 via their API that also Cache prompts

7 2 replies

@jannegrey 2026-05-27

I'm interested in what you're using for animation. It looks like some sort of web UI, you could probably do it in Claude or something. Can you share? Thanks!

0 4 replies

@Bangs_Theory 2026-05-27

This channel is my top 3 favorite AI Channels.

3 2 replies

@Tony-dp1rl 2026-06-03

Um, I don't want to seem negative here, but this video merged Prompt Cache with the KV Cache, and they are not the same thing.

0 1 replies

@PraneyBehl 2026-05-28

The sponsor plug was totally out of context 😂

@jacowaes 2026-05-29

Hmm, so ... with all the talk about Hermes, learning as you grow ... that basically is exactly what this prompt caching approach is discouraging ?

0 1 replies

@Arshdeep-d6z 2026-05-27

good

0 1 replies

Unlock the Data Inside
Turn Videos into Knowledge

Get FREE 10/day: transcripts, summaries, chats
Chat with videos, export text & PDF
$1 free API credit for RAG, chatbots & research

Try it free

Free forever plan • All features unlocked

DeepSeek’s Price Cut Wasn’t Magic

Description

Top Comments (8)

Related videos

Sonnet 4.5 Is Here—And It’s a Beast at Coding

GPT-OSS Jailbreak with this Simple Trick

Context Engineering is All You NEED!

The Only Embedding Model You Need for RAG

Gemini CLI — Google’s Free Open-Source Coding Agent

AI prompt engineering in 2025: What works and what doesn’t | Sander Schulhoff

The Secret to Perfect Prompts (Without Prompt Engineering)

Anthropic’s Blueprint for Building Lean, Powerful AI Agents

Meet KAG: Supercharging RAG Systems with Advanced Reasoning

Do Anything with Local Agents with AnythingLLM

Unlock all features

Unlock all features

Unlock all features

Unlock all features

Unlock all features

Related videos

Sonnet 4.5 Is Here—And It’s a Beast at Coding

GPT-OSS Jailbreak with this Simple Trick

Context Engineering is All You NEED!

The Only Embedding Model You Need for RAG

Gemini CLI — Google’s Free Open-Source Coding Agent

AI prompt engineering in 2025: What works and what doesn’t | Sander Schulhoff

The Secret to Perfect Prompts (Without Prompt Engineering)

Anthropic’s Blueprint for Building Lean, Powerful AI Agents

Meet KAG: Supercharging RAG Systems with Advanced Reasoning

Do Anything with Local Agents with AnythingLLM

Description

Top Comments (8)

Unlock the Data Inside
Turn Videos into Knowledge

DeepSeek’s Price Cut Wasn’t Magic

Description

Top Comments (8)

Related videos

Sonnet 4.5 Is Here—And It’s a Beast at Coding

GPT-OSS Jailbreak with this Simple Trick

Context Engineering is All You NEED!

The Only Embedding Model You Need for RAG

Gemini CLI — Google’s Free Open-Source Coding Agent

AI prompt engineering in 2025: What works and what doesn’t | Sander Schulhoff

The Secret to Perfect Prompts (Without Prompt Engineering)

Anthropic’s Blueprint for Building Lean, Powerful AI Agents

Meet KAG: Supercharging RAG Systems with Advanced Reasoning

Do Anything with Local Agents with AnythingLLM

Unlock all features

Unlock all features

Unlock all features

Unlock all features

Unlock all features

Related videos

Sonnet 4.5 Is Here—And It’s a Beast at Coding

GPT-OSS Jailbreak with this Simple Trick

Context Engineering is All You NEED!

The Only Embedding Model You Need for RAG

Gemini CLI — Google’s Free Open-Source Coding Agent

AI prompt engineering in 2025: What works and what doesn’t | Sander Schulhoff

The Secret to Perfect Prompts (Without Prompt Engineering)

Anthropic’s Blueprint for Building Lean, Powerful AI Agents

Meet KAG: Supercharging RAG Systems with Advanced Reasoning

Do Anything with Local Agents with AnythingLLM

Description

Top Comments (8)

Unlock the Data Inside Turn Videos into Knowledge

Unlock the Data Inside
Turn Videos into Knowledge