Home
Channel
Prompt Engineering
Is This the End of RAG? Anthropic's NEW Prompt Caching

Is This the End of RAG? Anthropic's NEW Prompt Caching

2024-08-15 Science & Technology

74.1k

1.3k

Watch on YouTube

Prompt Engineering

241.0k subscribers

Description

Anthropic's new prompt caching with Claude can reduce costs by 90% and latency by 85%. This video explores its similarities and differences with Google's context caching in Gemini models, different use cases, and performance impacts. Learn about practical caching strategies, cost considerations, and whether context caching can replace Retrieval-Augmented Generation (RAG). LINKS: Blogpost: https://www.anthropic.com/news/prompt-caching API Docs: https://docs.anthropic.com/en/docs/build-with-claude/prompt-caching#caching-tool-definitions Gemini Context Cache: https://ai.google.dev/gemini-api/docs/caching?lang=python Notebook: https://github.com/anthropics/anthropic-cookbook/blob/main/misc/prompt_caching.ipynb 💻 RAG Beyond Basics Course: https://prompt-s-site.thinkific.com/courses/rag Let's Connect: 🦾 Discord: https://discord.com/invite/t4eYQRUcXB ☕ Buy me a Coffee: https://ko-fi.com/promptengineering |🔴 Patreon: https://www.patreon.com/PromptEngineering 💼Consulting: https://calendly.com/engineerprompt/consulting-call 📧 Business Contact: [email protected] Become Member: http://tinyurl.com/y5h28s6h 💻 Pre-configured localGPT VM: https://bit.ly/localGPT (use Code: PromptEngineering for 50% off). Signup for Newsletter, localgpt: https://tally.so/r/3y9bb0 TIMESTAMPS 00:00 Introduction to Prompt Caching with Claude 00:29 Understanding Prompt Caching Benefits 01:32 Use Cases for Prompt Caching 03:04 Cost and Latency Reductions 05:14 Comparing Claude and Gemini Context Caching 07:45 Best Practices for Effective Caching 11:22 Code Example and Practical Implementation All Interesting Videos: Everything LangChain: https://www.youtube.com/playlist?list=PLVEEucA9MYhOu89CX8H3MBZqayTbcCTMr Everything LLM: https://youtube.com/playlist?list=PLVEEucA9MYhNF5-zeb4Iw2Nl1OKTH-Txw Everything Midjourney: https://youtube.com/playlist?list=PLVEEucA9MYhMdrdHZtFeEebl20LPkaSmw AI Image Generation: https://youtube.com/playlist?list=PLVEEucA9MYhPVgYazU5hx6emMXtargd4z

#prompt engineering #Prompt Engineer #LLMs #AI #artificial Intelligence #Llama #GPT-4 #fine-tuning LLMs

Top Comments (10)

@laviray5447 2024-08-15

In short: it's not a replacement to RAG

216 13 replies

@RostyslavB 2024-08-15

That 5 min are refreshed each time it is used. Meaning it can be forever if you keep chatting and AI keep accessing cached content. On Gemini page it is 1h but without refreshes. That`s what I understood from that text at least

@JavierReyesMoreno 2024-08-15

I think it is amazing. With something like Claude Dev, after reviewing the code in a project, prompts become gigantic and costs skyrocket. Caching will be a great addon for this use case. And yes, I agree that five minutes is a bit short.

@engineerprompt 2024-08-15

Check out the RAG Beyond Basics Course: https://prompt-s-site.thinkific.com/courses/rag

8 1 replies

@ibrahimaba8966 2024-08-17

The cache duration is 5 minutes, but it resets each time a new request is made. So as long as you keep sending requests, the cache is continuously refreshed.

@deepakachu 2024-08-22

you gave absolutely 0 explanation of how the caching works.

@micbab-vg2mu 2024-08-15

Do you pland to show us how to use this huge cached context window iwith RAGs :) the old RAGs systems wher niot good enough for my industry (minimum 95%) - maybe the new approched will be better :)

@sun-ship 2024-08-15

Thank you for keeping up with this always changing world.

@antonijo01 2024-08-15

Can you show how to do the same with complex large codebase?

@justinnkim 2024-08-23

This is a great video that is giving me great ideas. Thank you

Description

Top Comments (10)

@laviray5447 2024-08-15

In short: it's not a replacement to RAG

216 13 replies

@RostyslavB 2024-08-15

@JavierReyesMoreno 2024-08-15

@engineerprompt 2024-08-15

Check out the RAG Beyond Basics Course: https://prompt-s-site.thinkific.com/courses/rag

8 1 replies

@ibrahimaba8966 2024-08-17

The cache duration is 5 minutes, but it resets each time a new request is made. So as long as you keep sending requests, the cache is continuously refreshed.

@deepakachu 2024-08-22

you gave absolutely 0 explanation of how the caching works.

@micbab-vg2mu 2024-08-15

Do you pland to show us how to use this huge cached context window iwith RAGs :) the old RAGs systems wher niot good enough for my industry (minimum 95%) - maybe the new approched will be better :)

@sun-ship 2024-08-15

Thank you for keeping up with this always changing world.

@antonijo01 2024-08-15

Can you show how to do the same with complex large codebase?

@justinnkim 2024-08-23

This is a great video that is giving me great ideas. Thank you

Unlock the Data Inside
Turn Videos into Knowledge

Get FREE 10/day: transcripts, summaries, chats
Chat with videos, export text & PDF
$1 free API credit for RAG, chatbots & research

Try it free

Free forever plan • All features unlocked

Is This the End of RAG? Anthropic's NEW Prompt Caching

Description

Top Comments (10)

Related videos

Holy sh*t I think Anthropic is profitable now

THIS IS THE END OF AMERICA

Software engineering is dead now

Anthropic is lying to us.

Is This The End Of The Psyop? Or Just The Beginning? - Clip

Anthropic confirms software engineering is NOT dead

Sonnet 4.5 Is Here—And It’s a Beast at Coding

Is this the end of Chrome?

GPT-OSS Jailbreak with this Simple Trick

Context Engineering is All You NEED!

Unlock all features

Unlock all features

Unlock all features

Unlock all features

Unlock all features

Related videos

Holy sh*t I think Anthropic is profitable now

THIS IS THE END OF AMERICA

Software engineering is dead now

Anthropic is lying to us.

Is This The End Of The Psyop? Or Just The Beginning? - Clip

Anthropic confirms software engineering is NOT dead

Sonnet 4.5 Is Here—And It’s a Beast at Coding

Is this the end of Chrome?

GPT-OSS Jailbreak with this Simple Trick

Context Engineering is All You NEED!

Description

Top Comments (10)

Unlock the Data Inside
Turn Videos into Knowledge

Is This the End of RAG? Anthropic's NEW Prompt Caching

Description

Top Comments (10)

Related videos

Holy sh*t I think Anthropic is profitable now

THIS IS THE END OF AMERICA

Software engineering is dead now

Anthropic is lying to us.

Is This The End Of The Psyop? Or Just The Beginning? - Clip

Anthropic confirms software engineering is NOT dead

Sonnet 4.5 Is Here—And It’s a Beast at Coding

Is this the end of Chrome?

GPT-OSS Jailbreak with this Simple Trick

Context Engineering is All You NEED!

Unlock all features

Unlock all features

Unlock all features

Unlock all features

Unlock all features

Related videos

Holy sh*t I think Anthropic is profitable now

THIS IS THE END OF AMERICA

Software engineering is dead now

Anthropic is lying to us.

Is This The End Of The Psyop? Or Just The Beginning? - Clip

Anthropic confirms software engineering is NOT dead

Sonnet 4.5 Is Here—And It’s a Beast at Coding

Is this the end of Chrome?

GPT-OSS Jailbreak with this Simple Trick

Context Engineering is All You NEED!

Description

Top Comments (10)

Unlock the Data Inside Turn Videos into Knowledge

Unlock the Data Inside
Turn Videos into Knowledge