Navigate Select ESC Close

Is This the End of RAG? Anthropic's NEW Prompt Caching

2024-08-15 Science & Technology
74.1k
1.3k
79
Prompt Engineering
Prompt Engineering
241.0k subscribers

Unlock all features

FREE: Get instant access to 10 AI summaries, chats, or transcripts per day.

Description

Anthropic's new prompt caching with Claude can reduce costs by 90% and latency by 85%. This video explores its similarities and differences with Google's context caching in Gemini models, different use cases, and performance impacts. Learn about practical caching strategies, cost considerations, and whether context caching can replace Retrieval-Augmented Generation (RAG). LINKS: Blogpost: https://www.anthropic.com/news/prompt-caching API Docs: https://docs.anthropic.com/en/docs/build-with-claude/prompt-caching#caching-tool-definitions Gemini Context Cache: https://ai.google.dev/gemini-api/docs/caching?lang=python Notebook: https://github.com/anthropics/anthropic-cookbook/blob/main/misc/prompt_caching.ipynb 💻 RAG Beyond Basics Course: https://prompt-s-site.thinkific.com/courses/rag Let's Connect: 🦾 Discord: https://discord.com/invite/t4eYQRUcXB ☕ Buy me a Coffee: https://ko-fi.com/promptengineering |🔴 Patreon: https://www.patreon.com/PromptEngineering 💼Consulting: https://calendly.com/engineerprompt/consulting-call 📧 Business Contact: [email protected] Become Member: http://tinyurl.com/y5h28s6h 💻 Pre-configured localGPT VM: https://bit.ly/localGPT (use Code: PromptEngineering for 50% off). Signup for Newsletter, localgpt: https://tally.so/r/3y9bb0 TIMESTAMPS 00:00 Introduction to Prompt Caching with Claude 00:29 Understanding Prompt Caching Benefits 01:32 Use Cases for Prompt Caching 03:04 Cost and Latency Reductions 05:14 Comparing Claude and Gemini Context Caching 07:45 Best Practices for Effective Caching 11:22 Code Example and Practical Implementation All Interesting Videos: Everything LangChain: https://www.youtube.com/playlist?list=PLVEEucA9MYhOu89CX8H3MBZqayTbcCTMr Everything LLM: https://youtube.com/playlist?list=PLVEEucA9MYhNF5-zeb4Iw2Nl1OKTH-Txw Everything Midjourney: https://youtube.com/playlist?list=PLVEEucA9MYhMdrdHZtFeEebl20LPkaSmw AI Image Generation: https://youtube.com/playlist?list=PLVEEucA9MYhPVgYazU5hx6emMXtargd4z

Top Comments (10)

@laviray5447 2024-08-15

In short: it's not a replacement to RAG

216 13 replies
@RostyslavB 2024-08-15

That 5 min are refreshed each time it is used. Meaning it can be forever if you keep chatting and AI keep accessing cached content. On Gemini page it is 1h but without refreshes. That`s what I understood from that text at least

13
@JavierReyesMoreno 2024-08-15

I think it is amazing. With something like Claude Dev, after reviewing the code in a project, prompts become gigantic and costs skyrocket. Caching will be a great addon for this use case. And yes, I agree that five minutes is a bit short.

10
@engineerprompt 2024-08-15

Check out the RAG Beyond Basics Course: https://prompt-s-site.thinkific.com/courses/rag

8 1 replies
@ibrahimaba8966 2024-08-17

The cache duration is 5 minutes, but it resets each time a new request is made. So as long as you keep sending requests, the cache is continuously refreshed.

5
@deepakachu 2024-08-22

you gave absolutely 0 explanation of how the caching works.

5
@micbab-vg2mu 2024-08-15

Do you pland to show us how to use this huge cached context window iwith RAGs :) the old RAGs systems wher niot good enough for my industry (minimum 95%) - maybe the new approched will be better :)

3
@sun-ship 2024-08-15

Thank you for keeping up with this always changing world.

3
@antonijo01 2024-08-15

Can you show how to do the same with complex large codebase?

1
@justinnkim 2024-08-23

This is a great video that is giving me great ideas. Thank you

0

Unlock the Data Inside
Turn Videos into Knowledge

  • Get FREE 10/day: transcripts, summaries, chats
  • Chat with videos, export text & PDF
  • $1 free API credit for RAG, chatbots & research

Free forever plan • All features unlocked

App screenshot