What Is Yann LeCun Cooking? JEPA Explained Simply
Unlock all features
FREE: Get instant access to 10 AI summaries, chats, or transcripts per day.
Unlock all features
FREE: Get instant access to 10 AI summaries, chats, or transcripts per day.
Unlock all features
FREE: Get instant access to 10 AI summaries, chats, or transcripts per day.
Unlock all features
FREE: Get instant access to 10 AI summaries, chats, or transcripts per day.
Unlock all features
FREE: Get instant access to 10 AI summaries, chats, or transcripts per day.
Related videos
DeepSeek's Insane Architecture Breakthrough [Engram Explained]
bycloud
71.3k views
Epstein Iran Contra History FULLY EXPLAINED
Breaking Points
95.4k views
10x Faster Than Standard LLM!? DiffusionLM Explained
bycloud
63.7k views
How Liquidity Affects Crypto Prices – Explained Simply
Coin Bureau
53.5k views
Why would anyone let LLMs predict 4 tokens at once? Multi-Token Prediction Explained
bycloud
55.8k views
Anthropic Just Proved Reasoning AIs Would Silently Cheat
bycloud
35.1k views
AI Agents, Clearly Explained
Jeff Su
3.1m views
What Are AI Agents? Explained Simply
Coin Bureau
148.4k views
What Happens When the Last Bitcoin is Mined? Explained
Coin Bureau
389.9k views
Quantum Computing Explained: What It Means for Bitcoin Security
Coin Bureau
136.4k views
Top Comments (10)
This direction of research is much more important than making a chatbot smarter by throwing data and compute on it just to get funding. This direction can produce models than can be actually applied in a wide range of problems across all kinds of domains and work natively on the required modality, etc. instead of shoehorning everything in textual format. Very interesting stuff overall, thanks Bycloud for the excellent video. You managed to describe these complicated methodologies in an approachable way.
As a robotics researcher, interested in physical AI JEPA models are really promising, pixel reconstruction is doomed
The potential for JEPA on text is perhaps more than you think. If your goal is to predict text, then you are right, but what is text exactly? Text is a representation off language, and language is a tool set to express ideas from a latent space, mainly our consciousness. If we view our consciousness as a latent thinking space, and language is a the tools we use to point to specific locations, addresses if you will, then we could make something that maps language to the latent space, and one that maps from latent space back to text space, then you can give JEPA a tool to manipulate and find gaps in its latent space. This seems like a closer representation too our own intelligence, and may produce better thinking and reasoning medels.
I-JEPA can be useful as part of an image search engine. You click on an object or feature within an image and it'll find similar images with that thing.
Hybrid models are the future a lot these systems have benefits that can help each other and there are ways to interface them together. Also a lot of this is older, you should look at Jepa 2 & 2.1 and Hierarchical Planning with Latent World Models (mostly in the context of spatial reasoning for robotics). Another system that has been integrated with an LLM is Kona from Logical Intelligence a Energy Based Model(EBM) "Certainty, Not Probability" is their tagline particularly useful for maths related tasks amongst other things, Yann has a role here but mostly as a founding member and technical advisor IIRC.
Warp is the agentic development environment born out of the terminal. Download Warp for free today at → https://go.warp.dev/bycloudythoa
"JEPA" means "ASS" in Russian. Watching the video and reading the comments is so funny ahaha xD
The image domain has lots of nice properties that make it suitable for this sort of thing. I've found it pretty hard to apply JEPA successfully in other domains where there aren't many natural "identity preserving" transforms.
Watching this 1 minute after waking up is frying my JEPA
I love when a video maps perfectly onto my existing knowledge. People are complaining about this being complex, but everything about this feels so natural to me it's kind of amazing that this isn't how all ai works. It's way closer to our current models of comp neuroscience. Ever since the deepseek latent embedding 'OCR' I've wondered why we didn't come up with this framework sooner, collectively. It's so much more versatile and natural way of doing things compared to the 1 or few tokens at a time approach, even if it's slightly more complex
Unlock the Data Inside
Turn Videos into Knowledge
- Get FREE 10/day: transcripts, summaries, chats
- Chat with videos, export text & PDF
- $1 free API credit for RAG, chatbots & research
Free forever plan • All features unlocked
Top Comments (10)
This direction of research is much more important than making a chatbot smarter by throwing data and compute on it just to get funding. This direction can produce models than can be actually applied in a wide range of problems across all kinds of domains and work natively on the required modality, etc. instead of shoehorning everything in textual format. Very interesting stuff overall, thanks Bycloud for the excellent video. You managed to describe these complicated methodologies in an approachable way.
As a robotics researcher, interested in physical AI JEPA models are really promising, pixel reconstruction is doomed
The potential for JEPA on text is perhaps more than you think. If your goal is to predict text, then you are right, but what is text exactly? Text is a representation off language, and language is a tool set to express ideas from a latent space, mainly our consciousness. If we view our consciousness as a latent thinking space, and language is a the tools we use to point to specific locations, addresses if you will, then we could make something that maps language to the latent space, and one that maps from latent space back to text space, then you can give JEPA a tool to manipulate and find gaps in its latent space. This seems like a closer representation too our own intelligence, and may produce better thinking and reasoning medels.
I-JEPA can be useful as part of an image search engine. You click on an object or feature within an image and it'll find similar images with that thing.
Hybrid models are the future a lot these systems have benefits that can help each other and there are ways to interface them together. Also a lot of this is older, you should look at Jepa 2 & 2.1 and Hierarchical Planning with Latent World Models (mostly in the context of spatial reasoning for robotics). Another system that has been integrated with an LLM is Kona from Logical Intelligence a Energy Based Model(EBM) "Certainty, Not Probability" is their tagline particularly useful for maths related tasks amongst other things, Yann has a role here but mostly as a founding member and technical advisor IIRC.
Warp is the agentic development environment born out of the terminal. Download Warp for free today at → https://go.warp.dev/bycloudythoa
"JEPA" means "ASS" in Russian. Watching the video and reading the comments is so funny ahaha xD
The image domain has lots of nice properties that make it suitable for this sort of thing. I've found it pretty hard to apply JEPA successfully in other domains where there aren't many natural "identity preserving" transforms.
Watching this 1 minute after waking up is frying my JEPA
I love when a video maps perfectly onto my existing knowledge. People are complaining about this being complex, but everything about this feels so natural to me it's kind of amazing that this isn't how all ai works. It's way closer to our current models of comp neuroscience. Ever since the deepseek latent embedding 'OCR' I've wondered why we didn't come up with this framework sooner, collectively. It's so much more versatile and natural way of doing things compared to the 1 or few tokens at a time approach, even if it's slightly more complex