News
Inside FlashOptim, the new trick that cuts LLM training memory by 50 percent
3+ hour, 39+ min ago (780+ words) Inside FlashOptim, the new trick that cuts LLM training memory by 50 percent'Substack Inside FlashOptim, the new trick that cuts LLM training memory by 50 percent Training large language models usually requires a cluster of GPUs. FlashOptim changes the math, enabling full-parameter…...
How sparse attention is solving AI's memory bottleneck
1+ week, 3+ hour ago (291+ words) LLMs are getting pulled into longer and messier workflows, handling large inputs and generating longer and longer token sequences. Coding assistants need to keep track of repositories, issue threads, terminal outputs, and earlier edits. Research agents need to carry facts…...
RePo provides an innovative solution to long-context tasks in LLMs
1+ mon, 2+ hour ago (200+ words) A new technique developed by researchers at Sakana AI, called Context Re-Positioning (RePo), allows Large language models (LLMs) to dynamically re-organize their internal view of their input data to better handle long-context tasks. LLMs process information in a strictly linear…...
How MIT’s new framework solve LLM's memory barrier and 'context rot' problem
1+ mon, 1+ week ago (435+ words) Recursive Language Models (RLMs), a new framework developed by researchers at MIT CSAIL, provide a solution to the limited context window of large language models (LLMs). This approach enables models to process arbitrarily long prompts without incurring massive memory costs…...
Inside Nvidia's new technique to optimize long-context inference and continual learning
1+ mon, 3+ week ago (512+ words) Inside Nvidia's new technique to optimize long-context inference and continual learning'Substack Inside Nvidia's new technique to optimize long-context inference and continual learning By treating language modeling as a continual learning problem, the TTT-E2E architecture achieves the accuracy of full-attention Transformers…...
Meta’s new VL-JEPA model shifts from generating tokens to predicting concepts
1+ mon, 4+ week ago (413+ words) Researchers at Meta have introduced VL-JEPA, a vision-language model built on a Joint Embedding Predictive Architecture (JEPA). Unlike traditional models that focus on generating text word-by-word, VL-JEPA focuses on predicting abstract representations of the world. Second, real-time tasks like live…...
How reinforcement learning changed LLM tool-use
2+ mon, 4+ day ago (229+ words) Tool-use has been an important part of the development of large language models (LLMs) since the release of ChatGPT in 2022. Tools enable LLMs to interact with their environment and access information that goes beyond their internalized knowledge. Here is my…...