Gated Attention for Large Language Models: Non-linearity, Sparsity, and Attention-Sink-Free Paper • 2505.06708 • Published May 10 • 8
Running on Zero Featured 96 SAM3 Video Segmentation 🐠 96 Track and label objects in videos using text prompts or clicks
Running on CPU Upgrade Featured 2.69k The Smol Training Playbook 📚 2.69k The secrets to building world-class LLMs