Rui-Jie Zhu's picture

Rui-Jie Zhu

ridger

·

AI & ML interests

None yet

Recent Activity

upvoted a paper 1 day ago

How Much Is One Recurrence Worth? Iso-Depth Scaling Laws for Looped Language Models

upvoted a paper 1 day ago

Large Language Models Explore by Latent Distilling

upvoted a collection about 1 month ago

Nemotron-Cascade 2

View all activity

Organizations

upvoted 2 papers 1 day ago

How Much Is One Recurrence Worth? Iso-Depth Scaling Laws for Looped Language Models

Paper • 2604.21106 • Published 7 days ago • 7

Large Language Models Explore by Latent Distilling

Paper • 2604.24927 • Published 7 days ago • 63

upvoted a collection about 1 month ago

Nemotron-Cascade 2

Post-Training LLMs with Cascade RL and Multi-Domain On-Policy Distillation • 4 items • Updated 14 days ago • 50

liked a dataset about 2 months ago

stepfun-ai/Step-3.5-Flash-SFT

Viewer • Updated Mar 14 • 1.62M • 16.6k • 327

upvoted a collection about 2 months ago

Qwen3.5

21 items • Updated Mar 9 • 1.59k

upvoted a paper about 2 months ago

Helios: Real Real-Time Long Video Generation Model

Paper • 2603.04379 • Published Mar 4 • 186

liked a model 2 months ago

kernels-community/causal-conv1d

Updated 40 minutes ago • 1.53k • 3

New activity in ByteDance/Ouro-1.4B-Thinking 2 months ago

Update rope embeddings for rope_type='default'

#3 opened 2 months ago by

New activity in ByteDance/Ouro-2.6B-Thinking 2 months ago

Updated ids for bos_id, eos_id

#4 opened 2 months ago by

Added 'pad_token_id'.

#5 opened 2 months ago by

rope_type='default' excluded from ROPE_INIT_FUNCTIONS in transfomers >=5.0

#6 opened 2 months ago by

Fix bos/eos token IDs + add enable_thinking to chat template

#7 opened 2 months ago by

Fix UniversalTransformerCache.get_mask_sizes for batched generation

#8 opened 2 months ago by

New activity in ByteDance/Ouro-1.4B-Thinking 2 months ago

Fix bos/eos token IDs + add enable_thinking to chat template

#4 opened 2 months ago by

Fix UniversalTransformerCache.get_mask_sizes for batched generation

#5 opened 2 months ago by

authored a paper 3 months ago

LoopViT: Scaling Visual ARC with Looped Transformers

Paper • 2602.02156 • Published Feb 2 • 12

upvoted 4 papers 3 months ago

Learning Query-Specific Rubrics from Human Preferences for DeepResearch Report Generation

Paper • 2602.03619 • Published Feb 3 • 28

LoopViT: Scaling Visual ARC with Looped Transformers

Paper • 2602.02156 • Published Feb 2 • 12

Kimi K2.5: Visual Agentic Intelligence

Paper • 2602.02276 • Published Feb 2 • 268

ConceptMoE: Adaptive Token-to-Concept Compression for Implicit Compute Allocation

Paper • 2601.21420 • Published Jan 29 • 42