6 14 1

Tony Congqian Wang

TonyCWang

AI & ML interests

None yet

Recent Activity

upvoted an article 5 days ago

The Optimal Architecture for Small Language Models

upvoted a paper 24 days ago

TiDAR: Think in Diffusion, Talk in Autoregression

upvoted an article about 2 months ago

Why Did MiniMax M2 End Up as a Full Attention Model?

View all activity

Organizations

None yet

upvoted an article 5 days ago

Article

The Optimal Architecture for Small Language Models

7 days ago

•

upvoted a paper 24 days ago

TiDAR: Think in Diffusion, Talk in Autoregression

Paper • 2511.08923 • Published Nov 12, 2025 • 118

upvoted an article about 2 months ago

Article

Why Did MiniMax M2 End Up as a Full Attention Model?

Oct 30, 2025

•

upvoted 6 papers 2 months ago

Scaling Latent Reasoning via Looped Language Models

Paper • 2510.25741 • Published Oct 29, 2025 • 221

The End of Manual Decoding: Towards Truly End-to-End Language Models

Paper • 2510.26697 • Published Oct 30, 2025 • 116

Every Step Evolves: Scaling Reinforcement Learning for Trillion-Scale Thinking Model

Paper • 2510.18855 • Published Oct 21, 2025 • 71

upvoted 2 papers 3 months ago

Meta-Awareness Enhances Reasoning Models: Self-Alignment Reinforcement Learning

Paper • 2510.03259 • Published Sep 26, 2025 • 57

FlowRL: Matching Reward Distributions for LLM Reasoning

Paper • 2509.15207 • Published Sep 18, 2025 • 114

upvoted a paper 6 months ago

SingLoRA: Low Rank Adaptation Using a Single Matrix

Paper • 2507.05566 • Published Jul 8, 2025 • 113

upvoted an article 6 months ago

Article

Searching for better (Full) ImageNet ViT Baselines

Aug 26, 2024

•

upvoted a paper 7 months ago

Reinforcement Pre-Training

Paper • 2506.08007 • Published Jun 9, 2025 • 263

Tony Congqian Wang

AI & ML interests

Recent Activity

Organizations

TonyCWang's activity

The Optimal Architecture for Small Language Models

Why Did MiniMax M2 End Up as a Full Attention Model?

Searching for better (Full) ImageNet ViT Baselines