I would suggest running 4-bit with 8GB RAM on M1, and bf16 with M4 Pro 64GB. Interestingly, 4-bit is surprisingly good on our benchmark. We are considering larger models since they can handle more complex n-hop queries with less hallucination (14B and 30B MoE Qwen). We'll also be releasing the training and data-generation code next week