Tested on MPS with different block size & steps:
Benckmarking Dhara-70M | Device: mps:0 | Target: 50 tokens
Config | Time | Speed(TPS) | Accel | Output Preview
Original AR | 2.087s | 24.0 | 1.00x | The future of artificial intelligence is a big challenge. This world has the pot...
Diff (B1/S50) | 2.392s | 20.9 | 0.87x | The future of artificial intelligence is the most-first thing. This article was ...
Diff (B2/S25) | 1.162s | 43.0 | 1.80x | The future of artificial intelligence is the What. What1: The Future Future? ...
Diff (B5/S10) | 0.382s | 131.0 | 5.47x | The future of artificial intelligence is the This.,!!."."!!!!!!!!!!!!!!!!!!!!...
Diff (B10/S5) | 0.192s | 260.5 | 10.87x | The future of artificial intelligence is the !!!!!!!!!!!!!!!!!!!!!!!!!!!!!...
It seems that when we use AR or we use diffusion similar to AR (B1/S50), we have best quality. But it's slow.
When we make it faster, it will generate some bad results.