yuemingPan commited on
Commit
83db203
Β·
verified Β·
1 Parent(s): aa0fe20

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +70 -3
README.md CHANGED
@@ -2,8 +2,75 @@
2
  license: mit
3
  ---
4
 
5
- Project Page: https://yuemingpan.github.io/SFD.github.io/
6
 
7
- Arxiv: https://arxiv.org/pdf/2512.04926
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
8
 
9
- Github: https://github.com/yuemingPAN/SFD
 
2
  license: mit
3
  ---
4
 
5
+ # Semantics Lead the Way: Harmonizing Semantic and Texture Modeling with Asynchronous Latent Diffusion
6
 
7
+ ## 🚩 Overview
8
+ <p align="center">
9
+ <img src="https://raw.githubusercontent.com/yuemingPAN/SFD/main/images/teaser_v5.png" width="95%">
10
+ </p>
11
+
12
+ <div align="center" style="max-width:900px; text-align:justify; font-size:14px; line-height:1.5;">
13
+ <p>
14
+ <strong>(a) Overview of Semantic-First Diffusion (SFD).</strong>
15
+ Semantics (dashed curve) and textures (solid curve) follow asynchronous denoising trajectories.
16
+ SFD operates in three phases:
17
+ <span style="color:#d62728;">Stage I – Semantic initialization</span>, where semantic latents denoise first;
18
+ <span style="color:#4472c4;">Stage II – Asynchronous generation</span>, where semantics and textures denoise jointly but asynchronously, with semantics ahead of textures;
19
+ <span style="color:#2ca02c;">Stage III – Texture completion</span>, where only textures continue refining.
20
+ After denoising, the generated semantic latent <b>s₁</b> is discarded, and the final image is decoded solely from the texture latent <b>z₁</b>.
21
+ <strong>(b) Training convergence on ImageNet 256Γ—256 without guidance.</strong>
22
+ SFD achieves substantially faster convergence than DiT-XL/2 and LightningDiT-XL/1 by approximately <b>100Γ—</b> and <b>33.3Γ—</b>, respectively.
23
+ </p>
24
+ </div>
25
+
26
+ ---
27
+
28
+ ## ✨ Highlights
29
+ - We propose **Semantic-First Diffusion (SFD)**, a novel latent diffusion paradigm that performs asynchronous denoising on semantic and texture latents, allowing semantics to denoise earlier and subsequently guide texture generation.
30
+ - **SFD achieves state-of-the-art FID score of 1.04** on ImageNet 256Γ—256 generation.
31
+ - Exhibits **100Γ—** and **33.3Γ— faster** training convergence compared to **DiT** and **LightningDiT**, respectively.
32
+
33
+ ---
34
+
35
+ ## πŸ§ͺ Quantitative Results
36
+ Explicitly **leading semantics ahead of textures with a moderate offset (Ξ”t = 0.3)** achieves an optimal balance between early semantic stabilization and texture collaboration, effectively harmonizing their joint modeling.
37
+ <p align="center">
38
+ <img src="https://raw.githubusercontent.com/yuemingPAN/SFD/main/images/fid_vs_delta_t.png" width="50%">
39
+ </p>
40
+
41
+
42
+ - On ImageNet 256Γ—256, **SFD** achieves **FID 1.06** (LightningDiT-XL) and **FID 1.04** (1.0B LightningDiT-XXL).
43
+ - **100Γ—** and **33.3Γ—** faster training convergence compared to DiT and LightningDiT, respectively.
44
+
45
+ <p align="center">
46
+ <img src="https://raw.githubusercontent.com/yuemingPAN/SFD/main/images/tabel.png" width="90%">
47
+ </p>
48
+
49
+
50
+ ---
51
+
52
+ ## 🎨 Visual Results
53
+
54
+ <p align="center">
55
+ <img src="https://raw.githubusercontent.com/yuemingPAN/SFD/main/images/demo_Sample.png" width="90%">
56
+ </p>
57
+
58
+ ---
59
+
60
+ ## πŸ”— Links
61
+ - 🌐 **Project Page:** [https://yuemingpan.github.io/SFD.github.io/](https://yuemingpan.github.io/SFD.github.io/)
62
+ - πŸ“„ **Paper (arXiv):** [https://arxiv.org/pdf/2512.04926](https://arxiv.org/pdf/2512.04926)
63
+ - πŸ’Ύ **Code:** [https://github.com/yuemingPAN/SFD](https://github.com/yuemingPAN/SFD)
64
+ - 🧰 **License:** MIT
65
+
66
+ ---
67
+
68
+ ## 🧩 Citation
69
+ ```bibtex
70
+ @article{Pan2025SFD,
71
+ title={Semantics Lead the Way: Harmonizing Semantic and Texture Modeling with Asynchronous Latent Diffusion},
72
+ author={Pan, Yueming and Feng, Ruoyu and Dai, Qi and Wang, Yuqi and Lin, Wenfeng and Guo, Mingyu and Luo, Chong and Zheng, Nanning},
73
+ journal={arXiv preprint arXiv:2512.04926},
74
+ year={2025}
75
+ }
76