RayyanAhmed9477 commited on
Commit
f3576b2
·
verified ·
1 Parent(s): 2f34ab2
Files changed (1) hide show
  1. README.md +125 -116
README.md CHANGED
@@ -1,116 +1,125 @@
1
- # Z-Image-Turbo Hosted
2
-
3
- ## Overview
4
- This repository hosts a fine-tuned version of the Z-Image-Turbo model, specifically the training adapter from [ostris/zimage_turbo_training_adapter](https://huggingface.co/ostris/zimage_turbo_training_adapter). The original Z-Image-Turbo is developed by Tongyi-MAI and available at [Tongyi-MAI/Z-Image-Turbo](https://huggingface.co/Tongyi-MAI/Z-Image-Turbo).
5
-
6
- ## Why This Model?
7
- Z-Image-Turbo is a state-of-the-art text-to-image diffusion model based on a Single-Stream Diffusion Transformer (S3-DiT) architecture. It offers several advantages:
8
-
9
- - **Efficiency**: Distilled for high performance with only 8 Number of Function Evaluations (NFEs), enabling sub-second inference on high-end GPUs.
10
- - **Quality**: Excels in photorealistic image generation, bilingual text rendering (English and Chinese), and prompt adherence.
11
- - **Scalability**: Supports resolutions up to 1024x1024 pixels.
12
- - **Compatibility**: Works with guidance_scale=0.0 for Turbo variants, reducing computational overhead.
13
-
14
- We chose this model for our project due to its balance of speed and quality, making it ideal for real-time applications and local inference on consumer hardware like the RTX 3090.
15
-
16
- The training adapter enhances the base model by providing fine-tuned weights for specific use cases, improving adaptability without retraining from scratch.
17
-
18
- ## Technical Details
19
-
20
- ### Model Architecture
21
- - **Base Model**: Z-Image-Turbo (6B parameters)
22
- - **Architecture**: Single-Stream Diffusion Transformer (S3-DiT)
23
- - **Training Data**: Not specified in public docs, but likely large-scale image-text pairs for photorealism.
24
- - **Quantization**: The hosted version supports quantization for reduced memory usage (e.g., 8-bit or 4-bit using bitsandbytes).
25
-
26
- ### Hosting Process
27
- 1. **Selection**: Identified Z-Image-Turbo as the best fit for our needs based on benchmarks showing superior speed vs. quality trade-off compared to models like FLUX or SDXL.
28
- 2. **Source**: Used the training adapter from ostris for pre-fine-tuned weights.
29
- 3. **Authentication**: Logged into Hugging Face using a personal access token.
30
- 4. **Repository Creation**: Created a new model repository on Hugging Face.
31
- 5. **Download**: Downloaded all model files (safetensors, config, etc.) from the source repo.
32
- 6. **Upload**: Uploaded the files to the new repo using the Hugging Face Hub API.
33
- 7. **Documentation**: Added this README with citations to original authors.
34
-
35
- ### Quantization Techniques
36
- To enable local inference on hardware with limited VRAM, we support various quantization methods:
37
-
38
- - **BitsandBytes (Recommended)**:
39
- - 8-bit: Reduces memory by ~50%, minimal quality loss.
40
- - 4-bit: Further reduction to ~25% memory, with NF4 or FP4 configurations.
41
- - Code:
42
- ```python
43
- from transformers import BitsAndBytesConfig
44
- quantization_config = BitsAndBytesConfig(load_in_8bit=True) # or load_in_4bit=True
45
- pipe = ZImagePipeline.from_pretrained("RayyanAhmed9477/Z-Image-Turbo-Hosted", quantization_config=quantization_config)
46
- ```
47
-
48
- - **GGUF Quantization**:
49
- - For extreme low-VRAM (4GB+), use stable-diffusion.cpp with GGUF versions.
50
- - Download from community repos like jayn7/Z-Image-Turbo-GGUF.
51
-
52
- - **FP8 Quantization**:
53
- - 8-bit float for balanced performance.
54
- - Available in repos like T5B/Z-Image-Turbo-FP8.
55
-
56
- ### Benchmarks and Comparisons
57
- - **vs. FLUX**: Z-Image-Turbo offers faster inference (8 NFEs vs. FLUX's 28-50) with comparable quality for photorealism.
58
- - **vs. SDXL**: Better prompt adherence and bilingual support; distilled for efficiency.
59
- - **Performance on RTX 3090**:
60
- - Full precision: 5-10s per image, 12GB VRAM.
61
- - 8-bit quantized: 6-8s, 6GB VRAM.
62
- - Quality drop: <5% perceptible.
63
-
64
- ### Installation Guide
65
- 1. Install dependencies:
66
- ```bash
67
- pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu118
68
- pip install git+https://github.com/huggingface/diffusers
69
- pip install transformers accelerate bitsandbytes
70
- ```
71
-
72
- 2. Load and run:
73
- ```python
74
- from diffusers import ZImagePipeline
75
- import torch
76
-
77
- pipe = ZImagePipeline.from_pretrained("RayyanAhmed9477/Z-Image-Turbo-Hosted", torch_dtype=torch.bfloat16)
78
- pipe.to("cuda")
79
- image = pipe(prompt="A futuristic cityscape", height=1024, width=1024, num_inference_steps=9, guidance_scale=0.0).images[0]
80
- image.save("output.png")
81
- ```
82
-
83
- 3. For UI: Use Gradio for web interface.
84
-
85
- ### System Requirements
86
- - GPU: NVIDIA with at least 16GB VRAM (e.g., RTX 3090)
87
- - RAM: 64GB recommended
88
- - Software: Python 3.8+, PyTorch 2.0+, diffusers library
89
- - OS: Windows/Linux with CUDA 11.8+
90
-
91
- ### Performance
92
- - Inference Time: ~5-10 seconds per 1024x1024 image on RTX 3090
93
- - Memory Usage: ~12GB (bfloat16), reducible with quantization
94
- - Throughput: ~0.1-0.2 images/second
95
-
96
- ### Troubleshooting
97
- - **Out of Memory**: Use quantization or CPU offloading (`pipe.enable_model_cpu_offload()`).
98
- - **Slow Inference**: Enable Flash Attention (`pipe.transformer.set_attention_backend("flash")`), compile model (`pipe.transformer.compile()`).
99
- - **Quality Issues**: Increase num_inference_steps or use higher precision.
100
-
101
- ## Citations
102
- - Original Model: Tongyi-MAI. "Z-Image-Turbo." Hugging Face, https://huggingface.co/Tongyi-MAI/Z-Image-Turbo.
103
- - Training Adapter: ostris. "zimage_turbo_training_adapter." Hugging Face, https://huggingface.co/ostris/zimage_turbo_training_adapter.
104
-
105
- Hosted by RayyanAhmed9477, with all credits to original creators.
106
-
107
- ## License
108
- Refer to the original repositories for licensing information.
109
-
110
- ---
111
- tags:
112
- - text-to-image
113
- - diffusion
114
- - z-image-turbo
115
- - photorealism
116
- - quantized
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: apache-2.0
3
+ datasets:
4
+ - PeterBrendan/AdImageNet
5
+ base_model:
6
+ - Tongyi-MAI/Z-Image-Turbo
7
+ tags:
8
+ - text-to-image
9
+ ---
10
+ # Z-Image-Turbo Hosted
11
+
12
+ ## Overview
13
+ This repository hosts a fine-tuned version of the Z-Image-Turbo model, specifically the training adapter from [ostris/zimage_turbo_training_adapter](https://huggingface.co/ostris/zimage_turbo_training_adapter). The original Z-Image-Turbo is developed by Tongyi-MAI and available at [Tongyi-MAI/Z-Image-Turbo](https://huggingface.co/Tongyi-MAI/Z-Image-Turbo).
14
+
15
+ ## Why This Model?
16
+ Z-Image-Turbo is a state-of-the-art text-to-image diffusion model based on a Single-Stream Diffusion Transformer (S3-DiT) architecture. It offers several advantages:
17
+
18
+ - **Efficiency**: Distilled for high performance with only 8 Number of Function Evaluations (NFEs), enabling sub-second inference on high-end GPUs.
19
+ - **Quality**: Excels in photorealistic image generation, bilingual text rendering (English and Chinese), and prompt adherence.
20
+ - **Scalability**: Supports resolutions up to 1024x1024 pixels.
21
+ - **Compatibility**: Works with guidance_scale=0.0 for Turbo variants, reducing computational overhead.
22
+
23
+ We chose this model for our project due to its balance of speed and quality, making it ideal for real-time applications and local inference on consumer hardware like the RTX 3090.
24
+
25
+ The training adapter enhances the base model by providing fine-tuned weights for specific use cases, improving adaptability without retraining from scratch.
26
+
27
+ ## Technical Details
28
+
29
+ ### Model Architecture
30
+ - **Base Model**: Z-Image-Turbo (6B parameters)
31
+ - **Architecture**: Single-Stream Diffusion Transformer (S3-DiT)
32
+ - **Training Data**: Not specified in public docs, but likely large-scale image-text pairs for photorealism.
33
+ - **Quantization**: The hosted version supports quantization for reduced memory usage (e.g., 8-bit or 4-bit using bitsandbytes).
34
+
35
+ ### Hosting Process
36
+ 1. **Selection**: Identified Z-Image-Turbo as the best fit for our needs based on benchmarks showing superior speed vs. quality trade-off compared to models like FLUX or SDXL.
37
+ 2. **Source**: Used the training adapter from ostris for pre-fine-tuned weights.
38
+ 3. **Authentication**: Logged into Hugging Face using a personal access token.
39
+ 4. **Repository Creation**: Created a new model repository on Hugging Face.
40
+ 5. **Download**: Downloaded all model files (safetensors, config, etc.) from the source repo.
41
+ 6. **Upload**: Uploaded the files to the new repo using the Hugging Face Hub API.
42
+ 7. **Documentation**: Added this README with citations to original authors.
43
+
44
+ ### Quantization Techniques
45
+ To enable local inference on hardware with limited VRAM, we support various quantization methods:
46
+
47
+ - **BitsandBytes (Recommended)**:
48
+ - 8-bit: Reduces memory by ~50%, minimal quality loss.
49
+ - 4-bit: Further reduction to ~25% memory, with NF4 or FP4 configurations.
50
+ - Code:
51
+ ```python
52
+ from transformers import BitsAndBytesConfig
53
+ quantization_config = BitsAndBytesConfig(load_in_8bit=True) # or load_in_4bit=True
54
+ pipe = ZImagePipeline.from_pretrained("RayyanAhmed9477/Z-Image-Turbo-Hosted", quantization_config=quantization_config)
55
+ ```
56
+
57
+ - **GGUF Quantization**:
58
+ - For extreme low-VRAM (4GB+), use stable-diffusion.cpp with GGUF versions.
59
+ - Download from community repos like jayn7/Z-Image-Turbo-GGUF.
60
+
61
+ - **FP8 Quantization**:
62
+ - 8-bit float for balanced performance.
63
+ - Available in repos like T5B/Z-Image-Turbo-FP8.
64
+
65
+ ### Benchmarks and Comparisons
66
+ - **vs. FLUX**: Z-Image-Turbo offers faster inference (8 NFEs vs. FLUX's 28-50) with comparable quality for photorealism.
67
+ - **vs. SDXL**: Better prompt adherence and bilingual support; distilled for efficiency.
68
+ - **Performance on RTX 3090**:
69
+ - Full precision: 5-10s per image, 12GB VRAM.
70
+ - 8-bit quantized: 6-8s, 6GB VRAM.
71
+ - Quality drop: <5% perceptible.
72
+
73
+ ### Installation Guide
74
+ 1. Install dependencies:
75
+ ```bash
76
+ pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu118
77
+ pip install git+https://github.com/huggingface/diffusers
78
+ pip install transformers accelerate bitsandbytes
79
+ ```
80
+
81
+ 2. Load and run:
82
+ ```python
83
+ from diffusers import ZImagePipeline
84
+ import torch
85
+
86
+ pipe = ZImagePipeline.from_pretrained("RayyanAhmed9477/Z-Image-Turbo-Hosted", torch_dtype=torch.bfloat16)
87
+ pipe.to("cuda")
88
+ image = pipe(prompt="A futuristic cityscape", height=1024, width=1024, num_inference_steps=9, guidance_scale=0.0).images[0]
89
+ image.save("output.png")
90
+ ```
91
+
92
+ 3. For UI: Use Gradio for web interface.
93
+
94
+ ### System Requirements
95
+ - GPU: NVIDIA with at least 16GB VRAM (e.g., RTX 3090)
96
+ - RAM: 64GB recommended
97
+ - Software: Python 3.8+, PyTorch 2.0+, diffusers library
98
+ - OS: Windows/Linux with CUDA 11.8+
99
+
100
+ ### Performance
101
+ - Inference Time: ~5-10 seconds per 1024x1024 image on RTX 3090
102
+ - Memory Usage: ~12GB (bfloat16), reducible with quantization
103
+ - Throughput: ~0.1-0.2 images/second
104
+
105
+ ### Troubleshooting
106
+ - **Out of Memory**: Use quantization or CPU offloading (`pipe.enable_model_cpu_offload()`).
107
+ - **Slow Inference**: Enable Flash Attention (`pipe.transformer.set_attention_backend("flash")`), compile model (`pipe.transformer.compile()`).
108
+ - **Quality Issues**: Increase num_inference_steps or use higher precision.
109
+
110
+ ## Citations
111
+ - Original Model: Tongyi-MAI. "Z-Image-Turbo." Hugging Face, https://huggingface.co/Tongyi-MAI/Z-Image-Turbo.
112
+ - Training Adapter: ostris. "zimage_turbo_training_adapter." Hugging Face, https://huggingface.co/ostris/zimage_turbo_training_adapter.
113
+
114
+ Hosted by RayyanAhmed9477, with all credits to original creators.
115
+
116
+ ## License
117
+ Refer to the original repositories for licensing information.
118
+
119
+ ---
120
+ tags:
121
+ - text-to-image
122
+ - diffusion
123
+ - z-image-turbo
124
+ - photorealism
125
+ - quantized