Upload README.md with huggingface_hub
Browse files
README.md
CHANGED
|
@@ -33,18 +33,16 @@ This model was trained on a synthetic dataset generated using **Gemini 2.5-lates
|
|
| 33 |
**Fun Fact:** This entire dataset took only **45 minutes** to generate. How? Thanks to **Tier 3 Gemini API access**—a status I achieved involuntarily after all the times Gemini broke while vibe coding, looped infinitely, and racked up $$$ charges. My wallet's pain is your prompt engineering gain. 💸
|
| 34 |
|
| 35 |
#### Why Synthetic Data?
|
| 36 |
-
Z-Image Turbo is "needy." It requires very specific, dense descriptions to look good. Most human-written prompts are too short or use "tag salad" (comma-separated lists), which the Qwen-3 encoder hates. We used Gemini to expand simple concepts into 120-180 word rich paragraphs, teaching the model to hallucinate the missing details (lighting, texture, camera specs) that Z-Image Turbo needs to trigger its
|
| 37 |
|
| 38 |
#### The "Seed Strategy" (Engineering Diversity)
|
| 39 |
-
To ensure the model didn't just learn to output generic "portrait of a woman" prompts, we built a procedural generation engine for the seed prompts.
|
|
|
|
| 40 |
- **8 Major Style Pillars:** We explicitly balanced the dataset across Photorealism, Anime, Fantasy, Sci-Fi, Horror, Artistic, Documentary, and Fine Art.
|
| 41 |
-
- **
|
| 42 |
-
|
| 43 |
-
|
| 44 |
-
|
| 45 |
-
- **Color Grades:** (e.g., "Cinestill 800T", "Kodak Portra")
|
| 46 |
-
- **Spatial Cues:** (e.g., "foreground hero with blurred crowd")
|
| 47 |
-
This combinatorial approach ensured that the 20,000+ samples covered a massive surface area of aesthetic possibilities, preventing the model from collapsing into a single "style."
|
| 48 |
|
| 49 |
#### The Training Data Prompt
|
| 50 |
Here is the exact system prompt we used to generate the training data. You can see how we forced Gemini to focus on "Positive Constraints" and "Texture Density":
|
|
|
|
| 33 |
**Fun Fact:** This entire dataset took only **45 minutes** to generate. How? Thanks to **Tier 3 Gemini API access**—a status I achieved involuntarily after all the times Gemini broke while vibe coding, looped infinitely, and racked up $$$ charges. My wallet's pain is your prompt engineering gain. 💸
|
| 34 |
|
| 35 |
#### Why Synthetic Data?
|
| 36 |
+
Z-Image Turbo is "needy." It requires very specific, dense descriptions to look good. Most human-written prompts are too short or use "tag salad" (comma-separated lists), which the Qwen-3 encoder hates. We used Gemini to expand simple concepts into 120-180 word rich paragraphs, teaching the model to hallucinate the missing details (lighting, texture, camera specs) that Z-Image Turbo needs to trigger its magic.
|
| 37 |
|
| 38 |
#### The "Seed Strategy" (Engineering Diversity)
|
| 39 |
+
To ensure the model didn't just learn to output generic "portrait of a woman" prompts, we built a procedural generation engine for the seed prompts that functions as a combinatorial explosion.
|
| 40 |
+
|
| 41 |
- **8 Major Style Pillars:** We explicitly balanced the dataset across Photorealism, Anime, Fantasy, Sci-Fi, Horror, Artistic, Documentary, and Fine Art.
|
| 42 |
+
- **Infinite Variety:** We didn't just feed Gemini "A cat." We constructed seeds by randomly mixing ~170 base concepts with 26 styles, 10 shot types, 10 lighting setups, 11 moods, 8 texture notes, and 10 camera kits.
|
| 43 |
+
- **The Math:** This procedural engine is capable of generating over **217 Billion unique seed prompts**. From this vast latent space, we carefully sampled the 20,000 most coherent and high-impact intersections to train the model.
|
| 44 |
+
|
| 45 |
+
This ensures that the model understands that "Cinestill 800T" isn't just a random word, but a specific color grading instruction that can apply to *any* concept, from a cybernetic surgeon to a medieval marketplace.
|
|
|
|
|
|
|
|
|
|
| 46 |
|
| 47 |
#### The Training Data Prompt
|
| 48 |
Here is the exact system prompt we used to generate the training data. You can see how we forced Gemini to focus on "Positive Constraints" and "Texture Density":
|