codemichaeld
/

florence-2-base-PromptGen-fp8

+---
+library_name: diffusers
+tags:
+- fp8
+- safetensors
+- precision-recovery
+- mixed-method
+- converted-by-gradio
+---
+# FP8 Model with Per-Tensor Precision Recovery
+- **Source**: `https://huggingface.co/MiaoshouAI/Florence-2-base-PromptGen-v1.5`
+- **Original File**: `model.safetensors`
+- **FP8 Format**: `E5M2`
+- **FP8 File**: `model-fp8-e5m2.safetensors`
+- **Recovery File**: `model-recovery.safetensors`
+## Recovery Rules Used
+```json
+[
+  {
+    "key_pattern": "vae",
+    "dim": 4,
+    "method": "diff"
+  },
+  {
+    "key_pattern": "encoder",
+    "dim": 4,
+    "method": "diff"
+  },
+  {
+    "key_pattern": "decoder",
+    "dim": 4,
+    "method": "diff"
+  },
+  {
+    "key_pattern": "text",
+    "dim": 2,
+    "min_size": 10000,
+    "method": "lora",
+    "rank": 64
+  },
+  {
+    "key_pattern": "emb",
+    "dim": 2,
+    "min_size": 10000,
+    "method": "lora",
+    "rank": 64
+  },
+  {
+    "key_pattern": "attn",
+    "dim": 2,
+    "min_size": 10000,
+    "method": "lora",
+    "rank": 128
+  },
+  {
+    "key_pattern": "conv",
+    "dim": 4,
+    "method": "diff"
+  },
+  {
+    "key_pattern": "resnet",
+    "dim": 4,
+    "method": "diff"
+  },
+  {
+    "key_pattern": "all",
+    "method": "none"
+  }
+]
+```
+## Usage (Inference)
+```python
+from safetensors.torch import load_file
+import torch
+# Load FP8 model
+fp8_state = load_file("model-fp8-e5m2.safetensors")
+# Load recovery weights if available
+recovery_state = load_file("model-recovery.safetensors") if "model-recovery.safetensors" and os.path.exists("model-recovery.safetensors") else {}
+# Reconstruct high-precision weights
+reconstructed = {}
+for key in fp8_state:
+    fp8_weight = fp8_state[key].to(torch.float32)  # Convert to float32 for computation
+    # Apply LoRA recovery if available
+    lora_a_key = f"lora_A.{key}"
+    lora_b_key = f"lora_B.{key}"
+    if lora_a_key in recovery_state and lora_b_key in recovery_state:
+        A = recovery_state[lora_a_key].to(torch.float32)
+        B = recovery_state[lora_b_key].to(torch.float32)
+        # Reconstruct the low-rank approximation
+        lora_weight = B @ A
+        fp8_weight = fp8_weight + lora_weight
+    # Apply difference recovery if available
+    diff_key = f"diff.{key}"
+    if diff_key in recovery_state:
+        diff = recovery_state[diff_key].to(torch.float32)
+        fp8_weight = fp8_weight + diff
+    reconstructed[key] = fp8_weight
+# Use reconstructed weights in your model
+model.load_state_dict(reconstructed)
+```
+> **Note**: For best results, use the same recovery configuration during inference as was used during extraction.
+> Requires PyTorch ≥ 2.1 for FP8 support.
+## Statistics
+- **Total layers**: 667
+- **Layers with recovery**: 177
+  - LoRA recovery: 125
+  - Difference recovery: 52