|
|
--- |
|
|
library_name: diffusers |
|
|
tags: |
|
|
- fp8 |
|
|
- safetensors |
|
|
- precision-recovery |
|
|
- mixed-method |
|
|
- converted-by-gradio |
|
|
--- |
|
|
# FP8 Model with Per-Tensor Precision Recovery |
|
|
- **Source**: `https://huggingface.co/stabilityai/sd-vae-ft-mse` |
|
|
- **Original File**: `diffusion_pytorch_model.safetensors` |
|
|
- **FP8 Format**: `E5M2` |
|
|
- **FP8 File**: `diffusion_pytorch_model-fp8-e5m2.safetensors` |
|
|
- **Recovery File**: `diffusion_pytorch_model-recovery.safetensors` |
|
|
|
|
|
## Recovery Rules Used |
|
|
```json |
|
|
[ |
|
|
{ |
|
|
"key_pattern": "vae", |
|
|
"dim": 4, |
|
|
"method": "diff" |
|
|
}, |
|
|
{ |
|
|
"key_pattern": "encoder", |
|
|
"dim": 4, |
|
|
"method": "diff" |
|
|
}, |
|
|
{ |
|
|
"key_pattern": "decoder", |
|
|
"dim": 4, |
|
|
"method": "diff" |
|
|
}, |
|
|
{ |
|
|
"key_pattern": "text", |
|
|
"dim": 2, |
|
|
"min_size": 10000, |
|
|
"method": "lora", |
|
|
"rank": 64 |
|
|
}, |
|
|
{ |
|
|
"key_pattern": "emb", |
|
|
"dim": 2, |
|
|
"min_size": 10000, |
|
|
"method": "lora", |
|
|
"rank": 64 |
|
|
}, |
|
|
{ |
|
|
"key_pattern": "attn", |
|
|
"dim": 2, |
|
|
"min_size": 10000, |
|
|
"method": "lora", |
|
|
"rank": 128 |
|
|
}, |
|
|
{ |
|
|
"key_pattern": "conv", |
|
|
"dim": 4, |
|
|
"method": "diff" |
|
|
}, |
|
|
{ |
|
|
"key_pattern": "resnet", |
|
|
"dim": 4, |
|
|
"method": "diff" |
|
|
}, |
|
|
{ |
|
|
"key_pattern": "all", |
|
|
"method": "none" |
|
|
} |
|
|
] |
|
|
``` |
|
|
|
|
|
## Usage (Inference) |
|
|
```python |
|
|
from safetensors.torch import load_file |
|
|
import torch |
|
|
|
|
|
# Load FP8 model |
|
|
fp8_state = load_file("diffusion_pytorch_model-fp8-e5m2.safetensors") |
|
|
|
|
|
# Load recovery weights if available |
|
|
recovery_state = load_file("diffusion_pytorch_model-recovery.safetensors") if "diffusion_pytorch_model-recovery.safetensors" and os.path.exists("diffusion_pytorch_model-recovery.safetensors") else {} |
|
|
|
|
|
# Reconstruct high-precision weights |
|
|
reconstructed = {} |
|
|
for key in fp8_state: |
|
|
fp8_weight = fp8_state[key].to(torch.float32) # Convert to float32 for computation |
|
|
|
|
|
# Apply LoRA recovery if available |
|
|
lora_a_key = f"lora_A.{key}" |
|
|
lora_b_key = f"lora_B.{key}" |
|
|
if lora_a_key in recovery_state and lora_b_key in recovery_state: |
|
|
A = recovery_state[lora_a_key].to(torch.float32) |
|
|
B = recovery_state[lora_b_key].to(torch.float32) |
|
|
# Reconstruct the low-rank approximation |
|
|
lora_weight = B @ A |
|
|
fp8_weight = fp8_weight + lora_weight |
|
|
|
|
|
# Apply difference recovery if available |
|
|
diff_key = f"diff.{key}" |
|
|
if diff_key in recovery_state: |
|
|
diff = recovery_state[diff_key].to(torch.float32) |
|
|
fp8_weight = fp8_weight + diff |
|
|
|
|
|
reconstructed[key] = fp8_weight |
|
|
|
|
|
# Use reconstructed weights in your model |
|
|
model.load_state_dict(reconstructed) |
|
|
``` |
|
|
|
|
|
> **Note**: For best results, use the same recovery configuration during inference as was used during extraction. |
|
|
> Requires PyTorch ≥ 2.1 for FP8 support. |
|
|
|
|
|
## Statistics |
|
|
- **Total layers**: 248 |
|
|
- **Layers with recovery**: 66 |
|
|
- LoRA recovery: 2 |
|
|
- Difference recovery: 64 |
|
|
|