Upload folder using huggingface_hub
Browse files- README.md +114 -3
- ckpts/shape_dec_next_dc_f16c32_fp16.json +24 -0
- ckpts/shape_dec_next_dc_f16c32_fp16.safetensors +3 -0
- ckpts/shape_enc_next_dc_f16c32_fp16.json +24 -0
- ckpts/shape_enc_next_dc_f16c32_fp16.safetensors +3 -0
- ckpts/slat_flow_img2shape_dit_1_3B_1024_bf16.json +19 -0
- ckpts/slat_flow_img2shape_dit_1_3B_1024_bf16.safetensors +3 -0
- ckpts/slat_flow_img2shape_dit_1_3B_512_bf16.json +19 -0
- ckpts/slat_flow_img2shape_dit_1_3B_512_bf16.safetensors +3 -0
- ckpts/slat_flow_imgshape2tex_dit_1_3B_1024_bf16.json +19 -0
- ckpts/slat_flow_imgshape2tex_dit_1_3B_1024_bf16.safetensors +3 -0
- ckpts/slat_flow_imgshape2tex_dit_1_3B_512_bf16.json +19 -0
- ckpts/slat_flow_imgshape2tex_dit_1_3B_512_bf16.safetensors +3 -0
- ckpts/ss_flow_img_dit_1_3B_64_bf16.json +19 -0
- ckpts/ss_flow_img_dit_1_3B_64_bf16.safetensors +3 -0
- ckpts/tex_dec_next_dc_f16c32_fp16.json +25 -0
- ckpts/tex_dec_next_dc_f16c32_fp16.safetensors +3 -0
- ckpts/tex_enc_next_dc_f16c32_fp16.json +24 -0
- ckpts/tex_enc_next_dc_f16c32_fp16.safetensors +3 -0
- pipeline.json +92 -0
README.md
CHANGED
|
@@ -1,3 +1,114 @@
|
|
| 1 |
-
---
|
| 2 |
-
license: mit
|
| 3 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
---
|
| 2 |
+
license: mit
|
| 3 |
+
pipeline_tag: image-to-3d
|
| 4 |
+
library_name: trellis2
|
| 5 |
+
language:
|
| 6 |
+
- en
|
| 7 |
+
---
|
| 8 |
+
|
| 9 |
+
# TRELLIS.2: Native and Compact Structured Latents for 3D Generation
|
| 10 |
+
|
| 11 |
+
**Model Name:** TRELLIS.2-4B
|
| 12 |
+
**Paper:** [Coming Soon]
|
| 13 |
+
**Repository:** [GitHub Link Placeholder]
|
| 14 |
+
**Project Page:** [Website Link Placeholder]
|
| 15 |
+
|
| 16 |
+
## Introduction
|
| 17 |
+
|
| 18 |
+
**TRELLIS.2** is a state-of-the-art large 3D generative model designed for high-fidelity **image-to-3D** generation. It leverages a novel "field-free" sparse voxel structure termed **O-Voxel** and a large-scale flow-matching transformer (4 Billion parameters).
|
| 19 |
+
|
| 20 |
+
Unlike previous methods that rely on iso-surface fields (e.g., SDF, Flexicubes) which struggle with open surfaces or non-manifold geometry, TRELLIS can reconstruct and generate **arbitrary 3D assets** with complex topologies, sharp features, and full Physical-Based Rendering (PBR) materials—including transparency/translucency.
|
| 21 |
+
|
| 22 |
+
## Key Features
|
| 23 |
+
|
| 24 |
+
* **O-Voxel Representation:** An omni-voxel structure that encodes both geometry and appearance. It supports:
|
| 25 |
+
* **Arbitrary Topology:** Handles open surfaces, non-manifold geometry, and fully-enclosed structures without lossy conversion.
|
| 26 |
+
* **Rich Appearance:** Captures PBR attributes (including opacity for translucent surfaces) aligned with geometry.
|
| 27 |
+
* **Efficiency:** Instant optimization-free bidirectional conversion between meshes and O-Voxels (ms to seconds).
|
| 28 |
+
* **High-Resolution Generation:** The model is trained to generate fully textured assets at **up to 1536³ resolution**.
|
| 29 |
+
* **High-Fidelity while Compact Latent Space:** Utilizes a Sparse 3D VAE with **16× spatial downsampling**, encoding a $1024^3$ asset into only ~9.6K latent tokens with negligible perceptual degradation.
|
| 30 |
+
* **State-of-the-Art Speed:** Inference is significantly faster than existing large 3D models.
|
| 31 |
+
|
| 32 |
+
## Inference Speed (NVIDIA H100 GPU)
|
| 33 |
+
|
| 34 |
+
| Resolution | Time |
|
| 35 |
+
| :--- | :--- |
|
| 36 |
+
| 512³ | ~3 seconds |
|
| 37 |
+
| 1024³ | ~17 seconds |
|
| 38 |
+
| 1536³ | ~60 seconds |
|
| 39 |
+
|
| 40 |
+
## Known Limitations
|
| 41 |
+
|
| 42 |
+
* **Geometric Artifacts (Small Holes):** While O-Voxels handle complex topology well, the generated raw meshes may occasionally contain small holes or minor topological discontinuities. For applications requiring strictly watertight geometry (e.g., 3D printing), standard mesh post-processing steps (such as hole-filling algorithms) may be necessary.
|
| 43 |
+
* **Base Model Status (No Alignment):** TRELLIS.2-4B is a pre-trained foundation model. It has **not** been aligned with human preferences (e.g., via RLHF) or fine-tuned for specific aesthetic standards. Consequently, the outputs reflect the distribution of the training data and may vary in style; users may need to experiment with inputs to achieve the desired artistic result.
|
| 44 |
+
|
| 45 |
+
## Model Details
|
| 46 |
+
|
| 47 |
+
* **Developed by:** Jianfeng Xiang, Xiaoxue Chen, Sicheng Xu, Ruicheng Wang, Zelong Lv, Yu Deng, Hongyuan Zhu, Yue Dong, Hao Zhao, Nicholas Jing Yuan, Jiaolong Yang
|
| 48 |
+
* **Model Type:** Flow-Matching Transformers with Sparse Voxel based 3D VAE
|
| 49 |
+
* **Parameters:** 4 Billion
|
| 50 |
+
* **Input:** Single Image
|
| 51 |
+
* **Output:** 3D Asset (Mesh with PBR Materials)
|
| 52 |
+
* **Resolution:** Varies from 512³ to 1536³ (Voxel Grid Resolution)
|
| 53 |
+
|
| 54 |
+
## Usage
|
| 55 |
+
|
| 56 |
+
*Note: Please refer to the official [GitHub Repository] for installation instructions and dependencies.*
|
| 57 |
+
|
| 58 |
+
```python
|
| 59 |
+
import os
|
| 60 |
+
os.environ['OPENCV_IO_ENABLE_OPENEXR'] = '1'
|
| 61 |
+
import cv2
|
| 62 |
+
import imageio
|
| 63 |
+
from PIL import Image
|
| 64 |
+
import torch
|
| 65 |
+
from trellis2.pipelines import Trellis2ImageTo3DPipeline
|
| 66 |
+
from trellis2.utils import render_utils
|
| 67 |
+
from trellis2.renderers import EnvMap
|
| 68 |
+
import o_voxel
|
| 69 |
+
|
| 70 |
+
envmap = EnvMap(torch.tensor(
|
| 71 |
+
cv2.cvtColor(cv2.imread('assets/hdri/forest.exr', cv2.IMREAD_UNCHANGED), cv2.COLOR_BGR2RGB),
|
| 72 |
+
dtype=torch.float32, device='cuda'
|
| 73 |
+
))
|
| 74 |
+
|
| 75 |
+
# Load a pipeline from a model folder or a Hugging Face model hub.
|
| 76 |
+
pipeline = Trellis2ImageTo3DPipeline.from_pretrained("JeffreyXiang/TRELLIS.2-4B")
|
| 77 |
+
pipeline.cuda()
|
| 78 |
+
|
| 79 |
+
# Load an image
|
| 80 |
+
image = Image.open("assets/example_image/T.png")
|
| 81 |
+
|
| 82 |
+
# Run the pipeline
|
| 83 |
+
mesh = pipeline.run(image)[0]
|
| 84 |
+
|
| 85 |
+
# Render the outputs
|
| 86 |
+
video = render_utils.make_pbr_vis_frames(render_utils.render_video(mesh, envmap=envmap))
|
| 87 |
+
imageio.mimsave("sample.mp4", video, fps=15)
|
| 88 |
+
|
| 89 |
+
# GLB files can be extracted from the outputs
|
| 90 |
+
glb = o_voxel.postprocess.to_glb(
|
| 91 |
+
vertices = mesh.vertices,
|
| 92 |
+
faces = mesh.faces,
|
| 93 |
+
attr_volume = mesh.attrs,
|
| 94 |
+
coords = mesh.coords,
|
| 95 |
+
attr_layout = mesh.layout,
|
| 96 |
+
voxel_size = mesh.voxel_size,
|
| 97 |
+
aabb = [[-0.5, -0.5, -0.5], [0.5, 0.5, 0.5]],
|
| 98 |
+
decimation_target = 100000,
|
| 99 |
+
texture_size = 2048,
|
| 100 |
+
)
|
| 101 |
+
glb.export("sample.glb")
|
| 102 |
+
```
|
| 103 |
+
|
| 104 |
+
## Citation
|
| 105 |
+
|
| 106 |
+
If you find this model useful for your research, please cite our work:
|
| 107 |
+
|
| 108 |
+
```
|
| 109 |
+
TBD
|
| 110 |
+
```
|
| 111 |
+
|
| 112 |
+
## License
|
| 113 |
+
|
| 114 |
+
This model is released under the MIT License. The code and dataset are publicly released to facilitate reproduction and further research.
|
ckpts/shape_dec_next_dc_f16c32_fp16.json
ADDED
|
@@ -0,0 +1,24 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
{
|
| 2 |
+
"name": "FlexiDualGridVaeDecoder",
|
| 3 |
+
"args": {
|
| 4 |
+
"resolution": 256,
|
| 5 |
+
"model_channels": [1024, 512, 256, 128, 64],
|
| 6 |
+
"latent_channels": 32,
|
| 7 |
+
"num_blocks": [4, 16, 8, 4, 0],
|
| 8 |
+
"block_type": [
|
| 9 |
+
"SparseConvNeXtBlock3d",
|
| 10 |
+
"SparseConvNeXtBlock3d",
|
| 11 |
+
"SparseConvNeXtBlock3d",
|
| 12 |
+
"SparseConvNeXtBlock3d",
|
| 13 |
+
"SparseConvNeXtBlock3d"
|
| 14 |
+
],
|
| 15 |
+
"up_block_type": [
|
| 16 |
+
"SparseResBlockC2S3d",
|
| 17 |
+
"SparseResBlockC2S3d",
|
| 18 |
+
"SparseResBlockC2S3d",
|
| 19 |
+
"SparseResBlockC2S3d"
|
| 20 |
+
],
|
| 21 |
+
"block_args": [{}, {}, {}, {}, {}],
|
| 22 |
+
"use_fp16": true
|
| 23 |
+
}
|
| 24 |
+
}
|
ckpts/shape_dec_next_dc_f16c32_fp16.safetensors
ADDED
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:e3b718d3e43e4f8780e9a24ac6fff231811a67e3b058e336e10fe654c911d581
|
| 3 |
+
size 948490494
|
ckpts/shape_enc_next_dc_f16c32_fp16.json
ADDED
|
@@ -0,0 +1,24 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
{
|
| 2 |
+
"name": "FlexiDualGridVaeEncoder",
|
| 3 |
+
"args": {
|
| 4 |
+
"resolution": 256,
|
| 5 |
+
"model_channels": [64, 128, 256, 512, 1024],
|
| 6 |
+
"latent_channels": 32,
|
| 7 |
+
"num_blocks": [0, 4, 8, 16, 4],
|
| 8 |
+
"block_type": [
|
| 9 |
+
"SparseConvNeXtBlock3d",
|
| 10 |
+
"SparseConvNeXtBlock3d",
|
| 11 |
+
"SparseConvNeXtBlock3d",
|
| 12 |
+
"SparseConvNeXtBlock3d",
|
| 13 |
+
"SparseConvNeXtBlock3d"
|
| 14 |
+
],
|
| 15 |
+
"up_block_type": [
|
| 16 |
+
"SparseResBlockS2C3d",
|
| 17 |
+
"SparseResBlockS2C3d",
|
| 18 |
+
"SparseResBlockS2C3d",
|
| 19 |
+
"SparseResBlockS2C3d"
|
| 20 |
+
],
|
| 21 |
+
"block_args": [{}, {}, {}, {}, {}],
|
| 22 |
+
"use_fp16": true
|
| 23 |
+
}
|
| 24 |
+
}
|
ckpts/shape_enc_next_dc_f16c32_fp16.safetensors
ADDED
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:f37c5ff5b983b68e9946060000f09bc131f3e84318a2c8b7430a81e4b4636c41
|
| 3 |
+
size 708797208
|
ckpts/slat_flow_img2shape_dit_1_3B_1024_bf16.json
ADDED
|
@@ -0,0 +1,19 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
{
|
| 2 |
+
"name": "SLatFlowModel",
|
| 3 |
+
"args": {
|
| 4 |
+
"resolution": 64,
|
| 5 |
+
"in_channels": 32,
|
| 6 |
+
"out_channels": 32,
|
| 7 |
+
"model_channels": 1536,
|
| 8 |
+
"cond_channels": 1024,
|
| 9 |
+
"num_blocks": 30,
|
| 10 |
+
"num_heads": 12,
|
| 11 |
+
"mlp_ratio": 5.3334,
|
| 12 |
+
"pe_mode": "rope",
|
| 13 |
+
"share_mod": true,
|
| 14 |
+
"initialization": "scaled",
|
| 15 |
+
"qk_rms_norm": true,
|
| 16 |
+
"qk_rms_norm_cross": true,
|
| 17 |
+
"dtype": "bfloat16"
|
| 18 |
+
}
|
| 19 |
+
}
|
ckpts/slat_flow_img2shape_dit_1_3B_1024_bf16.safetensors
ADDED
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:07cd0596f634c5adc1890023d16023afc5eed02fb84b22bb23aff5bf0030fbbd
|
| 3 |
+
size 2584574424
|
ckpts/slat_flow_img2shape_dit_1_3B_512_bf16.json
ADDED
|
@@ -0,0 +1,19 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
{
|
| 2 |
+
"name": "SLatFlowModel",
|
| 3 |
+
"args": {
|
| 4 |
+
"resolution": 32,
|
| 5 |
+
"in_channels": 32,
|
| 6 |
+
"out_channels": 32,
|
| 7 |
+
"model_channels": 1536,
|
| 8 |
+
"cond_channels": 1024,
|
| 9 |
+
"num_blocks": 30,
|
| 10 |
+
"num_heads": 12,
|
| 11 |
+
"mlp_ratio": 5.3334,
|
| 12 |
+
"pe_mode": "rope",
|
| 13 |
+
"share_mod": true,
|
| 14 |
+
"initialization": "scaled",
|
| 15 |
+
"qk_rms_norm": true,
|
| 16 |
+
"qk_rms_norm_cross": true,
|
| 17 |
+
"dtype": "bfloat16"
|
| 18 |
+
}
|
| 19 |
+
}
|
ckpts/slat_flow_img2shape_dit_1_3B_512_bf16.safetensors
ADDED
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:ec5e0917ef9b7e25ad51dffc7d19687a42019871f94239f2fa7f86264c55b70f
|
| 3 |
+
size 2584574424
|
ckpts/slat_flow_imgshape2tex_dit_1_3B_1024_bf16.json
ADDED
|
@@ -0,0 +1,19 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
{
|
| 2 |
+
"name": "SLatFlowModel",
|
| 3 |
+
"args": {
|
| 4 |
+
"resolution": 64,
|
| 5 |
+
"in_channels": 64,
|
| 6 |
+
"out_channels": 32,
|
| 7 |
+
"model_channels": 1536,
|
| 8 |
+
"cond_channels": 1024,
|
| 9 |
+
"num_blocks": 30,
|
| 10 |
+
"num_heads": 12,
|
| 11 |
+
"mlp_ratio": 5.3334,
|
| 12 |
+
"pe_mode": "rope",
|
| 13 |
+
"share_mod": true,
|
| 14 |
+
"initialization": "scaled",
|
| 15 |
+
"qk_rms_norm": true,
|
| 16 |
+
"qk_rms_norm_cross": true,
|
| 17 |
+
"dtype": "bfloat16"
|
| 18 |
+
}
|
| 19 |
+
}
|
ckpts/slat_flow_imgshape2tex_dit_1_3B_1024_bf16.safetensors
ADDED
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:580401269059a339b8318ab9ced459a13ba63391721c83a6c383198c29e77686
|
| 3 |
+
size 2584672728
|
ckpts/slat_flow_imgshape2tex_dit_1_3B_512_bf16.json
ADDED
|
@@ -0,0 +1,19 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
{
|
| 2 |
+
"name": "SLatFlowModel",
|
| 3 |
+
"args": {
|
| 4 |
+
"resolution": 32,
|
| 5 |
+
"in_channels": 64,
|
| 6 |
+
"out_channels": 32,
|
| 7 |
+
"model_channels": 1536,
|
| 8 |
+
"cond_channels": 1024,
|
| 9 |
+
"num_blocks": 30,
|
| 10 |
+
"num_heads": 12,
|
| 11 |
+
"mlp_ratio": 5.3334,
|
| 12 |
+
"pe_mode": "rope",
|
| 13 |
+
"share_mod": true,
|
| 14 |
+
"initialization": "scaled",
|
| 15 |
+
"qk_rms_norm": true,
|
| 16 |
+
"qk_rms_norm_cross": true,
|
| 17 |
+
"dtype": "bfloat16"
|
| 18 |
+
}
|
| 19 |
+
}
|
ckpts/slat_flow_imgshape2tex_dit_1_3B_512_bf16.safetensors
ADDED
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:8371aa1c5d13be79dcd5ddfd2cf3835e902e204dc34427169a1c702828e1a94d
|
| 3 |
+
size 2584672728
|
ckpts/ss_flow_img_dit_1_3B_64_bf16.json
ADDED
|
@@ -0,0 +1,19 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
{
|
| 2 |
+
"name": "SparseStructureFlowModel",
|
| 3 |
+
"args": {
|
| 4 |
+
"resolution": 16,
|
| 5 |
+
"in_channels": 8,
|
| 6 |
+
"out_channels": 8,
|
| 7 |
+
"model_channels": 1536,
|
| 8 |
+
"cond_channels": 1024,
|
| 9 |
+
"num_blocks": 30,
|
| 10 |
+
"num_heads": 12,
|
| 11 |
+
"mlp_ratio": 5.3334,
|
| 12 |
+
"pe_mode": "rope",
|
| 13 |
+
"share_mod": true,
|
| 14 |
+
"initialization": "scaled",
|
| 15 |
+
"qk_rms_norm": true,
|
| 16 |
+
"qk_rms_norm_cross": true,
|
| 17 |
+
"dtype": "bfloat16"
|
| 18 |
+
}
|
| 19 |
+
}
|
ckpts/ss_flow_img_dit_1_3B_64_bf16.safetensors
ADDED
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:ca01377c485bec418076d38ee80166d32dc776d744f2553b835cba1e97a7abf6
|
| 3 |
+
size 2584426920
|
ckpts/tex_dec_next_dc_f16c32_fp16.json
ADDED
|
@@ -0,0 +1,25 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
{
|
| 2 |
+
"name": "SparseUnetVaeDecoder",
|
| 3 |
+
"args": {
|
| 4 |
+
"out_channels": 6,
|
| 5 |
+
"model_channels": [1024, 512, 256, 128, 64],
|
| 6 |
+
"latent_channels": 32,
|
| 7 |
+
"num_blocks": [4, 16, 8, 4, 0],
|
| 8 |
+
"block_type": [
|
| 9 |
+
"SparseConvNeXtBlock3d",
|
| 10 |
+
"SparseConvNeXtBlock3d",
|
| 11 |
+
"SparseConvNeXtBlock3d",
|
| 12 |
+
"SparseConvNeXtBlock3d",
|
| 13 |
+
"SparseConvNeXtBlock3d"
|
| 14 |
+
],
|
| 15 |
+
"up_block_type": [
|
| 16 |
+
"SparseResBlockC2S3d",
|
| 17 |
+
"SparseResBlockC2S3d",
|
| 18 |
+
"SparseResBlockC2S3d",
|
| 19 |
+
"SparseResBlockC2S3d"
|
| 20 |
+
],
|
| 21 |
+
"block_args": [{}, {}, {}, {}, {}],
|
| 22 |
+
"pred_subdiv": false,
|
| 23 |
+
"use_fp16": true
|
| 24 |
+
}
|
| 25 |
+
}
|
ckpts/tex_dec_next_dc_f16c32_fp16.safetensors
ADDED
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:97ea69addea2ecd9312910f5f548234665eef51c088386180b7cd5b258645e3c
|
| 3 |
+
size 948458812
|
ckpts/tex_enc_next_dc_f16c32_fp16.json
ADDED
|
@@ -0,0 +1,24 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
{
|
| 2 |
+
"name": "SparseUnetVaeEncoder",
|
| 3 |
+
"args": {
|
| 4 |
+
"in_channels": 6,
|
| 5 |
+
"model_channels": [64, 128, 256, 512, 1024],
|
| 6 |
+
"latent_channels": 32,
|
| 7 |
+
"num_blocks": [0, 4, 8, 16, 4],
|
| 8 |
+
"block_type": [
|
| 9 |
+
"SparseConvNeXtBlock3d",
|
| 10 |
+
"SparseConvNeXtBlock3d",
|
| 11 |
+
"SparseConvNeXtBlock3d",
|
| 12 |
+
"SparseConvNeXtBlock3d",
|
| 13 |
+
"SparseConvNeXtBlock3d"
|
| 14 |
+
],
|
| 15 |
+
"up_block_type": [
|
| 16 |
+
"SparseResBlockS2C3d",
|
| 17 |
+
"SparseResBlockS2C3d",
|
| 18 |
+
"SparseResBlockS2C3d",
|
| 19 |
+
"SparseResBlockS2C3d"
|
| 20 |
+
],
|
| 21 |
+
"block_args": [{}, {}, {}, {}, {}],
|
| 22 |
+
"use_fp16": true
|
| 23 |
+
}
|
| 24 |
+
}
|
ckpts/tex_enc_next_dc_f16c32_fp16.safetensors
ADDED
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:dd109f75f84b90fa411554ed6b0e4a87f430841163156fc0ebda2ebdc4752493
|
| 3 |
+
size 708797208
|
pipeline.json
ADDED
|
@@ -0,0 +1,92 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
{
|
| 2 |
+
"name": "Trellis2ImageTo3DPipeline",
|
| 3 |
+
"args": {
|
| 4 |
+
"models": {
|
| 5 |
+
"sparse_structure_decoder": "microsoft/TRELLIS-image-large/ckpts/ss_dec_conv3d_16l8_fp16",
|
| 6 |
+
"sparse_structure_flow_model": "ckpts/ss_flow_img_dit_1_3B_64_bf16",
|
| 7 |
+
"shape_slat_decoder": "ckpts/shape_dec_next_dc_f16c32_fp16",
|
| 8 |
+
"shape_slat_flow_model_512": "ckpts/slat_flow_img2shape_dit_1_3B_512_bf16",
|
| 9 |
+
"shape_slat_flow_model_1024": "ckpts/slat_flow_img2shape_dit_1_3B_1024_bf16",
|
| 10 |
+
"tex_slat_decoder": "ckpts/tex_dec_next_dc_f16c32_fp16",
|
| 11 |
+
"tex_slat_flow_model_512": "ckpts/slat_flow_imgshape2tex_dit_1_3B_512_bf16",
|
| 12 |
+
"tex_slat_flow_model_1024": "ckpts/slat_flow_imgshape2tex_dit_1_3B_1024_bf16"
|
| 13 |
+
},
|
| 14 |
+
"sparse_structure_sampler": {
|
| 15 |
+
"name": "FlowEulerGuidanceIntervalSampler",
|
| 16 |
+
"args": {
|
| 17 |
+
"sigma_min": 1e-5
|
| 18 |
+
},
|
| 19 |
+
"params": {
|
| 20 |
+
"steps": 12,
|
| 21 |
+
"guidance_strength": 7.5,
|
| 22 |
+
"guidance_rescale": 0.7,
|
| 23 |
+
"guidance_interval": [0.6, 1.0],
|
| 24 |
+
"rescale_t": 5.0
|
| 25 |
+
}
|
| 26 |
+
},
|
| 27 |
+
"shape_slat_sampler": {
|
| 28 |
+
"name": "FlowEulerGuidanceIntervalSampler",
|
| 29 |
+
"args": {
|
| 30 |
+
"sigma_min": 1e-5
|
| 31 |
+
},
|
| 32 |
+
"params": {
|
| 33 |
+
"steps": 12,
|
| 34 |
+
"guidance_strength": 7.5,
|
| 35 |
+
"guidance_rescale": 0.5,
|
| 36 |
+
"guidance_interval": [0.6, 1.0],
|
| 37 |
+
"rescale_t": 3.0
|
| 38 |
+
}
|
| 39 |
+
},
|
| 40 |
+
"shape_slat_normalization": {
|
| 41 |
+
"mean": [
|
| 42 |
+
0.781296, 0.018091, -0.495192, -0.558457, 1.060530, 0.093252, 1.518149, -0.933218,
|
| 43 |
+
-0.732996, 2.604095, -0.118341, -2.143904, 0.495076, -2.179512, -2.130751, -0.996944,
|
| 44 |
+
0.261421, -2.217463, 1.260067, -0.150213, 3.790713, 1.481266, -1.046058, -1.523667,
|
| 45 |
+
-0.059621, 2.220780, 1.621212, 0.877230, 0.567247, -3.175944, -3.186688, 1.578665
|
| 46 |
+
],
|
| 47 |
+
"std": [
|
| 48 |
+
5.972266, 4.706852, 5.445010, 5.209927, 5.320220, 4.547237, 5.020802, 5.444004,
|
| 49 |
+
5.226681, 5.683095, 4.831436, 5.286469, 5.652043, 5.367606, 5.525084, 4.730578,
|
| 50 |
+
4.805265, 5.124013, 5.530808, 5.619001, 5.103930, 5.417670, 5.269677, 5.547194,
|
| 51 |
+
5.634698, 5.235274, 6.110351, 5.511298, 6.237273, 4.879207, 5.347008, 5.405691
|
| 52 |
+
]
|
| 53 |
+
},
|
| 54 |
+
"tex_slat_sampler": {
|
| 55 |
+
"name": "FlowEulerGuidanceIntervalSampler",
|
| 56 |
+
"args": {
|
| 57 |
+
"sigma_min": 1e-5
|
| 58 |
+
},
|
| 59 |
+
"params": {
|
| 60 |
+
"steps": 12,
|
| 61 |
+
"guidance_strength": 1.0,
|
| 62 |
+
"guidance_rescale": 0.0,
|
| 63 |
+
"guidance_interval": [0.6, 0.9],
|
| 64 |
+
"rescale_t": 3.0
|
| 65 |
+
}
|
| 66 |
+
},
|
| 67 |
+
"tex_slat_normalization": {
|
| 68 |
+
"mean": [
|
| 69 |
+
3.501659, 2.212398, 2.226094, 0.251093, -0.026248, -0.687364, 0.439898, -0.928075,
|
| 70 |
+
0.029398, -0.339596, -0.869527, 1.038479, -0.972385, 0.126042, -1.129303, 0.455149,
|
| 71 |
+
-1.209521, 2.069067, 0.544735, 2.569128, -0.323407, 2.293000, -1.925608, -1.217717,
|
| 72 |
+
1.213905, 0.971588, -0.023631, 0.106750, 2.021786, 0.250524, -0.662387, -0.768862
|
| 73 |
+
],
|
| 74 |
+
"std": [
|
| 75 |
+
2.665652, 2.743913, 2.765121, 2.595319, 3.037293, 2.291316, 2.144656, 2.911822,
|
| 76 |
+
2.969419, 2.501689, 2.154811, 3.163343, 2.621215, 2.381943, 3.186697, 3.021588,
|
| 77 |
+
2.295916, 3.234985, 3.233086, 2.260140, 2.874801, 2.810596, 3.292720, 2.674999,
|
| 78 |
+
2.680878, 2.372054, 2.451546, 2.353556, 2.995195, 2.379849, 2.786195, 2.775190
|
| 79 |
+
]
|
| 80 |
+
},
|
| 81 |
+
"image_cond_model": {
|
| 82 |
+
"name": "DinoV3FeatureExtractor",
|
| 83 |
+
"args": {
|
| 84 |
+
"model_name": "facebook/dinov3-vitl16-pretrain-lvd1689m"
|
| 85 |
+
}
|
| 86 |
+
},
|
| 87 |
+
"rembg_model": {
|
| 88 |
+
"name": "BiRefNet",
|
| 89 |
+
"args": {}
|
| 90 |
+
}
|
| 91 |
+
}
|
| 92 |
+
}
|