JeffreyXiang commited on 20 days ago

Commit

5e241bc

verified ·

1 Parent(s): 260f72c

Upload folder using huggingface_hub

Browse files

Files changed (20) hide show

README.md +114 -3
ckpts/shape_dec_next_dc_f16c32_fp16.json +24 -0
ckpts/shape_dec_next_dc_f16c32_fp16.safetensors +3 -0
ckpts/shape_enc_next_dc_f16c32_fp16.json +24 -0
ckpts/shape_enc_next_dc_f16c32_fp16.safetensors +3 -0
ckpts/slat_flow_img2shape_dit_1_3B_1024_bf16.json +19 -0
ckpts/slat_flow_img2shape_dit_1_3B_1024_bf16.safetensors +3 -0
ckpts/slat_flow_img2shape_dit_1_3B_512_bf16.json +19 -0
ckpts/slat_flow_img2shape_dit_1_3B_512_bf16.safetensors +3 -0
ckpts/slat_flow_imgshape2tex_dit_1_3B_1024_bf16.json +19 -0
ckpts/slat_flow_imgshape2tex_dit_1_3B_1024_bf16.safetensors +3 -0
ckpts/slat_flow_imgshape2tex_dit_1_3B_512_bf16.json +19 -0
ckpts/slat_flow_imgshape2tex_dit_1_3B_512_bf16.safetensors +3 -0
ckpts/ss_flow_img_dit_1_3B_64_bf16.json +19 -0
ckpts/ss_flow_img_dit_1_3B_64_bf16.safetensors +3 -0
ckpts/tex_dec_next_dc_f16c32_fp16.json +25 -0
ckpts/tex_dec_next_dc_f16c32_fp16.safetensors +3 -0
ckpts/tex_enc_next_dc_f16c32_fp16.json +24 -0
ckpts/tex_enc_next_dc_f16c32_fp16.safetensors +3 -0
pipeline.json +92 -0

README.md CHANGED Viewed

@@ -1,3 +1,114 @@
----
-license: mit
----

+---
+license: mit
+pipeline_tag: image-to-3d
+library_name: trellis2
+language:
+- en
+---
+# TRELLIS.2: Native and Compact Structured Latents for 3D Generation
+**Model Name:** TRELLIS.2-4B
+**Paper:** [Coming Soon]
+**Repository:** [GitHub Link Placeholder]
+**Project Page:** [Website Link Placeholder]
+## Introduction
+**TRELLIS.2** is a state-of-the-art large 3D generative model designed for high-fidelity **image-to-3D** generation. It leverages a novel "field-free" sparse voxel structure termed **O-Voxel** and a large-scale flow-matching transformer (4 Billion parameters).
+Unlike previous methods that rely on iso-surface fields (e.g., SDF, Flexicubes) which struggle with open surfaces or non-manifold geometry, TRELLIS can reconstruct and generate **arbitrary 3D assets** with complex topologies, sharp features, and full Physical-Based Rendering (PBR) materials—including transparency/translucency.
+## Key Features
+*   **O-Voxel Representation:** An omni-voxel structure that encodes both geometry and appearance. It supports:
+    *   **Arbitrary Topology:** Handles open surfaces, non-manifold geometry, and fully-enclosed structures without lossy conversion.
+    *   **Rich Appearance:** Captures PBR attributes (including opacity for translucent surfaces) aligned with geometry.
+    *   **Efficiency:** Instant optimization-free bidirectional conversion between meshes and O-Voxels (ms to seconds).
+*   **High-Resolution Generation:** The model is trained to generate fully textured assets at **up to 1536³ resolution**.
+*   **High-Fidelity while Compact Latent Space:** Utilizes a Sparse 3D VAE with **16× spatial downsampling**, encoding a $1024^3$ asset into only ~9.6K latent tokens with negligible perceptual degradation.
+*   **State-of-the-Art Speed:** Inference is significantly faster than existing large 3D models.
+## Inference Speed (NVIDIA H100 GPU)
+| Resolution | Time |
+| :--- | :--- |
+| 512³ | ~3 seconds |
+| 1024³ | ~17 seconds |
+| 1536³ | ~60 seconds |
+## Known Limitations
+*   **Geometric Artifacts (Small Holes):** While O-Voxels handle complex topology well, the generated raw meshes may occasionally contain small holes or minor topological discontinuities. For applications requiring strictly watertight geometry (e.g., 3D printing), standard mesh post-processing steps (such as hole-filling algorithms) may be necessary.
+*   **Base Model Status (No Alignment):** TRELLIS.2-4B is a pre-trained foundation model. It has **not** been aligned with human preferences (e.g., via RLHF) or fine-tuned for specific aesthetic standards. Consequently, the outputs reflect the distribution of the training data and may vary in style; users may need to experiment with inputs to achieve the desired artistic result.
+## Model Details
+*   **Developed by:** Jianfeng Xiang, Xiaoxue Chen, Sicheng Xu, Ruicheng Wang, Zelong Lv, Yu Deng, Hongyuan Zhu, Yue Dong, Hao Zhao, Nicholas Jing Yuan, Jiaolong Yang
+*   **Model Type:** Flow-Matching Transformers with Sparse Voxel based 3D VAE
+*   **Parameters:** 4 Billion
+*   **Input:** Single Image
+*   **Output:** 3D Asset (Mesh with PBR Materials)
+*   **Resolution:** Varies from 512³ to 1536³ (Voxel Grid Resolution)
+## Usage
+*Note: Please refer to the official [GitHub Repository] for installation instructions and dependencies.*
+```python
+import os
+os.environ['OPENCV_IO_ENABLE_OPENEXR'] = '1'
+import cv2
+import imageio
+from PIL import Image
+import torch
+from trellis2.pipelines import Trellis2ImageTo3DPipeline
+from trellis2.utils import render_utils
+from trellis2.renderers import EnvMap
+import o_voxel
+envmap = EnvMap(torch.tensor(
+    cv2.cvtColor(cv2.imread('assets/hdri/forest.exr', cv2.IMREAD_UNCHANGED), cv2.COLOR_BGR2RGB),
+    dtype=torch.float32, device='cuda'
+))
+# Load a pipeline from a model folder or a Hugging Face model hub.
+pipeline = Trellis2ImageTo3DPipeline.from_pretrained("JeffreyXiang/TRELLIS.2-4B")
+pipeline.cuda()
+# Load an image
+image = Image.open("assets/example_image/T.png")
+# Run the pipeline
+mesh = pipeline.run(image)[0]
+# Render the outputs
+video = render_utils.make_pbr_vis_frames(render_utils.render_video(mesh, envmap=envmap))
+imageio.mimsave("sample.mp4", video, fps=15)
+# GLB files can be extracted from the outputs
+glb = o_voxel.postprocess.to_glb(
+    vertices            =   mesh.vertices,
+    faces               =   mesh.faces,
+    attr_volume         =   mesh.attrs,
+    coords              =   mesh.coords,
+    attr_layout         =   mesh.layout,
+    voxel_size          =   mesh.voxel_size,
+    aabb                =   [[-0.5, -0.5, -0.5], [0.5, 0.5, 0.5]],
+    decimation_target   =   100000,
+    texture_size        =   2048,
+)
+glb.export("sample.glb")
+```
+## Citation
+If you find this model useful for your research, please cite our work:
+```
+TBD
+```
+## License
+This model is released under the MIT License. The code and dataset are publicly released to facilitate reproduction and further research.

ckpts/shape_dec_next_dc_f16c32_fp16.json ADDED Viewed

	@@ -0,0 +1,24 @@

+{
+    "name": "FlexiDualGridVaeDecoder",
+    "args": {
+        "resolution": 256,
+        "model_channels": [1024, 512, 256, 128, 64],
+        "latent_channels": 32,
+        "num_blocks": [4, 16, 8, 4, 0],
+        "block_type": [
+            "SparseConvNeXtBlock3d",
+            "SparseConvNeXtBlock3d",
+            "SparseConvNeXtBlock3d",
+            "SparseConvNeXtBlock3d",
+            "SparseConvNeXtBlock3d"
+        ],
+        "up_block_type": [
+            "SparseResBlockC2S3d",
+            "SparseResBlockC2S3d",
+            "SparseResBlockC2S3d",
+            "SparseResBlockC2S3d"
+        ],
+        "block_args": [{}, {}, {}, {}, {}],
+        "use_fp16": true
+    }
+}

ckpts/shape_dec_next_dc_f16c32_fp16.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:e3b718d3e43e4f8780e9a24ac6fff231811a67e3b058e336e10fe654c911d581
+size 948490494

ckpts/shape_enc_next_dc_f16c32_fp16.json ADDED Viewed

	@@ -0,0 +1,24 @@

+{
+    "name": "FlexiDualGridVaeEncoder",
+    "args": {
+        "resolution": 256,
+        "model_channels": [64, 128, 256, 512, 1024],
+        "latent_channels": 32,
+        "num_blocks": [0, 4, 8, 16, 4],
+        "block_type": [
+            "SparseConvNeXtBlock3d",
+            "SparseConvNeXtBlock3d",
+            "SparseConvNeXtBlock3d",
+            "SparseConvNeXtBlock3d",
+            "SparseConvNeXtBlock3d"
+        ],
+        "up_block_type": [
+            "SparseResBlockS2C3d",
+            "SparseResBlockS2C3d",
+            "SparseResBlockS2C3d",
+            "SparseResBlockS2C3d"
+        ],
+        "block_args": [{}, {}, {}, {}, {}],
+        "use_fp16": true
+    }
+}

ckpts/shape_enc_next_dc_f16c32_fp16.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:f37c5ff5b983b68e9946060000f09bc131f3e84318a2c8b7430a81e4b4636c41
+size 708797208

ckpts/slat_flow_img2shape_dit_1_3B_1024_bf16.json ADDED Viewed

	@@ -0,0 +1,19 @@

+{
+    "name": "SLatFlowModel",
+    "args": {
+        "resolution": 64,
+        "in_channels": 32,
+        "out_channels": 32,
+        "model_channels": 1536,
+        "cond_channels": 1024,
+        "num_blocks": 30,
+        "num_heads": 12,
+        "mlp_ratio": 5.3334,
+        "pe_mode": "rope",
+        "share_mod": true,
+        "initialization": "scaled",
+        "qk_rms_norm": true,
+        "qk_rms_norm_cross": true,
+        "dtype": "bfloat16"
+    }
+}

ckpts/slat_flow_img2shape_dit_1_3B_1024_bf16.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:07cd0596f634c5adc1890023d16023afc5eed02fb84b22bb23aff5bf0030fbbd
+size 2584574424

ckpts/slat_flow_img2shape_dit_1_3B_512_bf16.json ADDED Viewed

	@@ -0,0 +1,19 @@

+{
+    "name": "SLatFlowModel",
+    "args": {
+        "resolution": 32,
+        "in_channels": 32,
+        "out_channels": 32,
+        "model_channels": 1536,
+        "cond_channels": 1024,
+        "num_blocks": 30,
+        "num_heads": 12,
+        "mlp_ratio": 5.3334,
+        "pe_mode": "rope",
+        "share_mod": true,
+        "initialization": "scaled",
+        "qk_rms_norm": true,
+        "qk_rms_norm_cross": true,
+        "dtype": "bfloat16"
+    }
+}

ckpts/slat_flow_img2shape_dit_1_3B_512_bf16.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:ec5e0917ef9b7e25ad51dffc7d19687a42019871f94239f2fa7f86264c55b70f
+size 2584574424

ckpts/slat_flow_imgshape2tex_dit_1_3B_1024_bf16.json ADDED Viewed

	@@ -0,0 +1,19 @@

+{
+    "name": "SLatFlowModel",
+    "args": {
+        "resolution": 64,
+        "in_channels": 64,
+        "out_channels": 32,
+        "model_channels": 1536,
+        "cond_channels": 1024,
+        "num_blocks": 30,
+        "num_heads": 12,
+        "mlp_ratio": 5.3334,
+        "pe_mode": "rope",
+        "share_mod": true,
+        "initialization": "scaled",
+        "qk_rms_norm": true,
+        "qk_rms_norm_cross": true,
+        "dtype": "bfloat16"
+    }
+}

ckpts/slat_flow_imgshape2tex_dit_1_3B_1024_bf16.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:580401269059a339b8318ab9ced459a13ba63391721c83a6c383198c29e77686
+size 2584672728

ckpts/slat_flow_imgshape2tex_dit_1_3B_512_bf16.json ADDED Viewed

	@@ -0,0 +1,19 @@

+{
+    "name": "SLatFlowModel",
+    "args": {
+        "resolution": 32,
+        "in_channels": 64,
+        "out_channels": 32,
+        "model_channels": 1536,
+        "cond_channels": 1024,
+        "num_blocks": 30,
+        "num_heads": 12,
+        "mlp_ratio": 5.3334,
+        "pe_mode": "rope",
+        "share_mod": true,
+        "initialization": "scaled",
+        "qk_rms_norm": true,
+        "qk_rms_norm_cross": true,
+        "dtype": "bfloat16"
+    }
+}

ckpts/slat_flow_imgshape2tex_dit_1_3B_512_bf16.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:8371aa1c5d13be79dcd5ddfd2cf3835e902e204dc34427169a1c702828e1a94d
+size 2584672728

ckpts/ss_flow_img_dit_1_3B_64_bf16.json ADDED Viewed

	@@ -0,0 +1,19 @@

+{
+    "name": "SparseStructureFlowModel",
+    "args": {
+        "resolution": 16,
+        "in_channels": 8,
+        "out_channels": 8,
+        "model_channels": 1536,
+        "cond_channels": 1024,
+        "num_blocks": 30,
+        "num_heads": 12,
+        "mlp_ratio": 5.3334,
+        "pe_mode": "rope",
+        "share_mod": true,
+        "initialization": "scaled",
+        "qk_rms_norm": true,
+        "qk_rms_norm_cross": true,
+        "dtype": "bfloat16"
+    }
+}

ckpts/ss_flow_img_dit_1_3B_64_bf16.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:ca01377c485bec418076d38ee80166d32dc776d744f2553b835cba1e97a7abf6
+size 2584426920

ckpts/tex_dec_next_dc_f16c32_fp16.json ADDED Viewed

	@@ -0,0 +1,25 @@

+{
+    "name": "SparseUnetVaeDecoder",
+    "args": {
+        "out_channels": 6,
+        "model_channels": [1024, 512, 256, 128, 64],
+        "latent_channels": 32,
+        "num_blocks": [4, 16, 8, 4, 0],
+        "block_type": [
+            "SparseConvNeXtBlock3d",
+            "SparseConvNeXtBlock3d",
+            "SparseConvNeXtBlock3d",
+            "SparseConvNeXtBlock3d",
+            "SparseConvNeXtBlock3d"
+        ],
+        "up_block_type": [
+            "SparseResBlockC2S3d",
+            "SparseResBlockC2S3d",
+            "SparseResBlockC2S3d",
+            "SparseResBlockC2S3d"
+        ],
+        "block_args": [{}, {}, {}, {}, {}],
+        "pred_subdiv": false,
+        "use_fp16": true
+    }
+}

ckpts/tex_dec_next_dc_f16c32_fp16.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:97ea69addea2ecd9312910f5f548234665eef51c088386180b7cd5b258645e3c
+size 948458812

ckpts/tex_enc_next_dc_f16c32_fp16.json ADDED Viewed

	@@ -0,0 +1,24 @@

+{
+    "name": "SparseUnetVaeEncoder",
+    "args": {
+        "in_channels": 6,
+        "model_channels": [64, 128, 256, 512, 1024],
+        "latent_channels": 32,
+        "num_blocks": [0, 4, 8, 16, 4],
+        "block_type": [
+            "SparseConvNeXtBlock3d",
+            "SparseConvNeXtBlock3d",
+            "SparseConvNeXtBlock3d",
+            "SparseConvNeXtBlock3d",
+            "SparseConvNeXtBlock3d"
+        ],
+        "up_block_type": [
+            "SparseResBlockS2C3d",
+            "SparseResBlockS2C3d",
+            "SparseResBlockS2C3d",
+            "SparseResBlockS2C3d"
+        ],
+        "block_args": [{}, {}, {}, {}, {}],
+        "use_fp16": true
+    }
+}

ckpts/tex_enc_next_dc_f16c32_fp16.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:dd109f75f84b90fa411554ed6b0e4a87f430841163156fc0ebda2ebdc4752493
+size 708797208

pipeline.json ADDED Viewed

	@@ -0,0 +1,92 @@

+{
+    "name": "Trellis2ImageTo3DPipeline",
+    "args": {
+        "models": {
+            "sparse_structure_decoder": "microsoft/TRELLIS-image-large/ckpts/ss_dec_conv3d_16l8_fp16",
+            "sparse_structure_flow_model": "ckpts/ss_flow_img_dit_1_3B_64_bf16",
+            "shape_slat_decoder": "ckpts/shape_dec_next_dc_f16c32_fp16",
+            "shape_slat_flow_model_512": "ckpts/slat_flow_img2shape_dit_1_3B_512_bf16",
+            "shape_slat_flow_model_1024": "ckpts/slat_flow_img2shape_dit_1_3B_1024_bf16",
+            "tex_slat_decoder": "ckpts/tex_dec_next_dc_f16c32_fp16",
+            "tex_slat_flow_model_512": "ckpts/slat_flow_imgshape2tex_dit_1_3B_512_bf16",
+            "tex_slat_flow_model_1024": "ckpts/slat_flow_imgshape2tex_dit_1_3B_1024_bf16"
+        },
+        "sparse_structure_sampler": {
+            "name": "FlowEulerGuidanceIntervalSampler",
+            "args": {
+                "sigma_min": 1e-5
+            },
+            "params": {
+                "steps": 12,
+                "guidance_strength": 7.5,
+                "guidance_rescale": 0.7,
+                "guidance_interval": [0.6, 1.0],
+                "rescale_t": 5.0
+            }
+        },
+        "shape_slat_sampler": {
+            "name": "FlowEulerGuidanceIntervalSampler",
+            "args": {
+                "sigma_min": 1e-5
+            },
+            "params": {
+                "steps": 12,
+                "guidance_strength": 7.5,
+                "guidance_rescale": 0.5,
+                "guidance_interval": [0.6, 1.0],
+                "rescale_t": 3.0
+            }
+        },
+        "shape_slat_normalization": {
+            "mean": [
+                0.781296, 0.018091, -0.495192, -0.558457, 1.060530, 0.093252, 1.518149, -0.933218,
+                -0.732996, 2.604095, -0.118341, -2.143904, 0.495076, -2.179512, -2.130751, -0.996944,
+                0.261421, -2.217463, 1.260067, -0.150213, 3.790713, 1.481266, -1.046058, -1.523667,
+                -0.059621, 2.220780, 1.621212, 0.877230, 0.567247, -3.175944, -3.186688, 1.578665
+            ],
+            "std": [
+                5.972266, 4.706852, 5.445010, 5.209927, 5.320220, 4.547237, 5.020802, 5.444004,
+                5.226681, 5.683095, 4.831436, 5.286469, 5.652043, 5.367606, 5.525084, 4.730578,
+                4.805265, 5.124013, 5.530808, 5.619001, 5.103930, 5.417670, 5.269677, 5.547194,
+                5.634698, 5.235274, 6.110351, 5.511298, 6.237273, 4.879207, 5.347008, 5.405691
+            ]
+        },
+        "tex_slat_sampler": {
+            "name": "FlowEulerGuidanceIntervalSampler",
+            "args": {
+                "sigma_min": 1e-5
+            },
+            "params": {
+                "steps": 12,
+                "guidance_strength": 1.0,
+                "guidance_rescale": 0.0,
+                "guidance_interval": [0.6, 0.9],
+                "rescale_t": 3.0
+            }
+        },
+        "tex_slat_normalization": {
+            "mean": [
+                3.501659, 2.212398, 2.226094, 0.251093, -0.026248, -0.687364, 0.439898, -0.928075,
+                0.029398, -0.339596, -0.869527, 1.038479, -0.972385, 0.126042, -1.129303, 0.455149,
+                -1.209521, 2.069067, 0.544735, 2.569128, -0.323407, 2.293000, -1.925608, -1.217717,
+                1.213905, 0.971588, -0.023631, 0.106750, 2.021786, 0.250524, -0.662387, -0.768862
+            ],
+            "std": [
+                2.665652, 2.743913, 2.765121, 2.595319, 3.037293, 2.291316, 2.144656, 2.911822,
+                2.969419, 2.501689, 2.154811, 3.163343, 2.621215, 2.381943, 3.186697, 3.021588,
+                2.295916, 3.234985, 3.233086, 2.260140, 2.874801, 2.810596, 3.292720, 2.674999,
+                2.680878, 2.372054, 2.451546, 2.353556, 2.995195, 2.379849, 2.786195, 2.775190
+            ]
+        },
+        "image_cond_model": {
+            "name": "DinoV3FeatureExtractor",
+            "args": {
+                "model_name": "facebook/dinov3-vitl16-pretrain-lvd1689m"
+            }
+        },
+        "rembg_model": {
+            "name": "BiRefNet",
+            "args": {}
+        }
+    }
+}