JeffreyXiang commited on
Commit
8dd6d2f
·
verified ·
1 Parent(s): 5e241bc

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +40 -25
README.md CHANGED
@@ -9,9 +9,12 @@ language:
9
  # TRELLIS.2: Native and Compact Structured Latents for 3D Generation
10
 
11
  **Model Name:** TRELLIS.2-4B
12
- **Paper:** [Coming Soon]
13
- **Repository:** [GitHub Link Placeholder]
14
- **Project Page:** [Website Link Placeholder]
 
 
 
15
 
16
  ## Introduction
17
 
@@ -19,6 +22,15 @@ language:
19
 
20
  Unlike previous methods that rely on iso-surface fields (e.g., SDF, Flexicubes) which struggle with open surfaces or non-manifold geometry, TRELLIS can reconstruct and generate **arbitrary 3D assets** with complex topologies, sharp features, and full Physical-Based Rendering (PBR) materials—including transparency/translucency.
21
 
 
 
 
 
 
 
 
 
 
22
  ## Key Features
23
 
24
  * **O-Voxel Representation:** An omni-voxel structure that encodes both geometry and appearance. It supports:
@@ -26,8 +38,9 @@ Unlike previous methods that rely on iso-surface fields (e.g., SDF, Flexicubes)
26
  * **Rich Appearance:** Captures PBR attributes (including opacity for translucent surfaces) aligned with geometry.
27
  * **Efficiency:** Instant optimization-free bidirectional conversion between meshes and O-Voxels (ms to seconds).
28
  * **High-Resolution Generation:** The model is trained to generate fully textured assets at **up to 1536³ resolution**.
29
- * **High-Fidelity while Compact Latent Space:** Utilizes a Sparse 3D VAE with **16× spatial downsampling**, encoding a $1024^3$ asset into only ~9.6K latent tokens with negligible perceptual degradation.
30
- * **State-of-the-Art Speed:** Inference is significantly faster than existing large 3D models.
 
31
 
32
  ## Inference Speed (NVIDIA H100 GPU)
33
 
@@ -39,25 +52,19 @@ Unlike previous methods that rely on iso-surface fields (e.g., SDF, Flexicubes)
39
 
40
  ## Known Limitations
41
 
42
- * **Geometric Artifacts (Small Holes):** While O-Voxels handle complex topology well, the generated raw meshes may occasionally contain small holes or minor topological discontinuities. For applications requiring strictly watertight geometry (e.g., 3D printing), standard mesh post-processing steps (such as hole-filling algorithms) may be necessary.
43
- * **Base Model Status (No Alignment):** TRELLIS.2-4B is a pre-trained foundation model. It has **not** been aligned with human preferences (e.g., via RLHF) or fine-tuned for specific aesthetic standards. Consequently, the outputs reflect the distribution of the training data and may vary in style; users may need to experiment with inputs to achieve the desired artistic result.
44
-
45
- ## Model Details
46
 
47
- * **Developed by:** Jianfeng Xiang, Xiaoxue Chen, Sicheng Xu, Ruicheng Wang, Zelong Lv, Yu Deng, Hongyuan Zhu, Yue Dong, Hao Zhao, Nicholas Jing Yuan, Jiaolong Yang
48
- * **Model Type:** Flow-Matching Transformers with Sparse Voxel based 3D VAE
49
- * **Parameters:** 4 Billion
50
- * **Input:** Single Image
51
- * **Output:** 3D Asset (Mesh with PBR Materials)
52
- * **Resolution:** Varies from 512³ to 1536³ (Voxel Grid Resolution)
53
 
54
  ## Usage
55
 
56
- *Note: Please refer to the official [GitHub Repository] for installation instructions and dependencies.*
57
 
58
  ```python
59
  import os
60
  os.environ['OPENCV_IO_ENABLE_OPENEXR'] = '1'
 
61
  import cv2
62
  import imageio
63
  from PIL import Image
@@ -67,26 +74,26 @@ from trellis2.utils import render_utils
67
  from trellis2.renderers import EnvMap
68
  import o_voxel
69
 
 
70
  envmap = EnvMap(torch.tensor(
71
  cv2.cvtColor(cv2.imread('assets/hdri/forest.exr', cv2.IMREAD_UNCHANGED), cv2.COLOR_BGR2RGB),
72
  dtype=torch.float32, device='cuda'
73
  ))
74
 
75
- # Load a pipeline from a model folder or a Hugging Face model hub.
76
  pipeline = Trellis2ImageTo3DPipeline.from_pretrained("JeffreyXiang/TRELLIS.2-4B")
77
  pipeline.cuda()
78
 
79
- # Load an image
80
  image = Image.open("assets/example_image/T.png")
81
-
82
- # Run the pipeline
83
  mesh = pipeline.run(image)[0]
 
84
 
85
- # Render the outputs
86
  video = render_utils.make_pbr_vis_frames(render_utils.render_video(mesh, envmap=envmap))
87
  imageio.mimsave("sample.mp4", video, fps=15)
88
 
89
- # GLB files can be extracted from the outputs
90
  glb = o_voxel.postprocess.to_glb(
91
  vertices = mesh.vertices,
92
  faces = mesh.faces,
@@ -95,10 +102,12 @@ glb = o_voxel.postprocess.to_glb(
95
  attr_layout = mesh.layout,
96
  voxel_size = mesh.voxel_size,
97
  aabb = [[-0.5, -0.5, -0.5], [0.5, 0.5, 0.5]],
98
- decimation_target = 100000,
99
  texture_size = 2048,
 
 
100
  )
101
- glb.export("sample.glb")
102
  ```
103
 
104
  ## Citation
@@ -106,7 +115,13 @@ glb.export("sample.glb")
106
  If you find this model useful for your research, please cite our work:
107
 
108
  ```
109
- TBD
 
 
 
 
 
 
110
  ```
111
 
112
  ## License
 
9
  # TRELLIS.2: Native and Compact Structured Latents for 3D Generation
10
 
11
  **Model Name:** TRELLIS.2-4B
12
+
13
+ **Paper:** [https://microsoft.github.io/trellis.2](https://microsoft.github.io/trellis.2)
14
+
15
+ **Repository:** [https://github.com/microsoft/TRELLIS.2](https://github.com/microsoft/TRELLIS.2)
16
+
17
+ **Project Page:** [https://microsoft.github.io/trellis.2](https://microsoft.github.io/trellis.2)
18
 
19
  ## Introduction
20
 
 
22
 
23
  Unlike previous methods that rely on iso-surface fields (e.g., SDF, Flexicubes) which struggle with open surfaces or non-manifold geometry, TRELLIS can reconstruct and generate **arbitrary 3D assets** with complex topologies, sharp features, and full Physical-Based Rendering (PBR) materials—including transparency/translucency.
24
 
25
+ ## Model Details
26
+
27
+ * **Developed by:** Jianfeng Xiang, Xiaoxue Chen, Sicheng Xu, Ruicheng Wang, Zelong Lv, Yu Deng, Hongyuan Zhu, Yue Dong, Hao Zhao, Nicholas Jing Yuan, Jiaolong Yang
28
+ * **Model Type:** Flow-Matching Transformers with Sparse Voxel based 3D VAE
29
+ * **Parameters:** 4 Billion
30
+ * **Input:** Single Image
31
+ * **Output:** 3D Asset (Mesh with PBR Materials)
32
+ * **Resolution:** Varies from 512³ to 1536³ (Voxel Grid Resolution)
33
+
34
  ## Key Features
35
 
36
  * **O-Voxel Representation:** An omni-voxel structure that encodes both geometry and appearance. It supports:
 
38
  * **Rich Appearance:** Captures PBR attributes (including opacity for translucent surfaces) aligned with geometry.
39
  * **Efficiency:** Instant optimization-free bidirectional conversion between meshes and O-Voxels (ms to seconds).
40
  * **High-Resolution Generation:** The model is trained to generate fully textured assets at **up to 1536³ resolution**.
41
+ * **High-Fidelity while Compact Latent Space:** Utilizes a Sparse 3D VAE with **16× spatial downsampling**, encoding a 1024³ asset into only ~9.6K latent tokens with negligible perceptual degradation.
42
+ * **Shape-conditioned Texture Generation:** Generates textures for input 3D meshes and reference images.
43
+ * **State-of-the-Art Speed:** Inference is highly efficient; see table below.
44
 
45
  ## Inference Speed (NVIDIA H100 GPU)
46
 
 
52
 
53
  ## Known Limitations
54
 
55
+ * **Geometric Artifacts (Small Holes):** While O-Voxels handle complex topology well, the generated raw meshes may occasionally contain small holes or minor topological discontinuities. For applications requiring strictly watertight geometry (e.g., 3D printing), we provide accompanying mesh post-processing scripts, such as hole-filling algorithms.
56
+ * **Base Model w/o Alignment:** TRELLIS.2-4B is a pre-trained foundation model. It has **not** been aligned with human preferences (e.g., via RLHF) or fine-tuned for specific aesthetic standards. Consequently, the outputs reflect the distribution of the training data and may vary in style; users may need to experiment with inputs to achieve the desired artistic result.
 
 
57
 
58
+ We are actively working on improving the model and addressing these limitations.
 
 
 
 
 
59
 
60
  ## Usage
61
 
62
+ *Note: Please refer to the official [GitHub Repository](https://github.com/microsoft/TRELLIS.2) for installation instructions and dependencies.*
63
 
64
  ```python
65
  import os
66
  os.environ['OPENCV_IO_ENABLE_OPENEXR'] = '1'
67
+ os.environ["PYTORCH_CUDA_ALLOC_CONF"] = "expandable_segments:True" # Can save GPU memory
68
  import cv2
69
  import imageio
70
  from PIL import Image
 
74
  from trellis2.renderers import EnvMap
75
  import o_voxel
76
 
77
+ # 1. Setup Environment Map
78
  envmap = EnvMap(torch.tensor(
79
  cv2.cvtColor(cv2.imread('assets/hdri/forest.exr', cv2.IMREAD_UNCHANGED), cv2.COLOR_BGR2RGB),
80
  dtype=torch.float32, device='cuda'
81
  ))
82
 
83
+ # 2. Load Pipeline
84
  pipeline = Trellis2ImageTo3DPipeline.from_pretrained("JeffreyXiang/TRELLIS.2-4B")
85
  pipeline.cuda()
86
 
87
+ # 3. Load Image & Run
88
  image = Image.open("assets/example_image/T.png")
 
 
89
  mesh = pipeline.run(image)[0]
90
+ mesh.simplify(16777216) # nvdiffrast limit
91
 
92
+ # 4. Render Video
93
  video = render_utils.make_pbr_vis_frames(render_utils.render_video(mesh, envmap=envmap))
94
  imageio.mimsave("sample.mp4", video, fps=15)
95
 
96
+ # 5. Export to GLB
97
  glb = o_voxel.postprocess.to_glb(
98
  vertices = mesh.vertices,
99
  faces = mesh.faces,
 
102
  attr_layout = mesh.layout,
103
  voxel_size = mesh.voxel_size,
104
  aabb = [[-0.5, -0.5, -0.5], [0.5, 0.5, 0.5]],
105
+ decimation_target = 1000000,
106
  texture_size = 2048,
107
+ remesh = False,
108
+ verbose = True
109
  )
110
+ glb.export("sample.glb", include_normals=False)
111
  ```
112
 
113
  ## Citation
 
115
  If you find this model useful for your research, please cite our work:
116
 
117
  ```
118
+ @article{
119
+ xiang2025trellis2,
120
+ title={Native and Compact Structured Latents for 3D Generation},
121
+ author={Xiang, Jianfeng and Chen, Xiaoxue and Xu, Sicheng and Wang, Ruicheng and Lv, Zelong and Deng, Yu and Zhu, Hongyuan and Dong, Yue and Zhao, Hao and Yuan, Nicholas Jing and Yang, Jiaolong},
122
+ journal={Tech report},
123
+ year={2025}
124
+ }
125
  ```
126
 
127
  ## License