Show HN: TRELLIS.2 image-to-3D running on Mac Silicon – no Nvidia GPU needed

This project showcases a remarkable feat of engineering, successfully porting Microsoft's TRELLIS.2, a 4-billion-parameter image-to-3D model, to run natively on Apple Silicon. The port eliminates the original model's reliance on NVIDIA's CUDA, making advanced 3D generation accessible to Mac users without specialized hardware or cloud services. This effort involved a meticulous replacement of CUDA-specific operations with pure-PyTorch and Python alternatives, demonstrating ingenious solutions to hardware compatibility challenges.

Core Achievement: TRELLIS.2, a state-of-the-art image-to-3D model, now runs on Apple Silicon (M1 or later) using PyTorch MPS.
CUDA Bypass: Key CUDA-dependent libraries like flex_gemm, o_voxel._C hashmap, and flash_attn were replaced with custom pure-PyTorch or Python implementations.
Performance: Generates 400K+ vertex meshes from single images in approximately 3.5 minutes on an M4 Pro, utilizing around 18GB of unified memory.
Offline Capability: Enables local 3D model generation, removing the need for cloud-based services.
Output: Produces vertex-colored OBJ and GLB files suitable for various 3D applications.
Limitations: The port currently lacks texture export and mesh hole-filling capabilities due to the absence of CUDA-specific dependencies for these features. The pure-PyTorch sparse convolution is also about 10x slower than its CUDA counterpart, and it's for inference only. This port offers Apple Silicon users a powerful, local tool for 3D content creation, pushing the boundaries of what's possible on consumer hardware and highlighting the ingenuity required to bridge hardware-software gaps in cutting-edge AI research.

Show HN: TRELLIS.2 image-to-3D running on Mac Silicon – no Nvidia GPU needed

The Lowdown