HN
Today

Show HN: TRELLIS.2 image-to-3D running on Mac Silicon – no Nvidia GPU needed

A clever engineer has successfully ported Microsoft's powerful TRELLIS.2 image-to-3D model to run natively on Apple Silicon, entirely bypassing NVIDIA's CUDA ecosystem. This significant technical achievement allows Mac users to generate detailed 3D meshes from single photos locally, without cloud dependencies or specialized GPUs. The detailed breakdown of replacing CUDA-specific operations with pure-PyTorch alternatives makes this a fascinating deep dive for developers interested in cross-platform ML and hardware optimization.

18
Score
0
Comments
#1
Highest Rank
11h
on Front Page
First Seen
Apr 20, 1:00 AM
Last Seen
Apr 20, 11:00 AM
Rank Over Time
4111881620252727

The Lowdown

This project showcases a remarkable feat of engineering, successfully porting Microsoft's TRELLIS.2, a 4-billion-parameter image-to-3D model, to run natively on Apple Silicon. The port eliminates the original model's reliance on NVIDIA's CUDA, making advanced 3D generation accessible to Mac users without specialized hardware or cloud services. This effort involved a meticulous replacement of CUDA-specific operations with pure-PyTorch and Python alternatives, demonstrating ingenious solutions to hardware compatibility challenges.

  • Core Achievement: TRELLIS.2, a state-of-the-art image-to-3D model, now runs on Apple Silicon (M1 or later) using PyTorch MPS.
  • CUDA Bypass: Key CUDA-dependent libraries like flex_gemm, o_voxel._C hashmap, and flash_attn were replaced with custom pure-PyTorch or Python implementations.
  • Performance: Generates 400K+ vertex meshes from single images in approximately 3.5 minutes on an M4 Pro, utilizing around 18GB of unified memory.
  • Offline Capability: Enables local 3D model generation, removing the need for cloud-based services.
  • Output: Produces vertex-colored OBJ and GLB files suitable for various 3D applications.
  • Limitations: The port currently lacks texture export and mesh hole-filling capabilities due to the absence of CUDA-specific dependencies for these features. The pure-PyTorch sparse convolution is also about 10x slower than its CUDA counterpart, and it's for inference only. This port offers Apple Silicon users a powerful, local tool for 3D content creation, pushing the boundaries of what's possible on consumer hardware and highlighting the ingenuity required to bridge hardware-software gaps in cutting-edge AI research.