Microsoft's Trellis Is Crazy Fast And Open-Source 3D Asset Generator
Trellis is a novel 3D generation method for versatile and high-quality 3D asset creation. It's fast, it's free, and the results are really good!
A few weeks ago, Microsoft unveiled a novel 3D generation method for versatile and high-quality 3D asset creation called Trellis. The model uses a unified structured latent representation (SLAT) to decode into various formats, such as Radiance Fields, 3D Gaussians, and meshes, by integrating sparse 3D grids with multiview visual features.
Okay, that sounds like a mouthful, but in simple terms, Trellis is really good at creating high-quality 3D models that look realistic and match the descriptions or pictures you provide. It’s an incredible tool for artists, developers, and designers to produce amazing 3D content efficiently.
I’ve talked about AI-powered 3D object generators in the past, but this one is particularly more impressive in terms of speed and quality.
How Does Trellis Work?
The method uses rectified flow transformers and achieves superior results compared to existing approaches, exhibiting flexible editing capabilities.
The model is trained on a large 3D asset dataset (500K objects) and surpasses existing methods in quality and versatility, as demonstrated through extensive experiments and user studies.
The 3D object generation in Trellis is a two-stage process that uses a special code called “Structured LATent” (SLAT).
Here’s how it works:
Stage 1: Building the Structure
Sparse Structure: Trellis starts by creating a basic framework of the object using a set of “active voxels.” Voxels are like tiny cubes in 3D space. The active voxels outline the rough shape of the object. Imagine building a Lego model and first putting together the main blocks to get the general shape.
Compressing the Structure: To make things more efficient, Trellis compresses this framework into a smaller set of instructions using a technique called a VAE (Variational Autoencoder).
Generating the Framework: Trellis uses a special type of artificial intelligence called a “Rectified Flow Transformer” to take this cheat sheet and turn it into a detailed plan for the object’s framework. This plan tells the computer exactly where to place the active voxels in 3D space.
Stage 2: Adding the Details
Local Latents: Once the framework is in place, Trellis adds details to each active voxel using “local latents.” These latents contain information about the object’s appearance, like color and texture.
Feature Aggregation: To figure out what each local latent should look like, Trellis uses a powerful vision model (DINOv2). This vision model analyzes pictures of the object from many different angles and extracts important features, like edges, shapes, and colors.
Generating the Details: Trellis uses another Rectified Flow Transformer to take these features and turn them into the detailed local latents. These latents are then attached to the active voxels to complete the 3D model.
This two-stage process allows Trellis to create high-quality 3D models efficiently. It leverages the power of artificial intelligence and computer vision to understand and recreate complex 3D objects from text descriptions or pictures.
Trellis can convert this SLAT representation into various 3D model formats, like:
3D Gaussians
Radiance Fields
Meshes
Check out these high-quality examples:
The way Trellis compresses the structure and adds details is reminiscent of how professional 3D artists work—starting with a base mesh and then layering details.
However, unlike human artists, Trellis does it in a fraction of the time.
If you want to learn about the technical details of Trellis, check out the whitepaper here.
How To Try Trellis
Here’s how you can generate 3D assets for free with Trellis.
Keep reading with a 7-day free trial
Subscribe to Generative AI Publication to keep reading this post and get 7 days of free access to the full post archives.