Estimating material property fields of 3D assets is critical for physics-based simulation,
robotics, and digital twin generation. Existing vision-based approaches are either too expensive
and slow or rely on 3D information. We present SLAT-Phys, an end-to-end method
that predicts spatially varying material property fields of 3D assets directly from a single RGB
image without explicit 3D reconstruction.
Our approach leverages spatially organized latent features from a pretrained 3D asset generation
model (TRELLIS) that encode rich geometry and semantic priors, and trains a lightweight neural
decoder to estimate Young’s modulus (E), density (ρ), and Poisson’s
ratio (ν). The coarse volumetric layout and semantic cues of the latent representation enable
accurate material estimation. SLAT-Phys requires only ∼9.9 seconds per
object on an NVIDIA RTX A5000 GPU and avoids reconstruction and voxelization preprocessing,
resulting in a ∼120× speedup compared to prior methods.