Pooya Jannaty
- Image Retrieval and Classification Techniques
- Distributed and Parallel Computing Systems
- Medical Image Segmentation Techniques
- 3D Surveying and Cultural Heritage
- Image Processing Techniques and Applications
Nvidia (United States)
2024
Physical AI needs to be trained digitally first. It a digital twin of itself, the policy model, and world, world model. In this paper, we present Cosmos World Foundation Model Platform help developers build customized models for their setups. We position foundation model as general-purpose that can fine-tuned into downstream applications. Our platform covers video curation pipeline, pre-trained models, examples post-training tokenizers. To builders solve most critical problems our society,...
We introduce GenUSD, an end-to-end text-to-scene generation framework that transforms natural language queries into realistic 3D scenes, including objects and layouts. The process involves two main steps: 1) A Large Language Model (LLM) generates a scene layout hierarchically. It first proposes high-level plan to decompose the multiple functionally spatially distinct subscenes. Then, for each subscene, LLM with detailed positions, poses, sizes, descriptions. To manage complex object...
We introduce Edify Image, a family of diffusion models capable generating photorealistic image content with pixel-perfect accuracy. Image utilizes cascaded pixel-space trained using novel Laplacian process, in which signals at different frequency bands are attenuated varying rates. supports wide range applications, including text-to-image synthesis, 4K upsampling, ControlNets, 360 HDR panorama generation, and finetuning for customization.