NFDI4DS | UHH-SEMS - Publication Details

Pooya Jannaty

ORCID: 0009-0009-8016-8156

Publications

Citations

Views

---

Saved

---

About

Contact & Profiles

A5105081962

Research Areas

Image Retrieval and Classification Techniques
Distributed and Parallel Computing Systems
Medical Image Segmentation Techniques
3D Surveying and Cultural Heritage
Image Processing Techniques and Applications

Nvidia (United States)
2024

Cosmos World Foundation Model Platform for Physical AI

OPENALEX - Publications

NVIDIA NULL AUTHOR_ID Niket Agarwal Adnan Ali Madhu Bala and 74 more

Physical AI needs to be trained digitally first. It a digital twin of itself, the policy model, and world, world model. In this paper, we present Cosmos World Foundation Model Platform help developers build customized models for their setups. We position foundation model as general-purpose that can fine-tuned into downstream applications. Our platform covers video curation pipeline, pre-trained models, examples post-training tokenizers. To builders solve most critical problems our society,...

10.48550/arxiv.2501.03575 preprint EN arXiv (Cornell University) 2025-01-07

GenUSD: 3D scene generation made easy

OPENALEX - Publications

Tsung-Yi Lin Chen-Hsuan Lin Yin Cui Yunhao Ge Seungjun Nah and 14 more

We introduce GenUSD, an end-to-end text-to-scene generation framework that transforms natural language queries into realistic 3D scenes, including objects and layouts. The process involves two main steps: 1) A Large Language Model (LLM) generates a scene layout hierarchically. It first proposes high-level plan to decompose the multiple functionally spatially distinct subscenes. Then, for each subscene, LLM with detailed positions, poses, sizes, descriptions. To manage complex object...

10.1145/3641520.3665306 article EN 2024-07-25

Edify Image: High-Quality Image Generation with Pixel Space Laplacian Diffusion Models

OPENALEX - Publications

Nvidia Nvidia NULL AUTHOR_ID Yuval Atzmon Madhu Bala Yogesh Balaji and 27 more

We introduce Edify Image, a family of diffusion models capable generating photorealistic image content with pixel-perfect accuracy. Image utilizes cascaded pixel-space trained using novel Laplacian process, in which signals at different frequency bands are attenuated varying rates. supports wide range applications, including text-to-image synthesis, 4K upsampling, ControlNets, 360 HDR panorama generation, and finetuning for customization.

10.48550/arxiv.2411.07126 preprint EN arXiv (Cornell University) 2024-11-11

Coming Soon ...