NFDI4DS | UHH-SEMS - Publication Details

Bo Dai

ORCID: 0000-0003-0777-9232

Publications

Citations

Views

---

Saved

---

About

Contact & Profiles

A5101990493

Research Areas

Generative Adversarial Networks and Image Synthesis
Human Pose and Action Recognition
Multimodal Machine Learning Applications
Advanced Vision and Imaging
Computer Graphics and Visualization Techniques
3D Shape Modeling and Analysis
Domain Adaptation and Few-Shot Learning
Advanced Image and Video Retrieval Techniques
Advanced Neural Network Applications
Video Analysis and Summarization
Human Motion and Animation
Advanced Image Processing Techniques
Gaussian Processes and Bayesian Inference
Reinforcement Learning in Robotics
Video Surveillance and Tracking Methods
Image and Signal Denoising Methods
Image Processing Techniques and Applications
3D Surveying and Cultural Heritage
Music and Audio Processing
Neural Networks and Applications
Anomaly Detection Techniques and Applications
Topic Modeling
Image Processing and 3D Reconstruction
Machine Learning and Algorithms
Stock Market Forecasting Methods

Beijing Academy of Artificial Intelligence
2022-2024

Shanghai Artificial Intelligence Laboratory
2022-2024

ShangHai JiAi Genetics & IVF Institute
2023-2024

University of Electronic Science and Technology of China
2009-2023

University of Shanghai for Science and Technology
2023

Google (United States)
2020-2022

State Key Laboratory of Mobile Networks and Mobile Multimedia Technology
2022

ZTE (China)
2022

Nanyang Technological University
2021-2022

China Mobile (China)
2022

Detecting Visual Relationships with Deep Relational Networks

OPENALEX - Publications

Bo Dai Yuqi Zhang Dahua Lin

Relationships among objects play a crucial role in image understanding. Despite the great success of deep learning techniques recognizing individual objects, reasoning about relationships remains challenging task. Previous methods often treat this as classification problem, considering each type relationship (e.g. ride) or distinct visual phrase person-ride-horse) category. Such approaches are faced with significant difficulties caused by high diversity appearance for kind large number...

10.1109/cvpr.2017.352 article EN 2017-07-01

Towards Diverse and Natural Image Descriptions via a Conditional GAN

OPENALEX - Publications

Bo Dai Sanja Fidler Raquel Urtasun Dahua Lin

Despite the substantial progress in recent years, image captioning techniques are still far from being perfect. Sentences produced by existing methods, e.g. those based on RNNs, often overly rigid and lacking variability. This issue is related to a learning principle widely used practice, that is, maximize likelihood of training samples. encourages high resemblance "ground-truth" captions, while suppressing other reasonable descriptions. Conventional evaluation metrics, BLEU METEOR, also...

10.1109/iccv.2017.323 article EN 2017-10-01

Temporal Pyramid Network for Action Recognition

OPENALEX - Publications

Ceyuan Yang Yinghao Xu Jianping Shi Bo Dai Bolei Zhou

Visual tempo characterizes the dynamics and temporal scale of an action. Modeling such visual tempos different actions facilitates their recognition. Previous works often capture through sampling raw videos at multiple rates constructing input-level frame pyramid, which usually requires a costly multi-branch network to handle. In this work we propose generic Temporal Pyramid Network (TPN) feature-level, can be flexibly integrated into 2D or 3D backbone networks in plug-and-play manner. Two...

10.1109/cvpr42600.2020.00067 article EN 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 2020-06-01

FineGym: A Hierarchical Video Dataset for Fine-Grained Action Understanding

OPENALEX - Publications

Dian Shao Yue Zhao Bo Dai Dahua Lin

On public benchmarks, current action recognition techniques have achieved great success. However, when used in real-world applications, e.g. sport analysis, which requires the capability of parsing an activity into phases and differentiating between subtly different actions, their performances remain far from being satisfactory. To take to a new level, we develop FineGym, dataset built on top gymnasium videos. Compared existing datasets, FineGym is distinguished richness, quality, diversity....

10.1109/cvpr42600.2020.00269 article EN 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 2020-06-01

Exploiting Deep Generative Prior for Versatile Image Restoration and Manipulation

OPENALEX - Publications

Xingang Pan Xiaohang Zhan Bo Dai Dahua Lin Chen Change Loy and 1 more

Learning a good image prior is long-term goal for restoration and manipulation. While existing methods like deep (DIP) capture low-level statistics, there are still gaps toward an that captures rich semantics including color, spatial coherence, textures, high-level concepts. This work presents effective way to exploit the captured by generative adversarial network (GAN) trained on large-scale natural images. As shown in Fig. 1, (DGP) provides compelling results restore missing semantics,...

10.1109/tpami.2021.3115428 article EN IEEE Transactions on Pattern Analysis and Machine Intelligence 2021-09-24

Generative Diffusion Prior for Unified Image Restoration and Enhancement

OPENALEX - Publications

Ben Fei Zhaoyang Lyu Liang Pan Junzhe Zhang Weidong Yang and 3 more

Existing image restoration methods mostly leverage the posterior distribution of natural images. However, they often assume known degradation and also require supervised training, which restricts their adaptation to complex real applications. In this work, we propose Generative Diffusion Prior (GDP) effectively model distributions in an unsupervised sampling manner. GDP utilizes a pre-train denoising diffusion generative (DDPM) for solving linear inverse, non-linear, or blind problems....

10.1109/cvpr52729.2023.00958 article EN 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 2023-06-01

Learning Hierarchical Cross-Modal Association for Co-Speech Gesture Generation

OPENALEX - Publications

Xian Liu Qianyi Wu Hang Zhou Yinghao Xu Rui Qian and 5 more

Generating speech-consistent body and gesture movements is a long-standing problem in virtual avatar creation. Previous studies often synthesize pose movement holistic manner, where poses of all joints are generated simultaneously. Such straightforward pipeline fails to generate fine-grained co-speech gestures. One observation that the hierarchical semantics speech structures human gestures can be naturally described into multiple granularities associated together. To fully utilize rich...

10.1109/cvpr52688.2022.01021 article EN 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 2022-06-01

Scaffold-GS: Structured 3D Gaussians for View-Adaptive Rendering

OPENALEX - Publications

Tao Lü Mulin Yu Linning Xu Yuanbo Xiangli Limin Wang and 2 more

10.1109/cvpr52733.2024.01952 article EN 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 2024-06-16

Self-Supervised Scene De-Occlusion

OPENALEX - Publications

Xiaohang Zhan Xingang Pan Bo Dai Ziwei Liu Dahua Lin and 1 more

Natural scene understanding is a challenging task, particularly when encountering images of multiple objects that are partially occluded. This obstacle given rise by varying object ordering and positioning. Existing paradigms able to parse only the visible parts, resulting in incomplete unstructured interpretation. In this paper, we investigate problem de-occlusion, which aims recover underlying occlusion complete invisible parts occluded objects. We make first attempt address through novel...

10.1109/cvpr42600.2020.00384 article EN 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 2020-06-01

Recursive Visual Sound Separation Using Minus-Plus Net

OPENALEX - Publications

Xudong Xu Bo Dai Dahua Lin

Sounds provide rich semantics, complementary to visual data, for many tasks. However, in practice, sounds from multiple sources are often mixed together. In this paper we propose a novel framework, referred as MinusPlus Network (MP-Net), the task of sound separation. MP-Net separates recursively order average energy, removing separated mixture at end each prediction, until becomes empty or contains only noise. way, could be applied mixtures with arbitrary numbers and types sounds. Moreover,...

10.1109/iccv.2019.00097 article EN 2021 IEEE/CVF International Conference on Computer Vision (ICCV) 2019-10-01

Contrastive Learning for Image Captioning

OPENALEX - Publications

Bo Dai Dahua Lin

Image captioning, a popular topic in computer vision, has achieved substantial progress recent years. However, the distinctiveness of natural descriptions is often overlooked previous work. It closely related to quality captions, as distinctive captions are more likely describe images with their unique aspects. In this work, we propose new learning method, Contrastive Learning (CL), for image captioning. Specifically, via two constraints formulated on top reference model, proposed method can...

10.48550/arxiv.1710.02534 preprint EN other-oa arXiv (Cornell University) 2017-01-01

Unsupervised 3D Shape Completion through GAN Inversion

OPENALEX - Publications

Junzhe Zhang Xinyi Chen Zhongang Cai Liang Pan Haiyu Zhao and 4 more

Most 3D shape completion approaches rely heavily on partial-complete pairs and learn in a fully super-vised manner. Despite their impressive performances in-domain data, when generalizing to partial shapes other forms or real-world scans, they often obtain unsatisfactory results due domain gaps. In contrast previous supervised approaches, this paper we present ShapeInversion, which introduces Generative Adversarial Network (GAN) inversion for the first time. ShapeInversion uses GAN...

10.1109/cvpr46437.2021.00181 article EN 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 2021-06-01

TransEditor: Transformer-Based Dual-Space GAN for Highly Controllable Facial Editing

OPENALEX - Publications

Yanbo Xu Yueqin Yin Liming Jiang Qianyi Wu Chengyao Zheng and 3 more

Recent advances like StyleGAN have promoted the growth of controllable facial editing. To address its core challenge attribute decoupling in a single latent space, attempts been made to adopt dual-space GAN for better disentanglement style and content representations. Nonetheless, these methods are still incompetent obtain plausible editing results with high controllability, especially complicated attributes. In this study, we highlight importance interaction more We propose TransEditor,...

10.1109/cvpr52688.2022.00753 article EN 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 2022-06-01

DiffBIR: Towards Blind Image Restoration with Generative Diffusion Prior

OPENALEX - Publications

Xinqi Lin Jingwen He Ziyan Chen Zhaoyang Lyu Ben Fei and 4 more

We present DiffBIR, a general restoration pipeline that could handle different blind image tasks in unified framework. DiffBIR decouples problem into two stages: 1) degradation removal: removing image-independent content; 2) information regeneration: generating the lost content. Each stage is developed independently but they work seamlessly cascaded manner. In first stage, we use modules to remove degradations and obtain high-fidelity restored results. For second propose IRControlNet...

10.48550/arxiv.2308.15070 preprint EN public-domain arXiv (Cornell University) 2023-01-01

Prototype-Based Embedding Network for Scene Graph Generation

OPENALEX - Publications

Chaofan Zheng Xinyu Lyu Lianli Gao Bo Dai Jingkuan Song

Current Scene Graph Generation (SGG) methods explore contextual information to predict relationships among entity pairs. However, due the diverse visual appearance of numerous possible subject-object combinations, there is a large intra-class variation within each predicate category, e.g., "man-eating-pizza, giraffe-eating-leaf", and severe inter-class similarity between different classes, "man-holding-plate, man-eating-pizza", in model's latent space. The above challenges prevent current...

10.1109/cvpr52729.2023.02182 article EN 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 2023-06-01

LAVIE: High-Quality Video Generation with Cascaded Latent Diffusion Models

OPENALEX - Publications

Yaohui Wang Xinyuan Chen Xin Ma Shangchen Zhou Ziqi Huang and 15 more

This work aims to learn a high-quality text-to-video (T2V) generative model by leveraging pre-trained text-to-image (T2I) as basis. It is highly desirable yet challenging task simultaneously a) accomplish the synthesis of visually realistic and temporally coherent videos while b) preserving strong creative generation nature T2I model. To this end, we propose LaVie, an integrated video framework that operates on cascaded latent diffusion models, comprising base T2V model, temporal...

10.48550/arxiv.2309.15103 preprint EN other-oa arXiv (Cornell University) 2023-01-01

LaVie: High-Quality Video Generation with Cascaded Latent Diffusion Models

OPENALEX - Publications

Yaohui Wang Xinyuan Chen Xin Ma Shangchen Zhou Ziqi Huang and 15 more

10.1007/s11263-024-02295-1 article EN International Journal of Computer Vision 2024-12-23

Towards Diverse and Natural Image Descriptions via a Conditional GAN

OPENALEX - Publications

Bo Dai Sanja Fidler Raquel Urtasun Dahua Lin

Despite the substantial progress in recent years, image captioning techniques are still far from being perfect.Sentences produced by existing methods, e.g. those based on RNNs, often overly rigid and lacking variability. This issue is related to a learning principle widely used practice, that is, maximize likelihood of training samples. encourages high resemblance "ground-truth" captions while suppressing other reasonable descriptions. Conventional evaluation metrics, BLEU METEOR, also favor...

10.48550/arxiv.1703.06029 preprint EN other-oa arXiv (Cornell University) 2017-01-01

Detecting Visual Relationships with Deep Relational Networks

OPENALEX - Publications

Bo Dai Yuqi Zhang Dahua Lin

Relationships among objects play a crucial role in image understanding. Despite the great success of deep learning techniques recognizing individual objects, reasoning about relationships remains challenging task. Previous methods often treat this as classification problem, considering each type relationship (e.g. "ride") or distinct visual phrase "person-ride-horse") category. Such approaches are faced with significant difficulties caused by high diversity appearance for kind large number...

10.48550/arxiv.1704.03114 preprint EN other-oa arXiv (Cornell University) 2017-01-01

Coming Soon ...