NFDI4DS | UHH-SEMS - Publication Details

Yikai Wang

ORCID: 0000-0003-1341-6235

Publications

Citations

Views

---

Saved

---

About

Contact & Profiles

A5100747441

Research Areas

Advanced Neural Network Applications
Domain Adaptation and Few-Shot Learning
Advanced Vision and Imaging
Computer Graphics and Visualization Techniques
Robotics and Sensor-Based Localization
Adversarial Robustness in Machine Learning
3D Shape Modeling and Analysis
Neural dynamics and brain function
Blind Source Separation Techniques
Brain Tumor Detection and Classification
Visual Attention and Saliency Detection
Advanced Image Processing Techniques
Generative Adversarial Networks and Image Synthesis
3D Surveying and Cultural Heritage
Multimodal Machine Learning Applications
Video Surveillance and Tracking Methods
COVID-19 diagnosis using AI
Functional Brain Connectivity Studies
Image Processing and 3D Reconstruction
Sentiment Analysis and Opinion Mining
Machine Learning and Data Classification
Tactile and Sensory Interactions
Adaptive Dynamic Programming Control
Music and Audio Processing
Direction-of-Arrival Estimation Techniques

Tsinghua University
2004-2025

Emory University
2018-2024

Soochow University
2024

China Mobile (China)
2023

Shandong University of Science and Technology
2023

Zhejiang University of Science and Technology
2022

PRG S&Tech (South Korea)
2021

Air Force Medical University
2019

Institute of Seismology
2018

University of Electronic Science and Technology of China
2014

Multimodal Token Fusion for Vision Transformers

OPENALEX - Publications

Yikai Wang Xinghao Chen Lele Cao Wenbing Huang Fuchun Sun and 1 more

Many adaptations of transformers have emerged to address the single-modal vision tasks, where self-attention modules are stacked handle input sources like images. Intuitively, feeding multiple modalities data could improve performance, yet innermodal attentive weights may be diluted, which thus greatly undermine final performance. In this paper, we propose a multimodal token fusion method (TokenFusion), tailored for transformer-based tasks. To effectively fuse modalities, TokenFusion...

10.1109/cvpr52688.2022.01187 article EN 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 2022-06-01

Text-to-3D using Gaussian Splatting

OPENALEX - Publications

Zilong Chen Feng Wang Yikai Wang Huaping Liu

10.1109/cvpr52733.2024.02022 article EN 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 2024-06-16

Deep Multimodal Fusion by Channel Exchanging

OPENALEX - Publications

Yikai Wang Wenbing Huang Fuchun Sun Tingyang Xu Yu Rong and 1 more

Deep multimodal fusion by using multiple sources of data for classification or regression has exhibited a clear advantage over the unimodal counterpart on various applications. Yet, current methods including aggregation-based and alignment-based are still inadequate in balancing trade-off between inter-modal intra-modal processing, incurring bottleneck performance improvement. To this end, paper proposes Channel-Exchanging-Network (CEN), parameter-free framework that dynamically exchanges...

10.48550/arxiv.2011.05005 preprint EN other-oa arXiv (Cornell University) 2020-01-01

Benchmarking Robustness of 3D Object Detection to Common Corruptions in Autonomous Driving

OPENALEX - Publications

Yinpeng Dong Caixin Kang Jinlai Zhang Zijian Zhu Yikai Wang and 4 more

3D object detection is an important task in autonomous driving to perceive the surroundings. Despite excellent performance, existing detectors lack robustness real-world corruptions caused by adverse weathers, sensor noises, etc., provoking concerns about safety and reliability of systems. To comprehensively rigorously benchmark corruption detectors, this paper we design 27 types common for both LiDAR camera inputs considering realworld scenarios. By synthesizing these on public datasets,...

10.1109/cvpr52729.2023.00105 article EN 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 2023-06-01

BCI Competition 2003—Data Set IV: An Algorithm Based on CSSD and FDA for Classifying Single-Trial EEG

OPENALEX - Publications

Yikai Wang Zhewei Zhang Yue Li Xin Gao Sheng Gao and 1 more

This paper presents an algorithm for classifying single-trial electroencephalogram (EEG) during the preparation of self-paced tapping. It combines common spatial subspace decomposition with Fisher discriminant analysis to extract features from multichannel EEG. Three are obtained based on Bereitschaftspotential and event-related desynchronization. Finally, a perceptron neural network is trained as classifier. was applied data set <self-paced 1s> "BCI Competition 2003" classification accuracy...

10.1109/tbme.2004.826697 article EN IEEE Transactions on Biomedical Engineering 2004-05-25

Bridged Transformer for Vision and Point Cloud 3D Object Detection

OPENALEX - Publications

Yikai Wang TengQi Ye Lele Cao Wenbing Huang Fuchun Sun and 2 more

3D object detection is a crucial research topic in computer vision, which usually uses point clouds as input conventional setups. Recently, there trend of leveraging multiple sources data, such complementing the cloud with 2D images that often have richer color and fewer noises. However, due to heterogeneous geometrics representations, it prevents us from applying off-the-shelf neural networks achieve multimodal fusion. To end, we propose Bridged Transformer (BrT), an end-to-end architecture...

10.1109/cvpr52688.2022.01180 article EN 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 2022-06-01

Towards Effective Adversarial Textured 3D Meshes on Physical Face Recognition

OPENALEX - Publications

Xiao Yang Chang Liu Longlong Xu Yikai Wang Yinpeng Dong and 3 more

Face recognition is a prevailing authentication solution in numerous biometric applications. Physical adversarial attacks, as an important surrogate, can identify the weak-nesses of face systems and evaluate their ro-bustness before deployed. However, most existing physical attacks are either detectable readily or ineffective against commercial systems. The goal this work to develop more reliable technique that carry out end-to-end evaluation robustness for It requires simultaneously deceive...

10.1109/cvpr52729.2023.00401 article EN 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 2023-06-01

Adaptive Pruning of Pretrained Transformer via Differential Inclusions

OPENALEX - Publications

Yizhuo Ding Ke Fan Yikai Wang Xinwei Sun Yanwei Fu

Large transformers have demonstrated remarkable success, making it necessary to compress these models reduce inference costs while preserving their perfor-mance. Current compression algorithms prune at fixed ratios, requiring a unique pruning process for each ratio, which results in high computational costs. In contrast, we propose of pretrained any desired ratio within single stage, based on differential inclusion mask parameter. This dynamic can generate the whole regularization solution...

10.48550/arxiv.2501.03289 preprint EN arXiv (Cornell University) 2025-01-06

Video4DGen: Enhancing Video and 4D Generation through Mutual Optimization

OPENALEX - Publications

Yikai Wang G. D. Liu Xinzhou Wang Zilong Chen Jiafang Li and 3 more

The advancement of 4D (i.e., sequential 3D) generation opens up new possibilities for lifelike experiences in various applications, where users can explore dynamic objects or characters from any viewpoint. Meanwhile, video generative models are receiving particular attention given their ability to produce realistic and imaginative frames. These also observed exhibit strong 3D consistency, indicating the potential act as world simulators. In this work, we present Video4DGen, a novel framework...

10.1109/tpami.2025.3550031 article EN IEEE Transactions on Pattern Analysis and Machine Intelligence 2025-01-01

Equivariant Local Reference Frames with Optimization for Robust Non-rigid Point Cloud Correspondence

OPENALEX - Publications

Ling Wang Runfa Chen Fuchun Sun Xinzhou Wang Kai Sun and 3 more

Unsupervised non-rigid point cloud shape correspondence underpins a multitude of 3D vision tasks, yet itself is non-trivial given the exponential complexity stemming from inter-point degree-of-freedom, i.e., pose transformations. Based on assumption local rigidity, one solution for reducing to decompose overall into independent regions using Local Reference Frames (LRFs) that are equivariant SE(3) However, focus solely structure neglects global geometric contexts, resulting in less...

10.1109/tip.2025.3550006 article EN IEEE Transactions on Image Processing 2025-01-01

FB-4D: Spatial-Temporal Coherent Dynamic 3D Content Generation with Feature Banks

OPENALEX - Publications

Jinwei Li Huan-ang Gao Wenyi Li Heng Chi Chenyu Liu and 11 more

With the rapid advancements in diffusion models and 3D generation techniques, dynamic content has become a crucial research area. However, achieving high-fidelity 4D (dynamic 3D) with strong spatial-temporal consistency remains challenging task. Inspired by recent findings that pretrained features capture rich correspondences, we propose FB-4D, novel framework integrates Feature Bank mechanism to enhance both spatial temporal generated frames. In store extracted from previous frames fuse...

10.48550/arxiv.2503.20784 preprint EN arXiv (Cornell University) 2025-03-26

Learning Deep Multimodal Feature Representation with Asymmetric Multi-layer Fusion

OPENALEX - Publications

Yikai Wang Fuchun Sun Ming Lu Anbang Yao

We propose a compact and effective framework to fuse multimodal features at multiple layers in single network. The consists of two innovative fusion schemes. Firstly, unlike existing methods that necessitate individual encoders for different modalities, we verify can be learnt within shared network by merely maintaining modality-specific batch normalization the encoder, which also enables implicit via joint feature representation learning. Secondly, bidirectional multi-layer scheme, where...

10.1145/3394171.3413621 article EN Proceedings of the 30th ACM International Conference on Multimedia 2020-10-12

Channel Exchanging Networks for Multimodal and Multitask Dense Image Prediction

OPENALEX - Publications

Yikai Wang Fuchun Sun Wenbing Huang Fengxiang He Dacheng Tao

Multimodal fusion and multitask learning are two vital topics in machine learning. Despite the fruitful progress, existing methods for both problems still brittle to same challenge-it remains dilemmatic integrate common information across modalities (resp. tasks) meanwhile preserving specific patterns of each modality task). Besides, while they actually closely related other, multimodal rarely explored within methodological framework before. In this paper, we propose...

10.1109/tpami.2022.3211086 article EN IEEE Transactions on Pattern Analysis and Machine Intelligence 2022-09-30

Fine-Grained Multilevel Fusion for Anti-Occlusion Monocular 3D Object Detection

OPENALEX - Publications

He Liu Huaping Liu Yikai Wang Fuchun Sun Wenbing Huang

We propose a deep fine-grained multi-level fusion architecture for monocular 3D object detection, with an additionally designed anti-occlusion optimization process. Conventional detection methods usually leverage geometry constraints such as keypoints, shape relationships, and to 2D optimizations offset the lack of accurate depth information. However, these still struggle against directly extracting rich information from estimation. To solve problem, we integrate features pseudo-LiDAR filter...

10.1109/tip.2022.3180210 article EN IEEE Transactions on Image Processing 2022-01-01

Sub-bit Neural Networks: Learning to Compress and Accelerate Binary Neural Networks

OPENALEX - Publications

Yikai Wang Yi Yang Fuchun Sun Anbang Yao

In the low-bit quantization field, training Binarized Neural Networks (BNNs) is extreme solution to ease deployment of deep models on resource-constrained devices, having lowest storage cost and significantly cheaper bit-wise operations compared 32-bit floating-point counterparts. this paper, we introduce Sub-bit (SNNs), a new type binary design tailored compress accelerate BNNs. SNNs are inspired by an empirical observation, showing that kernels learnt at convolutional layers BNN model...

10.1109/iccv48922.2021.00531 article EN 2021 IEEE/CVF International Conference on Computer Vision (ICCV) 2021-10-01

Regularized Adversarial Sampling and Deep Time-aware Attention for Click-Through Rate Prediction

OPENALEX - Publications

Yikai Wang Liang Zhang Quanyu Dai Fuchun Sun Bo Zhang and 3 more

Improving the performance of click-through rate (CTR) prediction remains one core tasks in online advertising systems. With rise deep learning, CTR models with networks remarkably enhance model capacities. In models, exploiting users' historical data is essential for learning behaviors and interests. As existing works neglect importance temporal signals when embed clicking records, we propose a time-aware attention which explicitly uses absolute expressing periodic relative relation between...

10.1145/3357384.3357936 preprint EN 2019-11-03

Vidu4D: Single Generated Video to High-Fidelity 4D Reconstruction with Dynamic Gaussian Surfels

OPENALEX - Publications

Yikai Wang Xinzhou Wang Zilong Chen Zhengyi Wang Fuchun Sun and 1 more

Video generative models are receiving particular attention given their ability to generate realistic and imaginative frames. Besides, these also observed exhibit strong 3D consistency, significantly enhancing potential act as world simulators. In this work, we present Vidu4D, a novel reconstruction model that excels in accurately reconstructing 4D (i.e., sequential 3D) representations from single generated videos, addressing challenges associated with non-rigidity frame distortion. This...

10.48550/arxiv.2405.16822 preprint EN arXiv (Cornell University) 2024-05-27

Real-time visual static hand gesture recognition system and its FPGA-based hardware implementation

OPENALEX - Publications

Ran Wang Zhishuai Yu Minghang Liu Yikai Wang Yuchun Chang

Gesture recognition has been paid more and attention as a new generation of human-computer interaction visual input mode. In motion sensing games other applications, gesture is used the interface. However, because its inherent features such diversity, ambiguity, space-time difference large computational burden, it difficult to achieve real-time application with software, especially in embedded system. Therefore, this paper we propose hardware-based system well innovative algorithm...

10.1109/icosp.2014.7015043 article EN 2014-10-01

Elastic Tactile Simulation Towards Tactile-Visual Perception

OPENALEX - Publications

Yikai Wang Wenbing Huang Bin Fang Fuchun Sun Chang Li

Tactile sensing plays an important role in robotic perception and manipulation tasks. To overcome the real-world limitations of data collection, simulating tactile response a virtual environment comes as desirable direction research. In this paper, we propose Elastic Interaction Particles (EIP) for simulation, which is capable reflecting elastic property sensor well characterizing fine-grained physical interaction during contact. Specifically, EIP models group coordinated particles, applied...

10.1145/3474085.3475414 article EN Proceedings of the 30th ACM International Conference on Multimedia 2021-10-17

ReconX: Reconstruct Any Scene from Sparse Views with Video Diffusion Model

OPENALEX - Publications

Fangfu Liu Wenqiang Sun Hanyang Wang Yikai Wang Haowen Sun and 3 more

Advancements in 3D scene reconstruction have transformed 2D images from the real world into models, producing realistic results hundreds of input photos. Despite great success dense-view scenarios, rendering a detailed insufficient captured views is still an ill-posed optimization problem, often resulting artifacts and distortions unseen areas. In this paper, we propose ReconX, novel paradigm that reframes ambiguous challenge as temporal generation task. The key insight to unleash strong...

10.48550/arxiv.2408.16767 preprint EN arXiv (Cornell University) 2024-08-29

Sound Adversarial Audio-Visual Navigation

OPENALEX - Publications

Yingfeng Yu Wenbing Huang Fuchun Sun Changan Chen Yikai Wang and 1 more

Audio-visual navigation task requires an agent to find a sound source in realistic, unmapped 3D environment by utilizing egocentric audio-visual observations. Existing works assume clean that solely contains the target sound, which, however, would not be suitable most real-world applications due unexpected noise or intentional interference. In this work, we design acoustically complex besides there exists attacker playing zero-sum game with agent. More specifically, can move and change...

10.48550/arxiv.2202.10910 preprint EN other-oa arXiv (Cornell University) 2022-01-01

Root Pose Decomposition Towards Generic Non-rigid 3D Reconstruction with Monocular Videos

OPENALEX - Publications

Yikai Wang Yinpeng Dong Fuchun Sun Xiao Yang

This work focuses on the 3D reconstruction of non-rigid objects based monocular RGB video sequences. Concretely, we aim at building high-fidelity models for generic object categories and casually captured scenes. To this end, do not assume known root poses objects, utilize category-specific templates or dense pose priors. The key idea our method, Root Pose Decomposition (RPD), is to maintain a per-frame transformation, meanwhile field with local transformations rectify pose. optimization...

10.1109/iccv51070.2023.01277 article EN 2021 IEEE/CVF International Conference on Computer Vision (ICCV) 2023-10-01

Coming Soon ...