NFDI4DS | UHH-SEMS - Publication Details

Jiahui Yu

ORCID: 0000-0003-1314-2481

Publications

Citations

Views

---

Saved

---

About

Contact & Profiles

A5001205723

Research Areas

Advanced Image Processing Techniques
Advanced Neural Network Applications
Domain Adaptation and Few-Shot Learning
Speech and Audio Processing
Image and Signal Denoising Methods
Human Pose and Action Recognition
Speech Recognition and Synthesis
Music and Audio Processing
Advanced Image and Video Retrieval Techniques
Anomaly Detection Techniques and Applications
Image Processing Techniques and Applications
Generative Adversarial Networks and Image Synthesis
Video Surveillance and Tracking Methods
Granular flow and fluidized beds
Microbial Natural Products and Biosynthesis
Advanced Vision and Imaging
Hand Gesture Recognition Systems
Sparse and Compressive Sensing Techniques
Thermochemical Biomass Conversion Processes
Gait Recognition and Analysis
Plant biochemistry and biosynthesis
Machine Learning and ELM
Adversarial Robustness in Machine Learning
Face and Expression Recognition
Computational Drug Discovery Methods

Zhejiang University
2021-2025

East China Normal University
2025

Google (United States)
2020-2024

DeepMind (United Kingdom)
2024

State Key Laboratory of Clean Energy Utilization
2021-2024

Shandong Institute of Business and Technology
2024

Chinese University of Hong Kong, Shenzhen
2021-2024

Nanjing University of Posts and Telecommunications
2024

North University of China
2024

Children's Hospital of Zhejiang University
2023-2024

Generative Image Inpainting with Contextual Attention

OPENALEX - Publications

Jiahui Yu Zhe Lin Shuicheng Yan Xiaohui Shen Xin Lu and 1 more

Recent deep learning based approaches have shown promising results for the challenging task of inpainting large missing regions in an image. These methods can generate visually plausible image structures and textures, but often create distorted or blurry textures inconsistent with surrounding areas. This is mainly due to ineffectiveness convolutional neural networks explicitly borrowing copying information from distant spatial locations. On other hand, traditional texture patch synthesis are...

10.1109/cvpr.2018.00577 article EN 2018-06-01

Conformer: Convolution-augmented Transformer for Speech Recognition

OPENALEX - Publications

Anmol Gulati James Qin Chung‐Cheng Chiu Niki Parmar Yu Zhang and 6 more

Recently Transformer and Convolution neural network (CNN) based models have shown promising results in Automatic Speech Recognition (ASR), outperforming Recurrent networks (RNNs).Transformer are good at capturing content-based global interactions, while CNNs exploit local features effectively.In this work, we achieve the best of both worlds by studying how to combine convolution transformers model dependencies an audio sequence a parameter-efficient way.To regard, propose...

10.21437/interspeech.2020-3015 article EN Interspeech 2022 2020-10-25

Free-Form Image Inpainting With Gated Convolution

OPENALEX - Publications

Jiahui Yu Zhe Lin Shuicheng Yan Xiaohui Shen Xin Lu and 1 more

We present a generative image inpainting system to complete images with free-form mask and guidance. The is based on gated convolutions learned from millions of without additional labelling efforts. proposed convolution solves the issue vanilla that treats all input pixels as valid ones, generalizes partial by providing learnable dynamic feature selection mechanism for each channel at spatial location across layers. Moreover, masks may appear anywhere in any shape, global local GANs designed...

10.1109/iccv.2019.00457 article EN 2021 IEEE/CVF International Conference on Computer Vision (ICCV) 2019-10-01

NTIRE 2017 Challenge on Single Image Super-Resolution: Methods and Results

OPENALEX - Publications

Radu Timofte Eirikur Agustsson Luc Van Gool Shuicheng Yan Lei Zhang and 72 more

This paper reviews the first challenge on single image super-resolution (restoration of rich details in an low resolution image) with focus proposed solutions and results. A new DIVerse 2K dataset (DIV2K) was employed. The had 6 competitions divided into 2 tracks 3 magnification factors each. Track 1 employed standard bicubic downscaling setup, while unknown operators (blur kernel decimation) but learnable through high res train images. Each competition ∽100 registered participants 20 teams...

10.1109/cvprw.2017.149 article EN 2017-07-01

UnitBox

OPENALEX - Publications

Jiahui Yu Yuning Jiang Zhangyang Wang Zhimin Cao Thomas S. Huang

In present object detection systems, the deep convolutional neural networks (CNNs) are utilized to predict bounding boxes of candidates, and have gained performance advantages over traditional region proposal methods. However, existing CNN methods assume bounds be four independent variables, which could regressed by $\ell_2$ loss separately. Such an oversimplified assumption is contrary well-received observation, that those variables correlated, resulting less accurate localization. To...

10.1145/2964284.2967274 article EN Proceedings of the 30th ACM International Conference on Multimedia 2016-09-29

Foreground-Aware Image Inpainting

OPENALEX - Publications

Wei Xiong Jiahui Yu Zhe Lin Shuicheng Yan Xin Lu and 2 more

Existing image inpainting methods typically fill holes by borrowing information from surrounding pixels. They often produce unsatisfactory results when the overlap with or touch foreground objects due to lack of about actual extent and background regions within holes. These scenarios, however, are very important in practice, especially for applications such as distracting object removal. To address problem, we propose a foreground-aware system that explicitly disentangles structure inference...

10.1109/cvpr.2019.00599 article EN 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 2019-06-01

NTIRE 2018 Challenge on Single Image Super-Resolution: Methods and Results

OPENALEX - Publications

Radu Timofte Shuhang Gu Jiqing Wu Luc Van Gool Lei Zhang and 95 more

This paper reviews the 2nd NTIRE challenge on single image super-resolution (restoration of rich details in a low resolution image) with focus proposed solutions and results. The had 4 tracks. Track 1 employed standard bicubic downscaling setup, while Tracks 2, 3 realistic unknown downgrading operators simulating camera acquisition pipeline. were learnable through provided pairs high train images. tracks 145, 114, 101, 113 registered participants, resp., 31 teams competed final testing...

10.1109/cvprw.2018.00130 article EN 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW) 2018-06-01

Universally Slimmable Networks and Improved Training Techniques

OPENALEX - Publications

Jiahui Yu Thomas S. Huang

Slimmable networks are a family of neural that can instantly adjust the runtime width. The width be chosen from predefined widths set to adaptively optimize accuracy-efficiency trade-offs at runtime. In this work, we propose systematic approach train universally slimmable (US-Nets), extending execute arbitrary width, and generalizing both with without batch normalization layers. We further two improved training techniques for US-Nets, named sandwich rule inplace distillation, enhance process...

10.1109/iccv.2019.00189 article EN 2021 IEEE/CVF International Conference on Computer Vision (ICCV) 2019-10-01

Wide Activation for Efficient and Accurate Image Super-Resolution

OPENALEX - Publications

Jiahui Yu Yuchen Fan Shuicheng Yan Ning Xu Zhaowen Wang and 2 more

In this report we demonstrate that with same parameters and computational budgets, models wider features before ReLU activation have significantly better performance for single image super-resolution (SISR). The resulted SR residual network has a slim identity mapping pathway ($2\times$ to $4\times$) channels in each block. To further widen ($6\times$ $9\times$) without overhead, introduce linear low-rank convolution into networks achieve even accuracy-efficiency tradeoffs. addition,...

10.48550/arxiv.1808.08718 preprint EN other-oa arXiv (Cornell University) 2018-01-01

SimVLM: Simple Visual Language Model Pretraining with Weak Supervision

OPENALEX - Publications

Zirui Wang Jiahui Yu Adams Wei Yu Zihang Dai Yulia Tsvetkov and 1 more

With recent progress in joint modeling of visual and textual representations, Vision-Language Pretraining (VLP) has achieved impressive performance on many multimodal downstream tasks. However, the requirement for expensive annotations including clean image captions regional labels limits scalability existing approaches, complicates pretraining procedure with introduction multiple dataset-specific objectives. In this work, we relax these constraints present a minimalist framework, named...

10.48550/arxiv.2108.10904 preprint EN other-oa arXiv (Cornell University) 2021-01-01

Conformer: Convolution-augmented Transformer for Speech Recognition

OPENALEX - Publications

Anmol Gulati James Qin Chung‐Cheng Chiu Niki Parmar Yu Zhang and 6 more

Recently Transformer and Convolution neural network (CNN) based models have shown promising results in Automatic Speech Recognition (ASR), outperforming Recurrent networks (RNNs). are good at capturing content-based global interactions, while CNNs exploit local features effectively. In this work, we achieve the best of both worlds by studying how to combine convolution transformers model dependencies an audio sequence a parameter-efficient way. To regard, propose convolution-augmented...

10.48550/arxiv.2005.08100 preprint EN other-oa arXiv (Cornell University) 2020-01-01

ContextNet: Improving Convolutional Neural Networks for Automatic Speech Recognition with Global Context

OPENALEX - Publications

Wei Han Zhengdong Zhang Yu Zhang Jiahui Yu Chung‐Cheng Chiu and 4 more

Convolutional neural networks (CNN) have shown promising results for end-to-end speech recognition, albeit still behind RNN/transformer based models in performance.In this paper, we study how to bridge gap and go beyond with a novel CNN-RNN-transducer architecture, which call ContextNet.ContextNet features fully convolutional encoder that incorporates global context information into convolution layers by adding squeeze-and-excitation modules.In addition, propose simple scaling method scales...

10.21437/interspeech.2020-2059 article EN Interspeech 2022 2020-10-25

Generative Image Inpainting with Contextual Attention

OPENALEX - Publications

Jiahui Yu Zhe Lin Shuicheng Yan Xiaohui Shen Xin Lu and 1 more

10.48550/arxiv.1801.07892 preprint EN other-oa arXiv (Cornell University) 2018-01-01

Generative Adversarial Networks for Image and Video Synthesis: Algorithms and Applications

OPENALEX - Publications

Ming-Yu Liu Xun Huang Jiahui Yu Ting-Chun Wang Arun Mallya

The generative adversarial network (GAN) framework has emerged as a powerful tool for various image and video synthesis tasks, allowing the of visual content in an unconditional or input-conditional manner. It enabled generation high-resolution photorealistic images videos, task that was challenging impossible with prior methods. also led to creation many new applications creation. In this article, we provide overview GANs special focus on algorithms synthesis. We cover several important...

10.1109/jproc.2021.3049196 article EN Proceedings of the IEEE 2021-02-02

AutoSlim: Towards One-Shot Architecture Search for Channel Numbers

OPENALEX - Publications

Jiahui Yu Thomas S. Huang

We study how to set channel numbers in a neural network achieve better accuracy under constrained resources (e.g., FLOPs, latency, memory footprint or model size). A simple and one-shot solution, named AutoSlim, is presented. Instead of training many samples searching with reinforcement learning, we train single slimmable approximate the different configurations. then iteratively evaluate trained greedily slim layer minimal drop. By this pass, can obtain optimized configurations resource...

10.48550/arxiv.1903.11728 preprint EN other-oa arXiv (Cornell University) 2019-01-01

BigSSL: Exploring the Frontier of Large-Scale Semi-Supervised Learning for Automatic Speech Recognition

OPENALEX - Publications

Yu Zhang Daniel Park Wei Han James Qin Anmol Gulati and 21 more

We summarize the results of a host efforts using giant automatic speech recognition (ASR) models pre-trained large, diverse unlabeled datasets containing approximately million hours audio. find that combination pre-training, self-training and scaling up model size greatly increases data efficiency, even for extremely large tasks with tens thousands labeled data. In particular, on an ASR task 34k data, by fine-tuning 8 billion parameter Conformer we can match state-of-the-art (SoTA)...

10.1109/jstsp.2022.3182537 article EN IEEE Journal of Selected Topics in Signal Processing 2022-06-13

Pyramid Attention Network for Image Restoration

OPENALEX - Publications

Yiqun Mei Yuchen Fan Yulun Zhang Jiahui Yu Yuqian Zhou and 4 more

Abstract Self-similarity refers to the image prior widely used in restoration algorithms that small but similar patterns tend occur at different locations and scales. However, recent advanced deep convolutional neural network-based methods for do not take full advantage of self-similarities by relying on self-attention modules only process information same scale. To solve this problem, we present a novel Pyramid Attention module restoration, which captures long-range feature correspondences...

10.1007/s11263-023-01843-5 article EN cc-by International Journal of Computer Vision 2023-08-08

Slimmable Neural Networks

OPENALEX - Publications

Jiahui Yu Linjie Yang Ning Xu Shuicheng Yan Thomas S. Huang

We present a simple and general method to train single neural network executable at different widths (number of channels in layer), permitting instant adaptive accuracy-efficiency trade-offs runtime. Instead training individual networks with width configurations, we shared switchable batch normalization. At runtime, the can adjust its on fly according on-device benchmarks resource constraints, rather than downloading offloading models. Our trained networks, named slimmable achieve similar...

10.48550/arxiv.1812.08928 preprint EN other-oa arXiv (Cornell University) 2018-01-01

Engineered monoculture and co-culture of methylotrophic yeast for de novo production of monacolin J and lovastatin from methanol

OPENALEX - Publications

Yiqi Liu Xiaohu Tu Qin Xu Chenxiao Bai Chuixing Kong and 6 more

10.1016/j.ymben.2017.12.009 article EN Metabolic Engineering 2017-12-16

Balanced Two-Stage Residual Networks for Image Super-Resolution

OPENALEX - Publications

Yuchen Fan Humphrey Shi Jiahui Yu Ding Liu Wei Han and 4 more

In this paper, balanced two-stage residual networks (BTSRN) are proposed for single image super-resolution. The deep design with constrained depth achieves the optimal balance between accuracy and speed super-resolving images. experiments show that structure, together our lightweight two-layer PConv block design, very promising results when considering both speed. We evaluated models on New Trends in Image Restoration Enhancement workshop challenge super-resolution (NTIRE SR 2017). Our final...

10.1109/cvprw.2017.154 article EN 2017-07-01

Vector-quantized Image Modeling with Improved VQGAN

OPENALEX - Publications

Jiahui Yu Xin Li Jing Yu Koh Han Zhang Ruoming Pang and 5 more

Pretraining language models with next-token prediction on massive text corpora has delivered phenomenal zero-shot, few-shot, transfer learning and multi-tasking capabilities both generative discriminative tasks. Motivated by this success, we explore a Vector-quantized Image Modeling (VIM) approach that involves pretraining Transformer to predict rasterized image tokens autoregressively. The discrete are encoded from learned Vision-Transformer-based VQGAN (ViT-VQGAN). We first propose...

10.48550/arxiv.2110.04627 preprint EN other-oa arXiv (Cornell University) 2021-01-01

A Better and Faster end-to-end Model for Streaming ASR

OPENALEX - Publications

Bo Li Anmol Gulati Jiahui Yu Tara N. Sainath Chung‐Cheng Chiu and 10 more

End-to-end (E2E) models have shown to outperform state-of-the-art conventional for streaming speech recognition [1] across many dimensions, including quality (as measured by word error rate (WER)) and endpointer latency [2]. However, the model still tends delay predictions towards end thus has much higher partial compared a ASR model. To address this issue, we look at encouraging E2E emit words early, through an algorithm called FastEmit [3]. Naturally, improving on results in degradation....

10.1109/icassp39728.2021.9413899 article EN ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2021-05-13

Particle behaviours of biomass gasification in a bubbling fluidized bed

OPENALEX - Publications

Dali Kong Kun Luo Shuai Wang Jiahui Yu Jianren Fan

10.1016/j.cej.2021.131847 article EN Chemical Engineering Journal 2021-08-17

Coming Soon ...