- Generative Adversarial Networks and Image Synthesis
- Advanced Image Processing Techniques
- Probability and Risk Models
- Computer Graphics and Visualization Techniques
- Advanced Vision and Imaging
- Advanced Neural Network Applications
- Advanced Image and Video Retrieval Techniques
- Stochastic processes and statistical mechanics
- Neural Networks and Applications
- Stochastic processes and financial applications
- Domain Adaptation and Few-Shot Learning
- Video Surveillance and Tracking Methods
- Digital Media Forensic Detection
- Image and Object Detection Techniques
- Music and Audio Processing
- 3D Shape Modeling and Analysis
- Stochastic Gradient Optimization Techniques
- Anomaly Detection Techniques and Applications
- Financial Risk and Volatility Modeling
- Face and Expression Recognition
- Medical Image Segmentation Techniques
- Speech Recognition and Synthesis
- Image and Signal Denoising Methods
- Machine Learning and Data Classification
- Image Processing Techniques and Applications
Hunan University
2008-2024
Qujing Normal University
2024
Xinjiang Production and Construction Corps
2024
Tarim University
2024
Nanchang University
2023-2024
Google (United States)
2019-2023
Hefei University of Technology
2021
University of Illinois Urbana-Champaign
2021
University of Maryland, College Park
2019
Tsinghua University
2018
We propose PD-GAN, a probabilistic diverse GAN for image inpainting. Given an input with arbitrary hole regions, PD-GAN produces multiple inpainting results and visually realistic content. Our is built upon vanilla which generates images based on random noise. During generation, we modulate deep features of noise from coarse-to-fine by injecting initially restored the regions in scales. argue that during filling, pixels near boundary should be more deterministic (i.e., higher probability...
Vision-Language Pretraining (VLP) has demonstrated remarkable capabilities in learning visual representations from textual descriptions of images without annotations. Yet, effective VLP demands large-scale image-text pairs, a resource that suffers scarcity the medical domain. Moreover, conventional is limited to 2D while encompass diverse modalities, often 3D, making process more challenging. To address these challenges, we present Generative Text-Guided 3D for Unified Medical Image...
Text-only adaptation of a transducer model remains challenging for end-to-end speech recognition since the has no clearly separated acoustic (AM), language (LM) or blank model. In this work, we propose modular hybrid autoregressive (MHAT) that structurally label and decoders to predict distributions, respectively, along with shared encoder. The encoder decoder outputs are directly projected AM internal LM scores then added compute posteriors. We train MHAT an loss HAT ensure its becomes...
User-intended visual content fills the hole regions of an input image in editing scenario. The coarse low- level inputs, which typically consist sparse sketch lines and color dots, convey user intentions for creation (i.e., free-form editing). While existing methods combine these low-level controls CNN corresponding feature representations are not sufficient to intentions, leading unfaithfully generated content. In this paper, we propose DeFLOCNet relies on a deep encoder-decoder retain...
In this paper, we explore reducing computational latency of the 2-pass cascaded encoder model [1]. Specifically, experiment with size causal 1st-pass and adding capacity to non-causal 2nd-pass, such that overall can be reduced without loss quality. addition, using a confidence for deciding stop 2nd-pass recognition if are confident in hypothesis. Overall, able reduce by factor 1.7X, compared baseline from Secondly, added find improve WER up 7% relative wav2vec minimum word-error-rate (MWER) training.
Innovations in neural architectures have fostered significant breakthroughs language modeling and computer vision. Unfortunately, novel often result challenging hyper-parameter choices training instability if the network parameters are not properly initialized. A number of architecture-specific initialization schemes been proposed, but these always portable to new architectures. This paper presents GradInit, an automated architecture agnostic method for initializing networks. GradInit is...
Designing an accurate and efficient model for animal recognition is a challenging task. It needs to consider many aspects including the accuracy of model, number parameters, complexity calculation so on. Therefore, we propose novel convolutional network, called Bilateral Convolutional Network (BCNet), which aims at achieving trade-off between size, that it can be better feasible mobile devices. consists two components, namely feature extraction module classification module, respectively. The...
Information flow type systems, such as EnerJ (a system for energy efficiency), and integrity confidentiality, are unsound if subtyping references is allowed because of the presence mutable references. The standard approach to disallow references, or in other words, replace constraints with equality constraints. Unfortunately, this often leads imprecision, causing reject valid programs.
Active learning theories and methods have been extensively studied in classical statistical settings. However, deep active learning, i.e., with models, is usually based on empirical criteria without solid theoretical justification, thus suffering from heavy doubts when some of those fail to provide benefits real applications. In this paper, by exploring the connection between generalization performance training dynamics, we propose a theory-driven method (dynamicAL) which selects samples...
Key-cap flatness detection after assembly is one of the basic quality control (QC) indexes in computer keyboard manufacturing. A modified machine vision system based on linear structured light imaging for measuring key-cap proposed QC automation. After a brief introduction design and principle, pipeline stripe image processing, especially removal printed letter interference, studied. First, staggered reprojection dense multiline fringes presented using pattern editability digital processing...
Image reconstruction is the transformation process from a reduced-order representation to original image pixel form. In materials characterization, it can be utilized as method retrieve material composition information. our previous work, surfacelet transform was developed efficiently represent boundary information in images with coefficients. this paper, new constrained-conjugate-gradient based methods are proposed inverse transform. With geometric constraints on boundaries and internal...
In multiscale materials modeling, it is desirable that different levels of details can be specified in regions interest without the separation scales so geometric and physical properties designed characterized. Existing modeling approaches focus more on representation distributions material compositions captured from images. this paper, a method proposed to support interactive specification visualization microstructures at multiple details, where designer's intent captured. This provides...
The upsampling layers are adopted in almost all the existing encoder-decoder based generative adversarial networks (GANs), which have shown promising results image inpainting field. However, (e.g. deconvolution and bilinear interpolation) suffer from two limitations: (1) they obtain few semantic information global structure. (2) layer could hardly capture local content details. To eliminate above issues, we propose a deep Fusion local-content global-semantic (DFLG) model that is both...
Novelty detection is a challenging task of identifying whether new sample obeys to known class. Note that the boundary between normal and novel not clear enough in existing works, resulting from adequately reconstructing samples or crudely samples. To tackle above issues, we propose general framework named Adaptive Adversarial Latent Space (AALS), which mainly consists two components, Generator (ALSG) Constrained-based AutoEncoder (CAAE). ALSG established obtain real latent space...
Natural products (NPs) afforded by living-beings, especially microscopic species, represent invaluable and indispensable reservoirs for drug leads in clinical practice. With the rapid advancement sequencing technology bioinformatics, ever-increasing number of microbial biosynthetic gene clusters (BGCs) were decrypted, while a great deal BGCs remain cryptic or inactive under standard laboratory culture conditions. Addressing this dilemma requires innovative tactics to awaken quiescence...