- Generative Adversarial Networks and Image Synthesis
- Advanced Neural Network Applications
- Advanced Image and Video Retrieval Techniques
- Domain Adaptation and Few-Shot Learning
- Image Enhancement Techniques
- Human Pose and Action Recognition
- 3D Surveying and Cultural Heritage
- Cancer-related molecular mechanisms research
- Advanced Vision and Imaging
- Robotics and Sensor-Based Localization
- Computer Graphics and Visualization Techniques
- 3D Shape Modeling and Analysis
- Multimodal Machine Learning Applications
- Digital Media Forensic Detection
- Advanced Image Processing Techniques
- Muscle activation and electromyography studies
- Video Surveillance and Tracking Methods
- Image Processing Techniques and Applications
- Gaze Tracking and Assistive Technology
- Color Science and Applications
- Medical Image Segmentation Techniques
- Advanced MEMS and NEMS Technologies
- Iterative Learning Control Systems
- Cooperative Communication and Network Coding
- Machine Learning and ELM
Amazon (United States)
2024-2025
University of Kansas
2016-2020
Xi'an Jiaotong University
2013-2015
Shougang (China)
2014
HBIS (China)
2014
Hebei Agricultural University
2013
Shandong Iron and Steel Group (China)
2012
Southwest University of Science and Technology
2011
Regularized autoencoders learn the latent codes, a structure with regularization under distribution, which enables them capability to infer codes given observations and generate new samples codes. However, they are sometimes ambiguous as tend produce reconstructions that not necessarily faithful reproduction of inputs. The main reason is enforce learned code distribution match prior while true remains unknown. To improve reconstruction quality space manifold structure, this paper presents...
The paper proposes a Dynamic ResBlock Generative Adversarial Network (DRB-GAN) for artistic style transfer. code is modeled as the shared parameters ResBlocks connecting both encoding network and transfer network. In network, class-aware attention mechanism used to attend feature representation generating codes. multiple are designed integrate extracted CNN semantic then feed into spatial window Layer-Instance Normalization (SW-LIN) decoder, which enables high-quality synthetic images with...
Existing image captioning methods just focus on understanding the relationship between objects or instances in a single image, without exploring contextual correlation existed among image. In this paper, we propose Dual Graph Convolutional Networks (Dual-GCN) with transformer and curriculum learning for captioning. particular, not only use an object-level GCN to capture object spatial relation within but also adopt image-level feature information provided by similar images. With...
Micro-expressions are spontaneous, rapid and subtle facial movements that can neither be forged nor suppressed. They very important nonverbal communication clues, but transient of low intensity thus difficult to recognize. Recently deep learning based methods have been developed for micro-expression (ME) recognition using feature extraction fusion techniques, however, targeted efficient still lack further study according the ME characteristics. To address these issues, we propose a novel...
Theoretical analysis in this paper indicates that the accuracy of a silicon piezoresistive pressure sensor is mainly affected by thermal drift, and varies nonlinearly with temperature. Here, smart temperature compensation system to reduce its effect on proposed. Firstly, an effective conditioning circuit for signal processing data acquisition designed. The hardware implement fabricated. Then, program developed LabVIEW which incorporates extreme learning machine (ELM) as calibration algorithm...
While self-attention has been successfully applied in a variety of natural language processing and computer vision tasks, its application Monte Carlo (MC) image denoising not yet well explored. This paper presents based MC deep learning network on the fact that is essentially non-local means filtering embedding space which makes it inherently very suitable for task. Particularly, we modify standard mechanism to an auxiliary feature guided considers by-products (e.g., buffers) rendering...
Illumination estimation from a single indoor image is promising yet challenging task. Existing illumination methods mainly regress lighting parameters or infer panorama limited field-of-view image. Nevertheless, these fail to recover with both well-distributed and detailed environment textures, leading lack of realism in rendering the embedded 3D objects complex materials. This paper presents novel multi-stage framework named IllumiDiff. Specifically, Stage I, we first estimate conditions...
In this paper we propose integrating a priori knowledge into both design and training of convolutional neural networks (CNNs) to learn object representations that are invariant affine transformations (i.e. translation, scale, rotation). Accordingly novel multi-scale maxout CNN train it end-to-end with rotation-invariant regularizer. This regularizer aims enforce the weights in each 2D spatial filter approximate circular patterns. way, manage handle using convolution, maxout, filters....
We propose a self-supervised approach to improve the training of Generative Adversarial Networks (GANs) via inducing discriminator examine structural consistency images. Although natural image samples provide ideal examples both valid structure and texture, learning reproduce together remains an open challenge. In our approach, we augment set images with modified that have degraded consistency. These are automatically created by randomly exchanging pairs patches in image's convolutional...
Recent image-to-image translation models have shown great success in mapping local textures between two domains. Existing approaches rely on a cycle-consistency constraint that supervises the generators to learn an inverse mapping. However, learning introduces extra trainable parameters and it is unable for some As result, they are ineffective scenarios where (i) multiple visual image domains involved; (ii) both structure texture transformations required; (iii) semantic consistency...
In this paper, we propose a novel framework named DRL-CPG to learn disentangled latent representation for controllable person image generation, which can produce realistic images with desired poses and human attributes (e.g. pose, head, upper clothes, pants) provided by various source persons. Unlike the existing works leveraging semantic masks obtain of each component, generate code via attribute encoder transformers trained in manner curriculum learning from relatively easy step gradually...
Recently, Point-MAE has extended Masked Autoencoders (MAE) to point clouds for 3D self-supervised learning, which however faces two problems: (1) the shape similarity between masked cloud and original is high, (2) pretext task of reconstructing straightforward fails compel network learn deep representative features. In this paper, we tackle these problems by proposing a PatchMixing strategy teacher-student training framework. First, with PatchMixing, mix selected patches multiple attempt...
The Vision Transformer (ViT) leverages the Transformer's encoder to capture global information by dividing images into patches and achieves superior performance across various computer vision tasks. However, self-attention mechanism of ViT captures context from outset, overlooking inherent relationships between neighboring pixels in or videos. Transformers mainly focus on while ignoring fine-grained local details. Consequently, lacks inductive bias during image video dataset training. In...
It is critical to obtain high resolution features with long range dependency for dense prediction tasks such as semantic segmentation. To generate high-resolution output of size $H\times W$ from a low-resolution feature map $h\times w$ ($hw\ll HW$), naive transformer incurs an intractable complexity $\mathcal{O}(hwHW)$, limiting its application on prediction. We propose Dual-Flattening Transformer (DFlatFormer) enable by reducing $\mathcal{O}(hw(H+W))$ that multiple orders magnitude smaller...
Online shopping is a complex multi-task, few-shot learning problem with wide and evolving range of entities, relations, tasks. However, existing models benchmarks are commonly tailored to specific tasks, falling short capturing the full complexity online shopping. Large Language Models (LLMs), their multi-task abilities, have potential profoundly transform by alleviating task-specific engineering efforts providing users interactive conversations. Despite potential, LLMs face unique...