Wenju Xu

ORCID: 0000-0003-2740-0357
Publications
Citations
Views
---
Saved
---
About
Contact & Profiles
Research Areas
  • Generative Adversarial Networks and Image Synthesis
  • Advanced Neural Network Applications
  • Advanced Image and Video Retrieval Techniques
  • Domain Adaptation and Few-Shot Learning
  • Image Enhancement Techniques
  • Human Pose and Action Recognition
  • 3D Surveying and Cultural Heritage
  • Cancer-related molecular mechanisms research
  • Advanced Vision and Imaging
  • Robotics and Sensor-Based Localization
  • Computer Graphics and Visualization Techniques
  • 3D Shape Modeling and Analysis
  • Multimodal Machine Learning Applications
  • Digital Media Forensic Detection
  • Advanced Image Processing Techniques
  • Muscle activation and electromyography studies
  • Video Surveillance and Tracking Methods
  • Image Processing Techniques and Applications
  • Gaze Tracking and Assistive Technology
  • Color Science and Applications
  • Medical Image Segmentation Techniques
  • Advanced MEMS and NEMS Technologies
  • Iterative Learning Control Systems
  • Cooperative Communication and Network Coding
  • Machine Learning and ELM

Amazon (United States)
2024-2025

University of Kansas
2016-2020

Xi'an Jiaotong University
2013-2015

Shougang (China)
2014

HBIS (China)
2014

Hebei Agricultural University
2013

Shandong Iron and Steel Group (China)
2012

Southwest University of Science and Technology
2011

Regularized autoencoders learn the latent codes, a structure with regularization under distribution, which enables them capability to infer codes given observations and generate new samples codes. However, they are sometimes ambiguous as tend produce reconstructions that not necessarily faithful reproduction of inputs. The main reason is enforce learned code distribution match prior while true remains unknown. To improve reconstruction quality space manifold structure, this paper presents...

10.1109/tmm.2019.2898777 article EN publisher-specific-oa IEEE Transactions on Multimedia 2019-02-11

The paper proposes a Dynamic ResBlock Generative Adversarial Network (DRB-GAN) for artistic style transfer. code is modeled as the shared parameters ResBlocks connecting both encoding network and transfer network. In network, class-aware attention mechanism used to attend feature representation generating codes. multiple are designed integrate extracted CNN semantic then feed into spatial window Layer-Instance Normalization (SW-LIN) decoder, which enables high-quality synthetic images with...

10.1109/iccv48922.2021.00632 article EN 2021 IEEE/CVF International Conference on Computer Vision (ICCV) 2021-10-01

Existing image captioning methods just focus on understanding the relationship between objects or instances in a single image, without exploring contextual correlation existed among image. In this paper, we propose Dual Graph Convolutional Networks (Dual-GCN) with transformer and curriculum learning for captioning. particular, not only use an object-level GCN to capture object spatial relation within but also adopt image-level feature information provided by similar images. With...

10.1145/3474085.3475439 article EN Proceedings of the 30th ACM International Conference on Multimedia 2021-10-17

Micro-expressions are spontaneous, rapid and subtle facial movements that can neither be forged nor suppressed. They very important nonverbal communication clues, but transient of low intensity thus difficult to recognize. Recently deep learning based methods have been developed for micro-expression (ME) recognition using feature extraction fusion techniques, however, targeted efficient still lack further study according the ME characteristics. To address these issues, we propose a novel...

10.1109/cvpr52729.2023.02115 article EN 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 2023-06-01

Theoretical analysis in this paper indicates that the accuracy of a silicon piezoresistive pressure sensor is mainly affected by thermal drift, and varies nonlinearly with temperature. Here, smart temperature compensation system to reduce its effect on proposed. Firstly, an effective conditioning circuit for signal processing data acquisition designed. The hardware implement fabricated. Then, program developed LabVIEW which incorporates extreme learning machine (ELM) as calibration algorithm...

10.3390/s140712174 article EN cc-by Sensors 2014-07-08

While self-attention has been successfully applied in a variety of natural language processing and computer vision tasks, its application Monte Carlo (MC) image denoising not yet well explored. This paper presents based MC deep learning network on the fact that is essentially non-local means filtering embedding space which makes it inherently very suitable for task. Particularly, we modify standard mechanism to an auxiliary feature guided considers by-products (e.g., buffers) rendering...

10.1145/3478513.3480565 article EN ACM Transactions on Graphics 2021-12-01

Illumination estimation from a single indoor image is promising yet challenging task. Existing illumination methods mainly regress lighting parameters or infer panorama limited field-of-view image. Nevertheless, these fail to recover with both well-distributed and detailed environment textures, leading lack of realism in rendering the embedded 3D objects complex materials. This paper presents novel multi-stage framework named IllumiDiff. Specifically, Stage I, we first estimate conditions...

10.1109/tvcg.2025.3553853 article EN IEEE Transactions on Visualization and Computer Graphics 2025-01-01

In this paper we propose integrating a priori knowledge into both design and training of convolutional neural networks (CNNs) to learn object representations that are invariant affine transformations (i.e. translation, scale, rotation). Accordingly novel multi-scale maxout CNN train it end-to-end with rotation-invariant regularizer. This regularizer aims enforce the weights in each 2D spatial filter approximate circular patterns. way, manage handle using convolution, maxout, filters....

10.1109/wacv45572.2020.9093385 article EN 2020-03-01

10.1016/j.neucom.2019.06.096 article EN publisher-specific-oa Neurocomputing 2019-07-19

We propose a self-supervised approach to improve the training of Generative Adversarial Networks (GANs) via inducing discriminator examine structural consistency images. Although natural image samples provide ideal examples both valid structure and texture, learning reproduce together remains an open challenge. In our approach, we augment set images with modified that have degraded consistency. These are automatically created by randomly exchanging pairs patches in image's convolutional...

10.1109/wacv45572.2020.9093525 article EN 2020-03-01

Recent image-to-image translation models have shown great success in mapping local textures between two domains. Existing approaches rely on a cycle-consistency constraint that supervises the generators to learn an inverse mapping. However, learning introduces extra trainable parameters and it is unable for some As result, they are ineffective scenarios where (i) multiple visual image domains involved; (ii) both structure texture transformations required; (iii) semantic consistency...

10.1109/tip.2021.3125266 article EN IEEE Transactions on Image Processing 2021-11-11

10.1016/j.compeleceng.2018.02.002 article EN Computers & Electrical Engineering 2018-02-23

In this paper, we propose a novel framework named DRL-CPG to learn disentangled latent representation for controllable person image generation, which can produce realistic images with desired poses and human attributes (e.g. pose, head, upper clothes, pants) provided by various source persons. Unlike the existing works leveraging semantic masks obtain of each component, generate code via attribute encoder transformers trained in manner curriculum learning from relatively easy step gradually...

10.1109/tmm.2023.3345180 article EN IEEE Transactions on Multimedia 2024-01-01

Recently, Point-MAE has extended Masked Autoencoders (MAE) to point clouds for 3D self-supervised learning, which however faces two problems: (1) the shape similarity between masked cloud and original is high, (2) pretext task of reconstructing straightforward fails compel network learn deep representative features. In this paper, we tackle these problems by proposing a PatchMixing strategy teacher-student training framework. First, with PatchMixing, mix selected patches multiple attempt...

10.1109/tcsvt.2024.3405069 article EN IEEE Transactions on Circuits and Systems for Video Technology 2024-05-24

The Vision Transformer (ViT) leverages the Transformer's encoder to capture global information by dividing images into patches and achieves superior performance across various computer vision tasks. However, self-attention mechanism of ViT captures context from outset, overlooking inherent relationships between neighboring pixels in or videos. Transformers mainly focus on while ignoring fine-grained local details. Consequently, lacks inductive bias during image video dataset training. In...

10.48550/arxiv.2407.19394 preprint EN arXiv (Cornell University) 2024-07-28

It is critical to obtain high resolution features with long range dependency for dense prediction tasks such as semantic segmentation. To generate high-resolution output of size $H\times W$ from a low-resolution feature map $h\times w$ ($hw\ll HW$), naive transformer incurs an intractable complexity $\mathcal{O}(hwHW)$, limiting its application on prediction. We propose Dual-Flattening Transformer (DFlatFormer) enable by reducing $\mathcal{O}(hw(H+W))$ that multiple orders magnitude smaller...

10.48550/arxiv.2201.09139 preprint EN cc-by-nc-nd arXiv (Cornell University) 2022-01-01

10.1109/tcsvt.2024.3471875 article EN IEEE Transactions on Circuits and Systems for Video Technology 2024-01-01

Online shopping is a complex multi-task, few-shot learning problem with wide and evolving range of entities, relations, tasks. However, existing models benchmarks are commonly tailored to specific tasks, falling short capturing the full complexity online shopping. Large Language Models (LLMs), their multi-task abilities, have potential profoundly transform by alleviating task-specific engineering efforts providing users interactive conversations. Despite potential, LLMs face unique...

10.48550/arxiv.2410.20745 preprint EN arXiv (Cornell University) 2024-10-28
Coming Soon ...