- Generative Adversarial Networks and Image Synthesis
- Computer Graphics and Visualization Techniques
- Advanced Vision and Imaging
- Video Analysis and Summarization
- Advanced Data Compression Techniques
- Multimodal Machine Learning Applications
- Advanced Image Processing Techniques
- Cancer-related molecular mechanisms research
- Natural Language Processing Techniques
- Advanced Image and Video Retrieval Techniques
- Image Retrieval and Classification Techniques
- Whipple's Disease and Interleukins
- Wireless Communication Security Techniques
- Molecular Communication and Nanonetworks
- IoT and Edge/Fog Computing
- Human Pose and Action Recognition
- Antenna Design and Analysis
- Advanced Wireless Communication Technologies
- Digital Media Forensic Detection
- Domain Adaptation and Few-Shot Learning
- Age of Information Optimization
- Handwritten Text Recognition Techniques
- Musicology and Musical Analysis
- Aesthetic Perception and Analysis
- Advanced Steganography and Watermarking Techniques
China Academy of Space Technology
2025
Peking University
2023
Rutgers Sexual and Reproductive Health and Rights
2021-2022
Rutgers, The State University of New Jersey
2021
Chinese Academy of Sciences
2017
An ever increasing number of configuration parameters are provided to system users. But many users have used one setting across different workloads, leaving untapped the performance potential systems. A good can greatly improve a deployed under certain workloads. with tens or hundreds parameters, it becomes highly costly task decide which leads best performance. While such requires strong expertise in both and application, commonly lack expertise. To help tap systems, we present BestConfig,...
Focusing on text-to-image (T2I) generation, we propose Text and Image Mutual-Translation Adversarial Networks (TIME), a lightweight but effective model that jointly learns T2I generator G an image captioning discriminator D under the Generative Network framework. While previous methods tackle problem as uni-directional task use pre-trained language models to enforce image--text consistency, TIME requires neither extra modules nor pre-training. We show performance of can be boosted...
ABSTRACT As sixth‐generation (6G) communication technology evolves, the increase in frequency and number of antennas has made traditional far‐field channel estimation methods less effective. This paper proposes a deep neural network (DNN)‐based method to optimize pilot signals for near‐field ultra‐massive multiple‐input multiple‐output (MIMO) systems. By optimizing signals, can accurately estimate distance angle scatterers, addressing challenges sparse techniques. Simulation results...
Imagining a colored realistic image from an arbitrary-drawn sketch is one of human capabilities that we eager machines to mimic. Unlike previous methods either require the sketch-image pairs or utilize low-quantity detected edges as sketches, study exemplar-based sketch-to-image (s2i) synthesis task in self-supervised learning manner, eliminating necessity paired data. To this end, first propose unsupervised method efficiently synthesize line-sketches for general RGB-only datasets. With...
Training Generative Adversarial Networks (GAN) on high-fidelity images usually requires large-scale GPU-clusters and a vast number of training images. In this paper, we study the few-shot image synthesis task for GAN with minimum computing cost. We propose light-weight structure that gains superior quality 1024*1024 resolution. Notably, model converges from scratch just few hours single RTX-2080 GPU, has consistent performance, even less than 100 samples. Two technique designs constitute our...
In this paper, we present MoMA: an open-vocabulary, training-free personalized image model that boasts flexible zero-shot capabilities. As foundational text-to-image models rapidly evolve, the demand for robust image-to-image translation grows. Addressing need, MoMA specializes in subject-driven generation. Utilizing open-source, Multimodal Large Language Model (MLLM), train to serve a dual role as both feature extractor and generator. This approach effectively synergizes reference text...
Abstract In this paper, we propose a license plate recognition model, which can detect and recognize the in single forward. The features of input image are extracted by our 15-layer convolutional neural network. detection branch, use loss function with better nonlinear to fit process plate. To catch location less information loss, add Intersection over Ground-truth (IoG) into Union (IoU) get Balanced-IoU (BIoU loss). combination these two functions make model predictive result. introduce an...
In this paper, we introduce DirectorLLM, a novel video generation model that employs large language (LLM) to orchestrate human poses within videos. As foundational text-to-video models rapidly evolve, the demand for high-quality motion and interaction grows. To address need enhance authenticity of motions, extend LLM from text generator director simulator. Utilizing open-source resources Llama 3, train DirectorLLM generate detailed instructional signals, such as poses, guide generation. This...
Sketch-to-Art is an AI tool that allows creatives to sketch idea and get fully rendered images, stylized the way they want in real time. Users can define a style by either choosing reference image, or group of selecting artist, art movement.
We propose a new approach for synthesizing fully detailed art-stylized images from sketches. Given sketch, with no semantic tagging, and reference image of specific style, the model can synthesize meaningful details colors textures. The consists three modules designed explicitly better artistic style capturing generation. Based on GAN framework, dual-masked mechanism is introduced to enforce content constraints (from sketch), feature-map transformation technique developed strengthen...
Imagining a colored realistic image from an arbitrarily drawn sketch is one of the human capabilities that we eager machines to mimic. Unlike previous methods either requires sketch-image pairs or utilize low-quantity detected edges as sketches, study exemplar-based sketch-to-image (s2i) synthesis task in self-supervised learning manner, eliminating necessity paired data. To this end, first propose unsupervised method efficiently synthesize line-sketches for general RGB-only datasets. With...
With the development of 6G, millimeter wave communication has received extensive attention. Due to characteristics wireless transmission, information secrecy transmission is facing significant challenges. This paper uses physical layer security (PLS) explore transmission. Specifically, we use Intelligent Reflecting Surface (IRS) control propagation environment and improve rate communication. The active beamforming matrix base station passive IRS are optimized achieve maximum rate. We deduce...
This paper presents a method through which we can realize the procedure of visual analysis voice data. We first transform amounts audio files gained by tax hotline 12366 to text using Baidu speech recognition service, and divide these into words phrases via ‘ Chinese Segmentation ’. propose NLP algorithm that select keywords from Word2Vec ’ contradiction - handling method. These serve as indications classification model, is service requirement. Then, some results are visualized in form...
Focusing on text-to-image (T2I) generation, we propose Text and Image Mutual-Translation Adversarial Networks (TIME), a lightweight but effective model that jointly learns T2I generator G an image captioning discriminator D under the Generative Network framework. While previous methods tackle problem as uni-directional task use pre-trained language models to enforce image--text consistency, TIME requires neither extra modules nor pre-training. We show performance of can be boosted...