Kunpeng Song

ORCID: 0009-0009-2439-4263
Publications
Citations
Views
---
Saved
---
About
Contact & Profiles
Research Areas
  • Generative Adversarial Networks and Image Synthesis
  • Computer Graphics and Visualization Techniques
  • Advanced Vision and Imaging
  • Video Analysis and Summarization
  • Advanced Data Compression Techniques
  • Multimodal Machine Learning Applications
  • Advanced Image Processing Techniques
  • Cancer-related molecular mechanisms research
  • Natural Language Processing Techniques
  • Advanced Image and Video Retrieval Techniques
  • Image Retrieval and Classification Techniques
  • Whipple's Disease and Interleukins
  • Wireless Communication Security Techniques
  • Molecular Communication and Nanonetworks
  • IoT and Edge/Fog Computing
  • Human Pose and Action Recognition
  • Antenna Design and Analysis
  • Advanced Wireless Communication Technologies
  • Digital Media Forensic Detection
  • Domain Adaptation and Few-Shot Learning
  • Age of Information Optimization
  • Handwritten Text Recognition Techniques
  • Musicology and Musical Analysis
  • Aesthetic Perception and Analysis
  • Advanced Steganography and Watermarking Techniques

China Academy of Space Technology
2025

Peking University
2023

Rutgers Sexual and Reproductive Health and Rights
2021-2022

Rutgers, The State University of New Jersey
2021

Chinese Academy of Sciences
2017

An ever increasing number of configuration parameters are provided to system users. But many users have used one setting across different workloads, leaving untapped the performance potential systems. A good can greatly improve a deployed under certain workloads. with tens or hundreds parameters, it becomes highly costly task decide which leads best performance. While such requires strong expertise in both and application, commonly lack expertise. To help tap systems, we present BestConfig,...

10.1145/3127479.3128605 preprint EN 2017-09-24

Focusing on text-to-image (T2I) generation, we propose Text and Image Mutual-Translation Adversarial Networks (TIME), a lightweight but effective model that jointly learns T2I generator G an image captioning discriminator D under the Generative Network framework. While previous methods tackle problem as uni-directional task use pre-trained language models to enforce image--text consistency, TIME requires neither extra modules nor pre-training. We show performance of can be boosted...

10.1609/aaai.v35i3.16305 article EN Proceedings of the AAAI Conference on Artificial Intelligence 2021-05-18

ABSTRACT As sixth‐generation (6G) communication technology evolves, the increase in frequency and number of antennas has made traditional far‐field channel estimation methods less effective. This paper proposes a deep neural network (DNN)‐based method to optimize pilot signals for near‐field ultra‐massive multiple‐input multiple‐output (MIMO) systems. By optimizing signals, can accurately estimate distance angle scatterers, addressing challenges sparse techniques. Simulation results...

10.1049/ell2.70227 article EN cc-by-nc-nd Electronics Letters 2025-01-01

Imagining a colored realistic image from an arbitrary-drawn sketch is one of human capabilities that we eager machines to mimic. Unlike previous methods either require the sketch-image pairs or utilize low-quantity detected edges as sketches, study exemplar-based sketch-to-image (s2i) synthesis task in self-supervised learning manner, eliminating necessity paired data. To this end, first propose unsupervised method efficiently synthesize line-sketches for general RGB-only datasets. With...

10.1609/aaai.v35i3.16304 article EN Proceedings of the AAAI Conference on Artificial Intelligence 2021-05-18

Training Generative Adversarial Networks (GAN) on high-fidelity images usually requires large-scale GPU-clusters and a vast number of training images. In this paper, we study the few-shot image synthesis task for GAN with minimum computing cost. We propose light-weight structure that gains superior quality 1024*1024 resolution. Notably, model converges from scratch just few hours single RTX-2080 GPU, has consistent performance, even less than 100 samples. Two technique designs constitute our...

10.48550/arxiv.2101.04775 preprint EN other-oa arXiv (Cornell University) 2021-01-01

In this paper, we present MoMA: an open-vocabulary, training-free personalized image model that boasts flexible zero-shot capabilities. As foundational text-to-image models rapidly evolve, the demand for robust image-to-image translation grows. Addressing need, MoMA specializes in subject-driven generation. Utilizing open-source, Multimodal Large Language Model (MLLM), train to serve a dual role as both feature extractor and generator. This approach effectively synergizes reference text...

10.48550/arxiv.2404.05674 preprint EN arXiv (Cornell University) 2024-04-08

Abstract In this paper, we propose a license plate recognition model, which can detect and recognize the in single forward. The features of input image are extracted by our 15-layer convolutional neural network. detection branch, use loss function with better nonlinear to fit process plate. To catch location less information loss, add Intersection over Ground-truth (IoG) into Union (IoU) get Balanced-IoU (BIoU loss). combination these two functions make model predictive result. introduce an...

10.1088/1742-6596/2504/1/012039 article EN Journal of Physics Conference Series 2023-05-01

In this paper, we introduce DirectorLLM, a novel video generation model that employs large language (LLM) to orchestrate human poses within videos. As foundational text-to-video models rapidly evolve, the demand for high-quality motion and interaction grows. To address need enhance authenticity of motions, extend LLM from text generator director simulator. Utilizing open-source resources Llama 3, train DirectorLLM generate detailed instructional signals, such as poses, guide generation. This...

10.48550/arxiv.2412.14484 preprint EN arXiv (Cornell University) 2024-12-18

Sketch-to-Art is an AI tool that allows creatives to sketch idea and get fully rendered images, stylized the way they want in real time. Users can define a style by either choosing reference image, or group of selecting artist, art movement.

10.1145/3407662.3407757 article EN 2020-08-14

We propose a new approach for synthesizing fully detailed art-stylized images from sketches. Given sketch, with no semantic tagging, and reference image of specific style, the model can synthesize meaningful details colors textures. The consists three modules designed explicitly better artistic style capturing generation. Based on GAN framework, dual-masked mechanism is introduced to enforce content constraints (from sketch), feature-map transformation technique developed strengthen...

10.48550/arxiv.2002.12888 preprint EN other-oa arXiv (Cornell University) 2020-01-01

Imagining a colored realistic image from an arbitrarily drawn sketch is one of the human capabilities that we eager machines to mimic. Unlike previous methods either requires sketch-image pairs or utilize low-quantity detected edges as sketches, study exemplar-based sketch-to-image (s2i) synthesis task in self-supervised learning manner, eliminating necessity paired data. To this end, first propose unsupervised method efficiently synthesize line-sketches for general RGB-only datasets. With...

10.48550/arxiv.2012.09290 preprint EN other-oa arXiv (Cornell University) 2020-01-01

With the development of 6G, millimeter wave communication has received extensive attention. Due to characteristics wireless transmission, information secrecy transmission is facing significant challenges. This paper uses physical layer security (PLS) explore transmission. Specifically, we use Intelligent Reflecting Surface (IRS) control propagation environment and improve rate communication. The active beamforming matrix base station passive IRS are optimized achieve maximum rate. We deduce...

10.1109/vtc2023-spring57618.2023.10200911 article EN 2022 IEEE 95th Vehicular Technology Conference: (VTC2022-Spring) 2023-06-01

This paper presents a method through which we can realize the procedure of visual analysis voice data. We first transform amounts audio files gained by tax hotline 12366 to text using Baidu speech recognition service, and divide these into words phrases via ‘ Chinese Segmentation ’. propose NLP algorithm that select keywords from Word2Vec ’ contradiction - handling method. These serve as indications classification model, is service requirement. Then, some results are visualized in form...

10.12783/dtcse/aiie2017/18217 article EN DEStech Transactions on Computer Science and Engineering 2018-02-12

Focusing on text-to-image (T2I) generation, we propose Text and Image Mutual-Translation Adversarial Networks (TIME), a lightweight but effective model that jointly learns T2I generator G an image captioning discriminator D under the Generative Network framework. While previous methods tackle problem as uni-directional task use pre-trained language models to enforce image--text consistency, TIME requires neither extra modules nor pre-training. We show performance of can be boosted...

10.48550/arxiv.2005.13192 preprint EN other-oa arXiv (Cornell University) 2020-01-01
Coming Soon ...