Yuan Gong

ORCID: 0009-0009-9097-4805
Publications
Citations
Views
---
Saved
---
About
Contact & Profiles
Research Areas
  • Sentiment Analysis and Opinion Mining
  • Generative Adversarial Networks and Image Synthesis
  • Advanced Image Fusion Techniques
  • Advanced Fiber Optic Sensors
  • Image and Video Quality Assessment
  • Photonic and Optical Devices
  • Advanced Vision and Imaging
  • Human Motion and Animation
  • Domain Adaptation and Few-Shot Learning
  • Face recognition and analysis
  • Multimodal Machine Learning Applications
  • Video Analysis and Summarization
  • Anomaly Detection Techniques and Applications
  • Emotion and Mood Recognition
  • Topic Modeling
  • Visual Attention and Saliency Detection
  • E-commerce and Technology Innovations
  • Machine Fault Diagnosis Techniques
  • Biopolymer Synthesis and Applications
  • Advanced Image Processing Techniques
  • Sulfur-Based Synthesis Techniques
  • Wireless Sensor Networks and IoT
  • Artificial Intelligence in Games
  • Image Enhancement Techniques
  • Smart Grid and Power Systems

Massachusetts Institute of Technology
2025

Chongqing University of Technology
2024

Tsinghua University
2022-2023

University Town of Shenzhen
2023

Tsinghua–Berkeley Shenzhen Institute
2022

Shenzhen University
2022

Wuhan University of Science and Technology
2016-2020

Beijing University of Chinese Medicine
2020

Center for Special Minimally Invasive and Robotic Surgery
2018

Robotic Technology (United States)
2018

No-Reference Image Quality Assessment (NR-IQA) aims to assess the perceptual quality of images in accordance with human subjective perception. Unfortunately, existing NR-IQA methods are far from meeting needs predicting accurate scores on GAN-based distortion images. To this end, we propose Multi-dimension Attention Network for no-reference (MANIQA) improve performance distortion. We firstly extract features via ViT, then strengthen global and local interactions, Transposed Block (TAB) Scale...

10.1109/cvprw56347.2022.00126 article EN 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW) 2022-06-01

This paper reports on the NTIRE 2022 challenge perceptual image quality assessment (IQA), held in conjunction with New Trends Image Restoration and Enhancement workshop (NTIRE) at CVPR 2022. is to address emerging of IQA by processing algorithms. The output images these algorithms have completely different characteristics from traditional distortions are included PIPAL dataset used this challenge. divided into two tracks, a full-reference track similar previous new that focuses no-reference...

10.1109/cvprw56347.2022.00109 article EN 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW) 2022-06-01

Image quality assessment (IQA) algorithm aims to quantify the human perception of image quality. Unfortunately, there is a performance drop when assessing distortion images generated by generative adversarial network (GAN) with seemingly realistic textures. In this work, we conjecture that maladaptation lies in backbone IQA models, where patch-level prediction methods use independent patches as input calculate their scores separately, but lack spatial relationship modeling among patches....

10.1109/cvprw56347.2022.00123 article EN 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW) 2022-06-01

Recently, a surge of high-quality 3D-aware GANs have been proposed, which leverage the generative power neural rendering. It is natural to associate 3D with GAN inversion methods project real image into generator's latent space, allowing free-view consistent synthesis and editing, referred as inversion. Although facial prior preserved in pre-trained GANs, reconstructing portrait only one monocular still an ill-pose problem. The straightforward application 2D focuses on texture similarity...

10.1109/cvpr52729.2023.00041 article EN 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 2023-06-01

Respiration rate is an essential vital sign that requires monitoring under various conditions, including in strong electromagnetic environments such as magnetic resonance imaging systems. To provide electromagnetically-immune breath-sensing system, we propose all-fiber-optic wearable breath sensor based on a fiber-tip microcantilever. The microcantilever was fabricated by two-photon polymerization microfabrication femtosecond laser, so micro Fabry-Pérot (FP) interferometer formed between the...

10.3390/bios12030168 article EN cc-by Biosensors 2022-03-07

Multimodal semantic understanding often has to deal with uncertainty, which means the obtained messages tend refer multiple targets. Such uncertainty is problematic for our interpretation, including inter- and intra-modal uncertainty. Little effort studied modeling of this particularly in pretraining on unlabeled datasets fine-tuning task-specific downstream datasets. In paper, we project representations all modalities as probabilistic distributions via a Probability Distribution Encoder...

10.1109/cvpr52729.2023.02228 article EN 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 2023-06-01

Accurate Story visualization requires several necessary elements, such as identity consistency across frames, the alignment between plain text and visual content, a reasonable layout of objects in images. Most previous works endeavor to meet these requirements by fitting text-to-image (T2I) model on set videos same style with characters, e.g., FlintstonesSV dataset. However, learned T2I models typically struggle adapt new scenes, styles, often lack flexibility revise synthesized This paper...

10.1145/3610548.3618184 article EN cc-by 2023-12-10

We target cross-domain face reenactment in this paper, i.e., driving a cartoon image with the video of real person and vice versa. Recently, many works have focused on one-shot talking generation to drive portrait video, within-domain reenactment. Straightforwardly applying those methods animation will cause inaccurate expression transfer, blur effects, even apparent artifacts due domain shift between faces. Only few attempt settle The most related work AnimeCeleb [13] requires constructing...

10.1109/iccv51070.2023.00707 article EN 2021 IEEE/CVF International Conference on Computer Vision (ICCV) 2023-10-01

10.1109/icassp49660.2025.10888198 article EN ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2025-03-12

10.1109/icassp49660.2025.10888591 article EN ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2025-03-12

Knowledge Distillation (KD) has developed extensively and boosted various tasks. The classical KD method adds the loss to original cross-entropy (CE) loss. We try decompose explore its relation with CE Surprisingly, we find it can be regarded as a combination of an extra which identical form However, notice forces student's relative probability learn teacher's absolute probability. Moreover, sum two probabilities is different, making hard optimize. To address this issue, revise formulation...

10.48550/arxiv.2208.10139 preprint EN other-oa arXiv (Cornell University) 2022-01-01

Generating videos for visual storytelling can be a tedious and complex process that typically requires either live-action filming or graphics animation rendering. To bypass these challenges, our key idea is to utilize the abundance of existing video clips synthesize coherent by customizing their appearances. We achieve this developing framework comprised two functional modules: (i) Motion Structure Retrieval, which provides candidates with desired scene motion context described query texts,...

10.48550/arxiv.2307.06940 preprint EN other-oa arXiv (Cornell University) 2023-01-01

It is necessary to pay attention the bonding strength of interface between precast normal concrete (NSC) and cast‐in‐place epoxy resin (EMR) when using EMR as a repair or filling material an overlay in bridges’ rehabilitation. However, performances are different due differential mix ratios; thus, properties various cement not completely same. This article investigated interfacial bond NSC ERC by direct tensile, push‐out, slant shear test with specimens special size structure observed...

10.1155/2021/5561097 article EN cc-by Advances in Materials Science and Engineering 2021-01-01

Abstract 3H-Benzo[1,2]-dithiole-3-thiones were prepared from potassium sulfide and 2-halobenzaldehydes in moderate-to-good yields, a plausible mechanism for this catalyst-free intramolecular heteroannulation reaction has been proposed. The Knoevenagel condensation reactions of 3H-benzo[1,2]dithiole-3-thiones with active methylene compounds such as ethyl 2-cyanoacetate diethyl malonate, the three-component one-pot sulfide, 2-halobenzaldehydes, 2-cyanoacetate, affording corresponding products,...

10.1080/10426507.2011.600743 article EN Phosphorus, sulfur, and silicon and the related elements 2011-10-31

Theoretical expressions for analyzing the refractive-index sensitivity of hybrid optical fiber Fabry-Pérot sensor is developed. Influence experimental parameters on measurement discussed. Hybrid fabricated by chemically etching a graded-index multimode (GI-MMF), fusion splicing it into single mode fiber, and cleaving GI-MMF. The fringe contrast exceeds 30 dB corresponding refractive index about 45 per refraction unit. Experimental results are in good agreement with theoretical ones. It...

10.7498/aps.60.064202 article EN cc-by Acta Physica Sinica 2011-01-01

No-Reference Image Quality Assessment (NR-IQA) aims to assess the perceptual quality of images in accordance with human subjective perception. Unfortunately, existing NR-IQA methods are far from meeting needs predicting accurate scores on GAN-based distortion images. To this end, we propose Multi-dimension Attention Network for no-reference (MANIQA) improve performance distortion. We firstly extract features via ViT, then strengthen global and local interactions, Transposed Block (TAB) Scale...

10.48550/arxiv.2204.08958 preprint EN cc-by arXiv (Cornell University) 2022-01-01

Accurate Story visualization requires several necessary elements, such as identity consistency across frames, the alignment between plain text and visual content, a reasonable layout of objects in images. Most previous works endeavor to meet these requirements by fitting text-to-image (T2I) model on set videos same style with characters, e.g., FlintstonesSV dataset. However, learned T2I models typically struggle adapt new scenes, styles, often lack flexibility revise synthesized This paper...

10.48550/arxiv.2305.18247 preprint EN other-oa arXiv (Cornell University) 2023-01-01

With the development of Internet technology, recommendation system is becoming an essential part major e-commerce platforms, social media platforms and other application fields. The main purpose algorithm to provide users with personalized accurate recommendations goods, services information. Traditional algorithms are mainly based on information such as historical behavior recommend similar items users. However, only considering cannot fully reflect individual needs because emotions...

10.54254/2755-2721/45/20241030 article EN Applied and Computational Engineering 2024-03-15

Annotating and recognizing speech emotion using prompt engineering has recently emerged with the advancement of Large Language Models (LLMs), yet its efficacy reliability remain questionable. In this paper, we conduct a systematic study on topic, beginning proposal novel prompts that incorporate emotion-specific knowledge from acoustics, linguistics, psychology. Subsequently, examine effectiveness LLM-based prompting Automatic Speech Recognition (ASR) transcription, contrasting it...

10.48550/arxiv.2409.15551 preprint EN arXiv (Cornell University) 2024-09-23

Recent advancements in Large Language Models (LLMs) have demonstrated great success many Natural Processing (NLP) tasks. In addition to their cognitive intelligence, exploring capabilities emotional intelligence is also crucial, as it enables more natural and empathetic conversational AI. studies shown LLMs' capability recognizing emotions, but they often focus on single emotion labels overlook the complex ambiguous nature of human emotions. This study first address this gap by potential...

10.48550/arxiv.2409.18339 preprint EN arXiv (Cornell University) 2024-09-26

In recent years, target inspection has found extensive utilization within the industry, making it crucial to detect defects in industrial products ensure quality. To address challenges posed by large brightness differences, attached dirt, and complex backgrounds saggers, we propose a sagger defect recognition method that integrates deep learning detection machine vision feature extraction. This commences employing photometric stereo construct curvature map of surface, reducing interference...

10.3390/electronics13245010 article EN Electronics 2024-12-20
Coming Soon ...