NFDI4DS | UHH-SEMS - Publication Details

MANIQA: Multi-dimension Attention Network for No-Reference Image Quality Assessment

OPENALEX - Publications

Sidi Yang Tianhe Wu Shuwei Shi Shanshan Lao Yuan Gong and 3 more

No-Reference Image Quality Assessment (NR-IQA) aims to assess the perceptual quality of images in accordance with human subjective perception. Unfortunately, existing NR-IQA methods are far from meeting needs predicting accurate scores on GAN-based distortion images. To this end, we propose Multi-dimension Attention Network for no-reference (MANIQA) improve performance distortion. We firstly extract features via ViT, then strengthen global and local interactions, Transposed Block (TAB) Scale...

10.1109/cvprw56347.2022.00126 article EN 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW) 2022-06-01

NTIRE 2022 Challenge on Perceptual Image Quality Assessment

OPENALEX - Publications

Jinjin Gu Haoming Cai Chao Dong Jimmy Ren Radu Timofte and 51 more

This paper reports on the NTIRE 2022 challenge perceptual image quality assessment (IQA), held in conjunction with New Trends Image Restoration and Enhancement workshop (NTIRE) at CVPR 2022. is to address emerging of IQA by processing algorithms. The output images these algorithms have completely different characteristics from traditional distortions are included PIPAL dataset used this challenge. divided into two tracks, a full-reference track similar previous new that focuses no-reference...

10.1109/cvprw56347.2022.00109 article EN 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW) 2022-06-01

Attentions Help CNNs See Better: Attention-based Hybrid Image Quality Assessment Network

OPENALEX - Publications

Shanshan Lao Yuan Gong Shuwei Shi Sidi Yang Tianhe Wu and 3 more

Image quality assessment (IQA) algorithm aims to quantify the human perception of image quality. Unfortunately, there is a performance drop when assessing distortion images generated by generative adversarial network (GAN) with seemingly realistic textures. In this work, we conjecture that maladaptation lies in backbone IQA models, where patch-level prediction methods use independent patches as input calculate their scores separately, but lack spatial relationship modeling among patches....

10.1109/cvprw56347.2022.00123 article EN 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW) 2022-06-01

3D GAN Inversion with Facial Symmetry Prior

OPENALEX - Publications

Fei Yin Yong Zhang Xuan Wang Tengfei Wang Xiaoyu Li and 6 more

Recently, a surge of high-quality 3D-aware GANs have been proposed, which leverage the generative power neural rendering. It is natural to associate 3D with GAN inversion methods project real image into generator's latent space, allowing free-view consistent synthesis and editing, referred as inversion. Although facial prior preserved in pre-trained GANs, reconstructing portrait only one monocular still an ill-pose problem. The straightforward application 2D focuses on texture similarity...

10.1109/cvpr52729.2023.00041 article EN 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 2023-06-01

A Wearable Breath Sensor Based on Fiber-Tip Microcantilever

OPENALEX - Publications

Cong Zhao Dan Liu Zhihao Cai Bin Du Mengqiang Zou and 9 more

Respiration rate is an essential vital sign that requires monitoring under various conditions, including in strong electromagnetic environments such as magnetic resonance imaging systems. To provide electromagnetically-immune breath-sensing system, we propose all-fiber-optic wearable breath sensor based on a fiber-tip microcantilever. The microcantilever was fabricated by two-photon polymerization microfabrication femtosecond laser, so micro Fabry-Pérot (FP) interferometer formed between the...

10.3390/bios12030168 article EN cc-by Biosensors 2022-03-07

MAP: Multimodal Uncertainty-Aware Vision-Language Pre-training Model

OPENALEX - Publications

Yatai Ji Junjie Wang Yuan Gong Lin Zhang Yanru Zhu and 4 more

Multimodal semantic understanding often has to deal with uncertainty, which means the obtained messages tend refer multiple targets. Such uncertainty is problematic for our interpretation, including inter- and intra-modal uncertainty. Little effort studied modeling of this particularly in pretraining on unlabeled datasets fine-tuning task-specific downstream datasets. In paper, we project representations all modalities as probabilistic distributions via a Probability Distribution Encoder...

10.1109/cvpr52729.2023.02228 article EN 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 2023-06-01

Interactive Story Visualization with Multiple Characters

OPENALEX - Publications

Yuan Gong Youxin Pang Xiaodong Cun Menghan Xia Yingqing He and 6 more

Accurate Story visualization requires several necessary elements, such as identity consistency across frames, the alignment between plain text and visual content, a reasonable layout of objects in images. Most previous works endeavor to meet these requirements by fitting text-to-image (T2I) model on set videos same style with characters, e.g., FlintstonesSV dataset. However, learned T2I models typically struggle adapt new scenes, styles, often lack flexibility revise synthesized This paper...

10.1145/3610548.3618184 article EN cc-by 2023-12-10

ToonTalker: Cross-Domain Face Reenactment

OPENALEX - Publications

Yuan Gong Yong Zhang Xiaodong Cun Fei Yin Yanbo Fan and 3 more

We target cross-domain face reenactment in this paper, i.e., driving a cartoon image with the video of real person and vice versa. Recently, many works have focused on one-shot talking generation to drive portrait video, within-domain reenactment. Straightforwardly applying those methods animation will cause inaccurate expression transfer, blur effects, even apparent artifacts due domain shift between faces. Only few attempt settle The most related work AnimeCeleb [13] requires constructing...

10.1109/iccv51070.2023.00707 article EN 2021 IEEE/CVF International Conference on Computer Vision (ICCV) 2023-10-01

AER-LLM: Ambiguity-aware Emotion Recognition Leveraging Large Language Models

OPENALEX - Publications

Xin Hong Yuan Gong Vidhyasaharan Sethu Ting Dang

10.1109/icassp49660.2025.10888198 article EN ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2025-03-12

Revise, Reason, and Recognize: LLM-Based Emotion Recognition via Emotion-Specific Prompts and ASR Error Correction

OPENALEX - Publications

Yuanchao Li Yuan Gong Chao-Han Huck Yang Peter Bell Catherine Lai

10.1109/icassp49660.2025.10888591 article EN ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2025-03-12

Logic-gated biolaser for digital biochemical sensing

OPENALEX - Publications

X. Jessie Yang Chenxiang Wang Zhou Li Tingting Wang Cairong Zhang and 3 more

10.1117/12.3061893 article EN 26th International Conference on Optical Fiber Sensors 2025-05-22

Organocatalytic oxidative dehydrogenation of aromatic amines for the preparation of azobenzenes under mild conditions

OPENALEX - Publications

Hengchang Ma Wenfeng Li Jian Wang Guanghai Xiao Yuan Gong and 9 more

10.1016/j.tet.2012.07.012 article EN Tetrahedron 2012-07-20

A novel FBG-based security fence enabling to detect extremely weak intrusion signals from nonequivalent sensor nodes

OPENALEX - Publications

Huijuan Wu Yunjiang Rao Cheng Tang Yu Wu Yuan Gong

10.1016/j.sna.2011.02.046 article EN Sensors and Actuators A Physical 2011-03-08

Rethinking Knowledge Distillation via Cross-Entropy

OPENALEX - Publications

Zhendong Yang Zhe Li Yuan Gong Tianke Zhang Shanshan Lao and 2 more

Knowledge Distillation (KD) has developed extensively and boosted various tasks. The classical KD method adds the loss to original cross-entropy (CE) loss. We try decompose explore its relation with CE Surprisingly, we find it can be regarded as a combination of an extra which identical form However, notice forces student's relative probability learn teacher's absolute probability. Moreover, sum two probabilities is different, making hard optimize. To address this issue, revise formulation...

10.48550/arxiv.2208.10139 preprint EN other-oa arXiv (Cornell University) 2022-01-01

Rapid and intelligent discrimination of Notopterygium incisum and Notopterygium franchetii by infrared spectroscopic fingerprints and electronic olfactory fingerprints

OPENALEX - Publications

Jianbo Chen Jing Fan Dan Wang Shiyan Yue Xiaolin Zhai and 2 more

10.1016/j.saa.2020.118176 article EN Spectrochimica Acta Part A Molecular and Biomolecular Spectroscopy 2020-02-18

Animate-A-Story: Storytelling with Retrieval-Augmented Video Generation

OPENALEX - Publications

Yingqing He Menghan Xia Haoxin Chen Xiaodong Cun Yuan Gong and 6 more

Generating videos for visual storytelling can be a tedious and complex process that typically requires either live-action filming or graphics animation rendering. To bypass these challenges, our key idea is to utilize the abundance of existing video clips synthesize coherent by customizing their appearances. We achieve this developing framework comprised two functional modules: (i) Motion Structure Retrieval, which provides candidates with desired scene motion context described query texts,...

10.48550/arxiv.2307.06940 preprint EN other-oa arXiv (Cornell University) 2023-01-01

Interfacial Bond Properties between Normal Strength Concrete and Epoxy Resin Concrete

OPENALEX - Publications

Nannan Sun Yifan Song Wei Hou Hanhao Zhang D. Wu and 2 more

It is necessary to pay attention the bonding strength of interface between precast normal concrete (NSC) and cast‐in‐place epoxy resin (EMR) when using EMR as a repair or filling material an overlay in bridges’ rehabilitation. However, performances are different due differential mix ratios; thus, properties various cement not completely same. This article investigated interfacial bond NSC ERC by direct tensile, push‐out, slant shear test with specimens special size structure observed...

10.1155/2021/5561097 article EN cc-by Advances in Materials Science and Engineering 2021-01-01

A Facile Synthesis of 3H-Benzo[1,2]Dithiole-3-Thiones and Their Condensation with Active Methylene Compounds

OPENALEX - Publications

Hongwei Jin Dong Jiang Jianrong Gao Gen‐Rong Qiang Yuan Gong

Abstract 3H-Benzo[1,2]-dithiole-3-thiones were prepared from potassium sulfide and 2-halobenzaldehydes in moderate-to-good yields, a plausible mechanism for this catalyst-free intramolecular heteroannulation reaction has been proposed. The Knoevenagel condensation reactions of 3H-benzo[1,2]dithiole-3-thiones with active methylene compounds such as ethyl 2-cyanoacetate diethyl malonate, the three-component one-pot sulfide, 2-halobenzaldehydes, 2-cyanoacetate, affording corresponding products,...

10.1080/10426507.2011.600743 article EN Phosphorus, sulfur, and silicon and the related elements 2011-10-31

Sensitivity analysis of hybrid fiber Fabry-Pérot refractive-index sensor

OPENALEX - Publications

Yuan Gong Guo Yu Rao Yun-Jiang Tian Zhao Wu Yu and 1 more

Theoretical expressions for analyzing the refractive-index sensitivity of hybrid optical fiber Fabry-Pérot sensor is developed. Influence experimental parameters on measurement discussed. Hybrid fabricated by chemically etching a graded-index multimode (GI-MMF), fusion splicing it into single mode fiber, and cleaving GI-MMF. The fringe contrast exceeds 30 dB corresponding refractive index about 45 per refraction unit. Experimental results are in good agreement with theoretical ones. It...

10.7498/aps.60.064202 article EN cc-by Acta Physica Sinica 2011-01-01

MANIQA: Multi-dimension Attention Network for No-Reference Image Quality Assessment

OPENALEX - Publications

Sidi Yang Tianhe Wu Shuwei Shi Shanshan Lao Yuan Gong and 3 more

No-Reference Image Quality Assessment (NR-IQA) aims to assess the perceptual quality of images in accordance with human subjective perception. Unfortunately, existing NR-IQA methods are far from meeting needs predicting accurate scores on GAN-based distortion images. To this end, we propose Multi-dimension Attention Network for no-reference (MANIQA) improve performance distortion. We firstly extract features via ViT, then strengthen global and local interactions, Transposed Block (TAB) Scale...

10.48550/arxiv.2204.08958 preprint EN cc-by arXiv (Cornell University) 2022-01-01

TaleCrafter: Interactive Story Visualization with Multiple Characters

OPENALEX - Publications

Yuan Gong Youxin Pang Xiaodong Cun Menghan Xia Yingqing He and 6 more

Accurate Story visualization requires several necessary elements, such as identity consistency across frames, the alignment between plain text and visual content, a reasonable layout of objects in images. Most previous works endeavor to meet these requirements by fitting text-to-image (T2I) model on set videos same style with characters, e.g., FlintstonesSV dataset. However, learned T2I models typically struggle adapt new scenes, styles, often lack flexibility revise synthesized This paper...

10.48550/arxiv.2305.18247 preprint EN other-oa arXiv (Cornell University) 2023-01-01

Research on recommendation algorithm based on user sentiment analysis

OPENALEX - Publications

Yuan Gong

With the development of Internet technology, recommendation system is becoming an essential part major e-commerce platforms, social media platforms and other application fields. The main purpose algorithm to provide users with personalized accurate recommendations goods, services information. Traditional algorithms are mainly based on information such as historical behavior recommend similar items users. However, only considering cannot fully reflect individual needs because emotions...

10.54254/2755-2721/45/20241030 article EN Applied and Computational Engineering 2024-03-15

Revise, Reason, and Recognize: LLM-Based Emotion Recognition via Emotion-Specific Prompts and ASR Error Correction

OPENALEX - Publications

Yuanchao Li Yuan Gong Chao-Han Huck Yang Peter Bell Catherine Lai

Annotating and recognizing speech emotion using prompt engineering has recently emerged with the advancement of Large Language Models (LLMs), yet its efficacy reliability remain questionable. In this paper, we conduct a systematic study on topic, beginning proposal novel prompts that incorporate emotion-specific knowledge from acoustics, linguistics, psychology. Subsequently, examine effectiveness LLM-based prompting Automatic Speech Recognition (ASR) transcription, contrasting it...

10.48550/arxiv.2409.15551 preprint EN arXiv (Cornell University) 2024-09-23

AER-LLM: Ambiguity-aware Emotion Recognition Leveraging Large Language Models

OPENALEX - Publications

Xin Hong Yuan Gong Vidhyasaharan Sethu Ting Dang

Recent advancements in Large Language Models (LLMs) have demonstrated great success many Natural Processing (NLP) tasks. In addition to their cognitive intelligence, exploring capabilities emotional intelligence is also crucial, as it enables more natural and empathetic conversational AI. studies shown LLMs' capability recognizing emotions, but they often focus on single emotion labels overlook the complex ambiguous nature of human emotions. This study first address this gap by potential...

10.48550/arxiv.2409.18339 preprint EN arXiv (Cornell University) 2024-09-26

Enhanced Sagger Crack Detection Integrating Deep Learning and Machine Vision

OPENALEX - Publications

Tao Song Ting Chen Yuan Gong Yulin Wang Ran Lu and 3 more

In recent years, target inspection has found extensive utilization within the industry, making it crucial to detect defects in industrial products ensure quality. To address challenges posed by large brightness differences, attached dirt, and complex backgrounds saggers, we propose a sagger defect recognition method that integrates deep learning detection machine vision feature extraction. This commences employing photometric stereo construct curvature map of surface, reducing interference...

10.3390/electronics13245010 article EN Electronics 2024-12-20