NFDI4DS | UHH-SEMS - Publication Details

Shiyu Huang

ORCID: 0009-0008-9541-0622

Publications

Citations

Views

---

Saved

---

About

Contact & Profiles

A5106404139

Research Areas

Multimodal Machine Learning Applications
Advanced Image and Video Retrieval Techniques
Image Enhancement Techniques
Human Pose and Action Recognition
Image Retrieval and Classification Techniques
Video Surveillance and Tracking Methods
Advanced Image Processing Techniques
Advanced Vision and Imaging
Video Analysis and Summarization

Dalian Polytechnic University
2024

MotionBench: Benchmarking and Improving Fine-grained Video Motion Understanding for Vision Language Models

OPENALEX - Publications

Wenyi Hong Yean Cheng Zhuoyi Yang Weihan Wang Lefan Wang and 4 more

In recent years, vision language models (VLMs) have made significant advancements in video understanding. However, a crucial capability - fine-grained motion comprehension remains under-explored current benchmarks. To address this gap, we propose MotionBench, comprehensive evaluation benchmark designed to assess the of understanding models. MotionBench evaluates models' motion-level perception through six primary categories motion-oriented question types and includes data collected from...

10.48550/arxiv.2501.02955 preprint EN arXiv (Cornell University) 2025-01-06

CogVLM2: Visual Language Models for Image and Video Understanding

OPENALEX - Publications

Wenyi Hong Weihan Wang Ming Ding Wenmeng Yu Qingsong Lv and 20 more

Beginning with VisualGLM and CogVLM, we are continuously exploring VLMs in pursuit of enhanced vision-language fusion, efficient higher-resolution architecture, broader modalities applications. Here propose the CogVLM2 family, a new generation visual language models for image video understanding including CogVLM2, CogVLM2-Video GLM-4V. As an model, inherits expert architecture improved training recipes both pre-training post-training stages, supporting input resolution up to $1344 \times...

10.48550/arxiv.2408.16500 preprint EN arXiv (Cornell University) 2024-08-29

DAF‐Retinex: Preserve the image detailed features and restore the reflected image

OPENALEX - Publications

Shiyu Huang Zijun Gao Jue Wang Bo Li

Abstract Currently, deep learning methods for low‐light image enhancement tasks mainly focus on the illumination of images, while neglecting problems noise and feature loss. To address this issue, paper proposes a novel network called DAF‐Retinex, based Retinex‐Net. issue noise, different from traditional denoise methods, utilizes fully convolutional neural to reflection component, additionally, denoising loss function is introduced suppress noise. For preserving details extracting features,...

10.1049/ipr2.13110 article EN cc-by IET Image Processing 2024-05-17

VisionReward: Fine-Grained Multi-Dimensional Human Preference Learning for Image and Video Generation

OPENALEX - Publications

Jiazheng Xu Yu Huang Jiale Cheng Yuanming Yang Jiajun Xu and 16 more

We present a general strategy to aligning visual generation models -- both image and video with human preference. To start with, we build VisionReward fine-grained multi-dimensional reward model. decompose preferences in images videos into multiple dimensions, each represented by series of judgment questions, linearly weighted summed an interpretable accurate score. address the challenges quality assessment, systematically analyze various dynamic features videos, which helps surpass...

10.48550/arxiv.2412.21059 preprint EN arXiv (Cornell University) 2024-12-30

Coming Soon ...