- Multimodal Machine Learning Applications
- Advanced Image and Video Retrieval Techniques
- Image Enhancement Techniques
- Human Pose and Action Recognition
- Image Retrieval and Classification Techniques
- Video Surveillance and Tracking Methods
- Advanced Image Processing Techniques
- Advanced Vision and Imaging
- Video Analysis and Summarization
Dalian Polytechnic University
2024
In recent years, vision language models (VLMs) have made significant advancements in video understanding. However, a crucial capability - fine-grained motion comprehension remains under-explored current benchmarks. To address this gap, we propose MotionBench, comprehensive evaluation benchmark designed to assess the of understanding models. MotionBench evaluates models' motion-level perception through six primary categories motion-oriented question types and includes data collected from...
Beginning with VisualGLM and CogVLM, we are continuously exploring VLMs in pursuit of enhanced vision-language fusion, efficient higher-resolution architecture, broader modalities applications. Here propose the CogVLM2 family, a new generation visual language models for image video understanding including CogVLM2, CogVLM2-Video GLM-4V. As an model, inherits expert architecture improved training recipes both pre-training post-training stages, supporting input resolution up to $1344 \times...
Abstract Currently, deep learning methods for low‐light image enhancement tasks mainly focus on the illumination of images, while neglecting problems noise and feature loss. To address this issue, paper proposes a novel network called DAF‐Retinex, based Retinex‐Net. issue noise, different from traditional denoise methods, utilizes fully convolutional neural to reflection component, additionally, denoising loss function is introduced suppress noise. For preserving details extracting features,...
We present a general strategy to aligning visual generation models -- both image and video with human preference. To start with, we build VisionReward fine-grained multi-dimensional reward model. decompose preferences in images videos into multiple dimensions, each represented by series of judgment questions, linearly weighted summed an interpretable accurate score. address the challenges quality assessment, systematically analyze various dynamic features videos, which helps surpass...