- Generative Adversarial Networks and Image Synthesis
- Advanced Image Processing Techniques
- Multimodal Machine Learning Applications
- Advanced Image and Video Retrieval Techniques
- Digital Media Forensic Detection
- Topic Modeling
- Video Analysis and Summarization
- Advanced Vision and Imaging
- Sentiment Analysis and Opinion Mining
- Complex Network Analysis Techniques
- Advanced Neural Network Applications
- Recommender Systems and Techniques
- Image Enhancement Techniques
- Opinion Dynamics and Social Influence
- Domain Adaptation and Few-Shot Learning
- Natural Language Processing Techniques
- Advanced Graph Neural Networks
- Video Surveillance and Tracking Methods
- Advanced Text Analysis Techniques
- Human Pose and Action Recognition
- Machine Fault Diagnosis Techniques
- Text and Document Classification Technologies
- Face recognition and analysis
- Gear and Bearing Dynamics Analysis
- Structural Health Monitoring Techniques
Hunan University
2016-2025
Chongqing University of Technology
2024
Yibin University
2022-2024
Yibin Vocational and Technical College
2022-2024
Chongqing University
2024
Shandong University
2017-2024
South China University of Technology
2023
University of South China
2023
Jilin Agricultural University
2023
Tangshan College
2021
The latest deep learning-based approaches have shown promising results for the challenging task of inpainting missing regions an image. However, existing methods often generate contents with blurry textures and distorted structures due to discontinuity local pixels. From a semantic-level perspective, pixel is mainly because these ignore semantic relevance feature continuity hole regions. To handle this problem, we investigate human behavior in repairing pictures propose fined generative...
Recently, deep learning-based models have exhibited remarkable performance for image manipulation detection. However, most of them suffer from poor universality handcrafted or predetermined features. Meanwhile, they only focus on localization and overlook classification. To address these issues, we propose a coarse-to-fine architecture named Constrained R-CNN complete accurate forensics. First, the learnable feature extractor learns unified representation directly data. Second, attention...
Given an untrimmed video and a description query, temporal moment retrieval aims to localize the segment within that best describes textual query. Existing studies predominantly employ coarse frame-level features as visual representation, obfuscating specific details which may provide critical cues for localizing desired moment. We propose SLTA (short "Spatial Language-Temporal Attention") method address detail missing issue. Specifically, takes advantage of object-level local attends most...
Most learning-based low-light image enhancement methods typically suffer from two problems. First, they require a large amount of paired data for training, which are difficult to acquire in most cases. Second, the process enhancement, noise is be removed and may even amplified. In other words, performing denoising illumination at same time difficult. As an alternative supervised learning strategies that use data, as presented previous work, this paper presents mixed-attention guided...
Compactness and light weight, large exit pupil diameter distance, small distortion for virtual image, see-through paths are pivotal factors to achieve a better, wearable experience of optical head-mounted displays (OST-HMDs). In addition, efficiency the image path is an important factor heat dissipation in HMD devices. This paper presents new type OST-HMD system that includes three wedge-shaped freeform prisms two symmetric lenses. Based on 0.71 in. microdisplay, prototype with diagonal...
Visual Question Answering (VQA) is a challenging multi-modal learning task since it requires an understanding of both visual and textual modalities simultaneously. Therefore, the approaches used to represent images questions in fine-grained manner play key roles performance. In order obtain image question representations, we develop co-attention mechanism using end-to-end deep network architecture jointly learn features. Specifically, attention implemented by self-attention model will reduce...
Cross-speed bearing fault diagnosis based on multiple source domains and their data enables high-performance condition monitoring for variable-speed equipment, such as engines turbines. Current multi-source methods typically employ a fixed-length sampling strategy to construct samples then align the distributions of these from different domains. However, neglect inherent periodic characteristics data, resulting in incomplete or redundant features samples. To address this challenge, we...
Partially Relevant Video Retrieval (PRVR) aims to accurately retrieve the most relevant video in response a query from untrimmed videos. The analysis of content can be done at three different granularities: frame-level, clip-level, and video-level. Previous methods have focused on one or two these levels for alignment, limiting exploration semantics. Moreover, some use video-level alignment apply self-attention mechanism generate features, but this may not ideal as entire query. We propose M...
User-intended visual content fills the hole regions of an input image in editing scenario. The coarse low- level inputs, which typically consist sparse sketch lines and color dots, convey user intentions for creation (i.e., free-form editing). While existing methods combine these low-level controls CNN corresponding feature representations are not sufficient to intentions, leading unfaithfully generated content. In this paper, we propose DeFLOCNet relies on a deep encoder-decoder retain...
Gray sufu is a traditional fermented bean product with strong flavor in China, but fermentation methods often lead to its off-flavor. This study was performed investigate the quality characteristics of gray using L. mesenteroides F24. Results showed 220 volatile compounds sufu, among which alcohols and esters were main volatiles. Inoculation F24 considerably affected contents substances substantially increased compounds. In addition, 29 kinds key identified by analyzing ROAVs. Four unique...
Recommending hashtags for micro-videos is a challenging task due to the following two reasons: 1) micro-video unity of multi-modalities, including visual, acoustic, and textual modalities. Therefore, how effectively extract features from multi-modalities utilize them express great significance; 2) usually include moods feelings, which may provide crucial cues recommending proper hashtags. However, most existing works have not considered sentiment media data hashtag recommendation. In this...
Aiming at the problems of low computational efficiency and insufficient precision for traditional violent behavior recognition methods, we propose a SELayer-3D Convolutional Neural Network (C3D). Firstly, C3D model is adopted to extract spatio-temporal feature information in video block. Secondly obtained features are assigned weights according importance degree by SELayer. Finally, output predicted Softmax classifier. In test experiment on CrowdViolence dataset, our method achieves an...
Semantic segmentation for lightweight object parsing is a very challenging task, because both accuracy and efficiency (e.g., execution speed, memory footprint or computational complexity) should all be taken into account. However, most previous works pay too much attention to one-sided perspective, either ignore others, which poses great limitation actual demands of intelligent devices. To tackle this dilemma, we propose novel architecture named Context-Integrated Feature-Refined Network...
The problem of position estimation has always been widely discussed in the field wireless communication. In recent years, deep learning technology is rapidly developing and attracting numerous applications. high-dimension modeling capability makes it possible to solve localization problems under many nonideal scenarios which are hard handle by classical models. Consequently, based on attracted extensive research during last decade. applications reviewed this paper. Typical models summarized...
This paper explores the task of video moment retrieval (VMR), which aims to localize temporal boundary a specific from an untrimmed by sentence query. Previous methods either extract pre-defined candidate features and select that best matches query ranking, or directly align clips target with predict matching scores. Despite their effectiveness, these mostly focus only on aligning single-level clip features, ignore different granularities involved in itself, such as clip, moment, video,...
Visual Question Answering (VQA) is a challenging multi-modal task to answer questions about an image. Many works concentrate on how reduce language bias which makes models ignoring visual content and context. However, reducing also weakens the ability of VQA learn context prior. To address this issue, we propose novel learning strategy named CCB, forces relying Content Context with Bias. Specifically, CCB establishes branches top base model them focus local key global effective respectively....
The latest deep learning-based approaches have shown promising results for the challenging task of inpainting missing regions an image. However, existing methods often generate contents with blurry textures and distorted structures due to discontinuity local pixels. From a semantic-level perspective, pixel is mainly because these ignore semantic relevance feature continuity hole regions. To handle this problem, we investigate human behavior in repairing pictures propose fined generative...