Bin Jiang

ORCID: 0000-0002-5840-9664
Publications
Citations
Views
---
Saved
---
About
Contact & Profiles
Research Areas
  • Generative Adversarial Networks and Image Synthesis
  • Advanced Image Processing Techniques
  • Multimodal Machine Learning Applications
  • Advanced Image and Video Retrieval Techniques
  • Digital Media Forensic Detection
  • Topic Modeling
  • Video Analysis and Summarization
  • Advanced Vision and Imaging
  • Sentiment Analysis and Opinion Mining
  • Complex Network Analysis Techniques
  • Advanced Neural Network Applications
  • Recommender Systems and Techniques
  • Image Enhancement Techniques
  • Opinion Dynamics and Social Influence
  • Domain Adaptation and Few-Shot Learning
  • Natural Language Processing Techniques
  • Advanced Graph Neural Networks
  • Video Surveillance and Tracking Methods
  • Advanced Text Analysis Techniques
  • Human Pose and Action Recognition
  • Machine Fault Diagnosis Techniques
  • Text and Document Classification Technologies
  • Face recognition and analysis
  • Gear and Bearing Dynamics Analysis
  • Structural Health Monitoring Techniques

Hunan University
2016-2025

Chongqing University of Technology
2024

Yibin University
2022-2024

Yibin Vocational and Technical College
2022-2024

Chongqing University
2024

Shandong University
2017-2024

South China University of Technology
2023

University of South China
2023

Jilin Agricultural University
2023

Tangshan College
2021

The latest deep learning-based approaches have shown promising results for the challenging task of inpainting missing regions an image. However, existing methods often generate contents with blurry textures and distorted structures due to discontinuity local pixels. From a semantic-level perspective, pixel is mainly because these ignore semantic relevance feature continuity hole regions. To handle this problem, we investigate human behavior in repairing pictures propose fined generative...

10.1109/iccv.2019.00427 article EN 2021 IEEE/CVF International Conference on Computer Vision (ICCV) 2019-10-01

Recently, deep learning-based models have exhibited remarkable performance for image manipulation detection. However, most of them suffer from poor universality handcrafted or predetermined features. Meanwhile, they only focus on localization and overlook classification. To address these issues, we propose a coarse-to-fine architecture named Constrained R-CNN complete accurate forensics. First, the learnable feature extractor learns unified representation directly data. Second, attention...

10.1109/icme46284.2020.9102825 article EN 2022 IEEE International Conference on Multimedia and Expo (ICME) 2020-06-09

Given an untrimmed video and a description query, temporal moment retrieval aims to localize the segment within that best describes textual query. Existing studies predominantly employ coarse frame-level features as visual representation, obfuscating specific details which may provide critical cues for localizing desired moment. We propose SLTA (short "Spatial Language-Temporal Attention") method address detail missing issue. Specifically, takes advantage of object-level local attends most...

10.1145/3323873.3325019 article EN 2019-06-05

Most learning-based low-light image enhancement methods typically suffer from two problems. First, they require a large amount of paired data for training, which are difficult to acquire in most cases. Second, the process enhancement, noise is be removed and may even amplified. In other words, performing denoising illumination at same time difficult. As an alternative supervised learning strategies that use data, as presented previous work, this paper presents mixed-attention guided...

10.26599/bdma.2021.9020020 article EN cc-by Big Data Mining and Analytics 2022-01-24

Compactness and light weight, large exit pupil diameter distance, small distortion for virtual image, see-through paths are pivotal factors to achieve a better, wearable experience of optical head-mounted displays (OST-HMDs). In addition, efficiency the image path is an important factor heat dissipation in HMD devices. This paper presents new type OST-HMD system that includes three wedge-shaped freeform prisms two symmetric lenses. Based on 0.71 in. microdisplay, prototype with diagonal...

10.1364/prj.440018 article EN Photonics Research 2021-11-02

Visual Question Answering (VQA) is a challenging multi-modal learning task since it requires an understanding of both visual and textual modalities simultaneously. Therefore, the approaches used to represent images questions in fine-grained manner play key roles performance. In order obtain image question representations, we develop co-attention mechanism using end-to-end deep network architecture jointly learn features. Specifically, attention implemented by self-attention model will reduce...

10.1109/access.2019.2908035 article EN cc-by-nc-nd IEEE Access 2019-01-01

Cross-speed bearing fault diagnosis based on multiple source domains and their data enables high-performance condition monitoring for variable-speed equipment, such as engines turbines. Current multi-source methods typically employ a fixed-length sampling strategy to construct samples then align the distributions of these from different domains. However, neglect inherent periodic characteristics data, resulting in incomplete or redundant features samples. To address this challenge, we...

10.1109/tte.2024.3525077 article EN IEEE Transactions on Transportation Electrification 2025-01-01

Partially Relevant Video Retrieval (PRVR) aims to accurately retrieve the most relevant video in response a query from untrimmed videos. The analysis of content can be done at three different granularities: frame-level, clip-level, and video-level. Previous methods have focused on one or two these levels for alignment, limiting exploration semantics. Moreover, some use video-level alignment apply self-attention mechanism generate features, but this may not ideal as entire query. We propose M...

10.1145/3716388 article EN ACM Transactions on Multimedia Computing Communications and Applications 2025-02-07

User-intended visual content fills the hole regions of an input image in editing scenario. The coarse low- level inputs, which typically consist sparse sketch lines and color dots, convey user intentions for creation (i.e., free-form editing). While existing methods combine these low-level controls CNN corresponding feature representations are not sufficient to intentions, leading unfaithfully generated content. In this paper, we propose DeFLOCNet relies on a deep encoder-decoder retain...

10.1109/cvpr46437.2021.01062 article EN 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 2021-06-01

Gray sufu is a traditional fermented bean product with strong flavor in China, but fermentation methods often lead to its off-flavor. This study was performed investigate the quality characteristics of gray using L. mesenteroides F24. Results showed 220 volatile compounds sufu, among which alcohols and esters were main volatiles. Inoculation F24 considerably affected contents substances substantially increased compounds. In addition, 29 kinds key identified by analyzing ROAVs. Four unique...

10.1016/j.fochx.2023.100881 article EN cc-by-nc-nd Food Chemistry X 2023-09-16

10.1016/j.ress.2024.110608 article EN Reliability Engineering & System Safety 2024-10-30

Recommending hashtags for micro-videos is a challenging task due to the following two reasons: 1) micro-video unity of multi-modalities, including visual, acoustic, and textual modalities. Therefore, how effectively extract features from multi-modalities utilize them express great significance; 2) usually include moods feelings, which may provide crucial cues recommending proper hashtags. However, most existing works have not considered sentiment media data hashtag recommendation. In this...

10.1109/access.2020.2989473 article EN cc-by IEEE Access 2020-01-01

Aiming at the problems of low computational efficiency and insufficient precision for traditional violent behavior recognition methods, we propose a SELayer-3D Convolutional Neural Network (C3D). Firstly, C3D model is adopted to extract spatio-temporal feature information in video block. Secondly obtained features are assigned weights according importance degree by SELayer. Finally, output predicted Softmax classifier. In test experiment on CrowdViolence dataset, our method achieves an...

10.1109/icea.2019.8858306 article EN 2019-08-01

Semantic segmentation for lightweight object parsing is a very challenging task, because both accuracy and efficiency (e.g., execution speed, memory footprint or computational complexity) should all be taken into account. However, most previous works pay too much attention to one-sided perspective, either ignore others, which poses great limitation actual demands of intelligent devices. To tackle this dilemma, we propose novel architecture named Context-Integrated Feature-Refined Network...

10.1109/tip.2020.2978583 article EN IEEE Transactions on Image Processing 2020-01-01

The problem of position estimation has always been widely discussed in the field wireless communication. In recent years, deep learning technology is rapidly developing and attracting numerous applications. high-dimension modeling capability makes it possible to solve localization problems under many nonideal scenarios which are hard handle by classical models. Consequently, based on attracted extensive research during last decade. applications reviewed this paper. Typical models summarized...

10.1155/2020/5214920 article EN Mathematical Problems in Engineering 2020-10-19

This paper explores the task of video moment retrieval (VMR), which aims to localize temporal boundary a specific from an untrimmed by sentence query. Previous methods either extract pre-defined candidate features and select that best matches query ranking, or directly align clips target with predict matching scores. Despite their effectiveness, these mostly focus only on aligning single-level clip features, ignore different granularities involved in itself, such as clip, moment, video,...

10.1145/3503161.3547963 article EN Proceedings of the 30th ACM International Conference on Multimedia 2022-10-10

Visual Question Answering (VQA) is a challenging multi-modal task to answer questions about an image. Many works concentrate on how reduce language bias which makes models ignoring visual content and context. However, reducing also weakens the ability of VQA learn context prior. To address this issue, we propose novel learning strategy named CCB, forces relying Content Context with Bias. Specifically, CCB establishes branches top base model them focus local key global effective respectively....

10.1109/icme51207.2021.9428098 article EN 2022 IEEE International Conference on Multimedia and Expo (ICME) 2021-06-09

The latest deep learning-based approaches have shown promising results for the challenging task of inpainting missing regions an image. However, existing methods often generate contents with blurry textures and distorted structures due to discontinuity local pixels. From a semantic-level perspective, pixel is mainly because these ignore semantic relevance feature continuity hole regions. To handle this problem, we investigate human behavior in repairing pictures propose fined generative...

10.48550/arxiv.1905.12384 preprint EN other-oa arXiv (Cornell University) 2019-01-01
Coming Soon ...