- Multimodal Machine Learning Applications
- Advanced Neural Network Applications
- Domain Adaptation and Few-Shot Learning
- Topic Modeling
- Human Pose and Action Recognition
- Advanced Image and Video Retrieval Techniques
- Visual Attention and Saliency Detection
- Video Analysis and Summarization
- Video Surveillance and Tracking Methods
- Anomaly Detection Techniques and Applications
- Natural Language Processing Techniques
- COVID-19 diagnosis using AI
- Adversarial Robustness in Machine Learning
- Advanced Graph Neural Networks
- Machine Learning in Materials Science
- Advanced Vision and Imaging
- Robotics and Sensor-Based Localization
- Cloud Computing and Resource Management
- Advanced Computational Techniques and Applications
- Handwritten Text Recognition Techniques
- Optical measurement and interference techniques
- Advanced Image Processing Techniques
- Machine Learning and Algorithms
- CCD and CMOS Imaging Sensors
- Medical Image Segmentation Techniques
Northwestern Polytechnical University
2019-2024
Lenovo (China)
2024
Qufu Normal University
2023
Shandong Academy of Sciences
2023
Qilu University of Technology
2023
Shandong Institute of Automation
2022-2023
Chinese Academy of Sciences
2022
Space Engineering University
2021
Deepblue Technology (China)
2021
Baidu (China)
2019
In this work, we tackle the problem of instance segmentation, task simultaneously solving object detection and semantic segmentation. Towards goal, present a model, called MaskLab, which produces three outputs: box detection, direction prediction. Building on top Faster-RCNN detector, predicted boxes provide accurate localization instances. Within each region interest, MaskLab performs foreground/background segmentation by combining Semantic assists model in distinguishing between objects...
Recognizing irregular text in natural scene images is challenging due to the large variance appearance, such as curvature, orientation and distortion. Most existing approaches rely heavily on sophisticated model designs and/or extra fine-grained annotations, which, some extent, increase difficulty algorithm implementation data collection. In this work, we propose an easy-to-implement strong baseline for recognition, using offthe-shelf neural network components only word-level annotations. It...
The success of deep neural networks relies on significant architecture engineering. Recently search (NAS) has emerged as a promise to greatly reduce manual effort in network design by automatically searching for optimal architectures, although typically such algorithms need an excessive amount computational resources, e.g., few thousand GPU-days. To date, challenging vision tasks object detection, NAS, especially fast versions is less studied. Here we propose the decoder structure detectors...
High-resolution representations (HR) are essential for dense prediction tasks such as segmentation, detection, and pose estimation. Learning HR is typically ignored in previous Neural Architecture Search (NAS) methods that focus on image classification. This work proposes a novel NAS method, called HR-NAS, which able to find efficient accurate networks different tasks, by effectively encoding multiscale contextual information while maintaining high-resolution representations. In we renovate...
Mengze Li, Tianbao Wang, Haoyu Zhang, Shengyu Zhou Zhao, Jiaxu Miao, Wenqiao Wenming Tan, Jin Peng Shiliang Pu, Fei Wu. Proceedings of the 60th Annual Meeting Association for Computational Linguistics (Volume 1: Long Papers). 2022.
The success of deep neural networks relies on significant architecture engineering. Recently search (NAS) has emerged as a promise to greatly reduce manual effort in network design by automatically searching for optimal architectures, although typically such algorithms need an excessive amount computational resources, e.g., few thousand GPU-days. To date, challenging vision tasks object detection, NAS, especially fast versions is less studied. Here we propose the decoder structure detectors...
Camouflaged object detection (COD), which aims to identify the objects that conceal themselves into surroundings, has recently drawn increasing research efforts in field of computer vision. In practice, success deep learning based COD is mainly determined by two key factors, including (i) A significantly large receptive field, provides rich context information, and (ii) An effective fusion strategy, aggregates multi-level features for accurate COD. Motivated these observations, this paper,...
In this work, we tackle the problem of instance segmentation, task simultaneously solving object detection and semantic segmentation. Towards goal, present a model, called MaskLab, which produces three outputs: box detection, direction prediction. Building on top Faster-RCNN detector, predicted boxes provide accurate localization instances. Within each region interest, MaskLab performs foreground/background segmentation by combining Semantic assists model in distinguishing between objects...
Due to the attractive potential in avoiding elaborate definition of anchor attributes, anchor-free-based deep learning approaches are promising for object detection remote sensing imagery. CornerNet is one most representative methods approaches. However, it can be observed distinctly from visual inspection that limited grouping keypoints, which significantly impacts performance. To address above problem, a novel and effective approach, called GroupNet, presented this paper, adaptively groups...
In this paper, we propose an improved YOLOv5 pedestrian detection algorithm to solve the problems of target missing and low accuracy in ROS platform. By adding a small layer 160*160, method improves performance model effectively reduces false rate occluded pedestrians, especially heavily targets. order further improve accuracy, it fuses underlying features backbone network achieve path aggregation with multi-feature fusion. Furthermore, Soft-DIoU-NMS is used for post-detection processing...
In this work, we construct a large-scale dataset for Ground-to-Aerial Person Search, named G2APS, which contains 31,770 images of 260,559 annotated bounding boxes 2,644 identities appearing in both the UAVs and ground surveillance cameras. To our knowledge, is first cross-platform intelligent applications, where could work as powerful complement more realistically simulate actual scenarios, cameras are fixed about 2 meters above ground, while capture videos persons at different location,...
In recent decades, the vision community has witnessed remarkable progress in visual recognition, partially owing to advancements dataset benchmarks. Notably, established COCO benchmark propelled development of modern detection and segmentation systems. However, seen comparatively slow improvement over last decade. Originally equipped with coarse polygon annotations for thing instances, it gradually incorporated superpixel stuff regions, which were subsequently heuristically amalgamated yield...
Neural Radiance Fields (NeRF) have shown impressive results in 3D reconstruction and generating novel views. A key challenge within NeRF is the editing of reconstructed scenes, such as object removal, which requires maintaining consistency across multiple views ensuring high-quality synthesised perspectives. Previous studies incorporated depth priors, typically from LiDAR or sparse measurements provided by COLMAP, to improve performance removal NeRF. However, these methods are either costly...
Referring Expression Comprehension (REC) is a crucial cross-modal task that objectively evaluates the capabilities of language understanding, image comprehension, and language-to-image grounding. Consequently, it serves as an ideal testing ground for Multi-modal Large Language Models (MLLMs). In pursuit this goal, we have established new REC dataset characterized by two key features: Firstly, designed with controllable varying levels difficulty, necessitating multi-level fine-grained...