- Advanced Image and Video Retrieval Techniques
- Remote-Sensing Image Classification
- Evaluation Methods in Various Fields
- Domain Adaptation and Few-Shot Learning
- Automated Road and Building Extraction
- Advanced Image Fusion Techniques
- Advanced Neural Network Applications
- Multimodal Machine Learning Applications
- Remote Sensing and Land Use
- Topic Modeling
- Technology and Security Systems
- E-commerce and Technology Innovations
- Natural Language Processing Techniques
Fudan University
2024-2025
Donghua University
2019
Although remote sensing (RS) data with multiple modalities can be used to significantly improve the accuracy of semantic segmentation in RS data, how effectively extract multimodal information through feature fusion remains a challenging task. Specifically, existing methods for still face two major challenges: 1) Due diverse imaging mechanisms boundaries same foreground may vary across different modalities, leading inclusion unwanted background semantics fused features; 2) from exhibit...
Object detection in remote sensing images (RSIs) remains a challenging task due to complex variations object scale, dense arrangements, and arbitrary orientations. Compared the widely-used multi-stage one-stage approaches, query-based methods that avoid post-processing procedures implement end-to-end inference, have recently attracted much attention. However, existing still face two main challenges: 1) The feature sampling regions predicted by query vectors often fail be aligned with...
Scientific documents record research findings and valuable human knowledge, comprising a vast corpus of high-quality data. Leveraging multi-modality data extracted from these assessing large models' abilities to handle scientific document-oriented tasks is therefore meaningful. Despite promising advancements, models still perform poorly on multi-page document extraction understanding tasks, their capacity process within-document formats such as charts equations remains under-explored. To...
Spatial transformer network has been used in a layered form conjunction with convolutional to enable the model transform data spatially. In this paper, we propose combined spatial (STN) and Long Short-Term Memory (LSTM) classify digits sequences formed by MINST elements. This LSTM-STN top-down attention mechanism profit from LSTM layer, so that STN layer can perform short-term independent elements for statement process of transformation, thus avoiding distortion may be caused when entire...