- Advanced Image and Video Retrieval Techniques
- Multimodal Machine Learning Applications
- Wireless Signal Modulation Classification
- Human Pose and Action Recognition
- Text and Document Classification Technologies
- Video Surveillance and Tracking Methods
- Advanced Neural Network Applications
- Domain Adaptation and Few-Shot Learning
- Remote-Sensing Image Classification
- Topic Modeling
- Imbalanced Data Classification Techniques
- Advanced Vision and Imaging
- Radar Systems and Signal Processing
- Generative Adversarial Networks and Image Synthesis
- Blind Source Separation Techniques
- Image Retrieval and Classification Techniques
- Face and Expression Recognition
- Machine Learning and ELM
- Geophysical Methods and Applications
- Neural Networks and Applications
- Integrated Circuits and Semiconductor Failure Analysis
- Natural Language Processing Techniques
- Advanced Photonic Communication Systems
- Security and Verification in Computing
- Information Retrieval and Search Behavior
Xidian University
2020-2024
University of Science and Technology Beijing
2012-2023
Chinese Academy of Sciences
2011-2022
Suzhou Institute of Biomedical Engineering and Technology
2022
China Southern Power Grid (China)
2022
Tencent (China)
2020-2021
China Electric Power Research Institute
2021
North China Electric Power University
2021
Alibaba Group (China)
2020
Beijing University of Posts and Telecommunications
2019
Language representation models such as BERT could effectively capture contextual semantic information from plain text, and have been proved to achieve promising results in lots of downstream NLP tasks with appropriate fine-tuning. However, most existing language cannot explicitly handle coreference, which is essential the coherent understanding whole discourse. To address this issue, we present CorefBERT, a novel model that can coreferential relations context. The experimental show that,...
In this paper, we address the text and image matching in cross-modal retrieval of fashion industry. Different from general domain, is required to pay much more attention fine-grained information images texts. Pioneer approaches detect region interests (i.e., RoIs) use RoI embeddings as representations. general, RoIs tend represent "object-level" images, while texts are prone describe detailed information, e.g. styles, attributes. thus not enough for matching. To end, propose FashionBERT,...
To address the issue associated with poor accuracy rates for specific emitter identification (SEI) under low signal-to-noise ratio (SNR) conditions, where single-dimension radar signal characteristics are severely affected by noise, we propose an attention-enhanced dual-branch residual network structure based on adaptive large-margin Softmax (ALS). Initially, designed a to extract features from one-dimensional intermediate frequency data and two-dimensional time–frequency images,...
Visual dialog, which aims to hold a meaningful conversation with humans about given image, is challenging task that requires models reason the complex dependencies among visual content, dialog history, and current questions.Graph neural networks are recently applied model implicit relations between objects in an image or dialog.However, they neglect importance of 1) coreference history dependency words for question representation; 2) representation based on fully represented...
Multi-target Multi-camera Tracking (MTMCT) aims to extract the trajectories from videos captured by a set of cameras. Recently, tracking performance MTMCT is significantly enhanced with employment re-identification (Re-ID) model. However, appearance feature usually becomes unreliable due occlusion and orientation variance targets. Directly applying Re-ID model in will encounter problem identity switches (IDS) tracklet fragment caused occlusion. To solve these problems, we propose novel...
In recent years, image processing methods based on convolutional neural networks (CNNs) have achieved very good results. At the same time, many branch techniques been proposed to improve accuracy. Aiming at change detection task of remote sensing images, we propose a new network U-Net in this paper. The attention mechanism is cleverly applied task, and data-dependent upsampling (DUpsampling) method used so that shows improvement accuracy, calculation amount greatly reduced. experimental...
Multi-label image classification is more in line with the real-world applications. This problem difficult due to fact that complex label space makes it hard get label-level attention regions and deal semantic relationships among labels. Common deep network-based methods utilize CNN extract features consider labels as a sequence or graph, thus handling correlations RNN graph-theoretical algorithms. In this paper, we propose novel CNN-RNN-based model, bi-modal multi-label learning(BMML)...
Visual dialogue is a challenging task since it needs to answer series of coherent questions on the basis understanding visual environment.Previous studies focus implicit exploration multimodal coreference by implicitly attending spatial image features or object-level but neglect importance locating objects explicitly in content, which associated with entities textual content.Therefore, this paper we propose Multimodal Incremental Transformer Grounding, named MITVG, consists two key parts:...
The results of aerial scene classification can provide valuable information for urban planning and land monitoring. In this specific field, there are always a number object-level semantic classes in big remote-sensing pictures. Complex label-space makes it hard to detect all the targets perceive corresponding semantics typical scene, thereby weakening sensing ability. Even worse, preparation labeled dataset training deep networks is more difficult due multiple labels. order mine visual...
In the modern electromagnetic environment, intra-pulse modulations of radar emitter signals have become more complex. Except for single-component signals, dual-component been widely used in current systems. order to make system ability classify and modulation at same period time accurately, this paper, we propose a multi-label learning method based on convolutional neural network transformer. Firstly, original single channel sampled sequences are padded with zeros length. Then converted...
A novel feature extraction algorithm for multichannel FPGA-based neural recording systems is presented in this paper. It contains the Dual Vertex Threshold (DVT) and Minimum Delimitation (MD), which are used spike detection vector respectively. By reducing computational complexity of DVT MD, difficulty application greatly reduced. Based on characteristic, a FPGA hardware architecture implemented Using extracted vectors, sorting performance K-means as good that with PCA-based features....
A deep understanding of our visual world is more than an isolated perception on a series objects, and the relationships between them also contain rich semantic information. Especially for those satellite remote sensing images, span so large that various objects are always different sizes complex spatial compositions. Therefore, recognition relations conducive to strengthen scenes. In this paper, we propose novel multi-scale fusion network (MSFN). framework, dilated convolution introduced...
Fine-grained image categorization is still a challenging computer vision problem in recent years. Most of existing methods highly rely on massive labeled data which are scarce many real world applications. It should also be noticed that progressive learning demands very common today. That is, we may pay attention to more fine-grained information (like arctic tern, black buttercup or tulip) an set with labels like "bird" and "flower". reasonable believe the model transferable knowledge would...
User generated social annotations provide extra information for describing document contents. In this paper, we propose an effective method to model the categorization property of and explore potential combining it with classical language models improving retrieval performance. Specifically, a novel TR-LDA is presented take as additional source generating contents apart from itself. We strategies representing weighting develop efficient inference algorithm, where space saving taken into...
Profiting from the great progress of information technology, a huge number multi-label samples are available in our daily life. As result, classification has aroused widespread concern. Different traditional machine learning methods which time-consuming during training phase, ELM-RBF (extreme machine-radial basis function) is more efficient and become research hotspot classification. However, because lack effective optimization methods, conventional extreme machines always unstable tend to...
In recent years, deep learning methods have been widely applied in remote sensing image classification tasks, providing valuable information for natural monitoring and spatial planning. an actual application like this, acquiring massive labeled data convolutional networks is costly difficult especially the situation that sources are diverse requirements changing. Transfer already shown superior performance on exploiting domain invariance features existing network-based categorization tasks....
This paper proposed a method based on the cluster boundary sampling RF-Bagging to solve problem of imbalance data classification. It uses boundaries down-sampling preprocessing training and then SVM RF two different base classifiers as learning algorithms, integrated before after by bagging respectively, contrast experiment results are obtained. Finally, we use ROC curves AUC values evaluation. The experimental show that can improve classification effect classifier, deal with problems effectively.