- Multimodal Machine Learning Applications
- Advanced Image and Video Retrieval Techniques
- Domain Adaptation and Few-Shot Learning
- Advanced Neural Network Applications
- Remote-Sensing Image Classification
- Image Retrieval and Classification Techniques
- Image Enhancement Techniques
- Remote Sensing and Land Use
- Cancer-related molecular mechanisms research
- Educational Reforms and Innovations
- Advanced SAR Imaging Techniques
- Brain Tumor Detection and Classification
- Cancer survivorship and care
- NMR spectroscopy and applications
- Visual Attention and Saliency Detection
- Image and Signal Denoising Methods
- Family Support in Illness
- Medical Research and Treatments
- Advanced Image Fusion Techniques
- Cancer-related cognitive impairment studies
- Hydraulic Fracturing and Reservoir Analysis
- Ocean Waves and Remote Sensing
- Video Analysis and Summarization
- Face recognition and analysis
- Synthetic Aperture Radar (SAR) Applications and Techniques
China University of Petroleum, East China
2021-2024
Hunan University
2024
Jinling Institute of Technology
2024
China University of Petroleum, Beijing
2024
Beijing Academy of Artificial Intelligence
2023
Huazhong University of Science and Technology
2020-2023
Southern Medical University
2023
Nanfang Hospital
2023
We launch EVA, a vision-centric foundation model to Explore the limits of Visual representation at scAle using only publicly accessible data. EVA is vanilla ViT pre-trained reconstruct masked out image-text aligned vision features conditioned on visible image patches. Via this pretext task, we can efficiently scale up one billion parameters, and sets new records broad range representative downstream tasks, such as recognition, video action object detection, instance segmentation semantic...
Contrastive language-image pre-training, CLIP for short, has gained increasing attention its potential in various scenarios. In this paper, we propose EVA-CLIP, a series of models that significantly improve the efficiency and effectiveness training. Our approach incorporates new techniques representation learning, optimization, augmentation, enabling EVA-CLIP to achieve superior performance compared previous with same number parameters but smaller training costs. Notably, our largest...
Recently vision transformer has achieved tremendous success on image-level visual recognition tasks. To effectively and efficiently model the crucial temporal information within a video clip, we propose Temporally Efficient Vision Transformer (TeViT) for instance segmentation (VIS). Different from previous transformer-based VIS methods, TeViT is nearly convolution-free, which contains backbone query-based head. In stage, parameter-free messenger shift mechanism early context fusion. head...
Semi-supervised classification of remote sensing hyperspectral image (HSI) aims at exploiting both labeled and unlabeled samples for accurate land cover recognition. However, imbalanced data distribution different difficulties negatively affect performance. Focused on this, a novel dual-stream class-adaptive network (DSCA-Net) is proposed semi-supervised HSI classification, in this paper. First, superpixel-guided label propagation module introduced to alleviate the negative effect...
Few-shot learning is a challenging task that aims at training classifier for unseen classes with only few examples. The main difficulty of few-shot lies in the lack intra-class diversity within insufficient samples. To alleviate this problem, we propose novel generative framework, Diversity Transfer Network (DTN), learns to transfer latent diversities from known categories and composite them support features generate diverse samples feature space. problem sample generation (i.e., transfer)...
The assimilation and prediction of phase-resolved surface gravity waves are critical challenges in ocean science engineering. Potential flow theory (PFT) has been widely employed to develop wave models numerical techniques for prediction. However, traditional methods often limited. For example, most simplified have a limited ability capture strong nonlinearity, while fully nonlinear PFT solvers fail meet the speed requirements engineering applications. This computational inefficiency also...
We investigate the problem of efficiently localizing sketch depicted scenes in a remote sensing image dataset. pose as that retrieval with queries and explore use hashing techniques to achieve efficient retrieval. Given two training datasets sketches images have common set class labels, we develop strategy coconstructs hash code books for separately. The book coconstruction encourages codes from different classes be far away one another those same close. This property is maintained by...
We launch EVA-02, a next-generation Transformer-based visual representation pre-trained to reconstruct strong and robust language-aligned vision features via masked image modeling. With an updated plain Transformer architecture as well extensive pre-training from open & accessible giant CLIP encoder, EVA-02 demonstrates superior performance compared prior state-of-the-art approaches across various representative tasks, while utilizing significantly fewer parameters compute budgets. Notably,...
Wireless capsule endoscopy (WCE) is a recently developed tool that allows for the painless and non-invasive examination of entire gastrointestinal (GI) tract. The microcamera captures large number redundant frames each WCE such video summarization technique needed to assist in diagnosis. However, prevalent methods summarizing videos focus only on representativeness owing lack high-level information their importance. This paper develops Frame Importance-Assisted Sparse Subset Selection model,...
We launch EVA, a vision-centric foundation model to explore the limits of visual representation at scale using only publicly accessible data. EVA is vanilla ViT pre-trained reconstruct masked out image-text aligned vision features conditioned on visible image patches. Via this pretext task, we can efficiently up one billion parameters, and sets new records broad range representative downstream tasks, such as recognition, video action object detection, instance segmentation semantic without...
The goal of multi-object tracking (MOT) is to detect and track all objects in a scene across frames, while maintaining unique identity for each object. Most existing methods rely on the spatial motion features appearance embedding detected consecutive frames. Effectively robustly representing long trajectories has become critical factor affecting performance MOT. We propose novel approach feature representation, improving upon clustering association method MOT\_FCG. For features, we Diagonal...
Synthetic Aperture Radar (SAR) is widely used for observing sea surfaces and retrieving two-dimensional wave spectra. However, existing methods directional spectra from SAR imagettes face challenges due to the complex non-linear SAR-wave imaging relationship limitation of first-guess This study proposes a novel twostage machine learning strategy Chinese Gaofen-3 mode products. We achieve generation complete several parameters solely GF-3 data without necessitating any additional inputs. In...
Summary The permeability of geological formations reflects the porous structure and flow information underground rock reservoirs, playing a crucial role in reservoir evaluation development decision-making for oil fields. This study aims to enhance predictive capability through application multimodal representation learning, leveraging various data types. A dataset is constructed, incorporating well logging such as sonic, resistivity, radioactivity, nuclear magnetic resonance, age...
Atmospheric visibility is a crucial meteorological element impacting urban air pollution monitoring, public transportation, and military security. Traditional detection methods, primarily manual instrumental, have been costly imprecise. With advancements in data science computing, deep learning-based technologies rapidly emerged as research hotspot atmospheric science. This paper systematically reviews the applications of various learning models—Convolutional Neural Networks (CNNs),...
Recently vision transformer has achieved tremendous success on image-level visual recognition tasks. To effectively and efficiently model the crucial temporal information within a video clip, we propose Temporally Efficient Vision Transformer (TeViT) for instance segmentation (VIS). Different from previous transformer-based VIS methods, TeViT is nearly convolution-free, which contains backbone query-based head. In stage, parameter-free messenger shift mechanism early context fusion. head...
We explore the problem of efficiently mutually localizing panchromatic and multispectral images. pose as that cross modal remote sensing image retrieval between images, employment hash code co-construction strategy to achieve efficient retrieval. design two special discriminative feature extractors for images according characteristics them, co-construct books them. The generate codes separately. Sorting Hamming distance achieves Extensive experiments on public data set validate effectiveness...
Modeling temporal visual context across frames is critical for video instance segmentation (VIS) and other understanding tasks. In this paper, we propose a fast online VIS model named CrossVIS. For information modeling in VIS, present novel crossover learning scheme that uses the feature current frame to pixel-wisely localize same frames. Different from previous schemes, does not require any additional network parameters enhancement. By integrating with loss, enables efficient cross-frame...