- Advanced Neural Network Applications
- Advanced Image and Video Retrieval Techniques
- Domain Adaptation and Few-Shot Learning
- Multimodal Machine Learning Applications
- Robotics and Sensor-Based Localization
- High-pressure geophysics and materials
- Advanced Vision and Imaging
- 3D Surveying and Cultural Heritage
- earthquake and tectonic studies
- Geological and Geochemical Analysis
- Video Surveillance and Tracking Methods
- Image and Signal Denoising Methods
- Generative Adversarial Networks and Image Synthesis
- Infrared Target Detection Methodologies
- Geological and Geophysical Studies
- Advanced Image Processing Techniques
- Image Retrieval and Classification Techniques
- Image Enhancement Techniques
- Remote Sensing and LiDAR Applications
- Industrial Vision Systems and Defect Detection
- Robotic Path Planning Algorithms
- Image and Video Quality Assessment
- CCD and CMOS Imaging Sensors
- Climate change and permafrost
- Geological Studies and Exploration
Harbin Medical University
2024-2025
First Affiliated Hospital of Harbin Medical University
2024-2025
Lanzhou University
2025
Ministry of Agriculture and Rural Affairs
2025
Tianjin University of Technology
2024
University of Edinburgh
2024
Tongji University
2022-2024
Wenzhou University
2024
Hong Kong Polytechnic University
2024
Guangdong University Of Finances and Economics
2024
Much of the recent progress made in image classification research can be credited to training procedure refinements, such as changes data augmentations and optimization methods. In literature, however, most refinements are either briefly mentioned implementation details or only visible source code. this paper, we will examine a collection empirically evaluate their impact on final model accuracy through ablation study. We show that, by combining these together, able improve various CNN...
Recent work has made significant progress in improving spatial resolution for pixelwise labeling with Fully Convolutional Network (FCN) framework by employing Dilated/Atrous convolution, utilizing multi-scale features and refining boundaries. In this paper, we explore the impact of global contextual information semantic segmentation introducing Context Encoding Module, which captures context scenes selectively highlights class-dependent featuremaps. The proposed Module significantly improves...
The ability to learn richer network representations generally boosts the performance of deep learning models. To improve representation-learning in convolutional neural networks, we present a multi-branch architecture, which applies channel-wise attention across different branches leverage complementary strengths both feature-map and multi-path representation. Our proposed Split-Attention module provides simple modular computation block that can serve as drop-in replacement for popular...
Recent work has achieved great success in utilizing global contextual information for semantic segmentation, including increasing the receptive field and aggregating pyramid feature representations. In this paper, we go beyond context explore fine-grained representation using co-occurrent features by introducing Co-occurrent Feature Model, which predicts distribution of a given target. To leverage features, build an Aggregated (ACF) Module probability with context. ACF learns spatial...
We propose a Deep Texture Encoding Network (Deep-TEN) with novel Layer integrated on top of convolutional layers, which ports the entire dictionary learning and encoding pipeline into single model. Current methods build from distinct components, using standard encoders separate off-the-shelf features such as SIFT descriptors or pre-trained CNN for material recognition. Our new approach provides an end-to-end framework, where inherent visual vocabularies are learned directly loss function....
Open-vocabulary semantic segmentation aims to segment an image into regions according text descriptions, which may not have been seen during training. Recent two-stage methods first generate class-agnostic mask proposals and then leverage pre-trained vision-language models, e.g., CLIP, classify masked regions. We identify the performance bottleneck of this paradigm be CLIP model, since it does perform well on images. To address this, we propose finetune a collection their corresponding...
We present Video-LLaMA, a multi-modal framework that empowers Large Language Models (LLMs) with the capability of understanding both visual and auditory content in video. Video-LLaMA bootstraps cross-modal training from frozen pre-trained & audio encoders LLMs. Unlike previous works complement LLMs to process or signals only, enables video comprehension by tackling two challenges: (1) capturing temporal changes scenes, (2) integrating audio-visual signals. To counter first challenge, we...
Training heuristics greatly improve various image classification model accuracies~\cite{he2018bag}. Object detection models, however, have more complex neural network structures and optimization targets. The training strategies pipelines dramatically vary among different models. In this works, we explore tweaks that apply to models including Faster R-CNN YOLOv3. These do not change the architectures, therefore, inference costs remain same. Our empirical results demonstrate that, these...
Despite the rapid progress in style transfer, existing approaches using feed-forward generative network for multi-style or arbitrary-style transfer are usually compromised of image quality and model flexibility. We find it is fundamentally difficult to achieve comprehensive modeling 1-dimensional embedding. Motivated by this, we introduce CoMatch Layer that learns match second order feature statistics with target styles. With Layer, build a Multi-style Generative Network (MSG-Net), which...
Recent work has made significant progress in improving spatial resolution for pixelwise labeling with Fully Convolutional Network (FCN) framework by employing Dilated/Atrous convolution, utilizing multi-scale features and refining boundaries. In this paper, we explore the impact of global contextual information semantic segmentation introducing Context Encoding Module, which captures context scenes selectively highlights class-dependent featuremaps. The proposed Module significantly improves...
We present a texture network called Deep Encoding Pooling Network (DEP) for the task of ground terrain recognition. Recognition is an important in establishing robot or vehicular control parameters, as well localization within outdoor environment. The architecture DEP integrates orderless details and local spatial information performance surpasses state-of-the-art methods this task. GTOS database (comprised over 30,000 images 40 classes scenes) enables supervised For evaluation under...
Material recognition for real-world outdoor surfaces has become increasingly important computer vision to support its operation in the wild. Computational surface modeling that underlies material transitioned from reflectance using in-lab controlled radiometric measurements image-based representations based on internet-mined images of materials captured scene. We propose take a middle-ground approach takes advantage both rich cues and flexible image capture. realize this by developing...
Starting from the seminal work of Fully Convolutional Networks (FCN), there has been significant progress on semantic segmentation. However, deep learning models often require large amounts pixelwise annotations to train accurate and robust models. Given prohibitively expensive annotation cost segmentation masks, we introduce a self-training framework in this paper leverage pseudo labels generated unlabeled data. In order handle data imbalance problem segmentation, propose centroid sampling...
In recent years, person re-identification (re-ID) has achieved relatively good performance, benefiting from the revival of deep neural networks. However, due to existence domain bias which refers different data distributions between two domains, it remains challenging directly deploy a model trained on labeled source target only with unlabeled available. this paper, Self-Training Progressive Representation Enhancement (PREST) framework, comprises multi-scale self-training method and...
Much of the recent progress made in image classification research can be credited to training procedure refinements, such as changes data augmentations and optimization methods. In literature, however, most refinements are either briefly mentioned implementation details or only visible source code. this paper, we will examine a collection empirically evaluate their impact on final model accuracy through ablation study. We show that, by combining these together, able improve various CNN...
Atrial fibrillation (AF) is the most common cardiac arrhythmia, with energy metabolic disorder leading to severe clinical courses. Relaxin-2 (RLX), a peptide hormone, has been identified activate crucial enzymes involved in cellular metabolism. However, whether relaxin-2 can improve metabolism of atrial myocytes inhibit AF pathogenesis remains unknown. Male New Zealand rabbits were randomly separated into sham, right tachypacing (RAP), and RAP human recombinant treatment (0.5 mg/kg) group...
Abstract Combining common garden experiments with reciprocal transplant or sowing is widely used to assess local adaptation in plants. This approach effectively minimizes the potential influence of maternal environments derived from seed origin. However, impact divergent on assessment has received limited attention previous studies. To investigate effects diverse conditions adaptation, we conducted a 2‐year experiment followed by 5‐year experiment, both carried out at two different...
Background China is a country with very heavy burden of liver cancer disease, and understanding the epidemiological characteristics trends in can help develop targeted public health strategies. Methods The data were retrieved from Global Burden Disease (GBD) Study 2021. age-standardized incidence rate (ASIR) death (ASDR) used to estimate deaths by sex, region, country, etiology between 1990 Additionally, attributable risk factors for disability adjusted life years (DALYs) assessed. Finally,...
Deep learning usually achieves the best results with complete supervision. In case of semantic segmentation, this means that large amounts pixelwise annotations are required to learn accurate models. paper, we show can obtain state-of-the-art using a semi-supervised approach, specifically self-training paradigm. We first train teacher model on labeled data, and then generate pseudo labels set unlabeled data. Our robust training framework digest human-annotated jointly achieve top...