Kai Zhao

ORCID: 0000-0002-2496-0829
Publications
Citations
Views
---
Saved
---
About
Contact & Profiles
Research Areas
  • Advanced Image and Video Retrieval Techniques
  • Advanced Neural Network Applications
  • Face recognition and analysis
  • Radiomics and Machine Learning in Medical Imaging
  • Medical Image Segmentation Techniques
  • Video Surveillance and Tracking Methods
  • Generative Adversarial Networks and Image Synthesis
  • Visual Attention and Saliency Detection
  • Anomaly Detection Techniques and Applications
  • Rough Sets and Fuzzy Logic
  • Human Pose and Action Recognition
  • Advanced Image Processing Techniques
  • Image and Signal Denoising Methods
  • Robotics and Sensor-Based Localization
  • Advanced Image Fusion Techniques
  • Remote Sensing and LiDAR Applications
  • Data Mining Algorithms and Applications
  • Multimodal Machine Learning Applications
  • Speech and Audio Processing
  • Image Retrieval and Classification Techniques
  • Music and Audio Processing
  • Medical Imaging Techniques and Applications
  • Face and Expression Recognition
  • Network Security and Intrusion Detection
  • Advanced Algorithms and Applications

Beijing Polytechnic
2014-2025

University of South China
2024

University of California, Los Angeles
2022-2024

China Tobacco
2024

Xinjiang University
2024

Chinese PLA General Hospital
2024

Nanyang Technological University
2024

First Affiliated Hospital of University of South China
2024

Tsinghua University
2013-2023

Sichuan University
2023

Representing features at multiple scales is of great importance for numerous vision tasks. Recent advances in backbone convolutional neural networks (CNNs) continually demonstrate stronger multi-scale representation ability, leading to consistent performance gains on a wide range applications. However, most existing methods represent the layer-wise manner. In this paper, we propose novel building block CNNs, namely Res2Net, by constructing hierarchical residual-like connections within one...

10.1109/tpami.2019.2938758 article EN IEEE Transactions on Pattern Analysis and Machine Intelligence 2019-08-30

We focus on a fundamental task of detecting meaningful line structures, a.k.a., semantic line, in natural scenes. Many previous methods regard this problem as special case object detection and adjust existing detectors for detection. However, these neglect the inherent characteristics lines, leading to sub-optimal performance. Lines enjoy much simpler geometric property than complex objects thus can be compactly parameterized by few arguments. To better exploit paper, we incorporate...

10.1109/tpami.2021.3077129 article EN IEEE Transactions on Pattern Analysis and Machine Intelligence 2021-01-01

Abstract We present the first comprehensive video polyp segmentation (VPS) study in deep learning era. Over years, developments VPS are not moving forward with ease due to lack of a large-scale dataset fine-grained annotations. To address this issue, we introduce high-quality frame-by-frame annotated dataset, named SUN-SEG, which contains 158 690 colonoscopy frames from well-known SUN-database. provide additional annotation covering diverse types, i.e., attribute, object mask, boundary,...

10.1007/s11633-022-1371-y article EN cc-by Deleted Journal 2022-11-03

Age estimation from facial images is typically cast as a nonlinear regression problem. The main challenge of this problem the feature space w.r.t. ages inhomogeneous, due to large variation in appearance across different persons same age and non-stationary property aging patterns. In paper, we propose Deep Regression Forests (DRFs), an end-to-end model, for estimation. DRFs connect split nodes fully connected layer convolutional neural network (CNN) deal with inhomogeneous data by jointly...

10.1109/cvpr.2018.00245 article EN 2018-06-01

We consider the face recognition task where facial images of same identity (person) is expected to be closer in representation space, while different identities far apart. Several recent studies encourage intra-class compactness by developing loss functions that penalize variance representations identity. In this paper, we propose `exclusive regularization' focuses on other aspect discriminability -- inter-class separability, which neglected many approaches. The proposed method, named...

10.1109/cvpr.2019.00123 article EN 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 2019-06-01

The growing amount and diversity of Android malware has significantly weakened the effectiveness conventional defense mechanisms, thus platform often remains unprotected from new unknown malware. To address these limitations, we propose DroidDeep, a detection approach for based on deep learning model. Deep emerges as area machine research that attracted increasing attention in artificial intelligence. implement this, first extract five types features static analysis apps. Then, build model...

10.1109/trustcom.2016.0070 article EN 2015 IEEE Trustcom/BigDataSE/ISPA 2016-08-01

Current CNN-based solutions to salient object detection (SOD) mainly rely on the optimization of cross-entropy loss (CELoss). Then quality detected saliency maps is often evaluated in terms F-measure. In this paper, we investigate an interesting issue: can consistently use F-measure formulation both training and evaluation for SOD? By reformulating standard propose relaxed which differentiable w.r.t posterior be easily appended back CNNs as function. Compared conventional gradients decrease...

10.1109/iccv.2019.00894 article EN 2021 IEEE/CVF International Conference on Computer Vision (ICCV) 2019-10-01

Android has become one of the most popular mobile operating systems because numerous applications (apps) it provides. However, malware downloaded from third-party markets threatens users' privacy, and them remain undetected lack efficient accurate detecting techniques. Prior efforts on detection attempted to build precise classification models by manually choosing features, few used any feature selection algorithms help pick typical features. In this paper, we present Feature Extraction...

10.1109/iscc.2015.7405598 article EN 2015-07-01

Object skeletons are useful for object representation and detection. They complementary to the contour, provide extra information, such as how scale (thickness) varies among parts. But skeleton extraction from natural images is very challenging, because it requires extractor be able capture both local non-local image context in order determine of each pixel. In this paper, we present a novel fully convolutional network with multiple scale-associated side outputs address problem. By observing...

10.1109/tip.2017.2735182 article EN IEEE Transactions on Image Processing 2017-08-02

Partially-supervised instance segmentation is a task which requests segmenting objects from novel categories via learning on limited base with annotated masks thus eliminating demands of heavy annotation burden. The key to addressing this build an effective class-agnostic mask model. Unlike previous methods that learn such models only categories, in paper, we propose new method, named ContrastMask, learns model both and under unified pixel-level contrastive framework. In framework, pseudo...

10.1109/cvpr52688.2022.01131 article EN 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 2022-06-01

Conditional image generation plays a vital role in medical analysis as it is effective tasks such super-resolution, denoising, and inpainting, among others. Diffusion models have been shown to perform at state-of-the-art level natural generation, but they not thoroughly studied with specific conditions. Moreover, current their own problems, limiting usage various tasks. In this paper, we introduce the use of conditional Denoising Probabilistic Models (cDDPMs) for which achieve performance on several

10.3390/bioengineering10111258 article EN cc-by Bioengineering 2023-10-28

Label distribution learning (LDL) is a general framework, which assigns to an instance over set of labels rather than single label or multiple labels. Current LDL methods have either restricted assumptions on the expression form limitations in representation learning, e.g., learn deep features end-to-end manner. This paper presents forests (LDLFs) - novel algorithm based differentiable decision trees, several advantages: 1) Decision trees potential model any distributions by mixture leaf...

10.48550/arxiv.1702.06086 preprint EN other-oa arXiv (Cornell University) 2017-01-01

In natural images, the scales (thickness) of object skeletons may dramatically vary among objects and parts. Thus, robust skeleton detection requires powerful multi-scale feature integration ability. To address this issue, we present a new convolutional neural network (CNN) architecture by introducing novel hierarchical mechanism, named Hi-Fi, to problem. The proposed CNN-based approach intrinsically captures high-level semantics from deeper layers, as well low-level details shallower...

10.24963/ijcai.2018/166 preprint EN 2018-07-01

In order to solve the problems in Two-Dimensional image threshold segmentation such as, time consuming, low accuracy and easy produce false image, a novel algorithm that combing improved Firefly Algorithm with Otsu(2-D Otsu) is proposed. First of all, an proposed considering influence historical best position group when updating location fireflies. Then, searching optimal segmenting by using method Otsu based on Algorithm. Finally, completing threshold. It shown good performance achieved...

10.1109/cyber.2015.7288151 article EN 2015-06-01

Cross-modal transfer is helpful to enhance modality-specific discriminative power for scene recognition. To this end, paper presents a unified framework integrate the tasks of cross-modal translation and recognition, termed as Translate-to-Recognize Network TRecgNet. Specifically, both recognition share same encoder network, which allows explicitly regularize training task with help translation, thus improve its final generalization ability. For task, we place decoder module on top network...

10.1109/cvpr.2019.01211 article EN 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 2019-06-01

Traditional network architecture can no longer meet the development needs of technologies such as cloud computing and big data. SDN has characteristics high openness programmability, which quickly respond to changes in business requirements. decouples control management function from data forwarding function, simplifies configuration work based on logic centralized controllers open programming interfaces, thereby achieving a flat mode flexible function. The programmability networks also pose...

10.1117/12.3052022 article EN 2025-01-16

Age estimation from facial images is typically cast as a label distribution learning or regression problem, since aging gradual progress. Its main challenge the feature space w.r.t. ages inhomogeneous, due to large variation in appearance across different persons of same age and non-stationary property aging. In this paper, we propose two Deep Differentiable Random Forests methods, Label Distribution Learning Forest (DLDLF) Regression (DRF), for estimation. Both them connect split nodes top...

10.1109/tpami.2019.2937294 article EN IEEE Transactions on Pattern Analysis and Machine Intelligence 2019-08-27

Bone age assessment (BAA) is a common radiological examination used in pediatrics based on an analysis of ossification centers and epiphyses hand bones. Segmentation bones could help give specific descriptions bone features medical records assess automatically. This study proposes lightweight U-Net architecture multi-scale convolutional network for pediatric segmentation the X-ray image. The compact structure with two down-sampling up-sampling operations multiple filters different kernel...

10.1109/access.2019.2918205 article EN cc-by-nc-nd IEEE Access 2019-01-01

As a fundamental component in map service, matching is of great importance for many trajectory-based applications, e.g., route optimization, traffic scheduling, and fleet management. In practice, Hidden Markov Model its variants are widely used to provide accurate efficient service. However, HMM-based methods fail utilize the knowledge (e.g., mobility pattern) enormous trajectory big data, which useful intelligent matching. Furthermore, with following-up works, they still easily influenced...

10.1109/tmc.2020.3043500 article EN IEEE Transactions on Mobile Computing 2020-01-01

The utilization of multi-modal sensor data in visual place recognition (VPR) has demonstrated enhanced performance compared to single-modal counterparts. Nonetheless, integrating additional sensors comes with elevated costs and may not be feasible for systems that demand lightweight operation, thereby impacting the practical deployment VPR. To address this issue, we resort knowledge distillation, which empowers students learn from cross-modal teachers without introducing during inference....

10.1609/aaai.v38i9.28905 article EN Proceedings of the AAAI Conference on Artificial Intelligence 2024-03-24

High-grade gliomas (HGG) and solitary brain metastases (SBM) are two common types of tumors in middle-aged elderly patients. HGG SBM display a high degree similarity on magnetic resonance imaging (MRI) images. Consequently, differential diagnosis using preoperative MRI remains challenging. This study developed deep learning models that used pre-operative T1-weighted contrast-enhanced (T1CE) images to differentiate between before surgery.

10.21037/qims-24-380 article EN Quantitative Imaging in Medicine and Surgery 2024-07-24

Diffusion models have achieved impressive performance on various image generation tasks, including super-resolution. Despite their performance, diffusion suffer from high computational costs due to the large number of denoising steps. In this paper, we proposed a novel accelerated model, termed Partial Models (PDMs), for magnetic resonance imaging (MRI) We observed that latents diffusing pair low- and high-resolution images gradually converge become indistinguishable after certain noise...

10.1109/tmi.2024.3483109 article EN IEEE Transactions on Medical Imaging 2024-01-01

Texture classification algorithms using local binary pattern (LBP) and its variants usually can achieve attractive results. However, the selected rotation invariant structural patterns in numerous LBP are not absolutely continuous to any angle. To improve effectiveness on this occasion, paper, we introduce a robust descriptor based principal curvatures (PCs) version of CLBP_Sign operator completed (CLBP), namely PC-LBP. Different from original many variants, PCs employed paper represent each...

10.1109/access.2018.2842078 article EN cc-by-nc-nd IEEE Access 2018-01-01

Association rules and decision trees represent two well-known data mining techniques to find predictive rules. In this work, we present a detailed comparison between constrained association predict multiple target attribut

10.3233/ida-2010-0462 article EN Intelligent Data Analysis 2011-03-11
Coming Soon ...