Kaiyou Song

ORCID: 0000-0001-8999-2680
Publications
Citations
Views
---
Saved
---
About
Contact & Profiles
Research Areas
  • Multimodal Machine Learning Applications
  • Advanced Image and Video Retrieval Techniques
  • Industrial Vision Systems and Defect Detection
  • Domain Adaptation and Few-Shot Learning
  • Advanced Neural Network Applications
  • Image Processing Techniques and Applications
  • Advanced Vision and Imaging
  • Generative Adversarial Networks and Image Synthesis
  • Integrated Circuits and Semiconductor Failure Analysis
  • Image Retrieval and Classification Techniques
  • Robotics and Sensor-Based Localization
  • Visual Attention and Saliency Detection
  • Image and Object Detection Techniques
  • Surface Roughness and Optical Measurements
  • Color Science and Applications
  • Digital Imaging for Blood Diseases
  • Video Surveillance and Tracking Methods
  • Machine Learning and ELM
  • Face and Expression Recognition
  • Medical Image Segmentation Techniques
  • Anomaly Detection Techniques and Applications
  • Advanced Measurement and Detection Methods
  • Human Pose and Action Recognition
  • Remote-Sensing Image Classification
  • Infrastructure Maintenance and Monitoring

Vi Technology (United States)
2023-2024

Megvii (China)
2023-2024

Huazhong University of Science and Technology
2017-2022

State Key Laboratory of Digital Manufacturing Equipment and Technology
2018

Visual inspection of texture surface defects is still a challenging task in the industrial automation field due to tremendous changes appearance various textures. Current visual methods cannot simultaneously and efficiently inspect types either low discriminative capabilities handcrafted features or their time-consuming sliding-window strategy. In this paper, we present novel unsupervised multiscale feature-clustering-based fully convolutional autoencoder (MS-FCAE) method that accurately...

10.1109/tase.2018.2886031 article EN IEEE Transactions on Automation Science and Engineering 2019-01-01

Flat panel displays, such as the thin film transistor liquid crystal display, organic light-emitting diode, and polymer have been widely applied in many fields recent decades. To ensure quality of these defect inspection is crucial. Mura defects, which are phenomena uneven screen most challenging visual defects to detect. This paper presents an online sequential classifier transfer learning (OSC-TL) method for training classification defects. OSC-TL a new that combines deep convolutional...

10.1109/tsm.2017.2777499 article EN IEEE Transactions on Semiconductor Manufacturing 2017-11-24

Masked image modeling (MIM) has attracted much research attention due to its promising potential for learning scalable visual representations. In typical approaches, models usually focus on predicting specific contents of masked patches, and their performances are highly related pre-defined mask strategies. Intuitively, this procedure can be considered as training a student (the model) solving given problems (predict patches). However, we argue that the model should not only problems, but...

10.1109/cvpr52729.2023.01000 article EN 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 2023-06-01

Object detection remains a challenging task in computer vision due to the tremendous extent of changes appearances objects caused by clustered backgrounds, occlusion, truncation, and scale change. Current deep neural network (DNN)-based object methods cannot simultaneously achieve high accuracy efficiency. To overcome this limitation, paper, we propose novel multi-scale attention (MSA) DNN for accurate with The proposed MSA-DNN method utilizes feature fusion module (MSFFM) construct...

10.1109/tcsvt.2018.2875449 article EN IEEE Transactions on Circuits and Systems for Video Technology 2018-10-11

Self-supervised learning (SSL) has made remarkable progress in visual representation learning. Some studies combine SSL with knowledge distillation (SSL-KD) to boost the performance of small models. In this study, we propose a Multi-mode Online Knowledge Distillation method (MOKD) self-supervised Different from existing SSL-KD methods that transfer static pre-trained teacher student, MOKD, two different models learn collaboratively manner. Specifically, MOKD consists modes: self-distillation...

10.1109/cvpr52729.2023.01140 article EN 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 2023-06-01

Establishing a unified model for the defect inspection of different texture surfaces remains challenge in industrial automation field because these can vary regular and irregular ways. Current unsupervised learning methods are trained on defect-free samples only cannot directly address anomalies during testing, which precludes from simultaneously inspecting various defects. In this article, we propose novel anomaly feature-editing-based adversarial network (AFEAN) to accurately inspect To...

10.1109/tii.2020.3015765 article EN IEEE Transactions on Industrial Informatics 2020-08-11

The visual inspection of Mura defects is still a challenging task in the quality control panel displays because intrinsically nonuniform brightness and blurry contours these defects. current methods cannot detect all defect types simultaneously, especially small In this paper, we introduce an accurate (AMVI) method for fast simultaneous various types. consists two parts: outlier-prejudging-based image background construction (OPBC) algorithm proposed to quickly reduce influence backgrounds...

10.1109/tase.2018.2823709 article EN IEEE Transactions on Automation Science and Engineering 2018-04-19

Almost all conventional template-matching methods employ low-level image features to measure the similarity between a template and scene using measures, such as pixel intensity gradient. Although these have been widely used in many applications, they cannot simultaneously address types of robustness challenges. In this paper, with goal addressing various challenges, we present robust semantic (RSTM) approach. Inspired by local binary descriptor, propose novel superpixel region descriptor...

10.1109/tip.2019.2893743 article EN IEEE Transactions on Image Processing 2019-01-17

Most face recognition methods employ single-bit binary descriptors for representation. The information from these is lost in the process of quantization real-valued to descriptors, which greatly limits their robustness recognition. In this study, we propose a novel weighted feature histogram (WFH) method multi-scale local patches using multi-bit First, obtain image, are extracted patch generation (MSLPG) method. Second, with goal reducing loss descriptor learning (MBLBDL) proposed extract...

10.1109/tip.2021.3065843 article EN IEEE Transactions on Image Processing 2021-01-01

Defect classification in the liquid crystal display (LCD) manufacturing process is one of most crucial issues for quality control. To resolve this constraint, an automatic defect (ADC) method based on machine learning proposed. Key features LCD micro-defects are defined and extracted, support vector used classification. The performance presented through several experimental results.

10.1109/isie.2009.5213760 article EN 2009-07-01

Texture defect inspection remains challenging due to the extreme variations in various textures and defects. Current unsupervised learning-based texture methods cannot simultaneously inspect a wide variety of defects because they lack an explicit mechanism encourage model create large anomaly scores for In this study, we propose novel composition decomposition network (ACDN) accurate proposed ACDN, Gaussian-sampling-based (GSAC) method is perform procedure, which composites number defective...

10.1109/tim.2022.3196133 article EN IEEE Transactions on Instrumentation and Measurement 2022-01-01

Active learning can reduce the human effort required for labeling training samples while preserving performance of visual classifiers. However, existing active frameworks cannot be used to perform classification industrial product surface defects because they still require intensive manual annotation efforts. In this study, we propose a cost-efficient autolabeling-enhanced (ALEAL) framework defect classification. The proposed ALEAL employs deep convolutional neural network (CNN) as...

10.1109/tim.2020.3032190 article EN IEEE Transactions on Instrumentation and Measurement 2020-10-21

The development of autoregressive modeling (AM) in computer vision lags behind natural language processing (NLP) self-supervised pre-training. This is mainly caused by the challenge that images are not sequential signals and lack a order when applying modeling. In this study, inspired human beings’ way grasping an image, i.e., focusing on main object first, we present semantic-aware image (SemAIM) method to tackle challenge. key insight SemAIM autoregressively model from semantic patches...

10.1609/aaai.v38i5.28296 article EN Proceedings of the AAAI Conference on Artificial Intelligence 2024-03-24

Texture recognition remains a challenging visual task due to the complex appearance variations caused by scale changes in real world. In most existing texture methods, textures are represented at single scale; thus, multi-scale information is not fully utilized, resulting insufficient representation and inaccurate recognition. this study, with goal of addressing challenge changes, we propose novel boosting feature encoding network (MSBFEN) for accurate MSBFEN first extracts features...

10.1109/tcsvt.2021.3051003 article EN IEEE Transactions on Circuits and Systems for Video Technology 2021-01-13

Object tracking is still a challenging problem in computer vision, as it entails learning an effective model to account for appearance changes caused by occlusion, out of view, plane rotation, scale change, and background clutter. This paper proposes robust visual algorithm called deep convolutional neural network (DCNNCT) simultaneously address these challenges. The proposed DCNNCT utilizes DCNN extract the image feature tracked target, full range information regarding each layer used...

10.1117/1.jei.27.2.023008 article EN Journal of Electronic Imaging 2018-03-16

As it is empirically observed that Vision Transformers (ViTs) are quite insensitive to the order of input tokens, need for an appropriate self-supervised pretext task enhances location awareness ViTs becoming evident. To address this, we present DropPos, a novel designed reconstruct Dropped Positions. The formulation DropPos simple: first drop large random subset positional embeddings and then model classifies actual position each non-overlapping patch among all possible positions solely...

10.48550/arxiv.2309.03576 preprint EN other-oa arXiv (Cornell University) 2023-01-01

The development of autoregressive modeling (AM) in computer vision lags behind natural language processing (NLP) self-supervised pre-training. This is mainly caused by the challenge that images are not sequential signals and lack a order when applying modeling. In this study, inspired human beings' way grasping an image, i.e., focusing on main object first, we present semantic-aware image (SemAIM) method to tackle challenge. key insight SemAIM model from semantic patches less patches. To...

10.48550/arxiv.2312.10457 preprint EN cc-by-sa arXiv (Cornell University) 2023-01-01

Masked visual modeling has attracted much attention due to its promising potential in learning generalizable representations. Typical approaches urge models predict specific contents of masked tokens, which can be intuitively considered as teaching a student (the model) solve given problems (predicting contents). Under such settings, the performance is highly correlated with mask strategies difficulty provided problems). We argue that it equally important for model stand shoes teacher...

10.48550/arxiv.2312.13714 preprint EN other-oa arXiv (Cornell University) 2023-01-01

In contrastive self-supervised learning, the common way to learn discriminative representation is pull different augmented "views" of same image closer while pushing all other images further apart, which has been proven be effective. However, it unavoidable construct undesirable views containing semantic concepts during augmentation procedure. It would damage consistency these augmentations in feature space indiscriminately. this study, we introduce feature-level and propose a novel...

10.1109/iccv51070.2023.01475 article EN 2021 IEEE/CVF International Conference on Computer Vision (ICCV) 2023-10-01

In contrastive self-supervised learning, the common way to learn discriminative representation is pull different augmented "views" of same image closer while pushing all other images further apart, which has been proven be effective. However, it unavoidable construct undesirable views containing semantic concepts during augmentation procedure. It would damage consistency these augmentations in feature space indiscriminately. this study, we introduce feature-level and propose a novel...

10.48550/arxiv.2212.06486 preprint EN other-oa arXiv (Cornell University) 2022-01-01

Masked image modeling (MIM) has attracted much research attention due to its promising potential for learning scalable visual representations. In typical approaches, models usually focus on predicting specific contents of masked patches, and their performances are highly related pre-defined mask strategies. Intuitively, this procedure can be considered as training a student (the model) solving given problems (predict patches). However, we argue that the model should not only problems, but...

10.48550/arxiv.2304.05919 preprint EN other-oa arXiv (Cornell University) 2023-01-01

Self-supervised learning (SSL) has made remarkable progress in visual representation learning. Some studies combine SSL with knowledge distillation (SSL-KD) to boost the performance of small models. In this study, we propose a Multi-mode Online Knowledge Distillation method (MOKD) self-supervised Different from existing SSL-KD methods that transfer static pre-trained teacher student, MOKD, two different models learn collaboratively manner. Specifically, MOKD consists modes: self-distillation...

10.48550/arxiv.2304.06461 preprint EN other-oa arXiv (Cornell University) 2023-01-01
Coming Soon ...