Guang‐Hui Fu

ORCID: 0000-0002-0138-0004
Publications
Citations
Views
---
Saved
---
About
Contact & Profiles
Research Areas
  • Imbalanced Data Classification Techniques
  • Spectroscopy and Chemometric Analyses
  • Advanced Statistical Methods and Models
  • Advanced Statistical Process Monitoring
  • Face and Expression Recognition
  • Data Mining Algorithms and Applications
  • Financial Distress and Bankruptcy Prediction
  • Neural Networks and Applications
  • Gene expression and cancer classification
  • Grey System Theory Applications
  • Analytical Chemistry and Chromatography
  • Rough Sets and Fuzzy Logic
  • Electricity Theft Detection Techniques
  • Text and Document Classification Technologies
  • Metabolomics and Mass Spectrometry Studies
  • Statistical Methods and Inference
  • Spectroscopy Techniques in Biomedical and Chemical Research
  • Anomaly Detection Techniques and Applications
  • Artificial Intelligence in Healthcare
  • Industrial Vision Systems and Defect Detection
  • Statistical Methods in Epidemiology
  • Advanced Chemical Sensor Technologies
  • Thermography and Photoacoustic Techniques
  • Power Systems and Technologies
  • Gout, Hyperuricemia, Uric Acid

Kunming University of Science and Technology
2016-2025

Shanghai Jiao Tong University
2024

China Academy of Space Technology
2021

Central South University
2011

Abstract An issue for class‐imbalanced learning is what assessment metric should be employed. So far, precision‐recall curve (PRC) as a rarely used in practice compared with its alternative of receiver operating characteristic (ROC). This study investigates the performance PRC evaluating criterion to address data and focuses on comparison ROC. The advantages over ROC assessing are also investigated tested our proposed algorithm by tuning whole model parameters simulation studies real...

10.1002/bimj.201800148 article EN Biometrical Journal 2018-12-12

ABSTRACT In critical domains including medicinal chemistry, biomedicine, metabolomics, and computational toxicology, class imbalance in datasets poor recognition accuracy for minority classes remain persistent challenges. While previous studies have employed resampling feature selection techniques to address data enhance classification performance, most approaches focused on single‐algorithm solutions rather than hybrid methodologies. Hybrid algorithms offer distinct advantages by...

10.1002/cem.70029 article EN Journal of Chemometrics 2025-04-01

In this paper a novel wavelength region selection algorithm, called elastic net grouping variable combined with partial least squares regression (EN-PLSR), is proposed for multi-component spectral data analysis. The EN-PLSR algorithm can automatically select successive strongly correlated prediction groups related to the response using two steps. First, portion of predictors are selected and divided into subgroups by means effect estimation. Then, recursive leave-one-group-out strategy...

10.1366/10-06069 article EN Applied Spectroscopy 2011-03-14

Abstract Background Feature selection in class-imbalance learning has gained increasing attention recent years due to the massive growth of high-dimensional class-imbalanced data across many scientific fields. In addition reducing model complexity and discovering key biomarkers, feature is also an effective method combating overlapping which may arise such become a crucial aspect for determining classification performance. However, ordinary techniques can not be simply used addressing...

10.1186/s12859-020-3411-3 article EN cc-by BMC Bioinformatics 2020-03-23

10.1016/j.chemolab.2017.10.015 article EN Chemometrics and Intelligent Laboratory Systems 2017-10-27

In class-imbalance learning, Synthetic Minority Oversampling Technique (SMOTE) is a widely used technique to tackle problems from the data level, whereas SMOTE blindly selects neighboring minority class points when performing an interpolation among them and inevitably brings collinearity between generated new original ones. To combat these problems, we propose in this study adaptive-weighting method, termed as AWSMOTE. AWSMOTE applies two types of SVM-based weights into SMOTE. A kind weight...

10.1155/2021/9947621 article EN cc-by Scientific Programming 2021-05-13

Abstract In this paper, a two‐step nonlinear classification algorithm is proposed to model the structure–activity relationship (SAR) between bioactivities and molecular descriptors of compounds, which consists kernel principal component analysis (KPCA) linear support vector machines (KPCA + LSVM). KPCA used remove some uninformative gradients such as noises then exactly capture latent structure training dataset using new variables called components in kernel‐defined feature space. LSVM makes...

10.1002/cem.1364 article EN Journal of Chemometrics 2011-02-01

Abstract Feature selection and rebalancing can be seen as two preprocessing ways in class‐imbalanced learning. Recently, there have been many research achievements applications on LASSO‐type feature selection, whereas most of them are not directly designed for addressing data. In this study, we proposed a LASSO‐based stable algorithm data analysis, false‐positive (FPS) under balanced imbalanced situations was calculated via frequency each predictor doing selection. The results simulation...

10.1002/cem.3177 article EN Journal of Chemometrics 2019-08-27

Disorders of lipid metabolism are a common cause coronary heart disease (CHD) and its comorbidities. In this study, ultra‐performance liquid chromatography–high‐resolution mass spectrometry in data‐independent acquisition (DIA) mode was applied to collect abundant tandem data, which provided valuable information for annotation. For the isomers that could not be completely separated by chromatography, parallel reaction monitoring (PRM) used quantification. A total 223 plasma metabolites were...

10.1002/jssc.202300848 article EN Journal of Separation Science 2024-04-01

Objective To examine the association of overweight/obesity and serum vitamin C (serum VC) with uric acid (SUA) to assess causality using Mendelian randomization (MR). Methods 4,772 participants from National Health Nutrition Examination Survey (NHANES), 2017–2018 were included in this study. Multivariate linear regression, variance inflation factor quantile regression used analyze relationships between VC SUA levels. Secondly, (MR) was utilized mitigate bias prevent reverse observational...

10.3389/fnut.2024.1429123 article EN cc-by Frontiers in Nutrition 2024-08-23

When developing prediction models for small or sparse binary data with many highly correlated covariates, logistic regression often encounters separation multicollinearity problems, resulting serious bias and even the nonexistence of standard maximum likelihood estimates. The combination makes task more difficult, a few studies addressed simultaneously. In this paper, we propose double-penalized method called lFRE to combat in regression. combines logF-type penalty ridge penalty. results...

10.3390/math10203824 article EN cc-by Mathematics 2022-10-16

Abstract Many real‐world data mining applications involve using imbalanced datasets to obtain predictive models. Imbalanced can hinder the model performance of learning algorithms in rare cases. Although there are many well‐researched classification task solutions, most them cannot be directly applied regression task. One challenges is find a suitable evaluation and optimization standard that improve ability without severe bias. Based on importance cases, this study proposes new metric...

10.1002/cem.3515 article EN Journal of Chemometrics 2023-09-08

This paper is the generalization of weight-fused elastic net (Fu and Xu, 2012 Fu, G., Q. (2012). Grouping variable selection by weight fused for multi-collinear data. Communications in Statistics-Simulation Computation 41(2):205–221.[Taylor & Francis Online], [Web Science ®] , [Google Scholar]), which performs group combining LASSO(wfLasso) (Zou Hastie, 2005 Zou, H., T. (2005). Regularization via net. Journal Royal Statistical Society: Series B (Statistical Methodology)...

10.1080/03610918.2012.752841 article EN Communications in Statistics - Simulation and Computation 2013-09-24

Elastic net (Enet) and sparse partial least squares (SPLS) are frequently employed for wavelength selection model calibration in analysis of near infrared spectroscopy data. Enet SPLS can perform variable simultaneously. And they also tend to select intervals rather than individual wavelengths when the predictors multicollinear. In this paper, we focus on comparison interval The results from both simulation real data show that method tends less as key variables SPLS; thus it gets more...

10.1155/2019/7314916 article EN cc-by International Journal of Analytical Chemistry 2019-08-01

Manual visual inspection for thermistor wire solder joints used in satellite circuits always results inefficiency and low accuracy. A method based on improved YOLOv5s model is proposed to automatically inspect defections through infrared images. In order enhance feature extraction capability, ECA module introduced the backbone of network. Image database built image shooting platform training testing. The experiment result indicates that precision, recall mAP reach 64.1%, 92.0% 88.4%...

10.1109/icacr53472.2021.9605165 article EN 2021-09-25

In this article, we consider the problem of variable selection and estimation with strongly correlated multi-collinear data by using grouping techniques. A new method, called weight-fused elastic net(WFEN), is proposed to deal high dimensional collinear data. The model, combined two different effect mechanisms induced net LASSO, respectively, can be easily unified in frame LASSO computed efficiently. performance simulation real sets shows that our method competitive other related methods,...

10.1080/03610918.2011.579369 article EN Communications in Statistics - Simulation and Computation 2011-10-07

Abstract Imbalanced domain prediction analysis is currently one of the hot research topics. Many real‐world data mining analyses involve using imbalanced to obtain predictive models. In context imbalance, on classification problems has been extensive, but regression negligible. Rare values rarely occur in problems, focus accurately predicting continuous target variables rare instances. One challenges finding a suitable strategy rebalance original dataset order improve performance model this...

10.1002/cem.3537 article EN Journal of Chemometrics 2024-02-09

Download This Paper Open PDF in Browser Add to My Library Share: Permalink Using these links will ensure access this page indefinitely Copy URL DOI

10.2139/ssrn.4789221 preprint EN 2024-01-01

Contrasted with imbalanced classification where the target variable is discrete, regression another learning that aims to accurately predict rare cases in continuous variable. In this study, we firstly analyzed performance of classical support vector machines (SVMs) on and gave three possible factors behind failure SVMs predicting cases. Then, different weight functions are designed adjusted machine algorithms proposed improve event prediction. The weights assumption obtain bigger adaptive...

10.2139/ssrn.4839324 preprint EN 2024-01-01
Coming Soon ...