Hongshik Ahn

ORCID: 0000-0002-8924-6159
Publications
Citations
Views
---
Saved
---
About
Contact & Profiles
Research Areas
  • Machine Learning and Data Classification
  • Gene expression and cancer classification
  • Face and Expression Recognition
  • Imbalanced Data Classification Techniques
  • Explainable Artificial Intelligence (XAI)
  • Bioinformatics and Genomic Networks
  • Adversarial Robustness in Machine Learning
  • Molecular Biology Techniques and Applications
  • Data Mining Algorithms and Applications
  • Neural Networks and Applications
  • Data Stream Mining Techniques
  • Bayesian Modeling and Causal Inference
  • Lung Cancer Treatments and Mutations
  • Reinforcement Learning in Robotics
  • Epigenetics and DNA Methylation
  • Auction Theory and Applications
  • Text and Document Classification Technologies
  • Colorectal Cancer Treatments and Studies
  • Radiomics and Machine Learning in Medical Imaging
  • Advanced Graph Neural Networks
  • Bone Metabolism and Diseases
  • Cancer-related gene regulation
  • Functional Brain Connectivity Studies
  • Machine Learning in Bioinformatics
  • Artificial Intelligence in Healthcare and Education

Stony Brook University
2008-2022

State University of New York
2001-2014

SUNY Korea
2013

The healing of skeletal fractures is essentially a replay bone development, involving the closely regulated, interdependent processes chondrogenesis and osteogenesis. Using rat femur model to determine degree transcriptional complexity these processes, suppressive subtractive hybridization (SSH) was performed between RNA isolated from intact that callus post-fracture (PF) days 3, 5, 7, 10 as means identifying up-regulated genes in regenerative process. Analysis 3,635 cDNA clones revealed 588...

10.1074/jbc.m203171200 article EN cc-by Journal of Biological Chemistry 2002-08-01

10.1016/j.jkss.2011.03.002 article EN Journal of the Korean Statistical Society 2011-04-14

Standard classification algorithms are generally designed to maximize the number of correct predictions (concordance). The criterion maximizing concordance may not be appropriate in certain applications. In practice, some applications emphasize high sensitivity (e.g., clinical diagnostic tests) and others specificity epidemiology screening studies). This paper considers effects decision threshold on sensitivity, specificity, for four methods: logistic regression, tree, Fisher's linear...

10.1080/10659360600787700 article EN SAR and QSAR in environmental research 2006-06-01

10.1007/s11633-020-1239-y article EN International Journal of Automation and Computing 2020-09-09

Counterfactual examples (CFs) are one of the most popular methods for attaching post-hoc explanations to machine learning (ML) models. However, existing CF generation either exploit internals specific models or depend on each sample's neighborhood, thus they hard generalize complex and inefficient large datasets. This work aims overcome these limitations introduces ReLAX, a model-agnostic algorithm generate optimal counterfactual explanations. Specifically, we formulate problem crafting CFs...

10.1145/3511808.3557429 article EN Proceedings of the 31st ACM International Conference on Information & Knowledge Management 2022-10-16

Counterfactual examples (CFs) are one of the most popular methods for attaching post-hoc explanations to machine learning (ML) models. However, existing CF generation either exploit internals specific models or depend on each sample's neighborhood; thus, they hard generalize complex and inefficient large datasets. This work aims overcome these limitations introduces RELAX, a model-agnostic algorithm generate optimal counterfactual explanations. Specifically, we formulate problem crafting CFs...

10.1109/tai.2022.3223892 article EN IEEE Transactions on Artificial Intelligence 2022-11-24

This article proposes a method for multiclass classification problems using ensembles of multinomial logistic regression models. A logit model is used as base classifier in from random partitions predictors. The can be applied to each mutually exclusive subset the feature space without variable selection. By combining multiple models proposed handle huge database constraint needed analyzing high-dimensional data, and partition improve prediction accuracy by reducing correlation among...

10.1080/10543406.2012.756500 article EN Journal of Biopharmaceutical Statistics 2013-04-23

Recently, graph neural networks (GNNs) have been widely used to develop successful recommender systems. Although powerful, it is very difficult for a GNN-based system attach tangible explanations of why specific item ends up in the list suggestions given user. Indeed, explaining recommendations unique, and existing GNN explanation methods are inappropriate two reasons. First, traditional designed node, edge, or classification tasks rather than ranking, as Second, standard machine learning...

10.48550/arxiv.2208.04222 preprint EN other-oa arXiv (Cornell University) 2022-01-01

In many real-world tasks, acquiring features requires a certain cost, which gives rise to the costly classification problem. this study, We formulate problem in reinforcement learning framework and sequentially select subset of make balance between error feature cost. Specifically, advantage actor critic algorithm is firstly used solve it. Furthermore, improve learned policy it explainable, we employ Monte Carlo Tree Search update iteratively. During procedure, also consider its performance...

10.1109/ijcnn52387.2021.9533593 article EN 2022 International Joint Conference on Neural Networks (IJCNN) 2021-07-18

Personalized medicine is defined by the use of genomic signatures patients to assign effective therapies. We present Classification Ensembles from Random Partitions (CERP) for class prediction and apply CERP data on leukemia with several clinical variables breast cancer patients. performs consistently well compared other classification algorithms. The predictive accuracy can be improved adding some relevant clinical/histopathological measurements data.

10.1186/gb-2006-7-12-r121 article EN cc-by Genome biology 2006-01-01

We apply robust classification algorithms to high-dimensional genomic data find biomarkers, by analyzing variable importance, that enable a better diagnosis of disease, an earlier intervention, or more effective assignment therapies. The goal is use importance ranking isolate set important genes can be used classify life-threatening diseases with respect prognosis type maximize efficacy minimize toxicity in personalized treatment such diseases. A method and present several other methods...

10.1080/10543400802278023 article EN Journal of Biopharmaceutical Statistics 2008-09-05

Binary tree classification has been useful for classifying the whole population based on levels of outcome variable that is associated with chosen predictors. Often we start a large number candidate predictors, and each predictor takes different cutoff values. Because these types multiplicity, binary method subject to severe type I error probability. Nonetheless, there have not many publications address this issue. In paper, propose control probability accept below certain level, say 5%.

10.4137/cin.s16342 article EN cc-by-nc Cancer Informatics 2014-01-01

Counterfactual examples (CFs) are one of the most popular methods for attaching post-hoc explanations to machine learning (ML) models. However, existing CF generation either exploit internals specific models or depend on each sample's neighborhood, thus they hard generalize complex and inefficient large datasets. This work aims overcome these limitations introduces ReLAX, a model-agnostic algorithm generate optimal counterfactual explanations. Specifically, we formulate problem crafting CFs...

10.48550/arxiv.2110.11960 preprint EN other-oa arXiv (Cornell University) 2021-01-01

10.1007/s00180-013-0466-x article EN Computational Statistics 2013-12-09

The purpose of the research is to develop a statistical decision support algorithm for patients who may benefit from Adjuvant Cisplatin/Vinorelbine (ACT) and improve their survival rates. Genome-wide microarray data are used identify feasible sets genes probe that constitute gene signature. available at National Center Biotechnology Information Gene Expression Omnibus (GSE14814). Preliminary studies have shown high-risk received ACT resulted in an improved prognosis. However, low-risk showed...

10.1080/10543406.2019.1684310 article EN Journal of Biopharmaceutical Statistics 2019-10-30

Recently, a new ensemble classification method named Canonical Forest (CF) has been proposed by Chen et al. [Canonical forest. Comput Stat. 2014;29:849–867]. CF proven to give consistently good results in many data sets and comparable other widely used methods. However, requires an adopting feature reduction before classifying high-dimensional data. Here, we extend classifier incorporating random subspace algorithm [Ho TK. The for constructing decision forests. IEEE Trans Pattern Anal Mach...

10.1080/00949655.2016.1231191 article EN Journal of Statistical Computation and Simulation 2016-09-14

Recent advances in molecular biology (e.g., cDNA microarray technology) enables the simultaneous monitoring of expression level thousands genes. Due to massive amount complex data generated, sophisticated statistical approaches are necessary order properly address experimental investigation. In this paper, we present analysis derived from bone regeneration experiments. Several interesting features these distinguish it commonly used experiment (i.e., separate hybridization mRNA samples...

10.1081/bip-200025652 article EN Journal of Biopharmaceutical Statistics 2004-10-08

10.1111/insr.12061 article International Statistical Review 2014-07-25
Coming Soon ...