- Machine Learning and Data Classification
- Gene expression and cancer classification
- Face and Expression Recognition
- Imbalanced Data Classification Techniques
- Explainable Artificial Intelligence (XAI)
- Bioinformatics and Genomic Networks
- Adversarial Robustness in Machine Learning
- Molecular Biology Techniques and Applications
- Data Mining Algorithms and Applications
- Neural Networks and Applications
- Data Stream Mining Techniques
- Bayesian Modeling and Causal Inference
- Lung Cancer Treatments and Mutations
- Reinforcement Learning in Robotics
- Epigenetics and DNA Methylation
- Auction Theory and Applications
- Text and Document Classification Technologies
- Colorectal Cancer Treatments and Studies
- Radiomics and Machine Learning in Medical Imaging
- Advanced Graph Neural Networks
- Bone Metabolism and Diseases
- Cancer-related gene regulation
- Functional Brain Connectivity Studies
- Machine Learning in Bioinformatics
- Artificial Intelligence in Healthcare and Education
Stony Brook University
2008-2022
State University of New York
2001-2014
SUNY Korea
2013
The healing of skeletal fractures is essentially a replay bone development, involving the closely regulated, interdependent processes chondrogenesis and osteogenesis. Using rat femur model to determine degree transcriptional complexity these processes, suppressive subtractive hybridization (SSH) was performed between RNA isolated from intact that callus post-fracture (PF) days 3, 5, 7, 10 as means identifying up-regulated genes in regenerative process. Analysis 3,635 cDNA clones revealed 588...
Standard classification algorithms are generally designed to maximize the number of correct predictions (concordance). The criterion maximizing concordance may not be appropriate in certain applications. In practice, some applications emphasize high sensitivity (e.g., clinical diagnostic tests) and others specificity epidemiology screening studies). This paper considers effects decision threshold on sensitivity, specificity, for four methods: logistic regression, tree, Fisher's linear...
Counterfactual examples (CFs) are one of the most popular methods for attaching post-hoc explanations to machine learning (ML) models. However, existing CF generation either exploit internals specific models or depend on each sample's neighborhood, thus they hard generalize complex and inefficient large datasets. This work aims overcome these limitations introduces ReLAX, a model-agnostic algorithm generate optimal counterfactual explanations. Specifically, we formulate problem crafting CFs...
Counterfactual examples (CFs) are one of the most popular methods for attaching post-hoc explanations to machine learning (ML) models. However, existing CF generation either exploit internals specific models or depend on each sample's neighborhood; thus, they hard generalize complex and inefficient large datasets. This work aims overcome these limitations introduces RELAX, a model-agnostic algorithm generate optimal counterfactual explanations. Specifically, we formulate problem crafting CFs...
This article proposes a method for multiclass classification problems using ensembles of multinomial logistic regression models. A logit model is used as base classifier in from random partitions predictors. The can be applied to each mutually exclusive subset the feature space without variable selection. By combining multiple models proposed handle huge database constraint needed analyzing high-dimensional data, and partition improve prediction accuracy by reducing correlation among...
Recently, graph neural networks (GNNs) have been widely used to develop successful recommender systems. Although powerful, it is very difficult for a GNN-based system attach tangible explanations of why specific item ends up in the list suggestions given user. Indeed, explaining recommendations unique, and existing GNN explanation methods are inappropriate two reasons. First, traditional designed node, edge, or classification tasks rather than ranking, as Second, standard machine learning...
In many real-world tasks, acquiring features requires a certain cost, which gives rise to the costly classification problem. this study, We formulate problem in reinforcement learning framework and sequentially select subset of make balance between error feature cost. Specifically, advantage actor critic algorithm is firstly used solve it. Furthermore, improve learned policy it explainable, we employ Monte Carlo Tree Search update iteratively. During procedure, also consider its performance...
Personalized medicine is defined by the use of genomic signatures patients to assign effective therapies. We present Classification Ensembles from Random Partitions (CERP) for class prediction and apply CERP data on leukemia with several clinical variables breast cancer patients. performs consistently well compared other classification algorithms. The predictive accuracy can be improved adding some relevant clinical/histopathological measurements data.
We apply robust classification algorithms to high-dimensional genomic data find biomarkers, by analyzing variable importance, that enable a better diagnosis of disease, an earlier intervention, or more effective assignment therapies. The goal is use importance ranking isolate set important genes can be used classify life-threatening diseases with respect prognosis type maximize efficacy minimize toxicity in personalized treatment such diseases. A method and present several other methods...
Binary tree classification has been useful for classifying the whole population based on levels of outcome variable that is associated with chosen predictors. Often we start a large number candidate predictors, and each predictor takes different cutoff values. Because these types multiplicity, binary method subject to severe type I error probability. Nonetheless, there have not many publications address this issue. In paper, propose control probability accept below certain level, say 5%.
Counterfactual examples (CFs) are one of the most popular methods for attaching post-hoc explanations to machine learning (ML) models. However, existing CF generation either exploit internals specific models or depend on each sample's neighborhood, thus they hard generalize complex and inefficient large datasets. This work aims overcome these limitations introduces ReLAX, a model-agnostic algorithm generate optimal counterfactual explanations. Specifically, we formulate problem crafting CFs...
The purpose of the research is to develop a statistical decision support algorithm for patients who may benefit from Adjuvant Cisplatin/Vinorelbine (ACT) and improve their survival rates. Genome-wide microarray data are used identify feasible sets genes probe that constitute gene signature. available at National Center Biotechnology Information Gene Expression Omnibus (GSE14814). Preliminary studies have shown high-risk received ACT resulted in an improved prognosis. However, low-risk showed...
Recently, a new ensemble classification method named Canonical Forest (CF) has been proposed by Chen et al. [Canonical forest. Comput Stat. 2014;29:849–867]. CF proven to give consistently good results in many data sets and comparable other widely used methods. However, requires an adopting feature reduction before classifying high-dimensional data. Here, we extend classifier incorporating random subspace algorithm [Ho TK. The for constructing decision forests. IEEE Trans Pattern Anal Mach...
Recent advances in molecular biology (e.g., cDNA microarray technology) enables the simultaneous monitoring of expression level thousands genes. Due to massive amount complex data generated, sophisticated statistical approaches are necessary order properly address experimental investigation. In this paper, we present analysis derived from bone regeneration experiments. Several interesting features these distinguish it commonly used experiment (i.e., separate hybridization mRNA samples...