- Machine Learning and Data Classification
- Imbalanced Data Classification Techniques
- Parallel Computing and Optimization Techniques
- Statistical Methods and Inference
- Gene expression and cancer classification
- Complex Network Analysis Techniques
- Advanced Statistical Methods and Models
- Advanced Data Storage Technologies
- Embedded Systems Design Techniques
- Face and Expression Recognition
- Neural Networks and Applications
- Bayesian Methods and Mixture Models
- Machine Learning and Algorithms
- Advanced Clustering Algorithms Research
- Bayesian Modeling and Causal Inference
- Molecular Biology Techniques and Applications
- AI-based Problem Solving and Planning
- Opinion Dynamics and Social Influence
- Semantic Web and Ontologies
- Anomaly Detection Techniques and Applications
- Distributed systems and fault tolerance
- Bioinformatics and Genomic Networks
- Cancer-related molecular mechanisms research
- Software Reliability and Analysis Research
- Reliability and Maintenance Optimization
University of Southern California
2013-2024
University of Hong Kong
2024
Southern California University for Professional Studies
2018-2021
China Information Technology Security Evaluation Center
2021
University of Tsukuba
2020-2021
University of Toronto
2013-2015
Princeton University
2011-2012
IBM (United States)
2012
Concordia University
2008-2010
Beijing University of Posts and Telecommunications
2006-2007
For high-dimensional classification, it is well known that naively performing the Fisher discriminant rule leads to poor results due diverging spectra and noise accumulation. Therefore, researchers proposed independence rules circumvent spectra, sparse mitigate issue of However, in biological applications, there are often a group correlated genes responsible for clinical outcomes, use covariance information can significantly reduce misclassification rates. In theory extent such error rate...
An umbrella algorithm and a graphical tool for asymmetric error control in binary classification.
This work demonstrates that a set of commercial and scale-out applications exhibit significant use superpages thus suffer from the fixed small superpage TLB structures some modern core designs. Other processors better cope with at expense using power-hungry slow fully-associative TLBs. We consider alternate designs allow all pages to freely share single, power-efficient fast set-associative TLB. propose prediction-guided multi-grain design uses prediction mechanism avoid multiple lookups in...
Conformal inference provides a rigorous statistical framework for uncertainty quantification in machine learning, enabling well-calibrated prediction sets with precise coverage guarantees any classification model. However, its reliance on the idealized assumption of perfect data exchangeability limits effectiveness presence real-world complications, such as low-quality labels -- widespread issue modern large-scale sets. This work tackles this open problem by introducing an adaptive conformal...
We propose a high dimensional classification method that involves nonparametric feature augmentation. Knowing marginal density ratios are the most powerful univariate classifiers, we use ratio estimates to transform original measurements. Subsequently, penalized logistic regression is invoked, taking as input newly transformed or augmented features. This procedure trains models equipped with local complexity and global simplicity, thereby avoiding curse of dimensionality while creating...
Abstract The Internet of Things (IoT) and Industrial 4.0 bring enormous potential benefits by enabling highly customised services applications, which create huge volume variety data. However, preserving the privacy in IoT against re-identification attacks is very challenging. In this work, we considered three main data types generated IoT: context , continuous media . We first proposed a stream anonymisation method based on k -anonymity for collected devices; then enhancing techniques both...
Motivated by problems of anomaly detection, this paper implements the Neyman-Pearson paradigm to deal with asymmetric errors in binary classification a convex loss. Given finite collection classifiers, we combine them and obtain new classifier that satisfies simultaneously two following properties high probability: (i) its probability type I error is below pre-specified level (ii), it has II close minimum possible. The proposed obtained solving an optimization problem empirical objective...
Motivated by the pressing needs for dissecting heterogeneous relationships in gene expression data, here we generalize squared Pearson correlation to capture a mixture of linear dependences between two real-valued variables, with or without an index variable that specifies line memberships. We construct generalized squares focusing on three aspects: exchangeability, no parametric model assumptions, and inference population-level parameters. To compute square from sample line-membership...
This paper presents the integration into GIPSY of Lucx's context calculus defined in Wan's PhD thesis. We start by defining different types tag sets, then we explain concept context, and operators. Finally, present how entities have been abstracted Java classes embedded system.
Online transaction processing (OLTP) workload performance suffers from instruction stalls; the footprint of a typical exceeds by far capacity an L1 cache, leading to ongoing cache thrashing. Several proposed techniques remove some stalls in exchange for error-prone instrumentation code base, or sharp increase L1-I unit area and power. Others reduce miss latency better utilizing shared L2 cache. SLICC [2], recently thread migration technique that exploits locality, is promising high core...
Based on a Gaussian mixture type model of K components, we derive eigen selection procedures that improve the usual spectral clustering algorithms in high-dimensional settings, which typically act top few eigenvectors an affinity matrix (e.g., X⊤X) derived from data X. Our principle formalizes two intuitions: (i) should be dropped when they have no power; (ii) some corresponding to smaller spiked eigenvalues due estimation inaccuracy. lead new algorithms: ESSC for = 2 and GESSC > 2. The...
Conversational case-based-reasoning (CCBR) provides a mixed-initiative dialog for guiding users to construct their problem description incrementally through question-answering sequence. Similarity calculation in CCBR, as traditional CBR, plays an important role the retrieval process since it decides quality of retrieved case. In this paper, we analyze different characteristics query (new case) between CCBR and argue that similarity method only takes features appearing into account, so called...
We describe a type system for platform called the General Intensional Programming System (GIPSY), designed to support intensional programming languages built upon logic and their imperative counter-parts execution model. In GIPSY, glues static dynamic typing between in its compiler runtime environments evaluation of expressions written various dialects language Lucid. The intensionality makes explicitly take into account multidimensional context with being first-class value that serves...
For high-dimensional classification, it is well known that naively performing the Fisher discriminant rule leads to poor results due diverging spectra and noise accumulation. Therefore, researchers proposed independence rules circumvent diverse spectra, sparse mitigate issue of However, in biological applications, there are often a group correlated genes responsible for clinical outcomes, use covariance information can significantly reduce misclassification rates. The extent such error rate...
Virtualization has become a magic bullet to increase utilization, improve security, lower costs, and reduce management overheads. In many scenarios, the number of virtual machines consolidated onto single processor grown even faster than hardware threads. This results in multiprogrammed virtualization where time-share core. Such fine-grain sharing comes at cost; each time machine gets scheduled by hypervisor, it effectively begins with "cold" cache, since any cache blocks accessed past have...
Most existing binary classification methods target on the optimization of overall risk and may fail to serve some real-world applications such as cancer diagnosis, where users are more concerned with misclassifying one specific class than other. Neyman-Pearson (NP) paradigm was introduced in this context a novel statistical framework for handling asymmetric type I/II error priorities. It seeks classifiers minimal II constrained I under user specified level. This article is first attempt...