Xin Tong

ORCID: 0000-0001-8534-3827
Publications
Citations
Views
---
Saved
---
About
Contact & Profiles
Research Areas
  • Machine Learning and Data Classification
  • Imbalanced Data Classification Techniques
  • Parallel Computing and Optimization Techniques
  • Statistical Methods and Inference
  • Gene expression and cancer classification
  • Complex Network Analysis Techniques
  • Advanced Statistical Methods and Models
  • Advanced Data Storage Technologies
  • Embedded Systems Design Techniques
  • Face and Expression Recognition
  • Neural Networks and Applications
  • Bayesian Methods and Mixture Models
  • Machine Learning and Algorithms
  • Advanced Clustering Algorithms Research
  • Bayesian Modeling and Causal Inference
  • Molecular Biology Techniques and Applications
  • AI-based Problem Solving and Planning
  • Opinion Dynamics and Social Influence
  • Semantic Web and Ontologies
  • Anomaly Detection Techniques and Applications
  • Distributed systems and fault tolerance
  • Bioinformatics and Genomic Networks
  • Cancer-related molecular mechanisms research
  • Software Reliability and Analysis Research
  • Reliability and Maintenance Optimization

University of Southern California
2013-2024

University of Hong Kong
2024

Southern California University for Professional Studies
2018-2021

China Information Technology Security Evaluation Center
2021

University of Tsukuba
2020-2021

University of Toronto
2013-2015

Princeton University
2011-2012

IBM (United States)
2012

Concordia University
2008-2010

Beijing University of Posts and Telecommunications
2006-2007

For high-dimensional classification, it is well known that naively performing the Fisher discriminant rule leads to poor results due diverging spectra and noise accumulation. Therefore, researchers proposed independence rules circumvent spectra, sparse mitigate issue of However, in biological applications, there are often a group correlated genes responsible for clinical outcomes, use covariance information can significantly reduce misclassification rates. In theory extent such error rate...

10.1111/j.1467-9868.2012.01029.x article EN Journal of the Royal Statistical Society Series B (Statistical Methodology) 2012-04-12

10.1016/j.ress.2003.11.002 article EN Reliability Engineering & System Safety 2004-01-06

An umbrella algorithm and a graphical tool for asymmetric error control in binary classification.

10.1126/sciadv.aao1659 article EN cc-by-nc Science Advances 2018-02-02

This work demonstrates that a set of commercial and scale-out applications exhibit significant use superpages thus suffer from the fixed small superpage TLB structures some modern core designs. Other processors better cope with at expense using power-hungry slow fully-associative TLBs. We consider alternate designs allow all pages to freely share single, power-efficient fast set-associative TLB. propose prediction-guided multi-grain design uses prediction mechanism avoid multiple lookups in...

10.1109/hpca.2015.7056034 preprint EN 2015-02-01

Conformal inference provides a rigorous statistical framework for uncertainty quantification in machine learning, enabling well-calibrated prediction sets with precise coverage guarantees any classification model. However, its reliance on the idealized assumption of perfect data exchangeability limits effectiveness presence real-world complications, such as low-quality labels -- widespread issue modern large-scale sets. This work tackles this open problem by introducing an adaptive conformal...

10.48550/arxiv.2501.18060 preprint EN arXiv (Cornell University) 2025-01-29

We propose a high dimensional classification method that involves nonparametric feature augmentation. Knowing marginal density ratios are the most powerful univariate classifiers, we use ratio estimates to transform original measurements. Subsequently, penalized logistic regression is invoked, taking as input newly transformed or augmented features. This procedure trains models equipped with local complexity and global simplicity, thereby avoiding curse of dimensionality while creating...

10.1080/01621459.2015.1005212 article EN Journal of the American Statistical Association 2015-02-06

Abstract The Internet of Things (IoT) and Industrial 4.0 bring enormous potential benefits by enabling highly customised services applications, which create huge volume variety data. However, preserving the privacy in IoT against re-identification attacks is very challenging. In this work, we considered three main data types generated IoT: context , continuous media . We first proposed a stream anonymisation method based on k -anonymity for collected devices; then enhancing techniques both...

10.1007/s10796-021-10116-w article EN cc-by Information Systems Frontiers 2021-05-11

Motivated by problems of anomaly detection, this paper implements the Neyman-Pearson paradigm to deal with asymmetric errors in binary classification a convex loss. Given finite collection classifiers, we combine them and obtain new classifier that satisfies simultaneously two following properties high probability: (i) its probability type I error is below pre-specified level (ii), it has II close minimum possible. The proposed obtained solving an optimization problem empirical objective...

10.48550/arxiv.1102.5750 preprint EN other-oa arXiv (Cornell University) 2011-01-01

Motivated by the pressing needs for dissecting heterogeneous relationships in gene expression data, here we generalize squared Pearson correlation to capture a mixture of linear dependences between two real-valued variables, with or without an index variable that specifies line memberships. We construct generalized squares focusing on three aspects: exchangeability, no parametric model assumptions, and inference population-level parameters. To compute square from sample line-membership...

10.1080/01621459.2024.2342639 article EN cc-by-nc-nd Journal of the American Statistical Association 2024-04-15

This paper presents the integration into GIPSY of Lucx's context calculus defined in Wan's PhD thesis. We start by defining different types tag sets, then we explain concept context, and operators. Finally, present how entities have been abstracted Java classes embedded system.

10.1109/compsac.2008.200 preprint EN 2008-01-01

Online transaction processing (OLTP) workload performance suffers from instruction stalls; the footprint of a typical exceeds by far capacity an L1 cache, leading to ongoing cache thrashing. Several proposed techniques remove some stalls in exchange for error-prone instrumentation code base, or sharp increase L1-I unit area and power. Others reduce miss latency better utilizing shared L2 cache. SLICC [2], recently thread migration technique that exploits locality, is promising high core...

10.1145/2485922.2485946 article EN 2013-06-23

Based on a Gaussian mixture type model of K components, we derive eigen selection procedures that improve the usual spectral clustering algorithms in high-dimensional settings, which typically act top few eigenvectors an affinity matrix (e.g., X⊤X) derived from data X. Our principle formalizes two intuitions: (i) should be dropped when they have no power; (ii) some corresponding to smaller spiked eigenvalues due estimation inaccuracy. lead new algorithms: ESSC for = 2 and GESSC > 2. The...

10.1080/01621459.2021.1917418 article EN Journal of the American Statistical Association 2021-04-17

Conversational case-based-reasoning (CCBR) provides a mixed-initiative dialog for guiding users to construct their problem description incrementally through question-answering sequence. Similarity calculation in CCBR, as traditional CBR, plays an important role the retrieval process since it decides quality of retrieved case. In this paper, we analyze different characteristics query (new case) between CCBR and argue that similarity method only takes features appearing into account, so called...

10.1109/iri-05.2005.1506511 article EN 2005-09-12

We describe a type system for platform called the General Intensional Programming System (GIPSY), designed to support intensional programming languages built upon logic and their imperative counter-parts execution model. In GIPSY, glues static dynamic typing between in its compiler runtime environments evaluation of expressions written various dialects language Lucid. The intensionality makes explicitly take into account multidimensional context with being first-class value that serves...

10.1145/1557626.1557642 article EN 2009-05-19

For high-dimensional classification, it is well known that naively performing the Fisher discriminant rule leads to poor results due diverging spectra and noise accumulation. Therefore, researchers proposed independence rules circumvent diverse spectra, sparse mitigate issue of However, in biological applications, there are often a group correlated genes responsible for clinical outcomes, use covariance information can significantly reduce misclassification rates. The extent such error rate...

10.48550/arxiv.1011.6095 preprint EN other-oa arXiv (Cornell University) 2010-01-01

Virtualization has become a magic bullet to increase utilization, improve security, lower costs, and reduce management overheads. In many scenarios, the number of virtual machines consolidated onto single processor grown even faster than hardware threads. This results in multiprogrammed virtualization where time-share core. Such fine-grain sharing comes at cost; each time machine gets scheduled by hypervisor, it effectively begins with "cold" cache, since any cache blocks accessed past have...

10.1109/hpca.2013.6522309 article EN 2013-02-01

Most existing binary classification methods target on the optimization of overall risk and may fail to serve some real-world applications such as cancer diagnosis, where users are more concerned with misclassifying one specific class than other. Neyman-Pearson (NP) paradigm was introduced in this context a novel statistical framework for handling asymmetric type I/II error priorities. It seeks classifiers minimal II constrained I under user specified level. This article is first attempt...

10.48550/arxiv.1508.03106 preprint EN other-oa arXiv (Cornell University) 2015-01-01
Coming Soon ...