Yixuan Li

ORCID: 0000-0003-3479-4323
Publications
Citations
Views
---
Saved
---
About
Contact & Profiles
Research Areas
  • Anomaly Detection Techniques and Applications
  • Domain Adaptation and Few-Shot Learning
  • Text and Document Classification Technologies
  • Data Stream Mining Techniques
  • Advanced Image and Video Retrieval Techniques
  • Machine Learning and Data Classification
  • Machine Learning in Healthcare
  • Advanced Neural Network Applications
  • Advanced Chemical Sensor Technologies
  • Meteorological Phenomena and Simulations
  • Artificial Intelligence in Healthcare and Education
  • Air Quality Monitoring and Forecasting
  • Bayesian Modeling and Causal Inference
  • Advanced Statistical Process Monitoring
  • Water Systems and Optimization
  • Visual Attention and Saliency Detection
  • Multimodal Machine Learning Applications
  • Fault Detection and Control Systems
  • Data Management and Algorithms
  • Advanced Sensor and Control Systems
  • Infrastructure Maintenance and Monitoring
  • Machine Learning and Algorithms
  • Adversarial Robustness in Machine Learning
  • Data-Driven Disease Surveillance
  • Smart Grid and Power Systems

University of Wisconsin–Madison
2023-2024

State Grid Corporation of China (China)
2024

10.1007/s11263-024-02117-4 article EN International Journal of Computer Vision 2024-06-23

Partial label learning (PLL) is an important problem that allows each training example to be labeled with a coarse candidate set the ground-truth included. However, in more practical but challenging scenario, annotator may miss and provide wrong set, which known as noisy PLL problem. To remedy this problem, we propose PiCO+ framework simultaneously disambiguates sets mitigates noise. Core PiCO+, develop novel disambiguation algorithm PiCO consists of contrastive module along class...

10.1109/tpami.2023.3342650 article EN IEEE Transactions on Pattern Analysis and Machine Intelligence 2023-12-13

10.1007/s11263-023-01895-7 article EN International Journal of Computer Vision 2023-09-20

Machine learning models deployed in the wild can be challenged by out-of-distribution (OOD) data from unknown classes. Recent advances OOD detection rely on distance measures to distinguish samples that are relatively far away in-distribution (ID) data. Despite promise, distance-based methods suffer curse-of-dimensionality problem, which limits efficacy high dimensional feature space. To combat this we propose a novel framework, Subspace Nearest Neighbor (SNN), for detection. In training,...

10.1609/aaai.v38i18.29960 article EN Proceedings of the AAAI Conference on Artificial Intelligence 2024-03-24

Multimodal large language models (MLLMs) have shown promising capabilities but struggle under distribution shifts, where evaluation data differ from instruction tuning distributions. Although previous works provided empirical evaluations, we argue that establishing a formal framework can characterize and quantify the risk of MLLMs is necessary to ensure safe reliable application in real world. By taking an information-theoretic perspective, propose first theoretical enables quantification...

10.48550/arxiv.2502.00577 preprint EN arXiv (Cornell University) 2025-02-01

10.1007/s11263-023-01916-5 article EN International Journal of Computer Vision 2023-10-06

Detecting out-of-distribution (OOD) samples is crucial for ensuring the safety of machine learning systems and has shaped field OOD detection. Meanwhile, several other problems are closely related to detection, including anomaly detection (AD), novelty (ND), open set recognition (OSR), outlier (OD). To unify these problems, a generalized framework was proposed, taxonomically categorizing five problems. However, Vision Language Models (VLMs) such as CLIP have significantly changed paradigm...

10.48550/arxiv.2407.21794 preprint EN arXiv (Cornell University) 2024-07-31

Out-of-distribution (OOD) generalization is critical for machine learning models deployed in the real world. However, achieving this can be fundamentally challenging, as it requires ability to learn invariant features across different domains or environments. In paper, we propose a novel framework HYPO (HYPerspherical OOD generalization) that provably learns domain-invariant representations hyperspherical space. particular, our algorithm guided by intra-class variation and inter-class...

10.48550/arxiv.2402.07785 preprint EN arXiv (Cornell University) 2024-02-12

The emergence of Data-centric AI (DCAI) represents a pivotal shift in development, redirecting focus from model refinement to prioritizing data quality. This paradigmatic transition emphasizes the critical role AI. While past approaches centered on refining models, they often overlooked potential imperfections, raising questions about true enhanced performance. DCAI advocates systematic engineering data, complementing existing efforts and playing vital driving success. has spurred innovation...

10.1145/3589335.3641297 article EN 2024-05-12

Using unlabeled data to regularize the machine learning models has demonstrated promise for improving safety and reliability in detecting out-of-distribution (OOD) data. Harnessing power of in-the-wild is non-trivial due heterogeneity both in-distribution (ID) OOD This lack a clean set samples poses significant challenges an optimal classifier. Currently, there research on formally understanding how helps detection. paper bridges gap by introducing new framework SAL (Separate And Learn) that...

10.48550/arxiv.2402.03502 preprint EN arXiv (Cornell University) 2024-02-05

Supervised learning aims to train a classifier under the assumption that training and test data are from same distribution. To ease above assumption, researchers have studied more realistic setting: out-of-distribution (OOD) detection, where may come classes unknown during (i.e., OOD data). Due unavailability diversity of data, good generalization ability is crucial for effective detection algorithms, corresponding theory still an open problem. study this paper investigates probably...

10.48550/arxiv.2404.04865 preprint EN arXiv (Cornell University) 2024-04-07

The surge in applications of large language models (LLMs) has prompted concerns about the generation misleading or fabricated information, known as hallucinations. Therefore, detecting hallucinations become critical to maintaining trust LLM-generated content. A primary challenge learning a truthfulness classifier is lack amount labeled truthful and hallucinated data. To address challenge, we introduce HaloScope, novel framework that leverages unlabeled LLM generations wild for hallucination...

10.48550/arxiv.2409.17504 preprint EN arXiv (Cornell University) 2024-09-25

Detecting data points deviating from the training distribution is pivotal for ensuring reliable machine learning. Extensive research has been dedicated to challenge, spanning classical anomaly detection techniques contemporary out-of-distribution (OOD) approaches. While OOD commonly relies on supervised learning a labeled in-distribution (ID) dataset, may treat entire ID as single class and disregard labels. This fundamental distinction raises significant question that yet be rigorously...

10.48550/arxiv.2405.18635 preprint EN arXiv (Cornell University) 2024-05-28

Deep learning models that aid in medical image assessment tasks must be both accurate and reliable to deployed within clinical settings. While deep have been shown highly across a variety of tasks, measures indicate the reliability these are less established. Increasingly, uncertainty quantification (UQ) methods being introduced inform users on model outputs. However, most existing cannot augmented previously validated because they not post hoc, change model's output. In this work, we...

10.1088/1361-6560/ad611d article EN cc-by Physics in Medicine and Biology 2024-07-09

Advancements in foundation models (FMs) have led to a paradigm shift machine learning. The rich, expressive feature representations from these pre-trained, large-scale FMs are leveraged for multiple downstream tasks, usually via lightweight fine-tuning of shallow fully-connected network following the representation. However, non-interpretable, black-box nature this prediction pipeline can be challenge, especially critical domains such as healthcare, finance, and security. In paper, we...

10.48550/arxiv.2412.14097 preprint EN arXiv (Cornell University) 2024-12-18

Out-of-distribution (OOD) learning often relies heavily on statistical approaches or predefined assumptions about OOD data distributions, hindering their efficacy in addressing multifaceted challenges of generalization and detection real-world deployment environments. This paper presents a novel framework for with human feedback, which can provide invaluable insights into the nature shifts guide effective model adaptation. Our capitalizes freely available unlabeled wild that captures...

10.48550/arxiv.2408.07772 preprint EN arXiv (Cornell University) 2024-08-14

Machine learning models deployed in the wild can be challenged by out-of-distribution (OOD) data from unknown classes. Recent advances OOD detection rely on distance measures to distinguish samples that are relatively far away in-distribution (ID) data. Despite promise, distance-based methods suffer curse-of-dimensionality problem, which limits efficacy high-dimensional feature space. To combat this we propose a novel framework, Subspace Nearest Neighbor (SNN), for detection. In training,...

10.48550/arxiv.2312.14452 preprint EN other-oa arXiv (Cornell University) 2023-01-01
Coming Soon ...