- Anomaly Detection Techniques and Applications
- Domain Adaptation and Few-Shot Learning
- Text and Document Classification Technologies
- Data Stream Mining Techniques
- Advanced Image and Video Retrieval Techniques
- Machine Learning and Data Classification
- Machine Learning in Healthcare
- Advanced Neural Network Applications
- Advanced Chemical Sensor Technologies
- Meteorological Phenomena and Simulations
- Artificial Intelligence in Healthcare and Education
- Air Quality Monitoring and Forecasting
- Bayesian Modeling and Causal Inference
- Advanced Statistical Process Monitoring
- Water Systems and Optimization
- Visual Attention and Saliency Detection
- Multimodal Machine Learning Applications
- Fault Detection and Control Systems
- Data Management and Algorithms
- Advanced Sensor and Control Systems
- Infrastructure Maintenance and Monitoring
- Machine Learning and Algorithms
- Adversarial Robustness in Machine Learning
- Data-Driven Disease Surveillance
- Smart Grid and Power Systems
University of Wisconsin–Madison
2023-2024
State Grid Corporation of China (China)
2024
Partial label learning (PLL) is an important problem that allows each training example to be labeled with a coarse candidate set the ground-truth included. However, in more practical but challenging scenario, annotator may miss and provide wrong set, which known as noisy PLL problem. To remedy this problem, we propose PiCO+ framework simultaneously disambiguates sets mitigates noise. Core PiCO+, develop novel disambiguation algorithm PiCO consists of contrastive module along class...
Machine learning models deployed in the wild can be challenged by out-of-distribution (OOD) data from unknown classes. Recent advances OOD detection rely on distance measures to distinguish samples that are relatively far away in-distribution (ID) data. Despite promise, distance-based methods suffer curse-of-dimensionality problem, which limits efficacy high dimensional feature space. To combat this we propose a novel framework, Subspace Nearest Neighbor (SNN), for detection. In training,...
Multimodal large language models (MLLMs) have shown promising capabilities but struggle under distribution shifts, where evaluation data differ from instruction tuning distributions. Although previous works provided empirical evaluations, we argue that establishing a formal framework can characterize and quantify the risk of MLLMs is necessary to ensure safe reliable application in real world. By taking an information-theoretic perspective, propose first theoretical enables quantification...
Detecting out-of-distribution (OOD) samples is crucial for ensuring the safety of machine learning systems and has shaped field OOD detection. Meanwhile, several other problems are closely related to detection, including anomaly detection (AD), novelty (ND), open set recognition (OSR), outlier (OD). To unify these problems, a generalized framework was proposed, taxonomically categorizing five problems. However, Vision Language Models (VLMs) such as CLIP have significantly changed paradigm...
Out-of-distribution (OOD) generalization is critical for machine learning models deployed in the real world. However, achieving this can be fundamentally challenging, as it requires ability to learn invariant features across different domains or environments. In paper, we propose a novel framework HYPO (HYPerspherical OOD generalization) that provably learns domain-invariant representations hyperspherical space. particular, our algorithm guided by intra-class variation and inter-class...
The emergence of Data-centric AI (DCAI) represents a pivotal shift in development, redirecting focus from model refinement to prioritizing data quality. This paradigmatic transition emphasizes the critical role AI. While past approaches centered on refining models, they often overlooked potential imperfections, raising questions about true enhanced performance. DCAI advocates systematic engineering data, complementing existing efforts and playing vital driving success. has spurred innovation...
Using unlabeled data to regularize the machine learning models has demonstrated promise for improving safety and reliability in detecting out-of-distribution (OOD) data. Harnessing power of in-the-wild is non-trivial due heterogeneity both in-distribution (ID) OOD This lack a clean set samples poses significant challenges an optimal classifier. Currently, there research on formally understanding how helps detection. paper bridges gap by introducing new framework SAL (Separate And Learn) that...
Supervised learning aims to train a classifier under the assumption that training and test data are from same distribution. To ease above assumption, researchers have studied more realistic setting: out-of-distribution (OOD) detection, where may come classes unknown during (i.e., OOD data). Due unavailability diversity of data, good generalization ability is crucial for effective detection algorithms, corresponding theory still an open problem. study this paper investigates probably...
The surge in applications of large language models (LLMs) has prompted concerns about the generation misleading or fabricated information, known as hallucinations. Therefore, detecting hallucinations become critical to maintaining trust LLM-generated content. A primary challenge learning a truthfulness classifier is lack amount labeled truthful and hallucinated data. To address challenge, we introduce HaloScope, novel framework that leverages unlabeled LLM generations wild for hallucination...
Detecting data points deviating from the training distribution is pivotal for ensuring reliable machine learning. Extensive research has been dedicated to challenge, spanning classical anomaly detection techniques contemporary out-of-distribution (OOD) approaches. While OOD commonly relies on supervised learning a labeled in-distribution (ID) dataset, may treat entire ID as single class and disregard labels. This fundamental distinction raises significant question that yet be rigorously...
Deep learning models that aid in medical image assessment tasks must be both accurate and reliable to deployed within clinical settings. While deep have been shown highly across a variety of tasks, measures indicate the reliability these are less established. Increasingly, uncertainty quantification (UQ) methods being introduced inform users on model outputs. However, most existing cannot augmented previously validated because they not post hoc, change model's output. In this work, we...
Advancements in foundation models (FMs) have led to a paradigm shift machine learning. The rich, expressive feature representations from these pre-trained, large-scale FMs are leveraged for multiple downstream tasks, usually via lightweight fine-tuning of shallow fully-connected network following the representation. However, non-interpretable, black-box nature this prediction pipeline can be challenge, especially critical domains such as healthcare, finance, and security. In paper, we...
Out-of-distribution (OOD) learning often relies heavily on statistical approaches or predefined assumptions about OOD data distributions, hindering their efficacy in addressing multifaceted challenges of generalization and detection real-world deployment environments. This paper presents a novel framework for with human feedback, which can provide invaluable insights into the nature shifts guide effective model adaptation. Our capitalizes freely available unlabeled wild that captures...
Machine learning models deployed in the wild can be challenged by out-of-distribution (OOD) data from unknown classes. Recent advances OOD detection rely on distance measures to distinguish samples that are relatively far away in-distribution (ID) data. Despite promise, distance-based methods suffer curse-of-dimensionality problem, which limits efficacy high-dimensional feature space. To combat this we propose a novel framework, Subspace Nearest Neighbor (SNN), for detection. In training,...