- Adversarial Robustness in Machine Learning
- Machine Learning and Data Classification
- Retinoids in leukemia and cellular processes
- Face recognition and analysis
- Face and Expression Recognition
- Anomaly Detection Techniques and Applications
- Privacy-Preserving Technologies in Data
- Domain Adaptation and Few-Shot Learning
- Myeloproliferative Neoplasms: Diagnosis and Treatment
- Neutropenia and Cancer Infections
- Neurological Disorders and Treatments
- Topic Modeling
- Hematological disorders and diagnostics
- Blood Coagulation and Thrombosis Mechanisms
- Hepatitis C virus research
- Biometric Identification and Security
- Acute Myeloid Leukemia Research
- Face Recognition and Perception
- Multiple Myeloma Research and Treatments
- Autoimmune and Inflammatory Disorders Research
- Acute Lymphoblastic Leukemia research
- Artificial Intelligence in Healthcare and Education
- Ethics and Social Impacts of AI
- COVID-19 diagnosis using AI
- Hepatitis B Virus Studies
University of Maryland, College Park
2020-2023
University College London
2019-2021
National Technical University "Kharkiv Polytechnic Institute"
2020
Northern State Medical University
2014
City Clinical Hospital
2014
National Research Center for Hematology Russian Academy of Medical Sciences
2003
Kirov Research Institute of Hematology and Blood Transfusion FMBA
1995-1997
Ministry of Health of the Russian Federation
1997
Data poisoning and backdoor attacks manipulate victim models by maliciously modifying training data. In light of this growing threat, a recent survey industry professionals revealed heightened fear in the private sector regarding data poisoning. Many previous defenses against either fail face increasingly strong attacks, or they significantly degrade performance. However, we find that augmentations, such as mixup CutMix, can diminish threat without trading off We further verify effectiveness...
Abstract Background An accurate and simple risk prediction model that would facilitate earlier detection of pancreatic adenocarcinoma (PDAC) is not available at present. In this study, we compare different algorithms in order to select the best one for constructing a biomarker-based score, PancRISK. Methods Three hundred seventy-nine patients with measurements three urine biomarkers, (LYVE1, REG1B TFF1) using retrospectively collected samples, as well creatinine age, were randomly split into...
Large language models (LLMs) perform remarkably well on tabular datasets in zero- and few-shot settings, since they can extract meaning from natural column headers that describe features labels. Similarly, TabPFN, a recent non-LLM transformer pretrained numerous tables for in-context learning, has demonstrated excellent performance dataset sizes up to thousand samples. In contrast, gradient-boosted decision trees (GBDTs) are typically trained scratch each without benefiting pretraining data...
Prompt engineering has emerged as a powerful technique for optimizing large language models (LLMs) specific applications, enabling faster prototyping and improved performance, giving rise to the interest of community in protecting proprietary system prompts. In this work, we explore novel perspective on prompt privacy through lens membership inference. We develop Detective, statistical method reliably determine whether given was used by third-party model. Our approach relies test comparing...
Recent work on deep learning for tabular data demonstrates the strong performance of models, often bridging gap between gradient boosted decision trees and neural networks. Accuracy aside, a major advantage models is that they learn reusable features are easily fine-tuned in new domains. This property exploited computer vision natural language applications, where transfer indispensable when task-specific training scarce. In this work, we demonstrate upstream gives networks decisive over...
As the deployment of automated face recognition (FR) systems proliferates, bias in these is not just an academic question, but a matter public concern. Media portrayals often center imbalance as main source bias, i.e., that FR models perform worse on images non-white people or women because demographic groups are underrepresented training data. Recent research paints more nuanced picture this relationship. However, previous studies data have focused exclusively verification setting, while...
Data poisoning and backdoor attacks manipulate training data to induce security breaches in a victim model. These can be provably deflected using differentially private (DP) methods, although this comes with sharp decrease model performance. The InstaHide method has recently been proposed as an alternative DP that leverages supposed privacy properties of the mixup augmentation, without rigorous guarantees. In work, we show strong augmentations, such random additive noise, nullify poison...
Meta-learning algorithms produce feature extractors which achieve state-of-the-art performance on few-shot classification. While the literature is rich with meta-learning methods, little known about why resulting perform so well. We develop a better understanding of underlying mechanics and difference between models trained using are classically. In doing so, we introduce verify several hypotheses for meta-learned better. Furthermore, regularizer boosts standard training routines many cases,...
Facial recognition systems are increasingly deployed by private corporations, government agencies, and contractors for consumer services mass surveillance programs alike. These typically built scraping social media profiles user images. Adversarial perturbations have been proposed bypassing facial systems. However, existing methods fail on full-scale commercial APIs. We develop our own adversarial filter that accounts the entire image processing pipeline is demonstrably effective against...
Academic tabular benchmarks often contain small sets of curated features. In contrast, data scientists typically collect as many features possible into their datasets, and even engineer new from existing ones. To prevent overfitting in subsequent downstream modeling, practitioners commonly use automated feature selection methods that identify a reduced subset informative Existing for consider classical models, toy synthetic or do not evaluate selectors on the basis performance. Motivated by...
Much recent research has uncovered and discussed serious concerns of bias in facial analysis technologies, finding performance disparities between groups people based on perceived gender, skin type, lighting condition, etc. These audits are immensely important successful at measuring algorithmic but have two major challenges: the (1) use recognition datasets which lack quality metadata, like LFW CelebA, (2) do not compare their observed to biases human alternatives. In this paper, we release...
The results of a cross-sectional study 224 patients, Arkhangelsk residents, with the acute coronary syndrome (ACS) have been presented. Features disease course, approaches to treatment and complications in women ACS under age 55 compared men analyzed. It has determined that both elevated ST- segment without ST elevation were admitted hospitals much later. This was caused by inadequate prehospital assessment clinical picture frequent atypical manifestations. In hospital period, showed greater...
As machine learning algorithms have been widely deployed across applications, many concerns raised over the fairness of their predictions, especially in high stakes settings (such as facial recognition and medical imaging). To respond to these concerns, community has proposed formalized various notions well methods for rectifying unfair behavior. While constraints studied extensively classical models, effectiveness imposing on deep neural networks is unclear. In this paper, we observe that...
Detecting text generated by modern large language models is thought to be hard, as both LLMs and humans can exhibit a wide range of complex behaviors. However, we find that score based on contrasting two closely related highly accurate at separating human-generated machine-generated text. Based this mechanism, propose novel LLM detector only requires simple calculations using pair pre-trained LLMs. The method, called Binoculars, achieves state-of-the-art accuracy without any training data....
While tabular classification has traditionally relied on from-scratch training, a recent breakthrough called prior-data fitted networks (PFNs) challenges this approach. Similar to large language models, PFNs make use of pretraining and in-context learning achieve strong performance new tasks in single forward pass. However, current have limitations that prohibit their widespread adoption. Notably, TabPFN achieves very small datasets but is not designed predictions for size larger than 1000....
Large language models (LLMs) have been shown to be effective on tabular prediction tasks in the low-data regime, leveraging their internal knowledge and ability learn from instructions examples. However, LLMs can fail generate predictions that satisfy group fairness, is, produce equitable outcomes across groups. Critically, conventional debiasing approaches for natural do not directly translate mitigating unfairness settings. In this work, we systematically investigate four empirical improve...
Large language models (LLMs) exhibit excellent ability to understand human languages, but do they also their own that appears gibberish us? In this work we delve into question, aiming uncover the mechanisms underlying such behavior in LLMs. We employ Greedy Coordinate Gradient optimizer craft prompts compel LLMs generate coherent responses from seemingly nonsensical inputs. call these inputs LM Babel and systematically studies of manipulated by prompts. find manipulation efficiency depends...
Class-imbalanced data, in which some classes contain far more samples than others, is ubiquitous real-world applications. Standard techniques for handling class-imbalance usually work by training on a re-weighted loss or re-balanced data. Unfortunately, overparameterized neural networks such objectives causes rapid memorization of minority class To avoid this trap, we harness meta-learning, uses both an ''outer-loop'' and ''inner-loop'' loss, each may be balanced using different strategies....