- Privacy-Preserving Technologies in Data
- Stochastic Gradient Optimization Techniques
- Cryptography and Data Security
- Face and Expression Recognition
- Anomaly Detection Techniques and Applications
- Statistical Methods and Inference
- Imbalanced Data Classification Techniques
- Renal Transplantation Outcomes and Treatments
- Machine Learning and Data Classification
- Traffic Prediction and Management Techniques
- Mobile Crowdsensing and Crowdsourcing
- Gene expression and cancer classification
- Fire Detection and Safety Systems
- Caching and Content Delivery
- Fault Detection and Control Systems
- Privacy, Security, and Data Protection
- Data Stream Mining Techniques
- Blockchain Technology Applications and Security
- Ethics in Clinical Research
- Ethics and Social Impacts of AI
Booz Allen Hamilton (United States)
2023-2024
Johns Hopkins University
2022
There are major efforts underway to make genome sequencing a routine part of clinical practice. A critical barrier these is achieving practical solutions for data ownership and integrity. Blockchain provides challenges in other realms, such as finance. However, its use genomics stymied due the difficulty storing large-scale on-chain, slow transaction speeds, limitations on querying. To overcome roadblocks, we developed private blockchain network store genomic variants reference-aligned reads...
Existing methods for out-of-distribution (OOD) detection use various techniques to produce a score, separate from classification, that determines how ``OOD'' an input is. Our insight is OOD can be simplified by using neural network architecture which effectively merge classification and into single step. Radial basis function networks (RBFNs) inherently link confidence detection; however, these have lost popularity due the difficult of training them in multi-layer fashion. In this work, we...
Machine learning is playing an increasingly critical role in health science with its capability of inferring valuable information from high-dimensional data. More training data provides greater statistical power to generate better models that can help decision-making healthcare. However, this often requires combining research and patient across institutions hospitals, which not always possible due privacy considerations. In paper, we outline a simple federated algorithm implementing...
LASSO regularized logistic regression is particularly useful for its built-in feature selection, allowing coefficients to be removed from deployment and producing sparse solutions. Differentially private versions of have been developed, but generally produce dense solutions, reducing the intrinsic utility penalty. In this paper, we present a differentially method that maintains hard zeros. Our key insight first train non-private model determine an appropriate privatized number non-zero use...
Linear $L_1$-regularized models have remained one of the simplest and most effective tools in data analysis, especially information retrieval problems where n-grams over text with TF-IDF or Okapi feature values are a strong easy baseline. Over past decade, screening rules risen popularity as way to reduce runtime for producing sparse regression weights $L_1$ models. However, despite increasing need privacy-preserving retrieval, best our knoweledge, no differentially private rule exists. In...
To the best of our knowledge, there are no methods today for training differentially private regression models on sparse input data. remedy this, we adapt Frank-Wolfe algorithm $L_1$ penalized linear to be aware inputs and use them effectively. In doing so, reduce time from $\mathcal{O}( T D S + N S)$ $\mathcal{O}(N \sqrt{D} \log{D} S^2)$, where $T$ is number iterations a sparsity rate $S$ dataset with $N$ rows $D$ features. Our results demonstrate that this procedure can runtime by factor...
As machine learning becomes increasingly prevalent in impactful decisions, recognizing when inference data is outside the model's expected input distribution paramount for giving context to predictions. Out-of-distribution (OOD) detection methods have been created this task. Such can be split into representation-based or logit-based from whether they respectively utilize embeddings predictions OOD detection. In contrast most papers which solely focus on one such group, we address both. We...
Data scientists often seek to identify the most important features in high-dimensional datasets. This can be done through $L_1$-regularized regression, but this become inefficient for very Additionally, regression leak information about individual datapoints a dataset. In paper, we empirically evaluate established baseline method feature selection with differential privacy, two-stage technique, and show that it is not stable under sparsity. makes perform poorly on real-world datasets, so...
In this article, we seek to elucidate challenges and opportunities for differential privacy within the federal government setting, as seen by a team of researchers, lawyers, data scientists working closely with U.S. government. After introducing privacy, highlight three significant which currently restrict use in We then provide two examples where can enhance capabilities agencies. The first example highlights how quantitative nature allows policy security officers release multiple versions...
LASSO regularized logistic regression is particularly useful for its built-in feature selection, allowing coefficients to be removed from deployment and producing sparse solutions. Differentially private versions of have been developed, but generally produce dense solutions, reducing the intrinsic utility penalty. In this paper, we present a differentially method that maintains hard zeros. Our key insight first train non-private model determine an appropriate privatized number non-zero use...