Akshay Balsubramani

ORCID: 0000-0003-1545-9837
Publications
Citations
Views
---
Saved
---
About
Contact & Profiles
Research Areas
  • Machine Learning and Data Classification
  • Statistical Methods and Inference
  • Imbalanced Data Classification Techniques
  • Machine Learning and Algorithms
  • Single-cell and spatial transcriptomics
  • Face and Expression Recognition
  • Digital Media Forensic Detection
  • Data Stream Mining Techniques
  • RNA and protein synthesis mechanisms
  • Genomics and Phylogenetic Studies
  • Generative Adversarial Networks and Image Synthesis
  • Advanced Bandit Algorithms Research
  • Statistical Methods in Clinical Trials
  • Domain Adaptation and Few-Shot Learning
  • Advanced Statistical Process Monitoring
  • Markov Chains and Monte Carlo Methods
  • Face recognition and analysis
  • Advanced biosensing and bioanalysis techniques
  • Neural Networks and Applications
  • Congenital heart defects research
  • Bayesian Methods and Mixture Models
  • Optimization and Search Problems
  • Adversarial Robustness in Machine Learning
  • Cancer-related molecular mechanisms research
  • Statistical Mechanics and Entropy

Stanford University
2016-2023

Sanofi (United States)
2023

Stanford Medicine
2020

University of California, San Diego
2015-2016

UC San Diego Health System
2015

We consider a situation in which we see samples $\mathbb{R}^d$ drawn i.i.d. from some distribution with mean zero and unknown covariance A. wish to compute the top eigenvector of A an incremental fashion - algorithm that maintains estimate O(d) space, incrementally adjusts each new data point arrives. Two classical such schemes are due Krasulina (1969) Oja (1983). give finite-sample convergence rates for both.

10.48550/arxiv.1501.03796 preprint EN other-oa arXiv (Cornell University) 2015-01-01

A bstract mRNA based vaccines and therapeutics are gaining popularity usage across a wide range of conditions. One the critical issues when designing such mRNAs is sequence optimization. Even small proteins or peptides can be encoded by an enormously large number mRNAs. The actual have impact on several properties including expression, stability, immunogenicity, more. To enable selection optimal sequence, we developed CodonBERT, language model (LLM) for Unlike prior models, CodonBERT uses...

10.1101/2023.09.09.556981 preprint EN cc-by-nd bioRxiv (Cold Spring Harbor Laboratory) 2023-09-12

mRNA-based vaccines and therapeutics are gaining popularity usage across a wide range of conditions. One the critical issues when designing such mRNAs is sequence optimization. Even small proteins or peptides can be encoded by an enormously large number mRNAs. The actual mRNA have impact on several properties, including expression, stability, immunogenicity, more. To enable selection optimal sequence, we developed CodonBERT, language model (LLM) for Unlike prior models, CodonBERT uses codons...

10.1101/gr.278870.123 article EN Genome Research 2024-07-01

We propose a new algorithm for training generative adversarial networks that jointly learns latent codes both identities (e.g. individual humans) and observations specific photographs). By fixing the identity portion of codes, we can generate diverse images same subject, by observation portion, traverse manifold subjects while maintaining contingent aspects such as lighting pose. Our features pairwise scheme in which each sample from generator consists two with common code. Corresponding...

10.48550/arxiv.1705.07904 preprint EN other-oa arXiv (Cornell University) 2017-01-01

The intrinsic DNA sequence preferences and cell type–specific cooperative partners of transcription factors (TFs) are typically highly conserved. Hence, despite the rapid evolutionary turnover individual TF binding sites, predictive models genomic occupancy a in one species should generalize to closely matched types related species. To assess viability cross-species prediction, we train neural networks discriminate ChIP-seq peak locations from background evaluate their performance within...

10.1101/gr.275394.121 article EN cc-by-nc Genome Research 2022-01-18

We develop the laws of thermodynamics in terms general exponential families. By casting learning (log-loss minimization) problems max-entropy and statistical mechanics terms, we translate results to scenarios. extend well-known way which families characterize thermodynamic equilibria. Basic ideas work heat, advanced concepts cycles equipartition energy, find exact useful counterparts AI / statistics terms. These have broad implications for quantifying addressing distribution shift.

10.48550/arxiv.2501.02071 preprint EN arXiv (Cornell University) 2025-01-03

Distribution shifts -- where the training distribution differs from test can substantially degrade accuracy of machine learning (ML) systems deployed in wild. Despite their ubiquity real-world deployments, these are under-represented datasets widely used ML community today. To address this gap, we present WILDS, a curated benchmark 10 reflecting diverse range that naturally arise applications, such as across hospitals for tumor identification; camera traps wildlife monitoring; and time...

10.48550/arxiv.2012.07421 preprint EN other-oa arXiv (Cornell University) 2020-01-01

Lipid nanoparticles (LNPs) are the most widely used vehicles for mRNA vaccine delivery. The structure of lipids composing LNPs can have a major impact on effectiveness payload. Several properties should be optimized to improve delivery and expression including biodegradability, synthetic accessibility, transfection efficiency.

10.1093/bioinformatics/btae342 article EN cc-by Bioinformatics 2024-05-29

We give concentration bounds for martingales that are uniform over finite times and extend classical Hoeffding Bernstein inequalities. also demonstrate our to be optimal with a matching anti-concentration inequality, proved using the same method. Together these constitute finite-time version of law iterated logarithm, shed light on relationship between it central limit theorem.

10.48550/arxiv.1405.2639 preprint EN other-oa arXiv (Cornell University) 2014-01-01

SUMMARY A central remaining question in the post-genomic era is how genes interact to form biological pathways. Measurements of gene dependency across hundreds cell lines have been used cluster into ‘co-essential’ pathways, but this approach has limited by ubiquitous false positives. Here, we develop a statistical method that enables robust identification co-essentiality and yields genome-wide set functional modules. This almanac recapitulates diverse pathways protein complexes predicts...

10.1101/827071 preprint EN bioRxiv (Cold Spring Harbor Laboratory) 2019-11-01

Abstract Motivation In silico saturation mutagenesis (ISM) is a popular approach in computational genomics for calculating feature attributions on biological sequences that proceeds by systematically perturbing each position sequence and recording the difference model output. However, this method can be slow because requires performing number of forward passes proportional to length being examined. Results work, we propose modification ISM leverages principles compressed sensing require only...

10.1093/bioinformatics/btac385 article EN Bioinformatics 2022-06-09

We propose a new algorithmic framework for sequential hypothesis testing with i.i.d. data, which includes A/B testing, nonparametric two-sample and independence as special cases. It is novel in several ways: (a) it takes linear time constant space to compute on the fly, (b) has same power guarantee non-sequential version of test computational constraints up small factor, (c) accesses only many samples are required - its stopping adapts unknown difficulty problem. All our statistics...

10.48550/arxiv.1506.03486 preprint EN other-oa arXiv (Cornell University) 2015-01-01
Coming Soon ...