- Machine Learning in Bioinformatics
- Machine Learning and Algorithms
- Genomics and Phylogenetic Studies
- Topic Modeling
- Generative Adversarial Networks and Image Synthesis
- Domain Adaptation and Few-Shot Learning
- RNA and protein synthesis mechanisms
- Natural Language Processing Techniques
- Bioinformatics and Genomic Networks
- Machine Learning in Materials Science
- Machine Learning and Data Classification
- Protein Structure and Dynamics
- Advanced Proteomics Techniques and Applications
- CRISPR and Genetic Engineering
- Microtubule and mitosis dynamics
- Computational Drug Discovery Methods
- Face recognition and analysis
- Face and Expression Recognition
- Photosynthetic Processes and Mechanisms
- AI in cancer detection
- Statistical Methods and Inference
- Protein Tyrosine Phosphatases
- Advanced Multi-Objective Optimization Algorithms
- Phosphodiesterase function and regulation
- Sparse and Compressive Sensing Techniques
Google (United States)
2017-2025
DeepMind (United Kingdom)
2024
Brain (Germany)
2022
University of Massachusetts Amherst
2012-2021
Merck & Co., Inc., Rahway, NJ, USA (United States)
2012
RTX (United States)
2011
Predicting the function of a protein from its amino acid sequence is long-standing challenge in bioinformatics. Traditional approaches use alignment to compare query either thousands models families or large databases individual sequences. Here we introduce ProteInfer, which instead employs deep convolutional neural networks directly predict variety functions – Enzyme Commission (EC) numbers and Gene Ontology (GO) terms an unaligned sequence. This approach provides precise predictions...
We present a method for synthesizing frontal, neutral-expression image of person's face given an input photograph. This is achieved by learning to generate facial landmarks and textures from features extracted facial-recognition network. Unlike previous generative approaches, our encoding feature vector largely invariant lighting, pose, expression. Exploiting this invariance, we train decoder network using only photographs. Since these photographs are well aligned, can decompose them into...
Abstract Optimizing enzymes to function in novel chemical environments is a central goal of synthetic biology, but optimization often hindered by rugged, expansive protein search space and costly experiments. In this work, we present TeleProt, an ML framework that blends evolutionary experimental data design diverse variant libraries, employ it improve the catalytic activity nuclease enzyme degrades biofilms accumulate on chronic wounds. After multiple rounds high-throughput experiments...
We introduce structured prediction energy networks (SPENs), a flexible framework for prediction. A deep architecture is used to define an function of candidate labels, and then predictions are produced by using back-propagation iteratively optimize the with respect labels. This captures dependencies between labels that would lead intractable graphical models, performs structure learning automatically discriminative features output. One natural application our technique multi-label...
Highlights•TeleProt is a method for combining evolutionary and assay data to design novel proteins•TeleProt achieved an improved hit rate diversity compared with directed evolution•TeleProt discovered nuclease enzyme 11-fold-improved specific activity•Zero-shot showed higher relative error-prone PCRSummaryOptimizing enzymes function in chemical environments central goal of synthetic biology, but optimization often hindered by rugged fitness landscape costly experiments. In this work, we...
Structured Prediction Energy Networks (SPENs) are a simple, yet expressive family of structured prediction models (Belanger and McCallum, 2016). An energy function over candidate outputs is given by deep network, predictions formed gradient-based optimization. This paper presents end-to-end learning for SPENs, where the discriminatively trained back-propagating through prediction. In our experience, approach substantially more accurate than SVM method Belanger McCallum (2016), as it allows...
Abstract Mapping a protein sequence to its underlying biological function is critical problem of increasing importance in biology. In this work, we propose ProtEx, retrieval-augmented approach for prediction that leverages exemplars from database improve accuracy and robustness enable generalization unseen classes. Our relies on novel multi-sequence pretraining task, fine-tuning strategy effectively conditions predictions retrieved exemplars. method achieves state-of-the-art results across...
Abstract Advancements in DNA synthesis and sequencing technologies have enabled a novel paradigm of protein design where machine learning (ML) models trained on experimental data are used to guide exploration fitness landscape. ML-guided directed evolution (MLDE) builds the success traditional unlocks strategies which make more efficient use data. Building an MLDE pipeline involves many choices across design-build-test-learn loop ranging from collection modeling, each has large impact...
Recent work demonstrated a large ensemble of convolutional neural networks (CNNs) outperforms industry-standard approaches at annotating protein sequences that are far from the training data. These results highlight potential deep learning to significantly advance sequence annotation, but this particular system is not practical tool for many biologists because computational burden making predictions using ensemble. In work, we fine-tune transformer model pre-trained on millions unlabeled...
Today when many practitioners run basic NLP on the entire web and large-volume traffic, faster methods are paramount to saving time energy costs. Recent advances in GPU hardware have led emergence of bi-directional LSTMs as a standard method for obtaining per-token vector representations serving input labeling tasks such NER (often followed by prediction linear-chain CRF). Though expressive accurate, these models fail fully exploit parallelism, limiting their computational efficiency. This...
The study and treatment of cancer is traditionally specialized to the cancer's primary site origin. However, certain phenotypes are shared across types have important implications for clinical care. To date, automating identification these characteristics from routine data - irrespective type impaired by tissue-specific variability limited labeled data. Whole-genome doubling one such phenotype; whole-genome events occur in nearly every significant prognostic implications. Using digitized...
Discrete black-box optimization problems are challenging for model-based (MBO) algorithms, such as Bayesian optimization, due to the size of search space and need satisfy combinatorial constraints. In particular, these methods require repeatedly solving a complex discrete global problem in inner loop, where popular heuristic inner-loop solvers introduce approximations difficult adapt response, we propose NN+MILP, general MBO framework using piecewise-linear neural networks surrogate models...