- Topic Modeling
- Natural Language Processing Techniques
- Catalytic Alkyne Reactions
- Machine Learning in Bioinformatics
- Machine Learning and Algorithms
- Machine Learning in Materials Science
- Microtubule and mitosis dynamics
- Domain Adaptation and Few-Shot Learning
- Cancer-related Molecular Pathways
- Handwritten Text Recognition Techniques
- Generative Adversarial Networks and Image Synthesis
- Genomics and Phylogenetic Studies
- Semantic Web and Ontologies
- Synthetic Organic Chemistry Methods
- Advanced Image Processing Techniques
- Machine Learning and Data Classification
- Algorithms and Data Compression
- Data Quality and Management
- Advanced Multi-Objective Optimization Algorithms
- Cyclopropane Reaction Mechanisms
- Asymmetric Synthesis and Catalysis
- Mass Spectrometry Techniques and Applications
- Asymmetric Hydrogenation and Catalysis
- Catalytic C–H Functionalization Methods
- Protein Structure and Dynamics
Google (United States)
2016-2021
Ghent University
2021
University of Massachusetts Amherst
2014-2020
Stevens Institute of Technology
2019-2020
Novartis (United States)
2018-2019
Buckingham Browne & Nichols
2012
Merck & Co., Inc., Rahway, NJ, USA (United States)
2010-2011
RTX (United States)
2010-2011
Chevron (Netherlands)
2006
Montana State University
1998-2003
Today when many practitioners run basic NLP on the entire web and large-volume traffic, faster methods are paramount to saving time energy costs. Recent advances in GPU hardware have led emergence of bi-directional LSTMs as a standard method for obtaining per-token vector representations serving input labeling tasks such NER (often followed by prediction linear-chain CRF). Though expressive accurate, these models fail fully exploit parallelism, limiting their computational efficiency. This...
Models for predicting phenotypic outcomes from genotypes have important applications to understanding genomic function and improving human health. Here, we develop a machine-learning system predict cell-type–specific epigenetic transcriptional profiles in large mammalian genomes DNA sequence alone. By use of convolutional neural networks, this identifies promoters distal regulatory elements synthesizes their content make effective gene expression predictions. We show that model predictions...
We introduce Performers, Transformer architectures which can estimate regular (softmax) full-rank-attention Transformers with provable accuracy, but using only linear (as opposed to quadratic) space and time complexity, without relying on any priors such as sparsity or low-rankness. To approximate softmax attention-kernels, Performers use a novel Fast Attention Via positive Orthogonal Random features approach (FAVOR+), may be of independent interest for scalable kernel methods. FAVOR+ also...
Rajarshi Das, Arvind Neelakantan, David Belanger, Andrew McCallum. Proceedings of the 15th Conference European Chapter Association for Computational Linguistics: Volume 1, Long Papers. 2017.
In a variety of application domains the content to be recommended users is associated with text. This includes research papers, movies plot summaries, news articles, blog posts, etc. Recommendation approaches based on latent factor models can extended naturally leverage text by employing an explicit mapping from factors. enables recommendations for new, unseen content, and may generalize better, since factors all items are produced compactly-parametrized model. Previous work has used topic...
When confronted with a substance of unknown identity, researchers often perform mass spectrometry on the sample and compare observed spectrum to library previously collected spectra identify molecule. While popular, this approach will fail molecules that are not in existing library. In response, we propose improve library's coverage by augmenting it synthetic predicted from candidate using machine learning. We contribute lightweight neural network model quickly predicts for small molecules,...
Patrick Verga, David Belanger, Emma Strubell, Benjamin Roth, Andrew McCallum. Proceedings of the 2016 Conference North American Chapter Association for Computational Linguistics: Human Language Technologies. 2016.
Permutations and matchings are core building blocks in a variety of latent variable models, as they allow us to align, canonicalize, sort data. Learning such models is difficult, however, because exact marginalization over these combinatorial objects intractable. In response, this paper introduces collection new methods for end-to-end learning that approximate discrete maximum-weight matching using the continuous Sinkhorn operator. iteration attractive it functions simple, easy-to-implement...
Image extension models have broad applications in image editing, computational photography and computer graphics. While inpainting has been extensively studied the literature, it is challenging to directly apply state-of-the-art methods as they tend generate blurry or repetitive pixels with inconsistent semantics. We introduce semantic conditioning discriminator of a generative adversarial network (GAN), achieve strong results on coherent semantics visually pleasing colors textures. also...
Abstract Understanding the relationship between amino acid sequence and protein function is a long-standing problem in molecular biology with far-reaching scientific implications. Despite six decades of progress, state-of-the-art techniques cannot annotate 1/3 microbial sequences, hampering our ability to exploit sequences collected from diverse organisms. In this paper, we explore an alternative methodology based on deep learning that learns unaligned their functional annotations across all...
Transformer models have achieved state-of-the-art results across a diverse range of domains. However, concern over the cost training attention mechanism to learn complex dependencies between distant inputs continues grow. In response, solutions that exploit structure and sparsity learned matrix blossomed. real-world applications involve long sequences, such as biological sequence analysis, may fall short meeting these assumptions, precluding exploration models. To address this challenge, we...
Predicting the function of a protein from its amino acid sequence is long-standing challenge in bioinformatics. Traditional approaches use alignment to compare query either thousands models families or large databases individual sequences. Here we instead employ deep convolutional neural networks directly predict variety functions – EC numbers and GO terms an unaligned sequence. This approach provides precise predictions which complement alignment-based methods, computational efficiency...
Abstract Models for predicting phenotypic outcomes from genotypes have important applications to understanding genomic function and improving human health. Here, we develop a machine-learning system predict cell type-specific epigenetic transcriptional profiles in large mammalian genomes DNA sequence alone. Using convolutional neural networks, this identifies promoters distal regulatory elements synthesizes their content make effective gene expression predictions. We show that model...
Compared to terminal alkynes, (methylthio)alkynes are generally superior substrates for the thermally promoted, Co2(CO)8 catalyzed Pauson-Khand reaction of enynes and allenynes, providing enones in higher yields with enhanced diastereoselectivity. Improvements yield dependent upon use 2,2,2-trifluoroethanol as co-solvent an apparent preference endo selectivity (ethoxy)alkynes also disclosed.
The choice of electronic environment about the metal atom is crucial to observed selectivity in Rh(I)-catalysed [4 + 2] cycloaddition. An account influence counterion on rate, diastereo-, enantio- and product described.
Recently, there has been great interest in learning how to best represent proteins, specifically with fixed-length embeddings. Deep become a popular tool for protein representation as model's hidden layers produce potentially useful vector TAPE introduced number of benchmark tasks and showed that semi-supervised learning, via pretraining language models on large corpus, improved performance downstream tasks. Two the (fluorescence prediction stability prediction) involve fitness landscapes....