- Genomics and Phylogenetic Studies
- Machine Learning in Bioinformatics
- Genomics and Chromatin Dynamics
- Topic Modeling
- RNA and protein synthesis mechanisms
- Natural Language Processing Techniques
- Adversarial Robustness in Machine Learning
- Text and Document Classification Technologies
- Protein Structure and Dynamics
- Multimodal Machine Learning Applications
- Algorithms and Data Compression
- Genetics, Bioinformatics, and Biomedical Research
- AI in cancer detection
- Gene expression and cancer classification
- Advanced Malware Detection Techniques
- Advanced Control Systems Optimization
- Cell Image Analysis Techniques
- Epigenetics and DNA Methylation
- Domain Adaptation and Few-Shot Learning
- Advanced Image and Video Retrieval Techniques
- Machine Learning and Data Classification
- Machine Learning in Materials Science
- Generative Adversarial Networks and Image Synthesis
- Handwritten Text Recognition Techniques
- Network Packet Processing and Optimization
University of Virginia
2016-2021
Although various techniques have been proposed to generate adversarial samples for white-box attacks on text, little attention has paid a black-box attack, which is more realistic scenario. In this paper, we present novel algorithm, DeepWordBug, effectively small text perturbations in setting that forces deep-learning classifier misclassify input. We develop scoring strategies find the most important words modify such deep makes wrong prediction. Simple character-level transformations are...
Histone modifications are among the most important factors that control gene regulation. Computational methods predict expression from histone modification signals highly desirable for understanding their combinatorial effects in This knowledge can help developing 'epigenetic drugs' diseases like cancer. Previous studies quantifying relationship between and levels either failed to capture or relied on multiple separate predictions analysis. paper develops a unified discriminative framework...
Multi-label image classification is the task of predicting a set labels corresponding to objects, attributes or other entities present in an image. In this work we propose Classification Transformer (C-Tran), general framework for multi-label that leverages Transformers exploit complex dependencies among visual features and labels. Our approach consists encoder trained predict target given input masked labels, from convolutional neural network. A key ingredient our method label mask training...
Abstract Deep learning, which describes a class of machine learning algorithms, has recently showed impressive results across variety domains. Biology and medicine are data rich, but the complex often ill-understood. Problems this nature may be particularly well-suited to deep techniques. We examine applications biomedical problems—patient classification, fundamental biological processes, treatment patients—and discuss whether will transform these tasks or if sphere poses unique challenges....
State-of-the-art attacks on NLP models lack a shared definition of what constitutes successful attack. We distill ideas from past work into unified framework: natural language adversarial example is perturbation that fools the model and follows some linguistic constraints. then analyze outputs two state-of-the-art synonym substitution attacks. find their perturbations often do not preserve semantics, 38% introduce grammatical errors. Human surveys reveal to successfully we need significantly...
The past decade has seen a revolution in genomic technologies that enable flood of genome-wide profiling chromatin marks. Recent literature tried to understand gene regulation by predicting expression from large-scale measurements. Two fundamental challenges exist for such learning tasks: (1) signals are spatially structured, high-dimensional and highly modular; (2) the core aim is what relevant factors how they work together? Previous studies either failed model complex dependencies among...
Abstract The past decade has seen a revolution in genomic technologies that enabled flood of genome-wide profiling chromatin marks. Recent literature tried to understand gene regulation by predicting expression from large-scale measurements. Two fundamental challenges exist for such learning tasks: (1) signals are spatially structured, high-dimensional and highly modular; (2) the core aim is what relevant factors how they work together. Previous studies either failed model complex...
This paper applies a deep convolutional/highway MLP framework to classify genomic sequences on the transcription factor binding site task. To make model understandable, we propose an optimization driven strategy extract "motifs", or symbolic patterns which visualize positive class learned by network. We show that our system, Deep Motif (DeMo), extracts motifs are similar to, and in some cases outperform current well known motifs. In addition, find deeper consisting of multiple convolutional...
Post-training of language models, either through reinforcement learning, preference optimization or supervised finetuning, tends to sharpen the output probability distribution and reduce diversity generated responses. This is particularly a problem for creative generative tasks where varied responses are desired. %This impacts ability generate high quality synthetic data which becoming vital component model training. In this work we introduce Diverse Preference Optimization (DivPO), an...
Predicting protein properties such as solvent accessibility and secondary structure from its primary amino acid sequence is an important task in bioinformatics. Recently, a few deep learning models have surpassed the traditional window based multilayer perceptron. Taking inspiration image classification domain we propose convolutional neural network architecture, MUST-CNN, to predict properties. This architecture uses novel shift-and-stitch (MUST) technique generate fully dense per-position...
Predicting protein properties such as solvent accessibility and secondary structure from its primary amino acid sequence is an important task in bioinformatics. Recently, a few deep learning models have surpassed the traditional window based multilayer perceptron. Taking inspiration image classification domain we propose convolutional neural network architecture, MUST-CNN, to predict properties. This architecture uses novel shift-and-stitch (MUST) technique generate fully dense per-position...
Abstract Motivation Predictive models of DNA chromatin profile (i.e. epigenetic state), such as transcription factor binding, are essential for understanding regulatory processes and developing gene therapies. It is known that the 3D genome, or spatial structure DNA, highly influential in profile. Deep neural networks have achieved state art performance on prediction by using short windows sequences independently. These methods, however, ignore long-range dependencies when predicting...
Although various techniques have been proposed to generate adversarial samples for white-box attacks on text, little attention has paid black-box attacks, which are more realistic scenarios. In this paper, we present a novel algorithm, DeepWordBug, effectively small text perturbations in setting that forces deep-learning classifier misclassify input. We employ scoring strategies identify the critical tokens that, if modified, cause make an incorrect prediction. Simple character-level...
Deep neural network (DNN) models have recently obtained state-of-the-art prediction accuracy for the transcription factor binding (TFBS) site classification task. However, it remains unclear how these approaches identify meaningful DNA sequence signals and give insights as to why TFs bind certain locations. In this paper, we propose a toolkit called Motif Dashboard (DeMo Dashboard) which provides suite of visualization strategies extract motifs, or patterns from deep TFBS classification. We...
ABSTRACT Viruses such as SARS-CoV-2 infect the human body by forming interactions between virus proteins and proteins. However, experimental methods to find protein are inadequate: large scale experiments noisy, small slow expensive. Inspired recent successes of deep neural networks, we hypothesize that learning well-positioned aid augment biological experiments, hoping help identify more accurate virus-host interaction maps. Moreover, computational can quickly adapt predict how mutations...
Large language models have been shown to struggle with multi-step reasoning, and do not retain previous reasoning steps for future use. We propose a simple method solving both of these problems by allowing the model take Self-Notes. Unlike recent chain-of-thought or scratchpad approaches, can deviate from input context at any time explicitly think write down its thoughts. This allows perform on fly as it reads even integrate steps, thus enhancing memory useful information enabling reasoning....
Recent progress in using machine learning models for reasoning tasks has been driven by novel model architectures, large-scale pre-training protocols, and dedicated datasets fine-tuning. In this work, to further pursue these advances, we introduce a new data generator that integrates with an embodied agent. The generated consists of templated text queries answers, matched world-states encoded into database. are result both world dynamics the actions We show results several baseline on...
Viruses such as SARS-CoV-2 infect the human body by forming interactions between virus proteins and proteins. However, experimental methods to find protein are inadequate: large scale experiments noisy, small slow expensive. Inspired recent successes of deep neural networks, we hypothesize that learning well-positioned aid augment biological experiments, hoping help identify more accurate virus-host interaction maps. Moreover, computational can quickly adapt predict how mutations change with host
Through sequence-based classification, this paper tries to accurately predict the DNA binding sites of transcription factors (TFs) in an unannotated cellular context. Related methods literature fail perform such predictions accurately, since they do not consider sample distribution shift sequence segments from annotated (source) context (target) We, therefore, propose a method called "Transfer String Kernel" (TSK) that achieves improved prediction factor site (TFBS) using knowledge transfer...