NFDI4DS | UHH-SEMS - Publication Details

Black-Box Generation of Adversarial Text Sequences to Evade Deep Learning Classifiers

OPENALEX - Publications

Ji Gao Jack Lanchantin Mary Lou Soffa Yanjun Qi

Although various techniques have been proposed to generate adversarial samples for white-box attacks on text, little attention has paid a black-box attack, which is more realistic scenario. In this paper, we present novel algorithm, DeepWordBug, effectively small text perturbations in setting that forces deep-learning classifier misclassify input. We develop scoring strategies find the most important words modify such deep makes wrong prediction. Simple character-level transformations are...

10.1109/spw.2018.00016 article EN 2018-05-01

DeepChrome: deep-learning for predicting gene expression from histone modifications

OPENALEX - Publications

Ritambhara Singh Jack Lanchantin Gabriel Robins Yanjun Qi

Histone modifications are among the most important factors that control gene regulation. Computational methods predict expression from histone modification signals highly desirable for understanding their combinatorial effects in This knowledge can help developing 'epigenetic drugs' diseases like cancer. Previous studies quantifying relationship between and levels either failed to capture or relied on multiple separate predictions analysis. paper develops a unified discriminative framework...

10.1093/bioinformatics/btw427 article EN cc-by-nc Bioinformatics 2016-08-29

General Multi-label Image Classification with Transformers

OPENALEX - Publications

Jack Lanchantin Tianlu Wang Vicente Ordóñez Yanjun Qi

Multi-label image classification is the task of predicting a set labels corresponding to objects, attributes or other entities present in an image. In this work we propose Classification Transformer (C-Tran), general framework for multi-label that leverages Transformers exploit complex dependencies among visual features and labels. Our approach consists encoder trained predict target given input masked labels, from convolutional neural network. A key ingredient our method label mask training...

10.1109/cvpr46437.2021.01621 article EN 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 2021-06-01

Opportunities and obstacles for deep learning in biology and medicine

OPENALEX - Publications

Travers Ching Daniel Himmelstein Brett K. Beaulieu‐Jones Alexandr A. Kalinin T. Brian and 31 more

Abstract Deep learning, which describes a class of machine learning algorithms, has recently showed impressive results across variety domains. Biology and medicine are data rich, but the complex often ill-understood. Problems this nature may be particularly well-suited to deep techniques. We examine applications biomedical problems—patient classification, fundamental biological processes, treatment patients—and discuss whether will transform these tasks or if sphere poses unique challenges....

10.1101/142760 preprint EN cc-by bioRxiv (Cold Spring Harbor Laboratory) 2017-05-28

DEEP MOTIF DASHBOARD: VISUALIZING AND UNDERSTANDING GENOMIC SEQUENCES USING DEEP NEURAL NETWORKS

OPENALEX - Publications

Jack Lanchantin Ritambhara Singh Beilun Wang Yanjun Qi

10.1142/9789813207813_0025 article EN Biocomputing 2016-11-22

Reevaluating Adversarial Examples in Natural Language

OPENALEX - Publications

John X. Morris Eli Lifland Jack Lanchantin Yangfeng Ji Yanjun Qi

State-of-the-art attacks on NLP models lack a shared definition of what constitutes successful attack. We distill ideas from past work into unified framework: natural language adversarial example is perturbation that fools the model and follows some linguistic constraints. then analyze outputs two state-of-the-art synonym substitution attacks. find their perturbations often do not preserve semantics, 38% introduce grammatical errors. Human surveys reveal to successfully we need significantly...

10.18653/v1/2020.findings-emnlp.341 preprint EN cc-by 2020-01-01

Attend and Predict: Understanding Gene Regulation by Selective Attention on Chromatin

OPENALEX - Publications

Ritambhara Singh Jack Lanchantin Arshdeep Sekhon Yanjun Qi

The past decade has seen a revolution in genomic technologies that enable flood of genome-wide profiling chromatin marks. Recent literature tried to understand gene regulation by predicting expression from large-scale measurements. Two fundamental challenges exist for such learning tasks: (1) signals are spatially structured, high-dimensional and highly modular; (2) the core aim is what relevant factors how they work together? Previous studies either failed model complex dependencies among...

10.48550/arxiv.1708.00339 preprint EN other-oa arXiv (Cornell University) 2017-01-01

Attend and Predict: Understanding Gene Regulation by Selective Attention on Chromatin

OPENALEX - Publications

Ritambhara Singh Jack Lanchantin Arshdeep Sekhon Yanjun Qi

Abstract The past decade has seen a revolution in genomic technologies that enabled flood of genome-wide profiling chromatin marks. Recent literature tried to understand gene regulation by predicting expression from large-scale measurements. Two fundamental challenges exist for such learning tasks: (1) signals are spatially structured, high-dimensional and highly modular; (2) the core aim is what relevant factors how they work together. Previous studies either failed model complex...

10.1101/329334 preprint EN cc-by-nc-nd bioRxiv (Cold Spring Harbor Laboratory) 2018-05-25

Deep Motif: Visualizing Genomic Sequence Classifications

OPENALEX - Publications

Jack Lanchantin Ritambhara Singh Zeming Lin Yanjun Qi

This paper applies a deep convolutional/highway MLP framework to classify genomic sequences on the transcription factor binding site task. To make model understandable, we propose an optimization driven strategy extract "motifs", or symbolic patterns which visualize positive class learned by network. We show that our system, Deep Motif (DeMo), extracts motifs are similar to, and in some cases outperform current well known motifs. In addition, find deeper consisting of multiple convolutional...

10.48550/arxiv.1605.01133 preprint EN other-oa arXiv (Cornell University) 2016-01-01

Diverse Preference Optimization

OPENALEX - Publications

Jack Lanchantin Angelica Chen Shehzaad Dhuliawala Ping Yu Jason Weston and 2 more

Post-training of language models, either through reinforcement learning, preference optimization or supervised finetuning, tends to sharpen the output probability distribution and reduce diversity generated responses. This is particularly a problem for creative generative tasks where varied responses are desired. %This impacts ability generate high quality synthetic data which becoming vital component model training. In this work we introduce Diverse Preference Optimization (DivPO), an...

10.48550/arxiv.2501.18101 preprint EN arXiv (Cornell University) 2025-01-29

MUST-CNN: A Multilayer Shift-and-Stitch Deep Convolutional Architecture for Sequence-Based Protein Structure Prediction

OPENALEX - Publications

Zeming Lin Jack Lanchantin Yanjun Qi

Predicting protein properties such as solvent accessibility and secondary structure from its primary amino acid sequence is an important task in bioinformatics. Recently, a few deep learning models have surpassed the traditional window based multilayer perceptron. Taking inspiration image classification domain we propose convolutional neural network architecture, MUST-CNN, to predict properties. This architecture uses novel shift-and-stitch (MUST) technique generate fully dense per-position...

10.1609/aaai.v30i1.10007 article EN Proceedings of the AAAI Conference on Artificial Intelligence 2016-02-21

MUST-CNN: A Multilayer Shift-and-Stitch Deep Convolutional Architecture for Sequence-based Protein Structure Prediction

OPENALEX - Publications

Zeming Lin Jack Lanchantin Yanjun Qi

Predicting protein properties such as solvent accessibility and secondary structure from its primary amino acid sequence is an important task in bioinformatics. Recently, a few deep learning models have surpassed the traditional window based multilayer perceptron. Taking inspiration image classification domain we propose convolutional neural network architecture, MUST-CNN, to predict properties. This architecture uses novel shift-and-stitch (MUST) technique generate fully dense per-position...

10.48550/arxiv.1605.03004 preprint EN other-oa arXiv (Cornell University) 2016-01-01

Graph convolutional networks for epigenetic state prediction using both sequence and 3D genome data

OPENALEX - Publications

Jack Lanchantin Yanjun Qi

Abstract Motivation Predictive models of DNA chromatin profile (i.e. epigenetic state), such as transcription factor binding, are essential for understanding regulatory processes and developing gene therapies. It is known that the 3D genome, or spatial structure DNA, highly influential in profile. Deep neural networks have achieved state art performance on prediction by using short windows sequences independently. These methods, however, ignore long-range dependencies when predicting...

10.1093/bioinformatics/btaa793 article EN Bioinformatics 2020-09-07

TOOLVERIFIER: Generalization to New Tools via Self-Verification

OPENALEX - Publications

Dheeraj Mekala Jason Weston Jack Lanchantin Roberta Raileanu María Lomelí and 2 more

10.18653/v1/2024.findings-emnlp.289 article EN 2024-01-01

Black-box Generation of Adversarial Text Sequences to Evade Deep Learning Classifiers

OPENALEX - Publications

Ji Gao Jack Lanchantin Mary Lou Soffa Yanjun Qi

Although various techniques have been proposed to generate adversarial samples for white-box attacks on text, little attention has paid black-box attacks, which are more realistic scenarios. In this paper, we present a novel algorithm, DeepWordBug, effectively small text perturbations in setting that forces deep-learning classifier misclassify input. We employ scoring strategies identify the critical tokens that, if modified, cause make an incorrect prediction. Simple character-level...

10.48550/arxiv.1801.04354 preprint EN other-oa arXiv (Cornell University) 2018-01-01

Deep Motif Dashboard: Visualizing and Understanding Genomic Sequences Using Deep Neural Networks

OPENALEX - Publications

Jack Lanchantin Ritambhara Singh Beilun Wang Yanjun Qi

Deep neural network (DNN) models have recently obtained state-of-the-art prediction accuracy for the transcription factor binding (TFBS) site classification task. However, it remains unclear how these approaches identify meaningful DNA sequence signals and give insights as to why TFs bind certain locations. In this paper, we propose a toolkit called Motif Dashboard (DeMo Dashboard) which provides suite of visualization strategies extract motifs, or patterns from deep TFBS classification. We...

10.48550/arxiv.1608.03644 preprint EN other-oa arXiv (Cornell University) 2016-01-01

Transfer Learning for Predicting Virus-Host Protein Interactions for Novel Virus Sequences

OPENALEX - Publications

Jack Lanchantin Tom Weingarten Arshdeep Sekhon Clint L. Miller Yanjun Qi

ABSTRACT Viruses such as SARS-CoV-2 infect the human body by forming interactions between virus proteins and proteins. However, experimental methods to find protein are inadequate: large scale experiments noisy, small slow expensive. Inspired recent successes of deep neural networks, we hypothesize that learning well-positioned aid augment biological experiments, hoping help identify more accurate virus-host interaction maps. Moreover, computational can quickly adapt predict how mutations...

10.1101/2020.12.14.422772 preprint EN cc-by-nc-nd bioRxiv (Cold Spring Harbor Laboratory) 2020-12-15

Learning to Reason and Memorize with Self-Notes

OPENALEX - Publications

Jack Lanchantin Shubham Toshniwal Jason Weston Arthur Szlam Sainbayar Sukhbaatar

Large language models have been shown to struggle with multi-step reasoning, and do not retain previous reasoning steps for future use. We propose a simple method solving both of these problems by allowing the model take Self-Notes. Unlike recent chain-of-thought or scratchpad approaches, can deviate from input context at any time explicitly think write down its thoughts. This allows perform on fly as it reads even integrate steps, thus enhancing memory useful information enabling reasoning....

10.48550/arxiv.2305.00833 preprint EN cc-by arXiv (Cornell University) 2023-01-01

A Data Source for Reasoning Embodied Agents

OPENALEX - Publications

Jack Lanchantin Sainbayar Sukhbaatar Gabriel Synnaeve Yuxuan Sun Kavya Srinet and 1 more

Recent progress in using machine learning models for reasoning tasks has been driven by novel model architectures, large-scale pre-training protocols, and dedicated datasets fine-tuning. In this work, to further pursue these advances, we introduce a new data generator that integrates with an embodied agent. The generated consists of templated text queries answers, matched world-states encoded into database. are result both world dynamics the actions We show results several baseline on...

10.1609/aaai.v37i7.26017 article EN Proceedings of the AAAI Conference on Artificial Intelligence 2023-06-26

Transfer learning for predicting virus-host protein interactions for novel virus sequences

OPENALEX - Publications

Jack Lanchantin Tom Weingarten Arshdeep Sekhon Clint L. Miller Yanjun Qi

Viruses such as SARS-CoV-2 infect the human body by forming interactions between virus proteins and proteins. However, experimental methods to find protein are inadequate: large scale experiments noisy, small slow expensive. Inspired recent successes of deep neural networks, we hypothesize that learning well-positioned aid augment biological experiments, hoping help identify more accurate virus-host interaction maps. Moreover, computational can quickly adapt predict how mutations change with host

10.1145/3459930.3469527 article EN 2021-07-30

Transfer String Kernel for Cross-Context DNA-Protein Binding Prediction

OPENALEX - Publications

Ritambhara Singh Jack Lanchantin Gabriel Robins Yanjun Qi

Through sequence-based classification, this paper tries to accurately predict the DNA binding sites of transcription factors (TFs) in an unannotated cellular context. Related methods literature fail perform such predictions accurately, since they do not consider sample distribution shift sequence segments from annotated (source) context (target) We, therefore, propose a method called "Transfer String Kernel" (TSK) that achieves improved prediction factor site (TFBS) using knowledge transfer...

10.1109/tcbb.2016.2609918 article EN publisher-specific-oa IEEE/ACM Transactions on Computational Biology and Bioinformatics 2016-09-15