Kyle Hippe

ORCID: 0000-0001-9470-572X
Publications
Citations
Views
---
Saved
---
About
Contact & Profiles
Research Areas
  • Machine Learning in Bioinformatics
  • Genomics and Phylogenetic Studies
  • Protein Structure and Dynamics
  • Scientific Computing and Data Management
  • Computational Drug Discovery Methods
  • Machine Learning in Materials Science
  • RNA and protein synthesis mechanisms
  • Modular Robots and Swarm Intelligence
  • Cell Image Analysis Techniques
  • SARS-CoV-2 and COVID-19 Research
  • Genetics, Bioinformatics, and Biomedical Research
  • Image Retrieval and Classification Techniques
  • Robotics and Automated Systems
  • Advanced Image and Video Retrieval Techniques
  • Single-cell and spatial transcriptomics
  • Bioinformatics and Genomic Networks
  • Microbial Metabolic Engineering and Bioproduction
  • CRISPR and Genetic Engineering
  • Biomedical Text Mining and Ontologies
  • Innovative Microfluidic and Catalytic Techniques Innovation
  • Advanced Electron Microscopy Techniques and Applications
  • Digital Imaging for Blood Diseases
  • Virus-based gene therapy research
  • vaccines and immunoinformatics approaches
  • Color Science and Applications

Argonne National Laboratory
2022-2024

Pacific Lutheran University
2020-2021

We seek to transform how new and emergent variants of pandemic-causing viruses, specifically SARS-CoV-2, are identified classified. By adapting large language models (LLMs) for genomic data, we build genome-scale (GenSLMs) which can learn the evolutionary landscape SARS-CoV-2 genomes. pre-training on over 110 million prokaryotic gene sequences fine-tuning a SARS-CoV-2-specific model 1.5 genomes, show that GenSLMs accurately rapidly identify concern. Thus, our knowledge, represents one first...

10.1177/10943420231201154 article EN The International Journal of High Performance Computing Applications 2023-10-27

We seek to transform how new and emergent variants of pandemic-causing viruses, specifically SARS-CoV-2, are identified classified. By adapting large language models (LLMs) for genomic data, we build genome-scale (GenSLMs) which can learn the evolutionary landscape SARS-CoV-2 genomes. pre-training on over 110 million prokaryotic gene sequences fine-tuning a SARS-CoV-2-specific model 1.5 genomes, show that GenSLMs accurately rapidly identify concern. Thus, our knowledge, represents one first...

10.1101/2022.10.10.511571 preprint EN cc-by-nc-nd bioRxiv (Cold Spring Harbor Laboratory) 2022-10-11

With new developments in biomedical technology, it is now a viable therapeutic treatment to alter genes with techniques like CRISPR. At the same time, increasingly cheaper perform whole genome sequencing, resulting rapid advancement gene therapy and editing precision medicine. Understanding current industry academic applications of provides an important backdrop future scientific developments. Additionally, machine learning artificial intelligence allow for reduction time money spent...

10.2174/1566523221666210622164133 article EN Current Gene Therapy 2021-06-23

Cryo-electron microscopy (cryo-EM) has become a major experimental technique to determine the structures of large protein complexes and molecular assemblies, as evidenced by 2017 Nobel Prize. Although cryo-EM been drastically improved generate high-resolution three-dimensional (3D) maps that contain detailed structural information about macromolecules, computational methods for using data automatically build structure models are lagging far behind. The traditional model building approach is...

10.1002/wcms.1542 article EN Wiley Interdisciplinary Reviews Computational Molecular Science 2021-05-15

Advances in robotic automation, high-performance computing, and artificial intelligence encourage us to propose large, general-purpose science factories with the scale needed tackle large discovery problems support thousands of scientists.

10.1039/d3dd00142c article EN cc-by-nc Digital Discovery 2023-01-01

<abstract> <![CDATA[N4-methylcytosine (4mC) is a kind of DNA modification which could regulate multiple biological processes. Correctly identifying 4mC sites in genomic sequences can provide precise knowledge about their genetic roles. This study aimed to develop an ensemble model predict the mouse genome. In proposed model, were encoded by <italic>k-mer</italic>, enhanced nucleic acid composition and <italic>k</italic>-spaced pairs. Subsequently, these features optimized using minimum...

10.3934/mbe.2021167 article EN cc-by Mathematical Biosciences & Engineering 2021-01-01

Advances in robotic automation, high-performance computing (HPC), and artificial intelligence (AI) encourage us to conceive of science factories: large, general-purpose computation- AI-enabled self-driving laboratories (SDLs) with the generality scale needed both tackle large discovery problems support thousands scientists. Science factories require modular hardware software that can be replicated for (re)configured many applications. To this end, we propose a prototype factory architecture...

10.48550/arxiv.2308.09793 preprint EN cc-by arXiv (Cornell University) 2023-01-01

Predicting protein function from sequence is a main challenge in the computational biology field. Traditional methods that search sequences against existing databases may not work well practice, particularly when little or no homology exists database. We introduce ProLanGO2 method which utilizes natural language processing and machine learning techniques to tackle prediction problem with as input. Our has been benchmarked blindly latest Critical Assessment of Function Annotation algorithms...

10.1145/3388440.3414701 article EN 2020-09-21

The Estimation of Model Accuracy problem is a cornerstone in the field Bioinformatics. As CASP14, there are 79 global QA methods, and minority 39 residue-level methods with very few them working on protein complexes. Here, we introduce ZoomQA, novel, single-model method for assessing accuracy tertiary structure/complex prediction at residue level, which have many applications such as drug discovery. ZoomQA differs from others by considering change chemical physical features fragment...

10.1093/bib/bbab384 article EN Briefings in Bioinformatics 2021-09-08

In the upcoming decade, deep learning may revolutionize natural sciences, enhancing our capacity to model and predict occurrences. This could herald a new era of scientific exploration, bringing significant advancements across sectors from drug development renewable energy. To answer this call, we present DeepSpeed4Science initiative (deepspeed4science.ai) which aims build unique capabilities through AI system technology innovations help domain experts unlock today's biggest science...

10.48550/arxiv.2310.04610 preprint EN cc-by arXiv (Cornell University) 2023-01-01

Self Driving Labs (SDLs) that combine automation of experimental procedures with autonomous decision making are gaining popularity as a means increasing the throughput scientific workflows. The task identifying quantities supplied colored pigments match target color, color matching problem, provides simple and flexible SDL test case, it requires experiment proposal, sample creation, analysis, three common components in discovery applications. We present robotic solution to problem allows for...

10.1145/3624062.3624615 article EN 2023-11-10

Deduplication is a major focus for assembling and curating training datasets large language models (LLM) -- detecting eliminating additional instances of the same content in collections technical documents. Unrestrained, duplicates dataset increase costs lead to undesirable properties such as memorization trained or cheating on evaluation. Contemporary approaches document-level deduplication are often extremely expensive both runtime memory. We propose LSHBloom, an extension MinhashLSH,...

10.48550/arxiv.2411.04257 preprint EN arXiv (Cornell University) 2024-11-06

Artificial Intelligence models encoding biology and chemistry are opening new routes to high-throughput high-quality in-silico drug development. However, their training increasingly relies on computational scale, with recent protein language (pLM) hundreds of graphical processing units (GPUs). We introduce the BioNeMo Framework facilitate AI across GPUs. Its modular design allows integration individual components, such as data loaders, into existing workflows is open community contributions....

10.48550/arxiv.2411.10548 preprint EN arXiv (Cornell University) 2024-11-15

Large language models (LLMs) trained on vast biological datasets can learn motifs and correlations across the evolutionary landscape of natural proteins. LLMs then be used for de novo design novel proteins with specific structures, functions, physicochemical properties. We employ a pre-trained genome-scale model that uses codons as tokens integrate it into workflow targeted generation sequences. Our framework suggests new gene sequences are ranked downstream evaluation by metrics...

10.1145/3624062.3626087 article EN 2023-11-10

Abstract: Malaria caused by Plasmodium falciparum is one of the major infectious diseases in world. It essential to exploit an effective method predict secretory proteins malaria parasites develop cures and treatment. Biochemical assays can provide details for accurate identification proteins, but these methods are expensive time-consuming. In this paper, we summarized machine learningbased algorithms compared construction strategies between different computational methods. Also, discussed...

10.2174/0929867328666211005140625 article EN Current Medicinal Chemistry 2021-10-23

As the body of genomic product data increases at a much faster rate than can be annotated, computational analysis protein function has never been more important. In this research, we introduce novel prediction method HMMeta, which is based on prominent natural language technique Hidden Markov Models (HMM). With new representation sequence as language, trained unique HMM for each Gene Ontology (GO) term taken from UniProt database, in total 27,451 GO IDs leading to creation Models. We...

10.1145/3388440.3414702 article EN 2020-09-21

ABSTRACT Motivation It has been a challenge for biologists to determine 3D shapes of proteins from linear chain amino acids and understand how carry out life’s tasks. Experimental techniques, such as X-ray crystallography or Nuclear Magnetic Resonance, are time-consuming. This highlights the importance computational methods protein structure predictions. In field prediction, ranking predicted decoys selecting one closest native is known model quality assessment (QA), accuracy estimation...

10.1101/2021.01.28.428710 preprint EN cc-by-nc-nd bioRxiv (Cold Spring Harbor Laboratory) 2021-01-30

ABSTRACT Motivation The Estimation of Model Accuracy problem is a cornerstone in the field Bioinformatics. When predictions are made for proteins which we do not know native structure, run into an issue to tell how good tertiary structure prediction is, especially protein binding regions, useful drug discovery. Currently, most methods only evaluate overall quality decoy, and few can work on residue level complex. Here introduce ZoomQA, novel, single-model method assessing accuracy / complex...

10.1101/2021.01.28.428680 preprint EN bioRxiv (Cold Spring Harbor Laboratory) 2021-01-30

Self Driving Labs (SDLs) that combine automation of experimental procedures with autonomous decision making are gaining popularity as a means increasing the throughput scientific workflows. The task identifying quantities supplied colored pigments match target color, color matching problem, provides simple and flexible SDL test case, it requires experiment proposal, sample creation, analysis, three common components in discovery applications. We present robotic solution to problem allows for...

10.48550/arxiv.2310.00510 preprint EN other-oa arXiv (Cornell University) 2023-01-01
Coming Soon ...