- Genomics and Phylogenetic Studies
- Algorithms and Data Compression
- Advanced Data Storage Technologies
- Parallel Computing and Optimization Techniques
- Machine Learning in Bioinformatics
- RNA and protein synthesis mechanisms
- Chromosomal and Genetic Variations
- DNA and Biological Computing
- Evolutionary Algorithms and Applications
- Network Packet Processing and Optimization
- Cloud Computing and Resource Management
- Nanopore and Nanochannel Transport Studies
- Caching and Content Delivery
- Genetics, Bioinformatics, and Biomedical Research
- Advanced Image and Video Retrieval Techniques
- Genomics and Chromatin Dynamics
- Molecular Biology Techniques and Applications
- RNA modifications and cancer
- Advanced Memory and Neural Computing
- Ferroelectric and Negative Capacitance Devices
- Natural Language Processing Techniques
- Gene expression and cancer classification
- Interconnection Networks and Systems
- Protist diversity and phylogeny
- Microbial Community Ecology and Physiology
BioNano Genomics (United States)
2022-2024
Carnegie Mellon University Australia
2024
Carnegie Mellon University
2018-2023
Associazione Medici Diabetologi
2021
Intel (United States)
2020
Seed location filtering is critical in DNA read mapping, a process where billions of fragments (reads) sampled from donor are mapped onto reference genome to identify genomic variants the donor. State-of-the-art mappers 1) quickly generate possible mapping locations for seeds (i.e., smaller segments) within each read, 2) extract sequences at locations, and 3) check similarity between its associated with computationally-expensive algorithm sequence alignment) determine origin read. A seed...
Genome sequence analysis has enabled significant advancements in medical and scientific areas such as personalized medicine, outbreak tracing, the understanding of evolution. To perform genome sequencing, devices extract small random fragments an organism's DNA (known reads). The first step is a computational process known read mapping. In mapping, each fragment matched to its potential location reference with goal identifying original genome. Unfortunately, rapid sequencing currently...
Generating the hash values of short subsequences, called seeds, enables quickly identifying similarities between genomic sequences by matching seeds with a single lookup their values. However, these can be used only for finding exact-matching as conventional hashing methods assign distinct different including highly similar seeds. Finding causes either (i) increasing use costly sequence alignment or (ii) limited sensitivity. We introduce
Genome analysis fundamentally starts with a process known as read mapping, where sequenced fragments of an organism's genome are compared against reference genome. Read mapping is currently major bottleneck in the entire pipeline, because state-of-the-art sequencing technologies able to sequence much faster than computational techniques employed analyze We describe ongoing journey significantly improving performance mapping. explain algorithmic methods and hardware-based acceleration...
It has become increasingly difficult to understand the complex interaction between modern applications and main memory, composed of Dynamic Random Access Memory (DRAM) chips. Manufacturers researchers are developing many different types DRAM, with each DRAM type catering needs (e.g., high throughput, low power, memory density). At same time, access patterns prevalent emerging rapidly diverging, as these manipulate larger data sets in very ways. As a result, combined DRAM-workload behavior is...
Long reads produced by third-generation sequencing technologies are used to construct an assembly (i.e., the subject's genome), which is further in downstream genome analysis. Unfortunately, long have high error rates and a large proportion of bps these incorrectly identified. These errors propagate affect accuracy Assembly polishing algorithms minimize such propagation or fixing using information from alignments between read-to-assembly alignment information). However, can only polish...
Modern data-intensive applications demand high computation capabilities with strict power constraints. Unfortunately, such suffer from a significant waste of both execution cycles and energy in current computing systems due to the costly data movement between units memory units. Genome analysis weather prediction are two examples applications. Recent FPGAs couple reconfigurable fabric high-bandwidth (HBM) enable more efficient improve overall performance efficiency. This trend is an example...
Read mapping is a fundamental step in many genomics applications. It used to identify potential matches and differences between fragments (called reads) of sequenced genome an already known reference genome). costly because it needs perform approximate string matching (ASM) on large amounts data. To address the computational challenges analysis, prior works propose various approaches such as accurate filters that select reads within dataset genomic read set) must undergo expensive...
A critical step of genome sequence analysis is the mapping sequenced DNA fragments (i.e., reads) collected from an individual to a known linear reference sequence-to-sequence mapping). Recent works replace with graph-based representation genome, which captures genetic variations and diversity across many individuals in population. Mapping reads sequence-to-graph mapping) results notable quality improvements analysis. Unfortunately, while well studied available tools accelerators, more...
AirLift is the first read remapping tool that enables users to quickly and comprehensively map a set, had been previously mapped one reference genome, another similar reference. Users can then run downstream analysis of sets for each latest release. Compared state-of-the-art method reads (i.e., full mapping), reduces overall execution time remap between two genome versions by up 27.4×. We validate our results with GATK find provides high accuracy in identifying ground truth SNP/INDEL variants.
Pairwise sequence alignment is a very time-consuming step in common bioinformatics pipelines. Speeding up this requires heuristics, efficient implementations, and/or hardware acceleration. A promising candidate for all of the above recently proposed GenASM algorithm. We identify and address three inefficiencies algorithm: it has high amount data movement, large memory footprint, does some unnecessary work. propose Scrooge, fast memory-frugal genomic aligner. Scrooge includes novel...
Nanopore sequencing is a widely-used high-throughput genome technology that can sequence long fragments of into raw electrical signals at low cost. requires two computationally-costly processing steps for accurate downstream analysis. The first step, basecalling, translates the nucleotide bases (i.e., A, C, G, T). second read mapping, finds correct location in reference genome. In existing analysis pipelines, basecalling and mapping are executed separately. We observe this work such separate...
Read mapping is a fundamental, yet computationally-expensive step in many genomics applications. It used to identify potential matches and differences between fragments (called reads) of sequenced genome an already known reference genome). To address the computational challenges analysis, prior works propose various approaches such as filters that select reads must undergo expensive computation, efficient heuristics, hardware acceleration. While effective at reducing computation overhead,...
It has become increasingly difficult to understand the complex interactions between modern applications and main memory, composed of Dynamic Random Access Memory (DRAM) chips. Manufacturers are now selling proposing many different types DRAM, with each DRAM type catering needs (e.g., high throughput, low power, memory density). At same time, access patterns prevalent emerging rapidly diverging, as these manipulate larger data sets in very ways. As a result, combined DRAM-workload behavior is...
It has become increasingly difficult to understand the complex interaction between modern applications and main memory, composed of Dynamic Random Access Memory (DRAM) chips. Manufacturers researchers are developing many different types DRAM, with each DRAM type catering needs (e.g., high throughput, low power, memory density). At same time, access patterns prevalent emerging rapidly diverging, as these manipulate larger data sets in very ways. As a result, combined DRAM-workload behavior is...
Generating the hash values of short subsequences, called seeds, enables quickly identifying similarities between genomic sequences by matching seeds with a single lookup their values. However, these can be used only for finding exact-matching as conventional hashing methods assign distinct different including highly similar seeds. Finding causes either 1) increasing use costly sequence alignment or 2) limited sensitivity. We introduce BLEND, first efficient and accurate mechanism that...
Profile hidden Markov models (pHMMs) are widely employed in various bioinformatics applications to identify similarities between biological sequences, such as DNA or protein sequences. In pHMMs, sequences represented graph structures, where states and edges capture modifications (i.e., insertions, deletions, substitutions) by assigning probabilities them. These subsequently used compute the similarity score a sequence pHMM graph. The Baum-Welch algorithm, prevalent highly accurate method,...
Abstract Motivation A genome read dataset can be quickly and efficiently remapped from one reference to another similar (e.g., between two versions or species) using a variety of tools, e.g., the commonly used CrossMap tool. With explosion available genomic datasets references, high-performance remapping tools will even more important for keeping up with computational demands assembly analysis. Results We provide FastRemap, fast efficient tool reads assemblies. FastRemap provides 7.82×...
Optical genome maps (OGM) from Bionano enable the detection of genomic structural and copy number variants that cannot be detected by next-generation sequencing (NGS) technologies are often missed conventional cytogenetic techniques. has developed bioinformatics pipelines for calling including Solve de novo assembly pipeline constitutional analysis Rare Variant Analysis (RVA) low allele-fraction cancer applications.
Abstract Background Optical genome maps (OGM) from Bionano enable the detection of genomic structural and copy number variants that cannot be detected by next-generation sequencing (NGS) technologies are often missed conventional cytogenetic techniques. has developed bioinformatics pipelines for calling including Solve de novo assembly pipeline constitutional analysis Rare Variant Analysis (RVA) low-allele-fraction cancer applications. Both computationally intensive currently take 5-10 hours...