NFDI4DS | UHH-SEMS - Publication Details

GRIM-Filter: Fast seed location filtering in DNA read mapping using processing-in-memory technologies

OPENALEX - Publications

Jeremie S. Kim Damla Senol Cali Hongyi Xin Donghyuk Lee Saugata Ghose and 5 more

Seed location filtering is critical in DNA read mapping, a process where billions of fragments (reads) sampled from donor are mapped onto reference genome to identify genomic variants the donor. State-of-the-art mappers 1) quickly generate possible mapping locations for seeds (i.e., smaller segments) within each read, 2) extract sequences at locations, and 3) check similarity between its associated with computationally-expensive algorithm sequence alignment) determine origin read. A seed...

10.1186/s12864-018-4460-0 article EN cc-by BMC Genomics 2018-05-01

GenASM: A High-Performance, Low-Power Approximate String Matching Acceleration Framework for Genome Sequence Analysis

OPENALEX - Publications

Damla Senol Cali Gurpreet S. Kalsi Zülal Bingöl Can Fırtına Lavanya Subramanian and 11 more

Genome sequence analysis has enabled significant advancements in medical and scientific areas such as personalized medicine, outbreak tracing, the understanding of evolution. To perform genome sequencing, devices extract small random fragments an organism's DNA (known reads). The first step is a computational process known read mapping. In mapping, each fragment matched to its potential location reference with goal identifying original genome. Unfortunately, rapid sequencing currently...

10.1109/micro50266.2020.00081 article EN 2020-10-01

BLEND: a fast, memory-efficient and accurate mechanism to find fuzzy seed matches in genome analysis

OPENALEX - Publications

Can Fırtına Jisung Park Mohammed Alser Jeremie S. Kim Damla Senol Cali and 6 more

Generating the hash values of short subsequences, called seeds, enables quickly identifying similarities between genomic sequences by matching seeds with a single lookup their values. However, these can be used only for finding exact-matching as conventional hashing methods assign distinct different including highly similar seeds. Finding causes either (i) increasing use costly sequence alignment or (ii) limited sensitivity. We introduce

10.1093/nargab/lqad004 article EN cc-by NAR Genomics and Bioinformatics 2023-01-10

Accelerating Genome Analysis: A Primer on an Ongoing Journey

OPENALEX - Publications

Mohammed Alser Zülal Bingöl Damla Senol Cali Jeremie Kim Saugata Ghose and 2 more

Genome analysis fundamentally starts with a process known as read mapping, where sequenced fragments of an organism's genome are compared against reference genome. Read mapping is currently major bottleneck in the entire pipeline, because state-of-the-art sequencing technologies able to sequence much faster than computational techniques employed analyze We describe ongoing journey significantly improving performance mapping. explain algorithmic methods and hardware-based acceleration...

10.1109/mm.2020.3013728 article EN IEEE Micro 2020-08-03

Demystifying Complex Workload-DRAM Interactions

OPENALEX - Publications

Saugata Ghose Tianshi Li Nastaran Hajinazar Damla Senol Cali Onur Mutlu

It has become increasingly difficult to understand the complex interaction between modern applications and main memory, composed of Dynamic Random Access Memory (DRAM) chips. Manufacturers researchers are developing many different types DRAM, with each DRAM type catering needs (e.g., high throughput, low power, memory density). At same time, access patterns prevalent emerging rapidly diverging, as these manipulate larger data sets in very ways. As a result, combined DRAM-workload behavior is...

10.1145/3309697.3331482 article EN 2019-06-20

Apollo: a sequencing-technology-independent, scalable and accurate assembly polishing algorithm

OPENALEX - Publications

Can Fırtına Jeremie S. Kim Mohammed Alser Damla Senol Cali A. Ercüment Çiçek and 2 more

Long reads produced by third-generation sequencing technologies are used to construct an assembly (i.e., the subject's genome), which is further in downstream genome analysis. Unfortunately, long have high error rates and a large proportion of bps these incorrectly identified. These errors propagate affect accuracy Assembly polishing algorithms minimize such propagation or fixing using information from alignments between read-to-assembly alignment information). However, can only polish...

10.1093/bioinformatics/btaa179 article EN Bioinformatics 2020-03-11

FPGA-Based Near-Memory Acceleration of Modern Data-Intensive Applications

OPENALEX - Publications

Gagandeep Singh Mohammed Alser Damla Senol Cali Dionysios Diamantopoulos Juan Gómez-Luna and 2 more

Modern data-intensive applications demand high computation capabilities with strict power constraints. Unfortunately, such suffer from a significant waste of both execution cycles and energy in current computing systems due to the costly data movement between units memory units. Genome analysis weather prediction are two examples applications. Recent FPGAs couple reconfigurable fabric high-bandwidth (HBM) enable more efficient improve overall performance efficiency. This trend is an example...

10.1109/mm.2021.3088396 article EN IEEE Micro 2021-06-10

GenStore: a high-performance in-storage processing system for genome sequence analysis

OPENALEX - Publications

Nika Mansouri Ghiasi Jisung Park Harun Mustafa Jeremie Kim Ataberk Olgun and 9 more

Read mapping is a fundamental step in many genomics applications. It used to identify potential matches and differences between fragments (called reads) of sequenced genome an already known reference genome). costly because it needs perform approximate string matching (ASM) on large amounts data. To address the computational challenges analysis, prior works propose various approaches such as accurate filters that select reads within dataset genomic read set) must undergo expensive...

10.1145/3503222.3507702 article EN 2022-02-22

SeGraM

OPENALEX - Publications

Damla Senol Cali Konstantinos Kanellopoulos Joël Lindegger Zülal Bingöl Gurpreet S. Kalsi and 13 more

A critical step of genome sequence analysis is the mapping sequenced DNA fragments (i.e., reads) collected from an individual to a known linear reference sequence-to-sequence mapping). Recent works replace with graph-based representation genome, which captures genetic variations and diversity across many individuals in population. Mapping reads sequence-to-graph mapping) results notable quality improvements analysis. Unfortunately, while well studied available tools accelerators, more...

10.1145/3470496.3527436 preprint EN 2022-05-31

AirLift: A Fast and Comprehensive Technique for Remapping Alignments between Reference Genomes

OPENALEX - Publications

Jeremie S. Kim Can Fırtına Meryem Banu Cavlak Damla Senol Cali Nastaran Hajinazar and 3 more

AirLift is the first read remapping tool that enables users to quickly and comprehensively map a set, had been previously mapped one reference genome, another similar reference. Users can then run downstream analysis of sets for each latest release. Compared state-of-the-art method reads (i.e., full mapping), reduces overall execution time remap between two genome versions by up 27.4×. We validate our results with GATK find provides high accuracy in identifying ground truth SNP/INDEL variants.

10.1109/tcbb.2024.3433378 article EN IEEE/ACM Transactions on Computational Biology and Bioinformatics 2024-01-01

Scrooge: A Fast and Memory-Frugal Genomic Sequence Aligner for CPUs, GPUs, and ASICs

OPENALEX - Publications

Joël Lindegger Damla Senol Cali Mohammed Alser Juan Gómez-Luna Nika Mansouri Ghiasi and 1 more

Pairwise sequence alignment is a very time-consuming step in common bioinformatics pipelines. Speeding up this requires heuristics, efficient implementations, and/or hardware acceleration. A promising candidate for all of the above recently proposed GenASM algorithm. We identify and address three inefficiencies algorithm: it has high amount data movement, large memory footprint, does some unnecessary work. propose Scrooge, fast memory-frugal genomic aligner. Scrooge includes novel...

10.1093/bioinformatics/btad151 article EN cc-by Bioinformatics 2023-03-24

GenPIP: In-Memory Acceleration of Genome Analysis via Tight Integration of Basecalling and Read Mapping

OPENALEX - Publications

Haiyu Mao Mohammed Alser Mohammad Sadrosadati Can Fırtına Akanksha Baranwal and 4 more

Nanopore sequencing is a widely-used high-throughput genome technology that can sequence long fragments of into raw electrical signals at low cost. requires two computationally-costly processing steps for accurate downstream analysis. The first step, basecalling, translates the nucleotide bases (i.e., A, C, G, T). second read mapping, finds correct location in reference genome. In existing analysis pipelines, basecalling and mapping are executed separately. We observe this work such separate...

10.1109/micro56248.2022.00056 article EN 2022-10-01

GenStore: A High-Performance and Energy-Efficient In-Storage Computing System for Genome Sequence Analysis

OPENALEX - Publications

Nika Mansouri Ghiasi Jisung Park Harun Mustafa Jeremie Kim Ataberk Olgun and 9 more

Read mapping is a fundamental, yet computationally-expensive step in many genomics applications. It used to identify potential matches and differences between fragments (called reads) of sequenced genome an already known reference genome). To address the computational challenges analysis, prior works propose various approaches such as filters that select reads must undergo expensive computation, efficient heuristics, hardware acceleration. While effective at reducing computation overhead,...

10.48550/arxiv.2202.10400 preprint EN cc-by arXiv (Cornell University) 2022-01-01

Demystifying Complex Workload-DRAM Interactions

OPENALEX - Publications

Saugata Ghose Tianshi Li Nastaran Hajinazar Damla Senol Cali Onur Mutlu

It has become increasingly difficult to understand the complex interactions between modern applications and main memory, composed of Dynamic Random Access Memory (DRAM) chips. Manufacturers are now selling proposing many different types DRAM, with each DRAM type catering needs (e.g., high throughput, low power, memory density). At same time, access patterns prevalent emerging rapidly diverging, as these manipulate larger data sets in very ways. As a result, combined DRAM-workload behavior is...

10.1145/3366708 article EN Proceedings of the ACM on Measurement and Analysis of Computing Systems 2019-12-17

Demystifying ComplexWorkload-DRAM Interactions

OPENALEX - Publications

Saugata Ghose Tianshi Li Nastaran Hajinazar Damla Senol Cali Onur Mutlu

It has become increasingly difficult to understand the complex interaction between modern applications and main memory, composed of Dynamic Random Access Memory (DRAM) chips. Manufacturers researchers are developing many different types DRAM, with each DRAM type catering needs (e.g., high throughput, low power, memory density). At same time, access patterns prevalent emerging rapidly diverging, as these manipulate larger data sets in very ways. As a result, combined DRAM-workload behavior is...

10.1145/3376930.3376989 article EN ACM SIGMETRICS Performance Evaluation Review 2019-12-17

BLEND: A Fast, Memory-Efficient, and Accurate Mechanism to Find Fuzzy Seed Matches in Genome Analysis

OPENALEX - Publications

Can Fırtına Jisung Park Mohammed Alser Jeremie S. Kim Damla Senol Cali and 6 more

Generating the hash values of short subsequences, called seeds, enables quickly identifying similarities between genomic sequences by matching seeds with a single lookup their values. However, these can be used only for finding exact-matching as conventional hashing methods assign distinct different including highly similar seeds. Finding causes either 1) increasing use costly sequence alignment or 2) limited sensitivity. We introduce BLEND, first efficient and accurate mechanism that...

10.1101/2022.11.23.517691 preprint EN cc-by-nc-nd bioRxiv (Cold Spring Harbor Laboratory) 2022-11-25

ApHMM: Accelerating Profile Hidden Markov Models for Fast and Energy-efficient Genome Analysis

OPENALEX - Publications

Can Fırtına Kamlesh Pillai Gurpreet S. Kalsi Bharathwaj Suresh Damla Senol Cali and 8 more

Profile hidden Markov models (pHMMs) are widely employed in various bioinformatics applications to identify similarities between biological sequences, such as DNA or protein sequences. In pHMMs, sequences represented graph structures, where states and edges capture modifications (i.e., insertions, deletions, substitutions) by assigning probabilities them. These subsequently used compute the similarity score a sequence pHMM graph. The Baum-Welch algorithm, prevalent highly accurate method,...

10.1145/3632950 article EN ACM Transactions on Architecture and Code Optimization 2023-12-28

FastRemap: a tool for quickly remapping reads between genome assemblies

OPENALEX - Publications

Jeremie S. Kim Can Fırtına Meryem Banu Cavlak Damla Senol Cali Can Alkan and 1 more

Abstract Motivation A genome read dataset can be quickly and efficiently remapped from one reference to another similar (e.g., between two versions or species) using a variety of tools, e.g., the commonly used CrossMap tool. With explosion available genomic datasets references, high-performance remapping tools will even more important for keeping up with computational demands assembly analysis. Results We provide FastRemap, fast efficient tool reads assemblies. FastRemap provides 7.82×...

10.1093/bioinformatics/btac554 article EN Bioinformatics 2022-08-17

P850: Accelerated optical genome mapping analysis with Stratys compute and guided assembly

OPENALEX - Publications

Damla Senol Cali Thomas Anantharaman Martin Muggli Samer Al-Saffar Charles Schoonover and 1 more

Optical genome maps (OGM) from Bionano enable the detection of genomic structural and copy number variants that cannot be detected by next-generation sequencing (NGS) technologies are often missed conventional cytogenetic techniques. has developed bioinformatics pipelines for calling including Solve de novo assembly pipeline constitutional analysis Rare Variant Analysis (RVA) low allele-fraction cancer applications.

10.1016/j.gimo.2024.101761 article EN cc-by-nc-nd Genetics in Medicine Open 2024-01-01

Abstract 2337: Accelerated optical genome mapping analysis with Stratys Compute and Guided Assembly

OPENALEX - Publications

Damla Senol Cali Thomas Anantharaman Martin Muggli Samer Al-Saffar Charles Schoonover and 1 more

Abstract Background Optical genome maps (OGM) from Bionano enable the detection of genomic structural and copy number variants that cannot be detected by next-generation sequencing (NGS) technologies are often missed conventional cytogenetic techniques. has developed bioinformatics pipelines for calling including Solve de novo assembly pipeline constitutional analysis Rare Variant Analysis (RVA) low-allele-fraction cancer applications. Both computationally intensive currently take 5-10 hours...

10.1158/1538-7445.am2024-2337 article EN Cancer Research 2024-03-22