- Gene expression and cancer classification
- Single-cell and spatial transcriptomics
- Genetic Associations and Epidemiology
- Machine Learning in Bioinformatics
- Molecular Biology Techniques and Applications
- Genomics and Phylogenetic Studies
- Bioinformatics and Genomic Networks
- Genetic Mapping and Diversity in Plants and Animals
- Genetic and phenotypic traits in livestock
- Algorithms and Data Compression
- Metabolomics and Mass Spectrometry Studies
- Bayesian Methods and Mixture Models
- Advanced Algorithms and Applications
- Cell Image Analysis Techniques
- Genetic diversity and population structure
- Genetics, Bioinformatics, and Biomedical Research
- Forensic and Genetic Research
- Immune cells in cancer
- Gene Regulatory Network Analysis
- Plant and animal studies
- Census and Population Estimation
- Non-Destructive Testing Techniques
- RNA regulation and disease
- Insect and Arachnid Ecology and Behavior
- AI in cancer detection
Centogene (Germany)
2024
University of Liège
2006-2022
Helmholtz Zentrum München
2020-2021
Max Planck Institute of Psychiatry
2017-2018
National Center for Genetic Engineering and Biotechnology
2007-2013
Abstract Single-cell atlases often include samples that span locations, laboratories and conditions, leading to complex, nested batch effects in data. Thus, joint analysis of atlas datasets requires reliable data integration. To guide integration method choice, we benchmarked 68 preprocessing combinations on 85 batches gene expression, chromatin accessibility simulation from 23 publications, altogether representing >1.2 million cells distributed 13 atlas-level tasks. We evaluated methods...
Abstract Cell atlases often include samples that span locations, labs, and conditions, leading to complex, nested batch effects in data. Thus, joint analysis of atlas datasets requires reliable data integration. Choosing a integration method is challenge due the difficulty defining success. Here, we benchmark 38 preprocessing combinations on 77 batches gene expression, chromatin accessibility, simulation from 23 publications, altogether representing >1.2 million cells distributed nine...
EpiScanpy is a toolkit for the analysis of single-cell epigenomic data, namely DNA methylation and ATAC-seq data. To address modality specific challenges from epigenomics epiScanpy quantifies epigenome using multiple feature space constructions builds nearest neighbour graph distance between cells. makes many existing scRNA-seq workflows scanpy available to large-scale data other -omics modalities, including methods common clustering, dimension reduction, cell type identification trajectory...
Allele-specific (AS) Polymerase Chain Reaction is a convenient and inexpensive method for genotyping Single Nucleotide Polymorphisms (SNPs) mutations. It applied in many recent studies including population genetics, molecular genetics pharmacogenomics. Using known AS primer design tools to create primers leads cumbersome process inexperience users since information about SNP/mutation must be acquired from public databases prior the design. Furthermore, most of these do not offer mismatch...
There is considerable ethno-linguistic and genetic variation among human populations in Asia, although tracing the origins of this diversity complicated by migration events. Thailand at center Mainland Southeast Asia (MSEA), a region within that has not been extensively studied. Genetic substructure may exist Thai population, since waves from southern China throughout its recent history have contributed to substantial gene flow. Autosomal SNP data were collated for 438,503 markers 992...
Abstract Background Non-random patterns of genetic variation exist among individuals in a population owing to variety evolutionary factors. Therefore, populations are structured into genetically distinct subpopulations. As genotypic datasets become ever larger, it is increasingly difficult correctly estimate the number subpopulations and assign them. The computationally efficient non-parametric, chiefly Principal Components Analysis (PCA)-based methods thus becoming relied upon for structure...
The p.Ser71Arg RAB32 variant was recently associated with Parkinson's disease (PD).
ClustalW is the most widely used tool for aligning multiple protein or nucleotide sequences. The alignment achieved via three stages: pairwise alignment, guide tree generation and progressive alignment. This paper analyzes enhances a multithreaded implementation of called ClustalW-SMP higher throughput. Our goal to maximize degree parallelism on multithreading MultiThreading-ClustalW (MT-ClustalW). As result, bioinformatics laboratories are able use this MT-ClustalW with much less energy...
ClustalW is the most widely used tool for aligning multiple protein or nucleotide sequences. The alignment achieved via three stages: pairwise alignment, guide tree generation and progressive alignment. This paper analyzes enhances a multithreaded implementation of called ClustalW-SMP higher throughput. Our goal to maximize degree parallelism on multithreading MultiThreading-ClustalW (MT-ClustalW). As result, bioinformatics laboratories are able use this MT-ClustalW with much less energy...
Abstract Due to its long genetic evolutionary history, Africans exhibit more variation than any other population in the world. Their diversity further lends itself subdivisions of into groups individuals with a similarity varying degrees granularity. It remains challenging detect fine-scale structure computationally efficient and meaningful way. In this paper, we present proof-of-concept novel detection tool Western African samples. These samples consist 1396 from 25 ethnic (two are American...
Resolving population genetic structure is challenging, especially when dealing with closely related or geographically confined populations. Although Principal Component Analysis (PCA)-based methods and genomic variation single nucleotide polymorphisms (SNPs) are widely used to describe shared ancestry, improvements can be made fine-scale the target. This work presents an R package called IPCAPS, which uses SNP information for resolving possibly structure. The IPCAPS routines built on...
This paper presents the methodology that assists compiler to optimize ClustalW; most widely used tool for aligning multiple text-based protein or nucleotide sequences in Bioinformatics. Our goal is minimize latency and maximize throughput of execution on multithreading ClustalW called MT-ClustalW: our previous work. As a result, optimized MT-ClustalW able fully utilize machine resources achieves higher multicore computers. The experiment results show can assist code better than only...
Abstract Background Resolving population genetic structure is challenging, especially when dealing with closely related or geographically confined populations. Although Principal Component Analysis (PCA)-based methods and genomic variation single nucleotide polymorphisms (SNPs) are widely used to describe shared ancestry, improvements can be made fine-scale the target. Results This work presents an R package called IPCAPS, which uses SNP information for resolving possibly structure. The...
Abstract SNP-based information is used in several existing clustering methods to detect shared genetic ancestry or identify population substructure. Here, we present a methodology for unsupervised using iterative pruning capture fine-scale structure called IPCAPS. Our method supports ordinal data which can be applied directly SNP structure. We compare our tools detecting via simulations. The simulated do not take into account haplotype information, therefore all markers are independent....
Many computational intensive bioinformatics software, such as multiple sequence alignment, population structure analysis, etc., written in C/C++ are not multicore-aware.A multicore processor is an emerging CPU technology that combines two or more independent processors into a single package.The Single Instruction Multiple Data-stream (SIMD) paradigm heavily utilized this class of processors.Nevertheless, most popular compilers including Microsoft Visual 6.0, x86 gnu C-compiler gcc do...