- Advanced Clustering Algorithms Research
- Data Management and Algorithms
- Data Mining Algorithms and Applications
- Genetic Mapping and Diversity in Plants and Animals
- Genetic Associations and Epidemiology
- Bioinformatics and Genomic Networks
- Single-cell and spatial transcriptomics
- Bayesian Methods and Mixture Models
- Gene Regulatory Network Analysis
- Gene expression and cancer classification
- RNA Research and Splicing
- CRISPR and Genetic Engineering
- Text and Document Classification Technologies
- Metaheuristic Optimization Algorithms Research
- Face and Expression Recognition
- Genomics and Chromatin Dynamics
- RNA modifications and cancer
- Genetic and phenotypic traits in livestock
- Big Data and Business Intelligence
European Bioinformatics Institute
2016-2022
Wellcome Trust
2019-2021
Universidade de São Paulo
2009-2015
Brazilian Society of Computational and Applied Mathematics
2012-2015
Microsoft (United States)
2014
Universidade Federal de São Carlos
2009
Single-cell RNA sequencing (scRNA-seq) enables characterizing the cellular heterogeneity in human tissues. Recent technological advances have enabled first population-scale scRNA-seq studies hundreds of individuals, allowing to assay genetic effects with single-cell resolution. However, existing strategies analyze these data remain based on principles established for analysis bulk RNA-seq. In particular, current methods depend a priori definitions discrete cell types, and hence cannot assess...
Abstract Motivation: Set-based variance component tests have been identified as a way to increase power in association studies by aggregating weak individual effects. However, the choice of test statistic has largely ignored even though it may play an important role obtaining optimal power. We compared standard statistical test—a score test—with recently developed likelihood ratio (LR) test. Further, when correction for hidden structure is needed, or gene–gene interactions are sought,...
The comparison of ordinary partitions a set objects is well established in the clustering literature, which comprehends several studies on analysis properties similarity measures for comparing partitions. However, clusterings are not readily applicable to biclusterings, since each bicluster tuple two sets (of rows and columns), whereas cluster only single rows). Some biclustering have been defined as minor contributions papers primarily report proposals evaluation algorithms or comparative...
Joint genetic models for multiple traits have helped to enhance association analyses. Most existing multi-trait been designed increase power detecting associations, whereas the analysis of interactions has received considerably less attention. Here, we propose iSet, a method based on linear mixed test between sets variants and environmental states or other contexts. Our model generalizes previous interaction tests in particular provides local differences architecture We first use simulations...
Abstract Identifying regulatory genetic effects in pluripotent cells provides important insights into disease variants with potentially transient or developmental origins. Combining existing and newly-generated data, we characterized 1,367 iPSC lines from 948 unique donors, collectively analyzed within the “Integrated QTL” (i2QTL) Consortium. The sample size of our study allowed us to derive most comprehensive map quantitative trait loci (QTL) human date. We mapped nearby common on five...
Abstract Different environmental factors, including diet, physical activity, or external conditions can contribute to genotype-environment interactions (GxE). Although high-dimensional data are increasingly available, and multiple environments have been implicated with GxE at the same loci, multi-environment tests for not established. Such joint analyses increase power detect improve interpretation of these effects. Here, we propose structured linear mixed model (StructLMM), a...
Similarity measures for comparing clusterings is an important component, e.g., of evaluating clustering algorithms, consensus clustering, and stability assessment. These have been studied over 40 years in the domain exclusive hard (exhaustive mutually object sets). In past years, literature has proposed to handle more general (e.g., fuzzy/probabilistic clusterings). This paper provides overview these new discusses their drawbacks. We ultimately develop a corrected-for-chance measure (13AGRI)...
This paper is concerned with the computational efficiency of clustering algorithms when data set to be clustered described by a proximity matrix only (relational data) and number clusters must automatically estimated from such data. Two relational versions an evolutionary algorithm for are derived compared against two systematic (repetitive) approaches that can also used estimate in Exhaustive experiments involving six artificial real sets reported analyzed.
This paper is concerned with the computational efficiency of clustering algorithms when data set to be clustered described by a proximity matrix only (relational data) and number clusters must automatically estimated from such data.
Abstract Single cell RNA sequencing (scRNA-seq) enables characterizing the cellular heterogeneity in human tissues. Technological advances have enabled first population-scale scRNA-seq studies hundreds of individuals, allowing to assay genetic effects with single-cell resolution. However, existing strategies perform analyses using remain based on principles established for bulk RNA-seq. In particular, current methods depend a priori definitions discrete types, and hence cannot assess allelic...
Abstract Joint genetic models for multiple traits have helped to enhance association analyses. Most existing multi-trait been designed increase power detecting associations, whereas the analysis of interactions has received considerably less attention. Here, we propose iSet, a method based on linear mixed test between sets variants and environmental states or other contexts. Our model generalizes previous interaction tests in particular provides local differences architecture We first use...
The features describing a data set may often be arranged in meaningful subsets, each of which corresponds to different aspect the data. An unsupervised algorithm (SCAD) that performs fuzzy clustering and aspects weighting simultaneously was recently proposed. However, there are several situations where is represented by proximity matrices only (relational data), renders approaches, including SCAD, inappropriate. To handle this kind data, relational CARD, based on SCAD algorithm, has been...