- Bioinformatics and Genomic Networks
- Single-cell and spatial transcriptomics
- Gene expression and cancer classification
- Gene Regulatory Network Analysis
- Genomics and Phylogenetic Studies
- Molecular Biology Techniques and Applications
- Genetic Associations and Epidemiology
- Genomics and Rare Diseases
- Neuroinflammation and Neurodegeneration Mechanisms
- RNA Research and Splicing
- CRISPR and Genetic Engineering
- Autism Spectrum Disorder Research
- RNA and protein synthesis mechanisms
- Genetics and Neurodevelopmental Disorders
- Genomics and Chromatin Dynamics
- Peroxisome Proliferator-Activated Receptors
- Ion channel regulation and function
- Intensive Care Unit Cognitive Disorders
- Genetic and Clinical Aspects of Sex Determination and Chromosomal Abnormalities
- Pluripotent Stem Cells Research
- Cardiac electrophysiology and arrhythmias
- Immune cells in cancer
- Animal Genetics and Reproduction
- Machine Learning in Bioinformatics
- Genetics, Bioinformatics, and Biomedical Research
Garvan Institute of Medical Research
2020-2024
UNSW Sydney
2010-2024
Cold Spring Harbor Laboratory
2014-2023
Genomics (United Kingdom)
2018
Victor Chang Cardiac Research Institute
2009-2013
Single-cell RNA-sequencing (scRNA-seq) technology provides a new avenue to discover and characterize cell types; however, the experiment-specific technical biases analytic variability inherent current pipelines may undermine its replicability. Meta-analysis is further hampered by use of ad hoc naming conventions. Here we demonstrate our replication framework, MetaNeighbor, that quantifies degree which types replicate across datasets, enables rapid identification clusters with high...
RNA-seq co-expression analysis is in its infancy and reasonable practices remain poorly defined. We assessed a variety of expression data to determine factors affecting functional connectivity topology networks.We examine generated from 1970 samples using Guilt-By-Association framework, which genes are for the tendency reflect shared function. Minimal experimental criteria obtain performance on par with microarrays were >20 read depth >10 M per sample. While aggregate network constructed...
The use of the human reference genome has shaped methods and data across modern genomics. This offered many benefits while creating a few constraints. In following opinion, we outline history, properties, pitfalls current genome. illustrative analyses, focus on its for variant-calling, highlighting nearness to ‘type specimen’. We suggest that switching consensus would offer important advantages over continued with disadvantages.
Abstract This study investigates the humoral and cellular immune responses health-related quality of life measures in individuals with mild to moderate long COVID (LC) compared age gender matched recovered COVID-19 controls (MC) over 24 months. LC participants show elevated nucleocapsid IgG levels at 3 months, higher neutralizing capacity up 8 months post-infection. Increased spike-specific nucleocapsid-specific CD4 + T cells, PD-1, TIM-3 expression on CD8 cells were observed but these...
Differential expression (DE) is commonly used to explore molecular mechanisms of biological conditions. While many studies report significant results between their groups interest, the degree which are specific question at hand not generally assessed, potentially leading inaccurate interpretation. This could be particularly problematic for metaanalysis where replicability across datasets taken as strong evidence existence a specific, biologically relevant signal, but instead may arise from...
Co-expression networks have been a useful tool for functional genomics, providing important clues about the cellular and biochemical mechanisms that are active in normal disease processes. However, co-expression analysis is often treated as black box with results being hard to trace their basis data. Here, we use both published novel single-cell RNA sequencing (RNA-seq) data understand fundamental drivers of gene-gene connectivity replicability networks. We perform first major co-expression,...
Evaluating gene networks with respect to known biology is a common task but often computationally costly one. Many computational experiments are difficult apply exhaustively in network analysis due run-times. To permit high-throughput of networks, we have implemented set very efficient tools calculate functional properties based on guilt-by-association methods. ( xtending ' uilt-by- ssociation' by egree) allows be evaluated hundreds or thousands sets. The methods predict novel members...
Abstract Co-expression analysis has provided insight into gene function in organisms from Arabidopsis to zebrafish. Comparison across species the potential enrich these results, for example by prioritizing among candidate human disease genes based on their network properties or finding alternative model systems where co-expression is conserved. Here, we present CoCoCoNet as a tool identifying conserved modules and comparing networks. resource both data methods, providing gold standard...
X-chromosome inactivation (XCI) is a random, permanent, and developmentally early epigenetic event that occurs during mammalian embryogenesis. We harness these features to investigate characteristics of lineage specification events human development. initially assess the consistency X-inactivation establish robust set XCI-escape genes. By analyzing variance in XCI ratios across tissues individuals, we find shared all tissues, suggesting completed epiblast (in at least 6-16 cells) prior germ...
Genes encoding proteins in a common pathway are often found near each other along bacterial chromosomes. Several explanations have been proposed to account for the evolution of these structures. For instance, natural selection may directly favour gene clusters through variety mechanisms, such as increased efficiency coregulation. An alternative and controversial hypothesis is selfish operon model, which asserts that clustered arrangements genes more easily transferred species, thus improving...
Genetic and environmental variation are key contributors during organism development, but the influence of minor perturbations or noise is difficult to assess. This study focuses on stochastic in allele-specific expression that persists through cell divisions nine-banded armadillo (Dasypus novemcinctus). We investigated blood transcriptome five wild monozygotic quadruplets over time explore developmental stochasticity gene expression. identify an enduring signal autosomal allelic variability...
Coronary artery disease (CAD), one of the leading causes death globally, is influenced by both environmental and genetic risk factors. Gene-centric genome-wide association studies (GWAS) involving cases controls have been remarkably successful in identifying loci contributing to CAD. Modern silico platforms, such as candidate gene prediction tools, permit a systematic analysis GWAS data identify genes for complex diseases like Subsequent integration drug-target from drug databases with...
Human genome sequencing has enabled the association of phenotypes with genetic loci, but our ability to effectively translate this data clinic not kept pace. Over past 60 years, pharmaceutical companies have successfully demonstrated safety and efficacy over 1,200 novel therapeutic drugs via costly clinical studies. While process must continue, better use can be made existing valuable data. In silico tools such as candidate gene prediction systems allow rapid identification disease genes by...
Gene set analysis, which translates gene lists into enriched functions, is among the most common bioinformatic methods. Yet few would advocate taking results at face value. Not only there no agreement on algorithms themselves, how to benchmark them. In this paper, we evaluate robustness and uniqueness of enrichment as a means assessing methods even where correctness unknown. We show that heavily annotated ('multifunctional') genes are likely appear in genomics study drive generation...
Abstract Background Automated candidate gene prediction systems allow geneticists to hone in on disease genes more rapidly by identifying the most probable linked phenotypes under investigation. Here we assessed ability of eight different predict intervals previously associated with type 2 diabetes benchmarking their performance against implicated recent genome-wide association studies. Results Using a search space 9556 genes, all but one pruned genome favour moderate highly significant...
Abstract Motivation: Network-based gene function inference methods have proliferated in recent years, but measurable progress remains elusive. We wished to better explore performance trends by controlling data and algorithm implementation, with a particular focus on the of aggregate predictions. Results: Hypothesizing that popular would perform well without hand-tuning, we used well-characterized algorithms produce verifiably ‘untweaked’ results. find most state-of-the-art machine learning...
The Human Reference Genome serves as the foundation for modern genomic analyses. However, in its present form, it does not adequately represent vast genetic diversity of human population. In this study, we explored consensus genome a potential successor current reference and assessed effect on accuracy RNA-seq read alignment. To find best haploid representation, constructed genomes at pan-human, superpopulation, population levels, using variant information from 1000 Genomes Project...
Many tools are available for RNA-seq alignment and expression quantification, with comparative value being hard to establish. Benchmarking assessments often highlight methods' good performance, but focused on either model data or fail explain variation in performance. This leaves us ask, what is the most meaningful way assess different choices? And importantly, where there room progress? In this work, we explore answers these two questions by performing an exhaustive assessment of STAR...
The expansion of protein-ligand annotation databases has enabled large-scale networking proteins by ligand similarity. These ligand-based protein networks, which implicitly predict the ability neighboring to bind related ligands, may complement biologically-oriented gene are used functional or disease relevance. To quantify degree such associations might genomic associations, including sequence similarity, physical protein-protein interactions, co-expression, and annotations, we calculated a...
Co-fractionation MS (CF-MS) is a technique with potential to characterize endogenous and unmanipulated protein complexes on an unprecedented scale. However this has been offset by lack of guidelines for best-practice CF-MS data collection analysis. To obtain such guidelines, study thoroughly evaluates novel published Saccharomyces cerevisiae sets using very high proteome coverage libraries yeast gold standard complexes. A new method identifying in data, Reference Complex Profiling, the...
Abstract Genetic variation, epigenetic regulation and major environmental stimuli are key contributors to phenotypic but the influence of minor perturbations or “noise” has been difficult assess in mammals. In this work, we uncover one axis random variation with a large permanent influence: developmental stochasticity. By assaying transcriptome wild monozygotic quadruplets nine-banded armadillo, find that persistent changes occur early development, these give rise clear transcriptional...
Gene networks have become a central tool in the analysis of genomic data but are widely regarded as hard to interpret. This has motivated great deal comparative evaluation and research into best practices. We explore possibility that this may lead overfitting field whole.We construct model 'research communities' sampling from real gene network machine learning methods characterize performance trends. Our reveals an important principle limiting value replication, namely targeting it directly...