Yi-Fei Huang

ORCID: 0000-0001-5594-6731
Publications
Citations
Views
---
Saved
---
About
Contact & Profiles
Research Areas
  • Genomics and Phylogenetic Studies
  • RNA and protein synthesis mechanisms
  • Evolution and Genetic Dynamics
  • Genomics and Chromatin Dynamics
  • Bioinformatics and Genomic Networks
  • Genomics and Rare Diseases
  • Chromosomal and Genetic Variations
  • Genetics, Bioinformatics, and Biomedical Research
  • Gene expression and cancer classification
  • RNA modifications and cancer
  • Machine Learning in Bioinformatics
  • RNA Research and Splicing
  • Genetic Mapping and Diversity in Plants and Animals
  • DNA and Nucleic Acid Chemistry
  • Stochastic processes and financial applications
  • Genetic and phenotypic traits in livestock
  • Genomic variations and chromosomal abnormalities
  • Ubiquitin and proteasome pathways
  • Statistical Methods and Inference
  • Advanced biosensing and bioanalysis techniques
  • Cancer-related molecular mechanisms research
  • Genetic diversity and population structure
  • Molecular Biology Techniques and Applications
  • Financial Risk and Volatility Modeling
  • Natural product bioactivities and synthesis

Cold Spring Harbor Laboratory
2016-2024

Pennsylvania State University
2011-2024

Penn State Milton S. Hershey Medical Center
2021

University of British Columbia
2016

BC Cancer Agency
2016

McMaster University
2011-2014

Boston University
2010-2011

Beijing Normal University
2008

Abstract Approximately 13% of the human genome can fold into non-canonical (non-B) DNA structures (e.g. G-quadruplexes, Z-DNA, etc.), which have been implicated in vital cellular processes. Non-B also hinders replication, increasing errors and facilitating mutagenesis, yet its contribution to genome-wide variation mutation rates remains unexplored. Here, we conducted a comprehensive analysis nucleotide substitution frequencies at non-B loci within noncoding, non-repetitive regions, their ±2...

10.1093/nar/gkaa1269 article EN cc-by-nc Nucleic Acids Research 2021-01-13

Abstract Background The concentrations of distinct types RNA in cells result from a dynamic equilibrium between synthesis and decay. Despite the critical importance decay rates, current approaches for measuring them are generally labor-intensive, limited sensitivity, and/or disruptive to normal cellular processes. Here, we introduce simple method estimating relative half-lives that is based on two standard widely available high-throughput assays: Precision Run-On sequencing (PRO-seq)...

10.1186/s12915-021-00949-x article EN cc-by BMC Biology 2021-02-15

Large-scale genome sequencing has enabled the measurement of strong purifying selection in protein-coding genes. Here we describe a new method, called ExtRaINSIGHT, for measuring such noncoding as well coding regions human genome. ExtRaINSIGHT estimates prevalence "ultraselection" by fractional depletion rare single-nucleotide variants, after controlling variation mutation rates. Applying to 71,702 whole sequences from gnomAD v3, find abundant ultraselection evolutionarily ancient miRNAs and...

10.1038/s41467-022-31872-6 article EN cc-by Nature Communications 2022-07-25

Abstract Summary The Phylogenetic Analysis with Space/Time models (PHAST) package is a widely used software for comparative genomics that has been freely available download since 2002. Here, we introduce web interface (phastWeb) makes it possible to use two of the most popular programs in PHAST, phastCons and phyloP, without downloading installing PHAST software. This allows users upload sequence alignment either corresponding phylogeny or have one estimated from alignment. After processing,...

10.1093/bioinformatics/bty966 article EN Bioinformatics 2018-11-27

Significance The study of cell-to-cell genomic differences in complex multicellular systems such as cancer requires genome sequencing large numbers single cells. This turn necessitates the uniform amplification single-cell genomes with high reproducibility across cells, which remains an outstanding challenge. Here, we introduce a method that uses commercially available liquid dispensing to perform inexpensive and high-throughput whole (WGA) nanoliter volumes. For first time, our knowledge,...

10.1073/pnas.1520964113 article EN Proceedings of the National Academy of Sciences 2016-07-13

Abstract Recent comparative genomic studies have identified many human accelerated elements (HARs) with elevated substitution rates in the lineage. However, it remains unknown to what extent transcription factor binding sites (TFBSs) are under evolution humans and other primates. Here, we introduce two pooling-based phylogenetic methods dramatically enhanced sensitivity examine TFBSs. Using these new methods, show that more than 6000 TFBSs annotated genome experienced Hominini, apes, Old...

10.1038/s41467-023-36421-3 article EN cc-by Nature Communications 2023-02-11

Approximately 13% of the human genome at certain motifs have potential to form noncanonical (non-B) DNA structures (e.g., G-quadruplexes, cruciforms, and Z-DNA), which regulate many cellular processes but also affect activity polymerases helicases. Because sequencing technologies use these enzymes, they might possess increased errors non-B structures. To evaluate this, we analyzed error rates, read depth, base quality Illumina, Pacific Biosciences (PacBio) HiFi, Oxford Nanopore Technologies...

10.1101/gr.277490.122 article EN cc-by-nc Genome Research 2023-06-01

Most studies of responses to transcriptional stimuli measure changes in cellular mRNA concentrations. By sequencing nascent RNA instead, it is possible detect transcription minutes rather than hours and thereby distinguish primary from secondary regulatory signals. Here, we describe the use PRO-seq characterize immediate response human cells celastrol, a compound derived traditional Chinese medicine that has potent anti-inflammatory, tumor-inhibitory, obesity-controlling effects. Celastrol...

10.1101/gr.222935.117 article EN cc-by-nc Genome Research 2017-10-12

Approximately 1% of the human genome has ability to fold into G-quadruplexes (G4s)—noncanonical strand-specific DNA structures forming at G-rich motifs. G4s regulate several key cellular processes (e.g., transcription) and have been hypothesized participate in others firing replication origins). Moreover, differ their thermostability, this may affect function. Yet, also hinder replication, transcription, translation increase instability mutation rates. Therefore, depending on genomic...

10.1101/gr.269589.120 article EN cc-by-nc Genome Research 2021-06-29

A central challenge in human genomics is to understand the cellular, evolutionary, and clinical significance of genetic variants. Here, we introduce a unified population-genetic machine-learning model, called L inear llele- S pecific election I nferenc E ( LASSIE ), for estimating fitness effects all observed potential single-nucleotide variants, based on polymorphism data predictive genomic features. We applied 51 high-coverage genome sequences annotated with 33 features constructed map...

10.1101/gr.245522.118 article EN cc-by-nc Genome Research 2019-06-27

A challenge in medical genomics is to identify variants and genes associated with severe genetic disorders. Based on the premise that severe, early-onset disorders often result a reduction of evolutionary fitness, several statistical methods have been developed predict pathogenic or constrained based signatures negative selection human populations. However, we currently lack framework jointly deleterious from both variant-level features gene-level selective constraints. Here present such...

10.1371/journal.pgen.1008922 article EN cc-by PLoS Genetics 2020-07-15

A critical question in biology is the identification of functionally important amino acid sites proteins. Because are under stronger purifying selection, site-specific substitution rates tend to be lower than usual at these sites. large number phylogenetic models have been developed estimate proteins and extraordinarily low used as evidence function. Most existing tools, e.g. Rate4Site, assume that independent across However, may strongly correlated protein tertiary structure, since...

10.1371/journal.pcbi.1003429 article EN cc-by PLoS Computational Biology 2014-01-16

Motivation: A number of statistical phylogenetic methods have been developed to infer conserved functional sites or regions in proteins. Many methods, e.g. Rate4Site, apply the standard models site-specific substitution rates and totally ignore spatial correlation protein tertiary structures, which may reduce their power identify patches structures when sequences used analysis are highly similar. The 3D sliding window method has proposed but size, reflects strength correlation, must be...

10.1093/bioinformatics/btu673 article EN Bioinformatics 2014-10-15

The ability to accurately predict essential genes intolerant loss-of-function (LOF) mutations can dramatically improve the identification of disease-associated genes. Recently, there have been numerous computational methods developed human from population genomic data. While existing are highly predictive long length, they limited power in pinpointing short due sparsity polymorphisms genome.Motivated by premise that and functional data may provide complementary evidence for gene...

10.1186/s12859-023-05481-z article EN cc-by BMC Bioinformatics 2023-09-18

In high-throughput cancer genomic studies, markers identified from the analysis of single data sets often suffer a lack reproducibility because small sample sizes. An ideal solution is to conduct large-scale prospective which are extremely expensive and time consuming. A cost-effective remedy pool multiple comparable studies integrative analysis. Integrative challenging high dimensionality measurements heterogeneity among studies. this article, we propose sparse boosting approach for marker...

10.1093/biostatistics/kxr033 article EN Biostatistics 2011-10-31

Abstract The rate at which RNA molecules decay is a key determinant of cellular concentrations, yet current approaches for measuring half-lives are generally labor-intensive, limited in sensitivity, and/or disruptive to normal processes. Here we introduce simple method estimating relative that based on two standard and widely available high-throughput assays: Precision Run-On sequencing (PRO-seq) (RNA-seq). Our treats PRO-seq as measure transcription RNA-seq concentration, estimates the...

10.1101/690644 preprint EN cc-by-nd bioRxiv (Cold Spring Harbor Laboratory) 2019-07-02

Evolutionary changes in gene expression are often driven by gains and losses of cis-regulatory elements (CREs). The dynamics CRE evolution can be examined using multispecies epigenomic data, but so far such analyses have generally been descriptive model-free. Here, we introduce a probabilistic modeling framework for the CREs that operates directly on raw chromatin immunoprecipitation sequencing (ChIP-seq) data fully considers phylogenetic relationships among species. Our includes hidden...

10.1093/molbev/msaa073 article EN Molecular Biology and Evolution 2020-03-13

In evolutionary genomics, it is fundamentally important to understand how characteristics of genomic sequences, such as gene expression level, determine the rate adaptive evolution. While numerous statistical methods, McDonald-Kreitman (MK) test, are available examine association between features and adaptation, we currently lack a approach disentangle independent effect feature from effects other correlated features. To address this problem, I present novel model, MK regression, which...

10.1093/molbev/msab291 article EN cc-by-nc Molecular Biology and Evolution 2021-09-30

In animals, the moss Physcomitrella patens and pollen of Arabidopsis thaliana, highly expressed genes have shorter introns than weakly genes. A popular explanation for this is selection transcription efficiency, which includes two sub-hypotheses: to minimize energetic cost or time cost. an individual human, different organs may differ up hundreds times in cell number (for example, a liver versus hypothalamus). Considered at level, gene specifically large organ actually transcribed tens more...

10.1186/1471-2148-8-154 article EN cc-by BMC Evolutionary Biology 2008-01-01

Abstract Motivation: A number of statistical phylogenetic methods have been proposed to identify type-I functional divergence in duplicate genes by detecting heterogeneous substitution rates trees. common disadvantage the existing is that autocorrelation along sequences not modeled. This reduces power regions under divergence. Results: We design a hidden Markov model protein relevant C++ program, HMMDiverge, has developed estimate parameters and Simulations demonstrate HMMDiverge can...

10.1093/bioinformatics/btr635 article EN Bioinformatics 2011-11-26

Abstract Across many species, a large fraction of genetic variants that influence phenotypes interest is located outside protein-coding genes, yet existing methods for identifying such have poor predictive power. Here, we introduce new computational method, called LINSIGHT, substantially improves the prediction noncoding nucleotide sites at which mutations are likely to deleterious fitness consequences, and therefore be phenotypically important. LINSIGHT combines simple neural network...

10.1101/069682 preprint EN bioRxiv (Cold Spring Harbor Laboratory) 2016-08-15

Abstract Genome sequencing of tens thousands humans has enabled the measurement large selective effects for mutations to protein-coding genes. Here we describe a new method, called ExtRaINSIGHT, measuring similar in noncoding as well coding regions human genome. ExtRaINSIGHT estimates prevalance strong purifying selection, or “ultraselection” ( λ s ), fractional depletion rare single-nucleotide variants target genomic sites relative matched that are putatively free from after controlling...

10.1101/2021.08.23.457339 preprint EN cc-by-nd bioRxiv (Cold Spring Harbor Laboratory) 2021-08-23
Coming Soon ...