- Genomics, phytochemicals, and oxidative stress
- Gene expression and cancer classification
- Machine Learning in Bioinformatics
- Monoclonal and Polyclonal Antibodies Research
- RNA and protein synthesis mechanisms
- Molecular Biology Techniques and Applications
- Machine Learning in Healthcare
- vaccines and immunoinformatics approaches
- Bioinformatics and Genomic Networks
- Genomics and Phylogenetic Studies
- T-cell and B-cell Immunology
- Glycosylation and Glycoproteins Research
- Genomics and Chromatin Dynamics
- Single-cell and spatial transcriptomics
- Natural Products and Biological Research
- Functional Brain Connectivity Studies
- Cancer Genomics and Diagnostics
- Primate Behavior and Ecology
- Mental Health Research Topics
- RNA Research and Splicing
- Advanced biosensing and bioanalysis techniques
- Immune Cell Function and Interaction
- COVID-19 impact on air quality
- Cancer Cells and Metastasis
- Extracellular vesicles in disease
Wellcome Sanger Institute
2012-2024
Sichuan Agricultural University
2024
Hunan Agricultural University
2015-2023
Guangdong Provincial People's Hospital
2020-2022
Guangdong Academy of Medical Sciences
2020-2022
Columbia University
2020-2021
Changshu No.1 People's Hospital
2020
Harvard University
2019
The University of Texas at Austin
2017
Hunan Rice Research Institute
2014
Gorillas are humans' closest living relatives after chimpanzees, and of comparable importance for the study human origins evolution. Here we present assembly analysis a genome sequence western lowland gorilla, compare whole genomes all extant great ape genera. We propose synthesis genetic fossil evidence consistent with placing human–chimpanzee human–chimpanzee–gorilla speciation events at approximately 6 10 million years ago. In 30% genome, gorilla is closer to or chimpanzee than latter...
The genome of the Southeast Asian great ape or orang-utan has been sequenced — specifically a draft assembly Sumatran female individual and short-read sequence data from five further Bornean orang-utan, Pongo abelii pygmaeus, respectively. Orang-utan species appear to have split around 400,000 years ago, more recent than most previous estimates suggested, resulting in an average Bornean–Sumatran nucleotide identity 99.68%. Structural evolution seems proceeded much slowly that other apes,...
The maximal information coefficient (MIC) captures both linear and nonlinear correlations between variable pairs. In this paper, we proposed the BackMIC algorithm for MIC estimation. adds a searching back process on equipartitioned axis to obtain better grid partition than original implementation ApproxMaxMI. And similar ChiMIC algorithm, it terminates search by χ 2 -test instead of maximum number bins B( n , α ). Results simulated data show that maintains generality MIC, gives more...
The understanding of disease susceptibility and biological variability will largely depend on the identification functional genetic variants. We describe a map more than 12,000 gene-based single nucleotide polymorphisms (SNPs) from transcribed regions, created by aligning cDNA sequence
Antibody repertoire sequencing enables researchers to acquire millions of B cell receptors and investigate these molecules at the single-nucleotide level. This power resolution in studying humoral responses have led its wide applications. However, most studies were conducted with a limited number samples. Given extraordinary diversity, assessment key features large sample set is demanded. Thus, we collect systematically analyze 2,152 high-quality heavy-chain antibody repertoires. Our study...
In efforts to discover disease mechanisms and improve clinical diagnosis of tumors, it is useful mine profiles for informative genes with definite biological meanings build robust classifiers high precision. this study, we developed a new method tumor-gene selection, the Chi-square test-based integrated rank gene direct classifier ( χ 2 -IRG-DC). First, obtained weighted importance from chi-square tests single pairwise interactions. Then, sequentially introduced ranked removed redundant by...
Vitamin D and folate are activated degraded by sunlight, respectively, the physiological processes they control likely to have been targets of selection as humans expanded from Africa into Eurasia. We investigated signals positive in gene sets involved metabolism, regulation action these two vitamins worldwide populations sequenced Phase I 1000 Genomes Project. Comparing allele frequency-spectrum-based summary statistics between matched genes, we observed a signal specific East Asians for...
Since the decision trees (DTs) have an advantage over "black-box" models, such as neural nets or support vector machines, in terms of comprehensibility, that it might merit improvement for further optimization. The node splitting measures and pruning methods are primary among techniques can improve generalization abilities DTs. Here, we introduced unequal interval optimization splitting, well local chi-square test tree pruning. This new method was named adaptive multi-branch (CMDT). 11...
The sequence upstream of the antibody variable region (antibody [AUS]) consists a 5' untranslated (5' UTR) and preceding leader region. variations in AUS affect engineering PCR based quantification may also be implicated mRNA transcription translation. However, diversity AUSs remains elusive. Using rapid amplification cDNA ends high-throughput repertoire sequencing technique, we acquired full-length for human, rhesus macaque, cynomolgus mouse, rat. We designed bioinformatics pipeline...
Abstract The adaptive immune receptor repertoire consists of the entire set an individual’s BCRs and TCRs is believed to contain a record prior responses potential for future immunity. Analyses TCR repertoires via deep learning (DL) methods have successfully diagnosed cancers infectious diseases, including coronavirus disease 2019. However, few studies used DL analyze BCR repertoires. In this study, we collected IgG H chain Ab from 276 healthy control subjects 326 patients with various...
Selecting informative genes, including individually discriminant genes and synergic from expression data has been useful for medical diagnosis prognosis. Detecting is more difficult than selecting genes. Several efforts have recently made to detect gene-gene synergies, such as dendrogram-based I(X 1; X 2; Y) (mutual information), doublets (gene pairs) MIC(X based on the maximal information coefficient. It unclear whether can capture synergies efficiently. Although a wide range of...
The antibody repertoire is a critical component of the adaptive immune system and believed to reflect an individual’s history current status. Delineating has advanced our understanding humoral immunity, facilitated discovery, showed great potential for improving diagnosis treatment disease. However, no tool date effectively integrated big Rep-seq data prior knowledge functional antibodies elucidate remarkably diverse repertoire. We developed dataset Analysis Platform with Integrated Database...
Abstract Informative gene selection can have important implications for the improvement of cancer diagnosis and identification new drug targets. Individual-gene-ranking methods ignore interactions between genes. Furthermore, popular pair-wise evaluation methods, e.g . TSP TSG, are helpless discovering interactions. Several efforts to discover synergy been made based on information approach, such as EMBP FeatKNN. However, which employed estimate mutual information, binarization,...
Background Identifying the cell types using unsupervised methods is essential for scRNA-seq research. However, conventional similarity measures introduce challenges to single-cell data clustering because of high dimensional, noise, and dropout. Methods We proposed a method small S cRNA-seq based on ubspace W eighted D istance (SSWD), which follows assumption that sets gene subspace composed similar density-distributing genes can better distinguish groups. To accurately capture intrinsic...
Lysine succinylation is a type of protein post-translational modification which widely involved in cell differentiation, metabolism and other important physiological activities. To study the molecular mechanism depth, sites need to be accurately identified, because experimental approaches are costly time-consuming, there great demand for reliable computational methods. Feature extraction key step building site prediction models, development effective new features improves predictive...
High-altitude environments pose substantial challenges for human survival and reproduction, attracting considerable attention to the demographic adaptive histories of high-altitude populations. Previous work focused mainly on Tibetans, establishing their genetic relatedness East Asians adaptation high altitude, especially at EPAS1. Here, we present 87 new whole-genome sequences from 16 Himalayan populations insight they provide into genomic history region. We show that population structure...
Background RNA editing, especially A-to-I editing sites, is a common modification critical for stem cell differentiation, muscle development, and disease occurrence. Unveiling comprehensive events associated with myogenesis of the skeletal satellite cells (MuSCs) essential extending our knowledge mechanism underpinning development. Results A total 9,632 sites (RESs) were screened in myoblasts (GM), myocytes (DM1), myotubes (DM5) samples. Among these 4,559 edits classified further analyzed....
Abstract The sequence upstream of antibody variable region (Antibody Upstream Sequence, or AUS) consists 5’ untranslated (5’ UTR) and two leader regions, L-PART1 L-PART2. variations in AUS affect the efficiency PCR amplification, mRNA translation, subsequent PCR-based quantification as well engineering. Despite their importance, diversity AUSs has long been neglected. Utilizing rapid amplification cDNA ends (5’RACE) high-throughput repertoire sequencing (Rep-Seq) technique, we acquired...
Abstract Antibody repertoire refers to the totality of superbly diversified antibodies within an individual cope with vast array possible pathogens. Despite this extreme diversity, same clonotype, namely public clones, have been discovered among individuals. Although some clones could be explained by antibody convergence, in naïve or virus-neutralizing from not infected people were also discovered. All these findings indicated that might occur random and they exert essential functions....
Antibody repertoire sequencing (Rep-seq) has been widely used to reveal dynamics and interrogate antibodies of interest at single nucleotide-level resolution. However, polymerase chain reaction (PCR) amplification introduces extensive artifacts including chimeras nucleotide errors, leading false discovery incorrect assessment somatic hypermutations (SHMs) which subsequently mislead downstream investigations. Here, a novel approach named DUMPArts, improves the accuracy antibody repertoires by...