- Algorithms and Data Compression
- Advanced Graph Theory Research
- Limits and Structures in Graph Theory
- Genomics and Phylogenetic Studies
- Gene expression and cancer classification
- RNA and protein synthesis mechanisms
- semigroups and automata theory
- DNA and Biological Computing
- Machine Learning and Algorithms
- Graph theory and applications
- Genome Rearrangement Algorithms
- RNA modifications and cancer
- Cellular Automata and Applications
- Genetic Associations and Epidemiology
- Genomics and Chromatin Dynamics
- Graph Labeling and Dimension Problems
- Computability, Logic, AI Algorithms
- Genetic Mapping and Diversity in Plants and Animals
- RNA Research and Splicing
- Advanced Topology and Set Theory
- Bioinformatics and Genomic Networks
- Mercury impact and mitigation studies
- Complexity and Algorithms in Graphs
- graph theory and CDMA systems
- Computational Drug Discovery Methods
Southwest University
2016-2025
Chengdu University of Traditional Chinese Medicine
2024-2025
University of California, Riverside
2015-2024
Tsinghua University
2015-2024
Jianghan University
2022-2024
Northeast Institute of Geography and Agroecology
2014-2024
Rice Research Institute
2024
Chinese Academy of Sciences
2015-2024
Ames National Laboratory
2024
Czech Academy of Sciences, Institute of Biophysics
2024
We study the computational complexity of two popular problems in multiple sequence alignment: alignment with SP-score and tree alignment. It is shown that first problem NP-complete second MAX SNP-hard. The a given phylogeny also considered.
In pattern recognition, feature extraction techniques are widely employed to reduce the dimensionality of data and enhance discriminatory information. Principal component analysis (PCA) linear discriminant (LDA) two most popular reduction methods. However, PCA is not very effective for features, LDA stable due small sample size problem. this paper, we propose some new (linear nonlinear) extractors based on maximum margin criterion (MMC). Geometrically, MMC maximize (average) between classes...
Abstract The phytohormone abscisic acid (ABA) plays a vital role in plant development and response to environmental challenges, but the complex networks of ABA signaling pathways are poorly understood. We previously reported that chloroplast protein, magnesium-protoporphyrin IX chelatase H subunit (CHLH/ABAR), functions as receptor for Arabidopsis thaliana. Here, we report ABAR spans envelope cytosolic C terminus interacts with group WRKY transcription factors (WRKY40, WRKY18, WRKY60)...
Software applications for structural similarity searching and clustering of small molecules play an important role in drug discovery chemical genomics. Here, we present the first open-source compound mining framework popular statistical programming environment R. The integration with a powerful maximizes flexibility, expandability programmability provided analysis functions.We discuss algorithms utilities by R package ChemmineR. It contains functions searching, libraries wide spectrum...
Abstract Motivation Accurately predicting drug–target interactions (DTIs) in silico can guide the drug discovery process and thus facilitate development. Computational approaches for DTI prediction that adopt systems biology perspective generally exploit rationale properties of drugs targets be characterized by their functional roles biological networks. Results Inspired recent advance information passing aggregation techniques generalize convolution neural networks to mine large-scale graph...
Single-cell ATAC-seq (scATAC-seq) profiles the chromatin accessibility landscape at single cell level, thus revealing cell-to-cell variability in gene regulation. However, high dimensionality and sparsity of scATAC-seq data often complicate analysis. Here, we introduce a method for analyzing data, called Single-Cell analysis via Latent feature Extraction (SCALE). SCALE combines deep generative framework probabilistic Gaussian Mixture Model to learn latent features that accurately...
Computational approaches for understanding compound-protein interactions (CPIs) can greatly facilitate drug development. Recently, a number of deep-learning-based methods have been proposed to predict binding affinities and attempt capture local interaction sites in compounds proteins through neural attentions (i.e., network architectures that enable the interpretation feature importance). Here, we compiled benchmark dataset containing inter-molecular non-covalent more than 10,000 pairs...
Abstract Many machine learning applications in bioinformatics currently rely on matching gene identities when analyzing input signatures and fail to take advantage of preexisting knowledge about functions. To further enable comparative analysis OMICS datasets, including target deconvolution mechanism action studies, we develop an approach that represents projected onto their biological functions, instead identities, similar how the word2vec technique works natural language processing. We...
Finite automata (FA's) are of fundamental importance in theory and applications. The following basic minimization problem is studied: Given a DFA (deterministic FA), find minimum equivalent nondeterministic FA (NFA). This paper shows that the natural decision associated with it PSPACE-complete. More generally, let ${\text{A}} \to {\text{B}}$ denote converting given type A to B. also most these problems computationally hard. Motivated by question how much nondeterminism suffices make...
The problems of finding shortest common supersequences (SCS) and longest subsequences (LCS) are two well-known ${\textbf NP}$-hard that have applications in many areas, including computational molecular biology, data compression, robot motion planning, scheduling, text editing, etc. A lot fruitless effort has been spent searching for good approximation algorithms these problems. In this paper, we show inherently hard to approximate the worst case. particular, prove (i) SCS does not a...
Arc-annotated sequences are useful in representing the structural information of RNA sequences. In general, secondary and tertiary structures can be represented as a set nested arcs crossing arcs, respectively. Since functions largely determined by molecular confirmation therefore structures, comparison between has received much attention recently. this paper, we propose notion edit distance to measure similarity two incorporating various operations performed on both bases (i.e.,...
The prediction of biologically active compounds is great importance for high-throughput screening (HTS) approaches in drug discovery and chemical genomics. Many computational methods this area focus on measuring the structural similarities between structures. However, traditional similarity measures are often too rigid or consider only global maximum common substructure (MCS) approach provides a more promising flexible alternative predicting bioactive compounds.In article, new backtracking...
Hepatocyte nuclear factor 4 alpha (HNF4α), a member of the receptor superfamily, is essential for liver function and linked to several diseases including diabetes, hemophilia, atherosclerosis, hepatitis. Although many DNA response elements target genes have been identified HNF4α, complete repertoire binding sites in human genome unknown. Here, we adapt protein microarrays (PBMs) examine DNA-binding characteristics two HNF4α species (rat human) isoforms (HNF4α2 HNF4α8) high-throughput...
The new second generation sequencing technology revolutionizes many biology-related research fields and poses various computational biology challenges. One of them is transcriptome assembly based on RNA-Seq data, which aims at reconstructing all full-length mRNA transcripts simultaneously from millions short reads. In this article, we consider three objectives in assembly: the maximization prediction accuracy, minimization interpretation, completeness. first objective, requires that...
Abstract Motivation Translation initiation is a key step in the regulation of gene expression. In addition to annotated translation sites (TISs), process may also start at multiple alternative TISs (including both AUG and non-AUG codons), which makes it challenging predict study underlying regulatory mechanisms. Meanwhile, advent several high-throughput sequencing techniques for profiling initiating ribosomes single-nucleotide resolution, e.g. GTI-seq QTI-seq, provides abundant data...
Algal blooms bring massive amounts of algal organic matter (AOM) into eutrophic lakes, which influences microbial methylmercury (MeHg) production. However, because the complexity AOM and its dynamic changes during decomposition, relationship between Hg methylators remains poorly understood, hinders predicting MeHg production bioaccumulation in shallow lakes. To address that, we explored impacts on by characterizing dissolved with Fourier transform ion cyclotron resonance mass spectrometry...
Peatlands store one-third of the world's soil organic carbon. Globally increased fires altered peat matter chemistry, yet redox property and molecular dynamics peat-dissolved (PDOM) during remain poorly characterized, limiting our understanding postfire biogeochemical processes. Clarifying these dynamic changes is essential for effective peatland fire management. This study demonstrates temperature-dependent in electron exchange capacity (EEC) PDOM by simulating burning, significantly...
The assignment of orthologous genes between a pair genomes is fundamental and challenging problem in comparative genomics. Existing methods that assign orthologs based on the similarity DNA or protein sequences may make erroneous assignments when sequence does not clearly delineate evolutionary relationship among same families. In this paper, we present new approach to ortholog takes into account both events at genome level, where are assumed correspond each other most parsimonious evolving...