NFDI4DS | UHH-SEMS - Publication Details

Shanfeng Zhu

ORCID: 0000-0002-6067-5312

Publications

Citations

Views

---

Saved

---

About

Contact & Profiles

A5045866167

Research Areas

Biomedical Text Mining and Ontologies
Machine Learning in Bioinformatics
Bioinformatics and Genomic Networks
Genomics and Phylogenetic Studies
Topic Modeling
Particle physics theoretical and experimental studies
Quantum Chromodynamics and Particle Interactions
vaccines and immunoinformatics approaches
Gene expression and cancer classification
Computational Drug Discovery Methods
Natural Language Processing Techniques
Advanced Text Analysis Techniques
Immunotherapy and Immune Responses
High-Energy Particle Collisions Research
Text and Document Classification Technologies
Monoclonal and Polyclonal Antibodies Research
Data Management and Algorithms
Protein Structure and Dynamics
Machine Learning in Materials Science
Web Data Mining and Analysis
Antimicrobial Peptides and Activities
Metabolomics and Mass Spectrometry Studies
Advanced Database Systems and Queries
Advanced Clustering Algorithms Research
CO2 Reduction Techniques and Catalysts

Fudan University
2016-2025

Shanghai Institute for Science of Science
2019-2025

Shanghai Center for Brain Science and Brain-Inspired Technology
2019-2025

Nanjing University
2019-2025

Shanghai Innovative Research Center of Traditional Chinese Medicine
2021-2024

ShangHai JiAi Genetics & IVF Institute
2022-2024

Anhui Science and Technology University
2024

Anhui University of Science and Technology
2024

Institute of Science and Technology
2023-2024

Institute of Art
2020-2024

Collaborative matrix factorization with multiple similarities for predicting drug-target interactions

OPENALEX - Publications

Xiaodong Zheng Hao Ding Hiroshi Mamitsuka Shanfeng Zhu

We address the problem of predicting new drug-target interactions from three inputs: known interactions, similarities over drugs and those targets. This setting has been considered by many methods, which however have a common allowing to only one similarity matrix that The key idea our approach is use more than matrices as well targets, where weights multiple are estimated data automatically select similarities, effective for improving performance interactions. propose factor model, named...

10.1145/2487575.2487670 article EN 2013-08-11

Critical Assessment of Metagenome Interpretation: the second round of challenges

OPENALEX - Publications

Fernando Meyer Adrian Fritz Zhi-Luo Deng David Koslicki Till Robin Lesker and 95 more

Abstract Evaluating metagenomic software is key for optimizing metagenome interpretation and focus of the Initiative Critical Assessment Metagenome Interpretation (CAMI). The CAMI II challenge engaged community to assess methods on realistic complex datasets with long- short-read sequences, created computationally from around 1,700 new known genomes, as well 600 plasmids viruses. Here we analyze 5,002 results by 76 program versions. Substantial improvements were seen in assembly, some due...

10.1038/s41592-022-01431-4 article EN cc-by Nature Methods 2022-04-01

GOLabeler: improving sequence-based large-scale protein function prediction by learning to rank

OPENALEX - Publications

Ronghui You Zihan Zhang Yi Xiong Fengzhu Sun Hiroshi Mamitsuka and 1 more

Abstract Motivation Gene Ontology (GO) has been widely used to annotate functions of proteins and understand their biological roles. Currently only &lt;1% &gt;70 million in UniProtKB have experimental GO annotations, implying the strong necessity automated function prediction (AFP) proteins, where AFP is a hard multilabel classification problem due one protein with diverse number terms. Most these sequences as input information, indicating importance sequence-based (SAFP: are input)....

10.1093/bioinformatics/bty130 article EN Bioinformatics 2018-03-06

MetaBinner: a high-performance and stand-alone ensemble binning method to recover individual genomes from complex microbial communities

OPENALEX - Publications

Ziye Wang Pingqin Huang Ronghui You Fengzhu Sun Shanfeng Zhu

Abstract Binning aims to recover microbial genomes from metagenomic data. For complex communities, the available binning methods are far satisfactory, which usually do not fully use different types of features and important biological knowledge. We developed a novel ensemble binner, MetaBinner, generates component results with multiple by k-means uses single-copy gene information for initialization. It then employs two-stage strategy based on genes integrate efficiently effectively....

10.1186/s13059-022-02832-6 article EN cc-by Genome biology 2023-01-06

NetGO 3.0: Protein Language Model Improves Large-Scale Functional Annotations

OPENALEX - Publications

Shaojun Wang Ronghui You Yunjia Liu Yi Xiong Shanfeng Zhu

As one of the state-of-the-art automated function prediction (AFP) methods, NetGO 2.0 integrates multi-source information to improve performance. However, it mainly utilizes proteins with experimentally supported functional annotations without leveraging valuable from a vast number unannotated proteins. Recently, protein language models have been proposed learn informative representations [e.g., Evolutionary Scale Modeling (ESM)-1b embedding] sequences based on self-supervision. Here, we...

10.1016/j.gpb.2023.04.001 article EN cc-by Genomics Proteomics & Bioinformatics 2023-04-01

PLMSearch: Protein language model powers accurate and fast sequence search for remote homology

OPENALEX - Publications

Wei Liu Ziye Wang Ronghui You Chenghan Xie Hong Wei and 3 more

Homologous protein search is one of the most commonly used methods for annotation and analysis. Compared to structure search, detecting distant evolutionary relationships from sequences alone remains challenging. Here we propose PLMSearch (Protein Language Model), a homologous method with only as input. uses deep representations pre-trained language model trains similarity prediction large number real similarity. This enables capture remote homology information concealed behind sequences....

10.1038/s41467-024-46808-5 article EN cc-by Nature Communications 2024-03-30

DrugE-Rank: improving drug–target interaction prediction of new candidate drugs or targets by ensemble learning to rank

OPENALEX - Publications

Qingjun Yuan Junning Gao Dongliang Wu Shihua Zhang Hiroshi Mamitsuka and 1 more

Identifying drug-target interactions is an important task in drug discovery. To reduce heavy time and financial cost experimental way, many computational approaches have been proposed. Although these used different principles, their performance far from satisfactory, especially predicting of new candidate drugs or targets.Approaches based on machine learning for this problem can be divided into two types: feature-based similarity-based methods. Learning to rank the most powerful technique...

10.1093/bioinformatics/btw244 article EN cc-by-nc Bioinformatics 2016-06-11

NetGO: improving large-scale protein function prediction with massive network information

OPENALEX - Publications

Ronghui You Shuwei Yao Yi Xiong Xiaodi Huang Fengzhu Sun and 2 more

Automated function prediction (AFP) of proteins is great significance in biology. AFP can be regarded as a problem the large-scale multi-label classification where protein associated with multiple gene ontology terms its labels. Based on our GOLabeler-a state-of-the-art method for third critical assessment functional annotation (CAFA3), this paper we propose NetGO, web server that able to further improve performance by incorporating massive protein-protein network information. Specifically,...

10.1093/nar/gkz388 article EN cc-by-nc Nucleic Acids Research 2019-05-01

DeepMeSH: deep semantic representation for improving large-scale MeSH indexing

OPENALEX - Publications

Shengwen Peng Ronghui You Hongning Wang ChengXiang Zhai Hiroshi Mamitsuka and 1 more

Abstract Motivation: Medical Subject Headings (MeSH) indexing, which is to assign a set of MeSH main headings citations, crucial for many important tasks in biomedical text mining and information retrieval. Large-scale indexing has two challenging aspects: the citation side side. For side, all existing methods, including Text Indexer (MTI) by National Library Medicine state-of-the-art method, MeSHLabeler, deal with bag-of-words, cannot capture semantic context-dependent well. Methods: We...

10.1093/bioinformatics/btw294 article EN cc-by-nc Bioinformatics 2016-06-11

DeepGraphGO: graph neural network for large-scale, multispecies protein function prediction

OPENALEX - Publications

Ronghui You Shuwei Yao Hiroshi Mamitsuka Shanfeng Zhu

Abstract Motivation Automated function prediction (AFP) of proteins is a large-scale multi-label classification problem. Two limitations most network-based methods for AFP are (i) single model must be trained each species and (ii) protein sequence information totally ignored. These cause weaker performance than sequence-based methods. Thus, the challenge how to develop powerful method overcome these limitations. Results We propose DeepGraphGO, an end-to-end, multispecies graph neural AFP,...

10.1093/bioinformatics/btab270 article EN Bioinformatics 2021-04-23

NetGO 2.0: improving large-scale protein function prediction with massive sequence, text, domain, family and network information

OPENALEX - Publications

Shuwei Yao Ronghui You Shaojun Wang Yi Xiong Xiaodi Huang and 1 more

Abstract With the explosive growth of protein sequences, large-scale automated function prediction (AFP) is becoming challenging. A usually associated with dozens gene ontology (GO) terms. Therefore, AFP regarded as a problem multi-label classification. Under learning to rank (LTR) framework, our previous NetGO tool integrated massive networks and multi-type information about sequences achieve good performance by dealing all possible GO terms (&gt;44 000). In this work, we propose...

10.1093/nar/gkab398 article EN cc-by-nc Nucleic Acids Research 2021-05-04

AttentionXML: Label Tree-based Attention-Aware Deep Model for High-Performance Extreme Multi-Label Text Classification

OPENALEX - Publications

Ronghui You Zihan Zhang Ziye Wang Suyang Dai Hiroshi Mamitsuka and 1 more

Extreme multi-label text classification (XMTC) is an important problem in the era of big data, for tagging a given with most relevant multiple labels from extremely large-scale label set. XMTC can be found many applications, such as item categorization, web page tagging, and news annotation. Traditionally methods used bag-of-words (BOW) inputs, ignoring word context well deep semantic information. Recent attempts to overcome problems BOW by learning still suffer 1) failing capture subtext...

10.48550/arxiv.1811.01727 preprint EN other-oa arXiv (Cornell University) 2018-01-01

Effective binning of metagenomic contigs using contrastive multi-view representation learning

OPENALEX - Publications

Ziye Wang Ronghui You Haitao Han Wei Liu Fengzhu Sun and 1 more

Abstract Contig binning plays a crucial role in metagenomic data analysis by grouping contigs from the same or closely related genomes. However, existing methods face challenges practical applications due to diversity of types and difficulties efficiently integrating heterogeneous information. Here, we introduce COMEBin, method based on contrastive multi-view representation learning. COMEBin utilizes augmentation generate multiple fragments (views) each contig obtains high-quality embeddings...

10.1038/s41467-023-44290-z article EN cc-by Nature Communications 2024-01-17

TEPITOPEpan: Extending TEPITOPE for Peptide Binding Prediction Covering over 700 HLA-DR Molecules

OPENALEX - Publications

Lianming Zhang Yiqing Chen Hau−San Wong Shuigeng Zhou Hiroshi Mamitsuka and 1 more

Motivation Accurate identification of peptides binding to specific Major Histocompatibility Complex Class II (MHC-II) molecules is great importance for elucidating the underlying mechanism immune recognition, as well developing effective epitope-based vaccines and promising immunotherapies many severe diseases. Due extreme polymorphism MHC-II alleles high cost biochemical experiments, development computational methods accurate prediction molecules, particularly ones with few or no...

10.1371/journal.pone.0030483 article EN cc-by PLoS ONE 2012-02-23

DeepText2GO: Improving large-scale protein function prediction with deep semantic text representation

OPENALEX - Publications

Ronghui You Xiaodi Huang Shanfeng Zhu

10.1016/j.ymeth.2018.05.026 article EN Methods 2018-06-06

MeSHLabeler: improving the accuracy of large-scale MeSH indexing by integrating diverse evidence

OPENALEX - Publications

Ke Liu Shengwen Peng Junqiu Wu ChengXiang Zhai Hiroshi Mamitsuka and 1 more

Medical Subject Headings (MeSHs) are used by National Library of Medicine (NLM) to index almost all citations in MEDLINE, which greatly facilitates the applications biomedical information retrieval and text mining. To reduce time financial cost manual annotation, NLM has developed a software package, Text Indexer (MTI), for assisting MeSH uses k-nearest neighbors (KNN), pattern matching indexing rules. Other types information, such as prediction classifiers (trained separately), can also be...

10.1093/bioinformatics/btv237 article EN cc-by-nc Bioinformatics 2015-06-10

Enhancing quantitative intra-day stock return prediction by integrating both market news and stock prices information

OPENALEX - Publications

Xiaodong Li Xiaodi Huang Xiaotie Deng Shanfeng Zhu

10.1016/j.neucom.2014.04.043 article EN Neurocomputing 2014-06-07

Does Summarization Help Stock Prediction? A News Impact Analysis

OPENALEX - Publications

Xiaodong Li Haoran Xie Yangqiu Song Shanfeng Zhu Qing Li and 1 more

The authors study the problem of how news summarization can help stock price prediction, proposing a generic prediction framework to enable use different external signals predict prices. Experiments were conducted on five years Hong Kong Stock Exchange data, with reported by Finet; evaluations performed at individual stock, sector index, and market index levels. authors' results show that based article effectively outperform full-length articles both validation independent testing sets.

10.1109/mis.2015.1 article EN IEEE Intelligent Systems 2015-01-12

SolidBin: improving metagenome binning with semi-supervised normalized cut

OPENALEX - Publications

Ziye Wang Zhengyang Wang Yang Young Lu Fengzhu Sun Shanfeng Zhu

Abstract Motivation Metagenomic contig binning is an important computational problem in metagenomic research, which aims to cluster contigs from the same genome into group. Unlike classical clustering problem, can utilize known relationships among some of or taxonomic identity contigs. However, current state-of-the-art methods do not make full use additional biological information except coverage and sequence composition Results We developed a novel method, Semi-supervised Spectral...

10.1093/bioinformatics/btz253 article EN Bioinformatics 2019-04-05

META-DDIE: predicting drug–drug interaction events with few-shot learning

OPENALEX - Publications

Yifan Deng Yang Qiu Xinran Xu Shichao Liu Zhongfei Zhang and 2 more

Abstract Drug–drug interactions (DDIs) are one of the major concerns in pharmaceutical research, and a number computational methods have been developed to predict whether two drugs interact or not. Recently, more attention has paid events caused by DDIs, which is useful for investigating mechanism hidden behind combined drug usage adverse reactions. However, some rare may only few examples, hindering them from being precisely predicted. To address above issues, we present few-shot method...

10.1093/bib/bbab514 article EN Briefings in Bioinformatics 2021-11-10

A probabilistic model for mining implicit ‘chemical compound–gene’ relations from literature

OPENALEX - Publications

Shanfeng Zhu Yasushi Okuno Gozoh Tsujimoto Hiroshi Mamitsuka

The importance of chemical compounds has been emphasized more in molecular biology, and 'chemical genomics' attracted a great deal attention recent years. Thus an important issue current biology is to identify biological-related (more specifically, drugs) genes. Co-occurrence biological entities the literature simple, comprehensive popular technique find association these entities. Our focus mine implicit compound gene' relations from co-occurrence literature.We propose probabilistic model,...

10.1093/bioinformatics/bti1141 article EN Bioinformatics 2005-09-01

Enhancing MEDLINE document clustering by incorporating MeSH semantic similarity

OPENALEX - Publications

Shanfeng Zhu Jia Zeng Hiroshi Mamitsuka

Abstract Motivation: Clustering MEDLINE documents is usually conducted by the vector space model, which computes content similarity between two basically using inner-product of their word vectors. Recently, semantic information MeSH (Medical Subject Headings) thesaurus being applied to clustering mapping into concept vectors be clustered. However, current approaches have serious limitations: first, important may lost when generating vectors, and second, original text has been discarded....

10.1093/bioinformatics/btp338 article EN Bioinformatics 2009-06-03

Coming Soon ...