NFDI4DS | UHH-SEMS - Publication Details

Zhaocheng Zhu

ORCID: 0009-0004-5425-330X

Publications

Citations

Views

---

Saved

---

About

Contact & Profiles

A5038654477

Research Areas

Advanced Graph Neural Networks
Machine Learning in Materials Science
Topic Modeling
Semantic Web and Ontologies
Machine Learning in Bioinformatics
Genetics, Bioinformatics, and Biomedical Research
Computational Drug Discovery Methods
Natural Language Processing Techniques
Recommender Systems and Techniques
Human Pose and Action Recognition
Medical Imaging Techniques and Applications
Protein Structure and Dynamics
Radiomics and Machine Learning in Medical Imaging
Bayesian Modeling and Causal Inference
Domain Adaptation and Few-Shot Learning
Bioinformatics and Genomic Networks
Adversarial Robustness in Machine Learning
Face recognition and analysis
Biomedical Text Mining and Ontologies
Speech and dialogue systems
Generative Adversarial Networks and Image Synthesis
Advanced Neural Network Applications
Intelligent Tutoring Systems and Adaptive Learning
Data Quality and Management
Model-Driven Software Engineering Techniques

Mila - Quebec Artificial Intelligence Institute
2019-2024

Centre Universitaire de Mila
2023-2024

Université de Montréal
2023-2024

University of Hong Kong
2023

Tsinghua University
2019-2021

Peking University
2016-2018

KEPLER: A Unified Model for Knowledge Embedding and Pre-trained Language Representation

OPENALEX - Publications

Xiaozhi Wang Tianyu Gao Zhaocheng Zhu Zhengyan Zhang Zhiyuan Liu and 2 more

Abstract Pre-trained language representation models (PLMs) cannot well capture factual knowledge from text. In contrast, embedding (KE) methods can effectively represent the relational facts in graphs (KGs) with informative entity embeddings, but conventional KE take full advantage of abundant textual information. this paper, we propose a unified model for Knowledge Embedding and LanguagERepresentation (KEPLER), which not only better integrate into PLMs also produce effective text-enhanced...

10.1162/tacl_a_00360 article EN cc-by Transactions of the Association for Computational Linguistics 2021-03-01

GraphAF: a Flow-based Autoregressive Model for Molecular Graph Generation

OPENALEX - Publications

Chence Shi Minkai Xu Zhaocheng Zhu Weinan Zhang Ming Zhang and 1 more

Molecular graph generation is a fundamental problem for drug discovery and has been attracting growing attention. The challenging since it requires not only generating chemically valid molecular structures but also optimizing their chemical properties in the meantime. Inspired by recent progress deep generative models, this paper we propose flow-based autoregressive model called GraphAF. GraphAF combines advantages of both approaches enjoys: (1) high flexibility data density estimation; (2)...

10.48550/arxiv.2001.09382 preprint EN other-oa arXiv (Cornell University) 2020-01-01

Neural Bellman-Ford Networks: A General Graph Neural Network Framework for Link Prediction

OPENALEX - Publications

Zhaocheng Zhu Zuobai Zhang Louis-Pascal Xhonneux Jian Tang

Link prediction is a very fundamental task on graphs. Inspired by traditional path-based methods, in this paper we propose general and flexible representation learning framework based paths for link prediction. Specifically, define the of pair nodes as generalized sum all path representations, with each product edge representations path. Motivated Bellman-Ford algorithm solving shortest problem, show that proposed formulation can be efficiently solved algorithm. To further improve capacity...

10.48550/arxiv.2106.06935 preprint EN other-oa arXiv (Cornell University) 2021-01-01

GraphVite: A High-Performance CPU-GPU Hybrid System for Node Embedding

OPENALEX - Publications

Zhaocheng Zhu Shizhen Xu Jian Tang Meng Qu

Learning continuous representations of nodes is attracting growing interest in both academia and industry recently, due to their simplicity effectiveness a variety applications. Most existing node embedding algorithms systems are capable processing networks with hundreds thousands or few millions nodes. However, how scale them that have tens even remains challenging problem. In this paper, we propose GraphVite, high-performance CPU-GPU hybrid system for training embeddings, by co-optimizing...

10.1145/3308558.3313508 preprint EN 2019-05-13

KEPLER: A Unified Model for Knowledge Embedding and Pre-trained Language Representation

OPENALEX - Publications

Xiaozhi Wang Tianyu Gao Zhaocheng Zhu Zhengyan Zhang Zhiyuan Liu and 2 more

Pre-trained language representation models (PLMs) cannot well capture factual knowledge from text. In contrast, embedding (KE) methods can effectively represent the relational facts in graphs (KGs) with informative entity embeddings, but conventional KE take full advantage of abundant textual information. this paper, we propose a unified model for Knowledge Embedding and LanguagE Representation (KEPLER), which not only better integrate into PLMs also produce effective text-enhanced strong...

10.48550/arxiv.1911.06136 preprint EN other-oa arXiv (Cornell University) 2019-01-01

TorchDrug: A Powerful and Flexible Machine Learning Platform for Drug Discovery

OPENALEX - Publications

Zhaocheng Zhu Chence Shi Zuobai Zhang Shengchao Liu Minghao Xu and 10 more

Machine learning has huge potential to revolutionize the field of drug discovery and is attracting increasing attention in recent years. However, lacking domain knowledge (e.g., which tasks work on), standard benchmarks data preprocessing pipelines are main obstacles for machine researchers this domain. To facilitate progress discovery, we develop TorchDrug, a powerful flexible platform built on top PyTorch. TorchDrug variety important including molecular property prediction, pretrained...

10.48550/arxiv.2202.08320 preprint EN other-oa arXiv (Cornell University) 2022-01-01

PEER: A Comprehensive and Multi-Task Benchmark for Protein Sequence Understanding

OPENALEX - Publications

Minghao Xu Zuobai Zhang Jiarui Lu Zhaocheng Zhu Yangtian Zhang and 3 more

We are now witnessing significant progress of deep learning methods in a variety tasks (or datasets) proteins. However, there is lack standard benchmark to evaluate the performance different methods, which hinders this field. In paper, we propose such called PEER, comprehensive and multi-task for Protein sEquence undERstanding. PEER provides set diverse protein understanding including function prediction, localization structure protein-protein interaction protein-ligand prediction. types...

10.48550/arxiv.2206.02096 preprint EN other-oa arXiv (Cornell University) 2022-01-01

Dialog state tracking with attention-based sequence-to-sequence learning

OPENALEX - Publications

Takaaki Hori Hai Wang Chiori Hori Shinji Watanabe Bret Harsham and 6 more

We present an advanced dialog state tracking system designed for the 5th Dialog State Tracking Challenge (DSTC5). The main task of DSTC5 is to track in a human-human dialog. For each utterance, tracker emits frame slot-value pairs considering full history up current turn. Our includes encoder-decoder architecture with attention mechanism map input word sequence set semantic labels, i.e., pairs. This handles problem unknown alignment between utterances and labels. By combining attention-based...

10.1109/slt.2016.7846317 article EN 2022 IEEE Spoken Language Technology Workshop (SLT) 2016-12-01

Neural-Symbolic Models for Logical Queries on Knowledge Graphs

OPENALEX - Publications

Zhaocheng Zhu Mikhail Galkin Zuobai Zhang Jian Tang

Answering complex first-order logic (FOL) queries on knowledge graphs is a fundamental task for multi-hop reasoning. Traditional symbolic methods traverse complete graph to extract the answers, which provides good interpretation each step. Recent neural learn geometric embeddings queries. These can generalize incomplete graphs, but their reasoning process hard interpret. In this paper, we propose Graph Neural Network Query Executor (GNN-QE), neural-symbolic model that enjoys advantages of...

10.48550/arxiv.2205.10128 preprint EN other-oa arXiv (Cornell University) 2022-01-01

Towards Foundational Models for Molecular Learning on Large-Scale Multi-Task Datasets

OPENALEX - Publications

Dominique Beaini Shenyang Huang Joao Alex Cunha Gabriela Moisescu-Pareja Oleksandr Dymov and 29 more

Recently, pre-trained foundation models have enabled significant advancements in multiple fields. In molecular machine learning, however, where datasets are often hand-curated, and hence typically small, the lack of with labeled features, codebases to manage those datasets, has hindered development models. this work, we present seven novel categorized by size into three distinct categories: ToyMix, LargeMix UltraLarge. These push boundaries both scale diversity supervised labels for...

10.48550/arxiv.2310.04292 preprint EN cc-by-sa arXiv (Cornell University) 2023-01-01

Large Language Models can Learn Rules

OPENALEX - Publications

Zhaocheng Zhu Yuan Xue Xinyun Chen Denny Zhou Jian Tang and 2 more

When prompted with a few examples and intermediate steps, large language models (LLMs) have demonstrated impressive performance in various reasoning tasks. However, prompting methods that rely on implicit knowledge an LLM often hallucinate incorrect answers when the is wrong or inconsistent task. To tackle this problem, we present Hypotheses-to-Theories (HtT), framework learns rule library for LLMs. HtT contains two stages, induction stage deduction stage. In stage, first asked to generate...

10.48550/arxiv.2310.07064 preprint EN other-oa arXiv (Cornell University) 2023-01-01

GraphAny: A Foundation Model for Node Classification on Any Graph

OPENALEX - Publications

Jianan Zhao Hesham Mostafa Mikhail Galkin Michael M. Bronstein Zhaocheng Zhu and 1 more

Foundation models that can perform inference on any new task without requiring specific training have revolutionized machine learning in vision and language applications. However, applications involving graph-structured data remain a tough nut for foundation models, due to challenges the unique feature- label spaces associated with each graph. Traditional graph ML such as neural networks (GNNs) trained graphs cannot feature different from ones. Furthermore, existing learn functions...

10.48550/arxiv.2405.20445 preprint EN arXiv (Cornell University) 2024-05-30

A*Net: A Scalable Path-based Reasoning Approach for Knowledge Graphs

OPENALEX - Publications

Zhaocheng Zhu Xinyu Yuan Louis-Pascal Xhonneux Ming Zhang Maxime Gazeau and 1 more

Reasoning on large-scale knowledge graphs has been long dominated by embedding methods. While path-based methods possess the inductive capacity that embeddings lack, their scalability is limited exponential number of paths. Here we present A*Net, a scalable method for graph reasoning. Inspired A* algorithm shortest path problems, our A*Net learns priority function to select important nodes and edges at each iteration, reduce time memory footprint both training inference. The ratio selected...

10.48550/arxiv.2206.04798 preprint EN other-oa arXiv (Cornell University) 2022-01-01

GraphText: Graph Reasoning in Text Space

OPENALEX - Publications

Jianan Zhao Le Zhuo Yikang Shen Meng Qu Kai Liu and 3 more

Large Language Models (LLMs) have gained the ability to assimilate human knowledge and facilitate natural language interactions with both humans other LLMs. However, despite their impressive achievements, LLMs not made significant advancements in realm of graph machine learning. This limitation arises because graphs encapsulate distinct relational data, making it challenging transform them into that understand. In this paper, we bridge gap a novel framework, GraphText, translates language....

10.48550/arxiv.2310.01089 preprint EN other-oa arXiv (Cornell University) 2023-01-01

Context Aware Document Embedding

OPENALEX - Publications

Zhaocheng Zhu Junfeng Hu

Recently, doc2vec has achieved excellent results in different tasks. In this paper, we present a context aware variant of doc2vec. We introduce novel weight estimating mechanism that generates weights for each word occurrence according to its contribution the context, using deep neural networks. Our model can achieve similar compared initialized byWikipedia trained vectors, while being much more efficient and free from heavy external corpus. Analysis shows they are kind enhanced IDF capture...

10.48550/arxiv.1707.01521 preprint EN other-oa arXiv (Cornell University) 2017-01-01

Neural Graph Reasoning: Complex Logical Query Answering Meets Graph Databases

OPENALEX - Publications

Hong‐Yu Ren Mikhail Galkin Michael Cochez Zhaocheng Zhu Jure Leskovec

Complex logical query answering (CLQA) is a recently emerged task of graph machine learning that goes beyond simple one-hop link prediction and solves far more complex multi-hop reasoning over massive, potentially incomplete graphs in latent space. The received significant traction the community; numerous works expanded field along theoretical practical axes to tackle different types queries modalities with efficient systems. In this paper, we provide holistic survey CLQA detailed taxonomy...

10.48550/arxiv.2303.14617 preprint EN other-oa arXiv (Cornell University) 2023-01-01

Towards Foundation Models for Knowledge Graph Reasoning

OPENALEX - Publications

Mikhail Galkin Xinyu Yuan Hesham Mostafa Jian Tang Zhaocheng Zhu

Foundation models in language and vision have the ability to run inference on any textual visual inputs thanks transferable representations such as a vocabulary of tokens language. Knowledge graphs (KGs) different entity relation vocabularies that generally do not overlap. The key challenge designing foundation KGs is learn enable graph with arbitrary vocabularies. In this work, we make step towards present ULTRA, an approach for learning universal representations. ULTRA builds relational...

10.48550/arxiv.2310.04562 preprint EN other-oa arXiv (Cornell University) 2023-01-01

Zero-shot Logical Query Reasoning on any Knowledge Graph

OPENALEX - Publications

Mikhail Galkin Jincheng Zhou Bruno Ribeiro Jian Tang Zhaocheng Zhu

Complex logical query answering (CLQA) in knowledge graphs (KGs) goes beyond simple KG completion and aims at compositional queries comprised of multiple projections operations. Existing CLQA methods that learn parameters bound to certain entity or relation vocabularies can only be applied the graph they are trained on which requires substantial training time before being deployed a new graph. Here we present UltraQuery, an inductive reasoning model zero-shot answer any KG. The core idea...

10.48550/arxiv.2404.07198 preprint EN arXiv (Cornell University) 2024-04-10

The 1st International Workshop on Graph Foundation Models (GFM)

OPENALEX - Publications

Haitao Mao Jianan Zhao Xiaoxin He Zhikai Chen Qian Huang and 9 more

Foundation models such as GPT-4 for natural language processing (NLP), Flamingo computer vision (CV), have set new benchmarks in AI by delivering state-of-the-art results across various tasks with minimal task-specific data. Despite their success, the application of these to graph domain is challenging due relational nature graph-structured To address this gap, we propose Graph Model (GFM) Workshop, first workshop GFMs, dedicated exploring adaptation and development foundation specifically...

10.1145/3589335.3641306 article EN 2024-05-12

Path-based reasoning in biomedical knowledge graphs

OPENALEX - Publications

Yue Hu Svitlana Oleshko Samuele Firmani Zhaocheng Zhu Hui Cheng and 6 more

Abstract Understanding complex interactions in biomedical networks is crucial for advancements biomedicine, but traditional link prediction (LP) methods are limited capturing this complexity. Representation-based learning techniques improve accuracy by mapping nodes to low-dimensional embeddings, yet they often struggle with interpretability and scalability. We present BioPathNet, a novel graph neural network framework based on the Neural Bellman-Ford Network (NBFNet), addressing these...

10.1101/2024.06.17.599219 preprint EN cc-by-nc bioRxiv (Cold Spring Harbor Laboratory) 2024-06-18

BioPathNet: Enhancing Link Prediction in Biomedical Knowledge Graphs through Path Representation Learning

OPENALEX - Publications

Annalisa Marsico Yue Hu Svitlana Oleshko Samuele Firmani Zhaocheng Zhu and 6 more

<title>Abstract</title> Understanding complex interactions in biomedical networks is crucial for advancements biomedicine, but traditional link prediction (LP) methods are limited capturing this complexity. Representation-based learning techniques improve accuracy by mapping nodes to low-dimensional embeddings, yet they often struggle with interpretability and scalability. We present BioPathNet, a novel graph neural network framework based on the Neural Bellman-Ford Network (NBFNet),...

10.21203/rs.3.rs-5057842/v1 preprint EN Research Square (Research Square) 2024-09-18

Learning Representations for Reasoning: Generalizing Across Diverse Structures

OPENALEX - Publications

Zhaocheng Zhu

Reasoning, the ability to logically draw conclusions from existing knowledge, is a hallmark of human. Together with perception, they constitute two major themes artificial intelligence. While deep learning has pushed limit perception beyond human-level performance, progress in reasoning domains way behind. One fundamental reason that problems usually have flexible structures for both knowledge and queries, many models only perform well on seen during training. Here we aim push boundary by...

10.48550/arxiv.2410.13018 preprint EN arXiv (Cornell University) 2024-10-16

Coming Soon ...