NFDI4DS | UHH-SEMS - Publication Details

Hongyu Guo

ORCID: 0000-0002-7663-2421

Publications

Citations

Views

---

Saved

---

About

Contact & Profiles

A5103193854

Research Areas

Topic Modeling
Natural Language Processing Techniques
Domain Adaptation and Few-Shot Learning
Machine Learning and Data Classification
Energy Load and Power Forecasting
Machine Learning in Materials Science
Generative Adversarial Networks and Image Synthesis
Computational Drug Discovery Methods
Adversarial Robustness in Machine Learning
Advanced Text Analysis Techniques
Multimodal Machine Learning Applications
Advanced Graph Neural Networks
Imbalanced Data Classification Techniques
Text and Document Classification Technologies
Image Retrieval and Classification Techniques
Face and Expression Recognition
Data Quality and Management
Building Energy and Comfort Optimization
Music and Audio Processing
Advanced Image and Video Retrieval Techniques
Advanced Neural Network Applications
Time Series Analysis and Forecasting
Advanced MRI Techniques and Applications
Sentiment Analysis and Opinion Mining
Radiomics and Machine Learning in Medical Imaging

Neusoft (China)
2019-2025

National Research Council Canada
2013-2023

Tianjin Medical University
2023

Shenyang University of Technology
2011-2023

Institute of Computing Technology
2023

University of Saskatchewan
2020-2021

University of Ottawa
2004-2019

Beihang University
2019

University of Leeds
2019

Shanghai Ocean University
2012

Learning from imbalanced data sets with boosting and data generation

OPENALEX - Publications

Hongyu Guo Herna L. Viktor

Learning from imbalanced data sets, where the number of examples one (majority) class is much higher than others, presents an important challenge to machine learning community. Traditional algorithms may be biased towards majority class, thus producing poor predictive accuracy over minority class. In this paper, we describe a new approach that combines boosting, ensemble-based algorithm, with generation improve power classifiers against sets consisting two classes. DataBoost-IM method, hard...

10.1145/1007730.1007736 article EN ACM SIGKDD Explorations Newsletter 2004-06-01

MixUp as Locally Linear Out-of-Manifold Regularization

OPENALEX - Publications

Hongyu Guo Yongyi Mao Richong Zhang

MixUp (Zhang et al. 2017) is a recently proposed dataaugmentation scheme, which linearly interpolates random pair of training examples and correspondingly the one-hot representations their labels. Training deep neural networks with such additional data shown capable significantly improving predictive accuracy current art. The power MixUp, however, primarily established empirically its working effectiveness have not been explained in any depth. In this paper, we develop an understanding for...

10.1609/aaai.v33i01.33013714 article EN Proceedings of the AAAI Conference on Artificial Intelligence 2019-07-17

Augmenting Data with Mixup for Sentence Classification: An Empirical Study

OPENALEX - Publications

Hongyu Guo Yongyi Mao Richong Zhang

Mixup, a recent proposed data augmentation method through linearly interpolating inputs and modeling targets of random samples, has demonstrated its capability significantly improving the predictive accuracy state-of-the-art networks for image classification. However, how this technique can be applied to what is effectiveness on natural language processing (NLP) tasks have not been investigated. In paper, we propose two strategies adaption Mixup sentence classification: one performs...

10.48550/arxiv.1905.08941 preprint EN other-oa arXiv (Cornell University) 2019-01-01

Pre-training Molecular Graph Representation with 3D Geometry

OPENALEX - Publications

Shengchao Liu Hanchen Wang Weiyang Liu Joan Lasenby Hongyu Guo and 1 more

Molecular graph representation learning is a fundamental problem in modern drug and material discovery. graphs are typically modeled by their 2D topological structures, but it has been recently discovered that 3D geometric information plays more vital role predicting molecular functionalities. However, the lack of real-world scenarios significantly impeded representation. To cope with this challenge, we propose Graph Multi-View Pre-training (GraphMVP) framework where self-supervised (SSL)...

10.48550/arxiv.2110.07728 preprint EN other-oa arXiv (Cornell University) 2021-01-01

A Graph to Graphs Framework for Retrosynthesis Prediction

OPENALEX - Publications

Chence Shi Minkai Xu Hongyu Guo Ming Zhang Jian Tang

A fundamental problem in computational chemistry is to find a set of reactants synthesize target molecule, a.k.a. retrosynthesis prediction. Existing state-of-the-art methods rely on matching the molecule with large reaction templates, which are very computationally expensive and also suffer from coverage. In this paper, we propose novel template-free approach called G2Gs by transforming molecular graph into reactant graphs. first splits synthons identifying centers, then translates final...

10.48550/arxiv.2003.12725 preprint EN other-oa arXiv (Cornell University) 2020-01-01

An Empirical Study on the Effect of Negation Words on Sentiment

OPENALEX - Publications

Xiaodan Zhu Hongyu Guo Saif M. Mohammad Svetlana Kiritchenko

Negation words, such as no and not, play a fundamental role in modifying sentiment of textual expressions. We will refer to negation word the negator text span within scope argument. Commonly used heuristics estimate negated expressions rely simply on argument (and not or itself). use treebank show that these existing are poor estimators sentiment. then modify be dependent negators this improves prediction. Next, we evaluate recently proposed composition model (Socher et al., 2013) relies...

10.3115/v1/p14-1029 article EN cc-by 2014-01-01

Nonlinear Mixup: Out-Of-Manifold Data Augmentation for Text Classification

OPENALEX - Publications

Hongyu Guo

Data augmentation with Mixup (Zhang et al. 2018) has shown to be an effective model regularizer for current art deep classification networks. It generates out-of-manifold samples through linearly interpolating inputs and their corresponding labels of random sample pairs. Despite its great successes, requires convex combination the as well modeling targets a pair, thus significantly limits space synthetic consequently regularization effect. To cope this limitation, we propose “nonlinear...

10.1609/aaai.v34i04.5822 article EN Proceedings of the AAAI Conference on Artificial Intelligence 2020-04-03

Self-supervised Graph-level Representation Learning with Local and Global Structure

OPENALEX - Publications

Minghao Xu Hang Wang Bingbing Ni Hongyu Guo Jian Tang

This paper studies unsupervised/self-supervised whole-graph representation learning, which is critical in many tasks such as molecule properties prediction drug and material discovery. Existing methods mainly focus on preserving the local similarity structure between different graph instances but fail to discover global semantic of entire data set. In this paper, we propose a unified framework called Local-instance Global-semantic Learning (GraphLoG) for self-supervised learning....

10.48550/arxiv.2106.04113 preprint EN other-oa arXiv (Cornell University) 2021-01-01

The Unreasonable Effectiveness of Word Representations for Twitter Named Entity Recognition

OPENALEX - Publications

Colin Cherry Hongyu Guo

Named entity recognition (NER) systems trained on newswire perform very badly when tested Twitter. Signals that were reliable in copy-edited text disappear almost entirely Twitter’s informal chatter, requiring the construction of specialized models. Using wellunderstood techniques, we set out to improve Twitter NER performance given a small annotated training tweets. To leverage unlabeled tweets, build Brown clusters and word vectors, enabling generalizations across distributionally similar...

10.3115/v1/n15-1075 article EN cc-by Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies 2015-01-01

Syntax Encoding with Application in Authorship Attribution

OPENALEX - Publications

Richong Zhang Zhiyuan Hu Hongyu Guo Yongyi Mao

We propose a novel strategy to encode the syntax parse tree of sentence into learnable distributed representation. The proposed encoding scheme is provably information-lossless. In specific, an embedding vector constructed for each word in sentence, path corresponding word. one-to-one correspondence between these "syntax-embedding" vectors and words (hence their vectors) makes it easy integrate such representation with all word-level NLP models. empirically show benefits embeddings on...

10.18653/v1/d18-1294 article EN cc-by Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing 2018-01-01

Development and evaluation of a deep learning model for multi-frequency Gibbs artifact elimination

OPENALEX - Publications

Lisong Dai Dan Wang Xin Yu Mao Zhenzhuang Miao Lei Lu and 7 more

Gibbs artifacts frequently occur as a result of truncation in the frequency domain (k-space). can degrade image quality and may be misinterpreted syrinx, thereby complicating diagnosis. This study aimed to develop evaluate robust deep learning (DL) model that eliminates multi-frequency artifacts. We retrospectively collected 290,940 magnetic resonance imaging (MRI) images from 4,936 scans, encompassing 5 anatomical regions 67 MRI sequences, DL for artifact removal. was trained using...

10.21037/qims-24-1344 article EN Quantitative Imaging in Medicine and Surgery 2025-01-24

A text-guided protein design framework

OPENALEX - Publications

Shengchao Liu Yanjing Li Zhuoxinran Li Anthony Gitter Yutao Zhu and 8 more

10.1038/s42256-025-01011-z article EN Nature Machine Intelligence 2025-03-27

DAG-Structured Long Short-Term Memory for Semantic Compositionality

OPENALEX - Publications

Xiaodan Zhu Parinaz Sobhani Hongyu Guo

Recurrent neural networks, particularly long short-term memory (LSTM), have recently shown to be very effective in a wide range of sequence modeling problems, core which is learning distributed representation for subsequences as well the sequences they form.An assumption almost all previous models, however, posits that learned (e.g., sentence), fully compositional from atomic components representations words), while non-compositionality basic phenomenon human languages.In this paper, we...

10.18653/v1/n16-1106 article EN cc-by Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies 2016-01-01

Dynamic Graph Convolutional Networks for Entity Linking

OPENALEX - Publications

Junshuang Wu Richong Zhang Yongyi Mao Hongyu Guo Masoumeh Soflaei and 1 more

Entity linking, which maps named entity mentions in a document into the proper entities given knowledge graph, has been shown to be able significantly benefit from modeling relatedness through Graph Convolutional Networks (GCN). Nevertheless, existing GCN linking models fail take account fact that structured graph for set of not only depends on contextual information but also adaptively changes different aggregation layers GCN, resulting insufficiency terms capturing structural among...

10.1145/3366423.3380192 article EN 2020-04-20

Accelerated Continuous Conditional Random Fields For Load Forecasting

OPENALEX - Publications

Hongyu Guo

Increasingly, aiming to contain their rapidly growing energy expenditures, commercial buildings are equipped respond utility's demand and price signals. Such smart consumption, however, heavily relies on accurate short-term load forecasting, such as hourly predictions for the next n (n ≥ 2) hours. To attain sufficient accuracy these predictions, it is important exploit relationships among estimated outputs. This paper treats multi-steps ahead regression task a sequence labeling (regression)...

10.1109/tkde.2015.2399311 article EN IEEE Transactions on Knowledge and Data Engineering 2015-02-11

Representation Based Translation Evaluation Metrics

OPENALEX - Publications

Boxing Chen Hongyu Guo

Boxing Chen, Hongyu Guo. Proceedings of the 53rd Annual Meeting Association for Computational Linguistics and 7th International Joint Conference on Natural Language Processing (Volume 2: Short Papers). 2015.

10.3115/v1/p15-2025 article EN cc-by 2015-01-01

MixUp as Directional Adversarial Training

OPENALEX - Publications

Guillaume Archambault Yongyi Mao Hongyu Guo Richong Zhang

In this work, we explain the working mechanism of MixUp in terms adversarial training. We introduce a new class training schemes, which refer to as directional training, or DAT. nutshell, DAT scheme perturbs example direction another but keeps its original label target. prove that is equivalent special subclass DAT, it has same expected loss function and corresponds optimization problem asymptotically. This understanding not only serves effectiveness MixUp, also reveals more general family...

10.48550/arxiv.1906.06875 preprint EN other-oa arXiv (Cornell University) 2019-01-01

Modeling Noisy Hierarchical Types in Fine-Grained Entity Typing: A Content-Based Weighting Approach

OPENALEX - Publications

Junshuang Wu Richong Zhang Yongyi Mao Hongyu Guo Jinpeng Huai

Fine-grained entity typing (FET), which annotates the entities in a sentence with set of finely specified type labels, often serves as first and critical step towards many natural language processing tasks. Despite great processes have been made, current FET methods difficulty to cope noisy labels naturally come data acquisition processes. Existing approaches either pre-process clean noise or simply focus on one sidestepping fact that those noises are related content dependent. In this...

10.24963/ijcai.2019/731 article EN 2019-07-28

Mining the plasma-proteome associated genes in patients with gastro-esophageal cancers for biomarker discovery

OPENALEX - Publications

Frederick S. Vizeacoumar Hongyu Guo Lynn Dwernychuk Adnan Zaidi Andrew Freywald and 3 more

Gastro-esophageal (GE) cancers are one of the major causes cancer-related death in world. There is a need for novel biomarkers management GE cancers, to yield predictive response available therapies. Our study aims identify leading genes that differentially regulated patients with these cancers. We explored expression data those whose protein products can be detected plasma using Cancer Genome Atlas work predicted several candidates as potential distinct stages including previously...

10.1038/s41598-021-87037-w article EN cc-by Scientific Reports 2021-04-07

Applications of radiomics-based analysis pipeline for predicting epidermal growth factor receptor mutation status

OPENALEX - Publications

Zefeng Liu Tianyou Zhang Liying Lin Fenghua Long Hongyu Guo and 1 more

This study aimed to develop a pipeline for selecting the best feature engineering-based radiomic path predict epidermal growth factor receptor (EGFR) mutant lung adenocarcinoma in 18F-fluorodeoxyglucose (FDG) positron emission tomography/computed tomography (PET/CT).The enrolled 115 patients with EGFR mutation status from June 2016 and September 2017. We extracted radiomics features by delineating regions-of-interest around entire tumor 18F-FDG PET/CT images. The paths were built combining...

10.1186/s12938-022-01049-9 article EN cc-by BioMedical Engineering OnLine 2023-02-21

Coming Soon ...