- Topic Modeling
- Natural Language Processing Techniques
- Domain Adaptation and Few-Shot Learning
- Machine Learning and Data Classification
- Energy Load and Power Forecasting
- Machine Learning in Materials Science
- Generative Adversarial Networks and Image Synthesis
- Computational Drug Discovery Methods
- Adversarial Robustness in Machine Learning
- Advanced Text Analysis Techniques
- Multimodal Machine Learning Applications
- Advanced Graph Neural Networks
- Imbalanced Data Classification Techniques
- Text and Document Classification Technologies
- Image Retrieval and Classification Techniques
- Face and Expression Recognition
- Data Quality and Management
- Building Energy and Comfort Optimization
- Music and Audio Processing
- Advanced Image and Video Retrieval Techniques
- Advanced Neural Network Applications
- Time Series Analysis and Forecasting
- Advanced MRI Techniques and Applications
- Sentiment Analysis and Opinion Mining
- Radiomics and Machine Learning in Medical Imaging
Neusoft (China)
2019-2025
National Research Council Canada
2013-2023
Tianjin Medical University
2023
Shenyang University of Technology
2011-2023
Institute of Computing Technology
2023
University of Saskatchewan
2020-2021
University of Ottawa
2004-2019
Beihang University
2019
University of Leeds
2019
Shanghai Ocean University
2012
Learning from imbalanced data sets, where the number of examples one (majority) class is much higher than others, presents an important challenge to machine learning community. Traditional algorithms may be biased towards majority class, thus producing poor predictive accuracy over minority class. In this paper, we describe a new approach that combines boosting, ensemble-based algorithm, with generation improve power classifiers against sets consisting two classes. DataBoost-IM method, hard...
MixUp (Zhang et al. 2017) is a recently proposed dataaugmentation scheme, which linearly interpolates random pair of training examples and correspondingly the one-hot representations their labels. Training deep neural networks with such additional data shown capable significantly improving predictive accuracy current art. The power MixUp, however, primarily established empirically its working effectiveness have not been explained in any depth. In this paper, we develop an understanding for...
Mixup, a recent proposed data augmentation method through linearly interpolating inputs and modeling targets of random samples, has demonstrated its capability significantly improving the predictive accuracy state-of-the-art networks for image classification. However, how this technique can be applied to what is effectiveness on natural language processing (NLP) tasks have not been investigated. In paper, we propose two strategies adaption Mixup sentence classification: one performs...
Molecular graph representation learning is a fundamental problem in modern drug and material discovery. graphs are typically modeled by their 2D topological structures, but it has been recently discovered that 3D geometric information plays more vital role predicting molecular functionalities. However, the lack of real-world scenarios significantly impeded representation. To cope with this challenge, we propose Graph Multi-View Pre-training (GraphMVP) framework where self-supervised (SSL)...
A fundamental problem in computational chemistry is to find a set of reactants synthesize target molecule, a.k.a. retrosynthesis prediction. Existing state-of-the-art methods rely on matching the molecule with large reaction templates, which are very computationally expensive and also suffer from coverage. In this paper, we propose novel template-free approach called G2Gs by transforming molecular graph into reactant graphs. first splits synthons identifying centers, then translates final...
Negation words, such as no and not, play a fundamental role in modifying sentiment of textual expressions. We will refer to negation word the negator text span within scope argument. Commonly used heuristics estimate negated expressions rely simply on argument (and not or itself). use treebank show that these existing are poor estimators sentiment. then modify be dependent negators this improves prediction. Next, we evaluate recently proposed composition model (Socher et al., 2013) relies...
Data augmentation with Mixup (Zhang et al. 2018) has shown to be an effective model regularizer for current art deep classification networks. It generates out-of-manifold samples through linearly interpolating inputs and their corresponding labels of random sample pairs. Despite its great successes, requires convex combination the as well modeling targets a pair, thus significantly limits space synthetic consequently regularization effect. To cope this limitation, we propose “nonlinear...
This paper studies unsupervised/self-supervised whole-graph representation learning, which is critical in many tasks such as molecule properties prediction drug and material discovery. Existing methods mainly focus on preserving the local similarity structure between different graph instances but fail to discover global semantic of entire data set. In this paper, we propose a unified framework called Local-instance Global-semantic Learning (GraphLoG) for self-supervised learning....
Named entity recognition (NER) systems trained on newswire perform very badly when tested Twitter. Signals that were reliable in copy-edited text disappear almost entirely Twitter’s informal chatter, requiring the construction of specialized models. Using wellunderstood techniques, we set out to improve Twitter NER performance given a small annotated training tweets. To leverage unlabeled tweets, build Brown clusters and word vectors, enabling generalizations across distributionally similar...
We propose a novel strategy to encode the syntax parse tree of sentence into learnable distributed representation. The proposed encoding scheme is provably information-lossless. In specific, an embedding vector constructed for each word in sentence, path corresponding word. one-to-one correspondence between these "syntax-embedding" vectors and words (hence their vectors) makes it easy integrate such representation with all word-level NLP models. empirically show benefits embeddings on...
Gibbs artifacts frequently occur as a result of truncation in the frequency domain (k-space). can degrade image quality and may be misinterpreted syrinx, thereby complicating diagnosis. This study aimed to develop evaluate robust deep learning (DL) model that eliminates multi-frequency artifacts. We retrospectively collected 290,940 magnetic resonance imaging (MRI) images from 4,936 scans, encompassing 5 anatomical regions 67 MRI sequences, DL for artifact removal. was trained using...
Recurrent neural networks, particularly long short-term memory (LSTM), have recently shown to be very effective in a wide range of sequence modeling problems, core which is learning distributed representation for subsequences as well the sequences they form.An assumption almost all previous models, however, posits that learned (e.g., sentence), fully compositional from atomic components representations words), while non-compositionality basic phenomenon human languages.In this paper, we...
Entity linking, which maps named entity mentions in a document into the proper entities given knowledge graph, has been shown to be able significantly benefit from modeling relatedness through Graph Convolutional Networks (GCN). Nevertheless, existing GCN linking models fail take account fact that structured graph for set of not only depends on contextual information but also adaptively changes different aggregation layers GCN, resulting insufficiency terms capturing structural among...
Increasingly, aiming to contain their rapidly growing energy expenditures, commercial buildings are equipped respond utility's demand and price signals. Such smart consumption, however, heavily relies on accurate short-term load forecasting, such as hourly predictions for the next n (n ≥ 2) hours. To attain sufficient accuracy these predictions, it is important exploit relationships among estimated outputs. This paper treats multi-steps ahead regression task a sequence labeling (regression)...
Boxing Chen, Hongyu Guo. Proceedings of the 53rd Annual Meeting Association for Computational Linguistics and 7th International Joint Conference on Natural Language Processing (Volume 2: Short Papers). 2015.
In this work, we explain the working mechanism of MixUp in terms adversarial training. We introduce a new class training schemes, which refer to as directional training, or DAT. nutshell, DAT scheme perturbs example direction another but keeps its original label target. prove that is equivalent special subclass DAT, it has same expected loss function and corresponds optimization problem asymptotically. This understanding not only serves effectiveness MixUp, also reveals more general family...
Fine-grained entity typing (FET), which annotates the entities in a sentence with set of finely specified type labels, often serves as first and critical step towards many natural language processing tasks. Despite great processes have been made, current FET methods difficulty to cope noisy labels naturally come data acquisition processes. Existing approaches either pre-process clean noise or simply focus on one sidestepping fact that those noises are related content dependent. In this...
Gastro-esophageal (GE) cancers are one of the major causes cancer-related death in world. There is a need for novel biomarkers management GE cancers, to yield predictive response available therapies. Our study aims identify leading genes that differentially regulated patients with these cancers. We explored expression data those whose protein products can be detected plasma using Cancer Genome Atlas work predicted several candidates as potential distinct stages including previously...
This study aimed to develop a pipeline for selecting the best feature engineering-based radiomic path predict epidermal growth factor receptor (EGFR) mutant lung adenocarcinoma in 18F-fluorodeoxyglucose (FDG) positron emission tomography/computed tomography (PET/CT).The enrolled 115 patients with EGFR mutation status from June 2016 and September 2017. We extracted radiomics features by delineating regions-of-interest around entire tumor 18F-FDG PET/CT images. The paths were built combining...