- Topic Modeling
- Natural Language Processing Techniques
- Advanced Text Analysis Techniques
- Domain Adaptation and Few-Shot Learning
- Advanced Graph Neural Networks
- Genomic variations and chromosomal abnormalities
- Prenatal Screening and Diagnostics
- Multimodal Machine Learning Applications
- Text and Document Classification Technologies
- Speech and dialogue systems
- Data Quality and Management
- Speech Recognition and Synthesis
- Security and Verification in Computing
- Data Mining Algorithms and Applications
- Gene expression and cancer classification
- Advanced Malware Detection Techniques
- Web Data Mining and Analysis
- Semantic Web and Ontologies
- Recommender Systems and Techniques
- Blockchain Technology Applications and Security
- Cancer Genomics and Diagnostics
- Epigenetics and DNA Methylation
- Cloud Computing and Resource Management
- Service-Oriented Architecture and Web Services
- Computational and Text Analysis Methods
South China Normal University
2019-2025
University of Hong Kong
2016-2023
Hong Kong University of Science and Technology
2016-2023
Key Laboratory of Guangdong Province
2022
Guangdong University of Foreign Studies
2019
National Police Academy
2016
Sun Yat-sen University
2013
Knowledge representation learning aims at modeling knowledge graph by encoding entities and relations into a low dimensional space. Most of the traditional works for embedding need negative sampling to minimize margin-based ranking loss. However, those construct samples through random mode, which are often too trivial fit model efficiently. In this paper, we propose novel framework based on Generative Adversarial Networks (GAN). GAN-based framework, take advantage generator obtain...
It is difficult to train a personalized task-oriented dialogue system because the data collected from each individual often insufficient. Personalized systems trained on small dataset likely overfit and make it adapt different user needs. One way solve this problem consider collection of multiple users as source domain an target domain, perform transfer learning domain. By following idea, we propose PErsonalized Task-oriented diALogue (PETAL) system, reinforcement framework based POMDP,...
Heng Wang, Shuangyin Li, Rong Pan, Mingzhi Mao. Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and 9th International Joint (EMNLP-IJCNLP). 2019.
Because of its efficiency, word embedding has been widely used in many natural language processing and text modeling tasks. It aims to represent each by a vector so such that the geometry between these vectors can capture semantic correlations words. An ambiguous often have diverse meanings different contexts, quality which is called polysemy. The bulk studies aimed generate only one single for word, whereas few made small number embeddings present word. However, it hard determine exact...
The financial property of Ethereum makes smart contract attacks frequently bring about tremendous economic loss. Method for effective detection vulnerabilities in contracts imperative. Existing efforts security analysis heavily rely on rigid rules defined by experts, which are labor-intensive and non-scalable. There is still a lack effort that considers combining expert-defined patterns with deep learning. This paper proposes EtherGIS, vulnerability framework utilizes graph neural networks...
Knowledge representation learning aims at modeling knowledge graph by encoding entities and relations into a low dimensional space. Most of the traditional works for embedding need negative sampling to minimize margin-based ranking loss. However, those construct samples through random mode, which are often too trivial fit model efficiently. In this paper, we propose novel framework based on Generative Adversarial Networks (GAN). GAN-based framework, take advantage generator obtain...
The Unmanned Aerial Vehicles (UAVs) delivery service is being increasingly used in logistics. However, it challenging for a UAV to precisely identify the position parcel delivering if only aided by GPS, especially some complex environments with weak signals and high interference. For this issue, we present knowledge distillation empowered edge intelligence architecture, KeepEdge, achieve visual information-assisted positioning services. Specifically, integrate deep neural networks (DNN) into...
Entity resolution, the task of identifying and consolidating records that pertain to same real-world entity, plays a pivotal role in various sectors such as e-commerce, healthcare, law enforcement. The emergence Large Language Models (LLMs) like GPT-4 has introduced new dimension this task, leveraging their advanced linguistic capabilities. This paper explores potential LLMs entity resolution process, shedding light on both advantages computational complexities associated with large-scale...
Recent advances in large language models (LLMs) have showcased exceptional performance long-context tasks, while facing significant inference efficiency challenges with limited GPU memory. Existing solutions first proposed the sliding-window approach to accumulate a set of historical \textbf{key-value} (KV) pairs for reuse, then further improvements selectively retain its subsets at each step. However, due sparse attention distribution across long context, it is hard identify and recall...
Abstract Web service discovery is a fundamental task in service-oriented architectures which searches for suitable web services based on users’ goals and preferences. In this paper, we present novel approach that can support user queries with various-size-grained text elements. Compared existing approaches only semantics matchmaking single texture granularity (either word level or paragraph level), our enables the requester to search any type of query content high performance, including...
In a document, the topic distribution of sentence depends on both topics its neighbored sentences and own content, it is usually affected by with different weights. The include preceding subsequent sentences. Meanwhile, natural that document can be treated as sequence Most existing works for Bayesian modeling do not take these points into consideration. To fill this gap, we propose bi-Directional Recurrent Attentional Topic Model (bi-RATM) embedding. bi-RATM only takes advantage sequential...
In recent years, recurrent neural networks have been widely used for various text classification tasks. However, most of the architectures will not assign a class label to until they read last word, while human beings are able determine before reading whole text. this paper, we propose Length Adaptive Recurrent Model (LARM) which can automatically minimum length that is necessary perform classification. With three parts includingReader, Predictor andAgent, our model designed word by and...
The COVID-19 pandemic has had a significant impact on the world, highlighting importance of accurate prediction infection numbers. Given that transmission SARS-CoV-2 is influenced by temporal and spatial factors, numerous researchers have employed neural networks to address this issue. Accordingly, we propose whale optimization algorithm-bidirectional long short-term memory (WOA-BILSTM) model for predicting cumulative confirmed cases. In model, initially input regional epidemic data,...
In the past two decades, there has been a huge amount of document data with rich tag information during evolution Internet, which can be called semi-structured data. These contain both unstructured features (e.g., plain text) and metadata, such as tags in html files or author venue research articles. It's great interest to model kind Most previous works focused on modeling Some other methods have proposed specific tags. To build general for documents remains an important problem terms...
It is difficult to train a personalized task-oriented dialogue system because the data collected from each individual often insufficient. Personalized systems trained on small dataset can overfit and make it adapt different user needs. One way solve this problem consider collection of multiple users' as source domain an user's target domain, perform transfer learning domain. By following idea, we propose "PETAL"(PErsonalized Task-oriented diALogue), transfer-learning framework based POMDP...
In a document, the topic distribution of sentence depends on both topics preceding sentences and its own content, it is usually affected by with different weights. It natural that document can be treated as sequence sentences. Most existing works for Bayesian modeling do not take these points into consideration. To fill this gap, we propose Recurrent Attentional Topic Model (RATM) embedding. The RATM only takes advantage sequential orders among but also use attention mechanism to model...