- Natural Language Processing Techniques
- Topic Modeling
- Semantic Web and Ontologies
- Advanced Text Analysis Techniques
- Service-Oriented Architecture and Web Services
- Multimodal Machine Learning Applications
- Biomedical Text Mining and Ontologies
- Web Data Mining and Analysis
- Advanced Image and Video Retrieval Techniques
- Spam and Phishing Detection
- Advanced Graph Neural Networks
- Underwater Acoustics Research
- Advanced Computational Techniques and Applications
- Sentiment Analysis and Opinion Mining
- Domain Adaptation and Few-Shot Learning
- Human Pose and Action Recognition
- Data Mining Algorithms and Applications
- Complex Network Analysis Techniques
- Data Quality and Management
- Adversarial Robustness in Machine Learning
- Logic, Reasoning, and Knowledge
- Underwater Vehicles and Communication Systems
- Image Retrieval and Classification Techniques
- Multi-Agent Systems and Negotiation
- Advanced Database Systems and Queries
The University of Western Australia
2016-2025
University of Technology Sydney
2019-2025
China Foreign Affairs University
2025
Shanghai University
2009-2025
Harbin Institute of Technology
2016-2025
Tongji University
2022-2024
City University of Macau
2023-2024
Northwest Institute of Eco-Environment and Resources
2024
Chinese Academy of Sciences
2017-2024
Imperial College London
2024
We propose a deep convolutional neural network architecture codenamed Inception that achieves the new state of art for classification and detection in ImageNet Large-Scale Visual Recognition Challenge 2014 (ILSVRC14). The main hallmark this is improved utilization computing resources inside network. By carefully crafted design, we increased depth width while keeping computational budget constant. To optimize quality, architectural decisions were based on Hebbian principle intuition...
We present a technique for adding global context to deep convolutional networks semantic segmentation. The approach is simple, using the average feature layer augment features at each location. In addition, we study several idiosyncrasies of training, significantly increasing performance baseline (e.g. from FCN). When add our proposed feature, and learning normalization parameters, accuracy increases consistently even over improved versions baselines. Our approach, ParseNet, achieves...
Automatic generation of video captions is a fundamental challenge in computer vision. Recent techniques typically employ combination Convolutional Neural Networks (CNNs) and Recursive (RNNs) for captioning. These methods mainly focus on tailoring sequence learning through RNNs better caption generation, whereas off-the-shelf visual features are borrowed from CNNs. We argue that careful designing this task equally important, present feature encoding technique to generate semantically rich...
Wei Liu, Xiyan Fu, Yue Zhang, Wenming Xiao. Proceedings of the 59th Annual Meeting Association for Computational Linguistics and 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers). 2021.
We propose a new decision tree algorithm, Class Confidence Proportion Decision Tree (CCPDT), which is robust and insensitive to size of classes generates rules are statistically significant. In order make trees robust, we begin by expressing Information Gain, the metric used in C4.5, terms confidence rule. This allows us immediately explain why like confidence, results biased towards majority class. To overcome this bias, introduce measure, (CCP), forms basis CCPDT. generate significant...
The idea of a decentralised, self-organising service-oriented architecture seems to be more and plausible than the traditional registry-based ones in view success web reluctance taking up service technologies. Automatically clustering Web Service Description Language (WSDL) files on into functionally similar homogeneous groups can seen as bootstrapping step for creating search engine and, at same time, reducing space discovery. This paper proposes techniques automatically gather, discover...
Wei Liu, Tongge Xu, Qinghua Jiayu Song, Yueran Zu. Proceedings of the 2019 Conference North American Chapter Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers). 2019.
Learning user’s preference from check-in data is important for POI recommendation. Yet, a user usually has visited some POIs while most of are unvisited (i.e., negative samples). To leverage these “no-behavior” POIs, typical approach pairwise ranking, which constructs ranking pairs the and POIs. Although this generally effective, samples in obtained randomly, may fail to “critical” model training. On other hand, previous studies also utilized geographical feature improve recommendation...
Controllable Image Captioning (CIC) — generating image descriptions following designated control signals has received unprecedented attention over the last few years. To emulate human ability in controlling caption generation, current CIC studies focus exclusively on concerning objective properties, such as contents of interest or descriptive patterns. However, we argue that almost all existing have overlooked two indispensable characteristics an ideal signal: 1) Event-compatible: visual...
Several approaches for embedding a sentence into vector space have been developed. However, it is unclear to what extent the sentence's position in reflects its semantic meaning, rather than other factors such as syntactic structure. Depending on model used embeddings this will vary -- different models are suited down-stream applications. For applications machine translation and automated summarization, highly desirable meaning encoded embedding. We consider be quality of localization how...
Multimodal sentiment analysis combines information available from visual, textual, and acoustic representations for prediction. The recent multimodal fusion schemes combine multiple modalities as a tensor obtain either; the common by utilizing neural networks, or unique modeling low-rank representation of tensor. However, both these are essential they render inter-modal intra-modal relationships data. In this research, we first propose novel deep architecture to extract multi-mode...
Dense captioning methods generally detect events in videos first and then generate captions for the individual events. Events are localized solely based on visual cues while ignoring associated linguistic information context. Whereas end-to-end learning may implicitly take guidance from language, these still fall short of power explicit modeling. In this paper, we propose a <italic xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">Visual-Semantic...