- Topic Modeling
- Multimodal Machine Learning Applications
- Advanced Image and Video Retrieval Techniques
- Natural Language Processing Techniques
- Target Tracking and Data Fusion in Sensor Networks
- Network Traffic and Congestion Control
- Advanced Text Analysis Techniques
- Video Analysis and Summarization
- Advanced Neural Network Applications
- Inertial Sensor and Navigation
- Domain Adaptation and Few-Shot Learning
- Advanced Graph Neural Networks
- Fault Detection and Control Systems
- Generative Adversarial Networks and Image Synthesis
- Network Security and Intrusion Detection
- Image Processing and 3D Reconstruction
- Wireless Networks and Protocols
- Wireless Signal Modulation Classification
- Digital Media Forensic Detection
- Hate Speech and Cyberbullying Detection
- Text Readability and Simplification
- Time Series Analysis and Forecasting
- Bayesian Modeling and Causal Inference
- Mycobacterium research and diagnosis
- Innovative Educational Techniques
University of Macau
2024-2025
China Waterborne Transport Research Institute
2022-2024
Statistics New Zealand
2022-2024
City University of Macau
2024
Australian Institute of Business
2022-2023
University of Technology Sydney
2022-2023
University of Chinese Academy of Sciences
2023
University of Electronic Science and Technology of China
2022-2023
Aerospace Information Research Institute
2023
Chinese Academy of Sciences
2023
Yucheng Zhou, Xiubo Geng, Tao Shen, Wenqiang Zhang, Daxin Jiang. Proceedings of the 2021 Conference North American Chapter Association for Computational Linguistics: Human Language Technologies. 2021.
Generating new events given context with correlated ones plays a crucial role in many event-centric reasoning tasks. Existing works either limit their scope to specific scenarios or overlook event-level correlations. In this paper, we propose pre-train general Correlation-aware context-to-Event Transformer (ClarET) for reasoning. To achieve this, three novel objectives, i.e., whole event recovering, contrastive event-correlation encoding and prompt-based locating, which highlight...
Ship detection from synthetic aperture radar (SAR) images has become a major research field in recent years. It plays role monitoring the ocean, marine rescue activities, and safety warnings. However, there are still some factors that restrict further improvements detecting performance, e.g., multi-scale ship transformation unfocused caused by motion. In order to resolve these issues, this paper, doppler feature matrix fused with multi-layer pyramid network (D-MFPN) is proposed for SAR...
With the integration of Multimodal large language models (MLLMs) into robotic systems and various AI applications, embedding emotional intelligence (EI) capabilities these is essential for enabling robots to effectively address human needs interact seamlessly in real-world scenarios. Existing static, text-based, or text-image benchmarks overlook multimodal complexities interactions fail capture dynamic, nature expressions, making them inadequate evaluating MLLMs' EI. Based on established...
In this paper, we introduce DC (Decouple)-ControlNet, a highly flexible and precisely controllable framework for multi-condition image generation. The core idea behind DC-ControlNet is to decouple control conditions, transforming global into hierarchical system that integrates distinct elements, contents, layouts. This enables users mix these individual conditions with greater flexibility, leading more efficient accurate generation control. Previous ControlNet-based models rely solely on...
Commit messages are natural language descriptions of code changes, which important for software evolution such as understanding and maintenance. However, previous methods trained on the entire dataset without considering fact that a portion commit adhere to good practice (i.e., good-practice commits), while rest do not. On basis our empirical study, we discover training commits significantly contributes message generation. Motivated by this finding, propose novel knowledge-aware denoising...
A neural ranker plays an indispensable role in the de facto 'retrieval & rerank' pipeline, but its training still lags behind due to weak negative mining during contrastive learning. Compared retrievers boosted by self-adversarial (i.e., in-distribution) mining, ranker's heavy structure suffers from query-document combinatorial explosions, so it can only resort sampled fast yet out-of-distribution retriever. Thereby, moderate negatives compose ineffective learning samples, becoming main...
Script reasoning infers subsequent events from a given event chain, which involves the ability to understand relations between events.A human-labeled script dataset is usually of small size with limited relations, highlights necessity leverage external eventuality knowledge graphs (KG) consisting numerous triple facts describe inferential relation events.Existing methods adopt retrieval and integration paradigm focus merely on graph triples that have overlap script, but ignore much more...
Long document retrieval aims to fetch query-relevant documents from a large-scale collection, where knowledge distillation has become de facto improve retriever by mimicking heterogeneous yet powerful cross-encoder. However, in contrast passages or sentences, on long suffers the \textit{scope hypothesis} that may cover multiple topics. This maximizes their structure heterogeneity and poses granular-mismatch issue, leading an inferior efficacy. In this work, we propose new learning framework,...
Labelling image-sentence is expensive and some unsupervised image captioning methods show promising results on caption generation. However, the generated captions are not very relevant to images due excessive dependence corpus. In order overcome that drawback, we focus correspondence between sentence construct an with better mapping relation. this paper, present a novel triple sequence generative adversarial net including generator, discriminator, generator. The generator used generate...
Event correlation reasoning infers whether a natural language paragraph containing multiple events conforms to human common sense. For example, "Andrew was very drowsy, so he took long nap, and now is alert" sound reasonable. In contrast, stayed up time, does not comply with Such capability essential for many downstream tasks, such as script reasoning, abductive narrative incoherence, story cloze test, etc. However, conducting event challenging due lack of large amounts diverse event-based...
Existing multi-style image captioning methods show promising results in generating a caption with accurate visual content and desired linguistic style. However, existing overlook the relationship between style content. To overcome this drawback, we propose style-aware contrastive learning for captioning. First, present encoder to mine potential relevant Moreover, triplet contrast objective distinguish whether image, matched. provide positive negative samples learning, three retrieval...
Text-guided image inpainting (TGII) aims to restore missing regions based on a given text in damaged image. Existing methods are strong vision encoder and cross-modal fusion model integrate features. However, these allocate most of the computation visual encoding, while light modeling modality interactions. Moreover, they take for depth features, which ignores fine-grained alignment between Recently, vision-language pre-trained models (VLPM), encapsulating rich knowledge, have advanced...
Image-guided story ending generation (IgSEG) is to generate a based on given plots and image. Existing methods focus cross-modal feature fusion but overlook reasoning mining implicit information from To tackle this drawback, we propose multimodal event transformer, an event-based framework for IgSEG. Specifically, construct visual semantic graphs image, leverage reason mine in single modality. Next, connect utilize integrate different-modality features. In addition, injector adaptive pass...
China’s five-level education and teaching research system (ETRS) has been instrumental in advancing Chinese basic education. It includes the central-, provincial-, municipal-, county-level institutions school-based offices, which jointly contribute to enhancement of quality teacher professional development. There is close collaboration as well a clear division responsibility among these institutions. This article expounds on ETRS’s functions characteristics, shedding light its significance...
In software evolution, resolving the emergent issues within GitHub repositories is a complex challenge that involves not only incorporation of new code but also maintenance existing functionalities. Large Language Models (LLMs) have shown promise in generation and understanding face difficulties change, particularly at repository level. To overcome these challenges, we empirically study reason why LLMs mostly fail to resolve analyze some impact factors. Motivated by empirical findings,...
In Large Visual Language Models (LVLMs), the efficacy of In-Context Learning (ICL) remains limited by challenges in cross-modal interactions and representation disparities. To overcome these challenges, we introduce a novel (VICL) method comprising Demonstration Retrieval, Intent-Oriented Image Summarization, Composition. Our approach retrieves images via ''Retrieval & Rerank'' paradigm, summarises with task intent task-specific visual parsing, composes language-based demonstrations that...
To address the shortcomings of existing methods such as low recognition accuracy and poor anti-interference performance under signal-to-noise ratios, this paper proposes RFSE-ResNeXt (Residual-fusion squeeze–excitation aggregated residual for networks, RFSE-ResNeXt) network. In paper, we improve structure network based on ResNeXt then introduce compressed excitation to generalization ability The improvement leads a good in overall network; meanwhile, improves confusion phenomenon when faces...
Event correlation reasoning infers whether a natural language paragraph containing multiple events conforms to human common sense. For example, "Andrew was very drowsy, so he took long nap, and now is alert" sound reasonable. In contrast, stayed up time, does not comply with Such capability essential for many downstream tasks, such as script reasoning, abductive narrative incoherence, story cloze test, etc. However, conducting event challenging due lack of large amounts diverse event-based...
Sketch storytelling aims to generate a story for given sketch. Although image captioning based on deep learning has great progress, describing the sketch in style is still challenge. The reason that there currently no paired sketch-story data which expensive acquire. Therefore, it necessary train model without using any data. To address these issues, we replace natural caption dataset with corresponding objects pseudo sketch, can obtain sketch-caption and sketch-image Due sketches are not...
One of the fundamental issues in asynchronous transfer mode (ATM) networks is congestion problem information flow. Due to complexity and variability ATM, it difficult accurately describe characteristics source traffic. This paper presents a traffic controller solving by using Q-learning conjunction with simulated annealing. In stead relying on mathematical model for traffic, designed learn an optimal policy directly interacting unknown environment. The annealing powerful way solve hard...