- Natural Language Processing Techniques
- Topic Modeling
- Multimodal Machine Learning Applications
- Mathematics, Computing, and Information Processing
- Domain Adaptation and Few-Shot Learning
- Handwritten Text Recognition Techniques
- Advanced Graph Neural Networks
- Intelligent Tutoring Systems and Adaptive Learning
- Semantic Web and Ontologies
- Text Readability and Simplification
- Speech Recognition and Synthesis
- Advanced Neural Network Applications
- Software Engineering Research
- Image Retrieval and Classification Techniques
- Advanced Image and Video Retrieval Techniques
- Optical measurement and interference techniques
- Data Visualization and Analytics
- Video Analysis and Summarization
- Ferroelectric and Negative Capacitance Devices
- Anomaly Detection Techniques and Applications
- Space Satellite Systems and Control
- Digital and Cyber Forensics
- Advanced Image Fusion Techniques
- Machine Learning and Data Classification
- Power Systems and Renewable Energy
Tongji University
2025
Hong Kong University of Science and Technology
2022-2024
University of Hong Kong
2022-2024
Zhuhai Institute of Advanced Technology
2024
Beijing Institute of Technology
2024
Tiangong University
2023
Xinjiang University
2022
University of Electronic Science and Technology of China
2019-2020
Singapore Management University
2020
Xidian University
2019
The design of automatic solvers to arithmetic math word problems has attracted considerable attention in recent years and a large number datasets methods have been published. Among them, Math23K is the largest data corpus that very helpful evaluate generality robustness proposed solution. best performer seq2seq model based on LSTM generate expression. However, suffers from performance degradation space target expressions. In this paper, we propose template-based solution recursive neural...
While the recent tree-based neural models have demonstrated promising results in generating solution expression for math word problem (MWP), most of these do not capture relationships and order information among quantities well. This poor quantity representations incorrect expressions. In this paper, we propose Graph2Tree, a novel deep learning architecture that combines merits graph-based encoder decoder to generate better Included our Graph2Tree framework are two graphs, namely Quantity...
Several deep learning models have been proposed for solving math word problems (MWPs) automatically. Although these the ability to capture features without manual efforts, their approaches capturing are not specifically designed MWPs. To utilize merits of with simultaneous consideration MWPs’ specific features, we propose a group attention mechanism extract global quantity-related quantity-pair and question-related in MWPs respectively. The experimental results show that approach performs...
Math word problem (MWP) solving faces a dilemma in number representation learning. In order to avoid the issue and reduce search space of feasible solutions, existing works striving for MWP usually replace real numbers with symbolic placeholders focus on logic reasoning. However, different from common reasoning tasks like program synthesis knowledge graph reasoning, has extra requirements numerical other words, instead value itself, it is reusable property that matters more Therefore, we...
Renjie Pi, Jiahui Gao, Shizhe Diao, Rui Pan, Hanze Dong, Jipeng Zhang, Lewei Yao, Jianhua Han, Hang Xu, Lingpeng Kong, Tong Zhang. Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing. 2023.
Vision language pre-training aims to learn alignments between vision and from a large amount of data. Most existing methods only image-text alignments. Some others utilize pre-trained object detectors leverage at the level. In this paper, we propose multi-grained by unified framework that learns aligning localization simultaneously. Based on it, present X2-VLM, an all-in-one model with flexible modular architecture, in which further unify video-text one model. X2-VLM is able unlimited visual...
Generative foundation models are susceptible to implicit biases that can arise from extensive unsupervised training data. Such produce suboptimal samples, skewed outcomes, and unfairness, with potentially serious consequences. Consequently, aligning these human ethics preferences is an essential step toward ensuring their responsible effective deployment in real-world applications. Prior research has primarily employed Reinforcement Learning Human Feedback (RLHF) address this problem, where...
Math word problem (MWP) is challenging due to the limitation in training data where only one “standard” solution available. MWP models often simply fit this rather than truly understand or solve problem. The generalization of (to diverse scenarios) thus limited. To address problem, paper proposes a novel approach, TSN-MD, by leveraging teacher network integrate knowledge equivalent expressions and then regularize learning behavior student network. In addition, we introduce multiple-decoder...
Download This Paper Open PDF in Browser Add to My Library Share: Permalink Using these links will ensure access this page indefinitely Copy URL DOI
In change detection tasks, seasonal variations in spectral characteristics and surface cover can negatively impact performance when comparing image pairs from different seasons. Many existing methods do not specifically address the degradation caused by errors. To tackle this issue, Dual-Branch Seasonal Error Elimination Change Detection Framework using Target Image Feature Fusion Generator (DBSEE-CDF) is introduced. Specifically, approach utilizes (TIFFG), which incorporates spatial channel...
Video question answering (VideoQA) has emerged as a popular research topic in recent years. Enormous efforts have been devoted to developing more effective fusion strategies and better intra-modal feature preparation. To explore these issues further, we identify two key problems. (1) Current works take almost no account of introducing action interest video representation. Additionally, there exists insufficient labeling data on where the is many datasets. However, questions VideoQA are...
Foundation models or pre-trained have substantially improved the performance of various language, vision, and vision-language understanding tasks. However, existing foundation can only perform best in one type tasks, namely vision-language. It is still an open question whether it possible to construct a general model performing for all In this paper, we propose new method training model, X-FM (the X-Foundation Model). has language encoder, vision fusion as well method. The includes two...
Math word problem (MWP) solving is an important task in question answering which requires human-like reasoning ability. Analogical has long been used mathematical education, as it enables students to apply common relational structures of situations solve new problems. In this paper, we propose build a novel MWP solver by leveraging analogical MWPs, advance the solver's generalization ability across different kinds MWPs. The key idea, named analogy identification, associate pairs latent...
Multimodal Large Language Models (MLLMs) excel in generating responses based on visual inputs. However, they often suffer from a bias towards similar to their pretraining corpus, overshadowing the importance of information. We treat this as "preference" for statistics, which hinders model's grounding input. To mitigate issue, we propose Bootstrapped Preference Optimization (BPO), conducts preference learning with datasets containing negative bootstrapped model itself. Specifically, following...
Current math word problem (MWP) solvers are usually Seq2Seq models trained by the (one-problem; one-solution) pairs, each of which is made a description and solution showing reasoning flow to get correct answer. However, one MWP naturally has multiple equations. The training an solver with pairs excludes other solutions, thus limits generalizability solver. One feasible this limitation augment solutions given problem. it difficult collect diverse accurate through human efforts. In paper, we...
Large foundation models have demonstrated a great ability to achieve general human-level intelligence far beyond traditional approaches. As the technique keeps attracting attention from AI community, more and large become publically available. However, most of those exhibit major deficiency in specialized-task applications, where step finetuning is still required for obtaining satisfactory performance. number available specialized tasks growing, job becomes highly nontrivial. In this paper,...
LLMs acquire a wide range of abilities during pre-training, but aligning under Reinforcement Learning with Human Feedback (RLHF) can lead to forgetting, which is also known as the alignment tax. To empirically verify this hypothesis, we conducted experiments existing RLHF algorithms using OpenLLaMA-3B, revealed pronounced tax in NLP tasks. On other hand, despite various techniques mitigate they are often at odds performance, leading trade-off between reward maximization and forgetting...
The deployment of multimodal large language models (MLLMs) has brought forth a unique vulnerability: susceptibility to malicious attacks through visual inputs. We delve into the novel challenge defending MLLMs against such attacks. discovered that images act as "foreign language" is not considered during alignment, which can make prone producing harmful responses. Unfortunately, unlike discrete tokens in text-based LLMs, continuous nature image signals presents significant alignment...
With increasing consumption of primary energy and deterioration the global environment, clean sources with large reserves, such as natural gas, have gradually gained a higher proportion structure. Monitoring predicting data play crucial role in reducing waste improving supply efficiency. However, owing to factors high monitoring device costs, safety risks associated installation, low efficiency manual meter reading, gas at household level is challenging. Moreover, there lack methods for...
Image description datasets play a crucial role in the advancement of various applications such as image understanding, text-to-image generation, and text-image retrieval. Currently, primarily originate from two sources. One source is scraping image-text pairs web. Despite their abundance, these descriptions are often low quality noisy. Another through human labeling. Datasets COCO generally very short lack details. Although detailed can be annotated by humans, high annotation cost limits...