Jipeng Zhang

ORCID: 0000-0002-3269-6992
Publications
Citations
Views
---
Saved
---
About
Contact & Profiles
Research Areas
  • Natural Language Processing Techniques
  • Topic Modeling
  • Multimodal Machine Learning Applications
  • Mathematics, Computing, and Information Processing
  • Domain Adaptation and Few-Shot Learning
  • Handwritten Text Recognition Techniques
  • Advanced Graph Neural Networks
  • Intelligent Tutoring Systems and Adaptive Learning
  • Semantic Web and Ontologies
  • Text Readability and Simplification
  • Speech Recognition and Synthesis
  • Advanced Neural Network Applications
  • Software Engineering Research
  • Image Retrieval and Classification Techniques
  • Advanced Image and Video Retrieval Techniques
  • Optical measurement and interference techniques
  • Data Visualization and Analytics
  • Video Analysis and Summarization
  • Ferroelectric and Negative Capacitance Devices
  • Anomaly Detection Techniques and Applications
  • Space Satellite Systems and Control
  • Digital and Cyber Forensics
  • Advanced Image Fusion Techniques
  • Machine Learning and Data Classification
  • Power Systems and Renewable Energy

Tongji University
2025

Hong Kong University of Science and Technology
2022-2024

University of Hong Kong
2022-2024

Zhuhai Institute of Advanced Technology
2024

Beijing Institute of Technology
2024

Tiangong University
2023

Xinjiang University
2022

University of Electronic Science and Technology of China
2019-2020

Singapore Management University
2020

Xidian University
2019

The design of automatic solvers to arithmetic math word problems has attracted considerable attention in recent years and a large number datasets methods have been published. Among them, Math23K is the largest data corpus that very helpful evaluate generality robustness proposed solution. best performer seq2seq model based on LSTM generate expression. However, suffers from performance degradation space target expressions. In this paper, we propose template-based solution recursive neural...

10.1609/aaai.v33i01.33017144 article EN Proceedings of the AAAI Conference on Artificial Intelligence 2019-07-17

While the recent tree-based neural models have demonstrated promising results in generating solution expression for math word problem (MWP), most of these do not capture relationships and order information among quantities well. This poor quantity representations incorrect expressions. In this paper, we propose Graph2Tree, a novel deep learning architecture that combines merits graph-based encoder decoder to generate better Included our Graph2Tree framework are two graphs, namely Quantity...

10.18653/v1/2020.acl-main.362 article EN cc-by 2020-01-01

Several deep learning models have been proposed for solving math word problems (MWPs) automatically. Although these the ability to capture features without manual efforts, their approaches capturing are not specifically designed MWPs. To utilize merits of with simultaneous consideration MWPs’ specific features, we propose a group attention mechanism extract global quantity-related quantity-pair and question-related in MWPs respectively. The experimental results show that approach performs...

10.18653/v1/p19-1619 article EN cc-by 2019-01-01

Math word problem (MWP) solving faces a dilemma in number representation learning. In order to avoid the issue and reduce search space of feasible solutions, existing works striving for MWP usually replace real numbers with symbolic placeholders focus on logic reasoning. However, different from common reasoning tasks like program synthesis knowledge graph reasoning, has extra requirements numerical other words, instead value itself, it is reusable property that matters more Therefore, we...

10.18653/v1/2022.findings-naacl.74 article EN cc-by Findings of the Association for Computational Linguistics: NAACL 2022 2022-01-01

Renjie Pi, Jiahui Gao, Shizhe Diao, Rui Pan, Hanze Dong, Jipeng Zhang, Lewei Yao, Jianhua Han, Hang Xu, Lingpeng Kong, Tong Zhang. Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing. 2023.

10.18653/v1/2023.emnlp-main.876 article EN cc-by Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing 2023-01-01

Vision language pre-training aims to learn alignments between vision and from a large amount of data. Most existing methods only image-text alignments. Some others utilize pre-trained object detectors leverage at the level. In this paper, we propose multi-grained by unified framework that learns aligning localization simultaneously. Based on it, present X2-VLM, an all-in-one model with flexible modular architecture, in which further unify video-text one model. X2-VLM is able unlimited visual...

10.1109/tpami.2023.3339661 article EN IEEE Transactions on Pattern Analysis and Machine Intelligence 2023-12-13

Generative foundation models are susceptible to implicit biases that can arise from extensive unsupervised training data. Such produce suboptimal samples, skewed outcomes, and unfairness, with potentially serious consequences. Consequently, aligning these human ethics preferences is an essential step toward ensuring their responsible effective deployment in real-world applications. Prior research has primarily employed Reinforcement Learning Human Feedback (RLHF) address this problem, where...

10.48550/arxiv.2304.06767 preprint EN other-oa arXiv (Cornell University) 2023-01-01

10.18653/v1/2024.emnlp-main.895 article EN Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing 2024-01-01

10.1109/cvpr52733.2024.02561 article EN 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 2024-06-16

Math word problem (MWP) is challenging due to the limitation in training data where only one “standard” solution available. MWP models often simply fit this rather than truly understand or solve problem. The generalization of (to diverse scenarios) thus limited. To address problem, paper proposes a novel approach, TSN-MD, by leveraging teacher network integrate knowledge equivalent expressions and then regularize learning behavior student network. In addition, we introduce multiple-decoder...

10.24963/ijcai.2020/555 article EN 2020-07-01

Download This Paper Open PDF in Browser Add to My Library Share: Permalink Using these links will ensure access this page indefinitely Copy URL DOI

10.2139/ssrn.5089661 preprint EN 2025-01-01

In change detection tasks, seasonal variations in spectral characteristics and surface cover can negatively impact performance when comparing image pairs from different seasons. Many existing methods do not specifically address the degradation caused by errors. To tackle this issue, Dual-Branch Seasonal Error Elimination Change Detection Framework using Target Image Feature Fusion Generator (DBSEE-CDF) is introduced. Specifically, approach utilizes (TIFFG), which incorporates spatial channel...

10.3390/rs17030523 article EN cc-by Remote Sensing 2025-02-03

Video question answering (VideoQA) has emerged as a popular research topic in recent years. Enormous efforts have been devoted to developing more effective fusion strategies and better intra-modal feature preparation. To explore these issues further, we identify two key problems. (1) Current works take almost no account of introducing action interest video representation. Additionally, there exists insufficient labeling data on where the is many datasets. However, questions VideoQA are...

10.1109/tcsvt.2020.3048440 article EN IEEE Transactions on Circuits and Systems for Video Technology 2020-12-31

Foundation models or pre-trained have substantially improved the performance of various language, vision, and vision-language understanding tasks. However, existing foundation can only perform best in one type tasks, namely vision-language. It is still an open question whether it possible to construct a general model performing for all In this paper, we propose new method training model, X-FM (the X-Foundation Model). has language encoder, vision fusion as well method. The includes two...

10.18653/v1/2023.findings-emnlp.40 article EN cc-by 2023-01-01

Math word problem (MWP) solving is an important task in question answering which requires human-like reasoning ability. Analogical has long been used mathematical education, as it enables students to apply common relational structures of situations solve new problems. In this paper, we propose build a novel MWP solver by leveraging analogical MWPs, advance the solver's generalization ability across different kinds MWPs. The key idea, named analogy identification, associate pairs latent...

10.18653/v1/2022.emnlp-main.643 article EN cc-by Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing 2022-01-01

Multimodal Large Language Models (MLLMs) excel in generating responses based on visual inputs. However, they often suffer from a bias towards similar to their pretraining corpus, overshadowing the importance of information. We treat this as "preference" for statistics, which hinders model's grounding input. To mitigate issue, we propose Bootstrapped Preference Optimization (BPO), conducts preference learning with datasets containing negative bootstrapped model itself. Specifically, following...

10.48550/arxiv.2403.08730 preprint EN arXiv (Cornell University) 2024-03-13

Current math word problem (MWP) solvers are usually Seq2Seq models trained by the (one-problem; one-solution) pairs, each of which is made a description and solution showing reasoning flow to get correct answer. However, one MWP naturally has multiple equations. The training an solver with pairs excludes other solutions, thus limits generalizability solver. One feasible this limitation augment solutions given problem. it difficult collect diverse accurate through human efforts. In paper, we...

10.1609/aaai.v37i11.26548 article EN Proceedings of the AAAI Conference on Artificial Intelligence 2023-06-26

Large foundation models have demonstrated a great ability to achieve general human-level intelligence far beyond traditional approaches. As the technique keeps attracting attention from AI community, more and large become publically available. However, most of those exhibit major deficiency in specialized-task applications, where step finetuning is still required for obtaining satisfactory performance. number available specialized tasks growing, job becomes highly nontrivial. In this paper,...

10.48550/arxiv.2306.12420 preprint EN other-oa arXiv (Cornell University) 2023-01-01

LLMs acquire a wide range of abilities during pre-training, but aligning under Reinforcement Learning with Human Feedback (RLHF) can lead to forgetting, which is also known as the alignment tax. To empirically verify this hypothesis, we conducted experiments existing RLHF algorithms using OpenLLaMA-3B, revealed pronounced tax in NLP tasks. On other hand, despite various techniques mitigate they are often at odds performance, leading trade-off between reward maximization and forgetting...

10.48550/arxiv.2309.06256 preprint EN other-oa arXiv (Cornell University) 2023-01-01

The deployment of multimodal large language models (MLLMs) has brought forth a unique vulnerability: susceptibility to malicious attacks through visual inputs. We delve into the novel challenge defending MLLMs against such attacks. discovered that images act as "foreign language" is not considered during alignment, which can make prone producing harmful responses. Unfortunately, unlike discrete tokens in text-based LLMs, continuous nature image signals presents significant alignment...

10.48550/arxiv.2401.02906 preprint EN other-oa arXiv (Cornell University) 2024-01-01

With increasing consumption of primary energy and deterioration the global environment, clean sources with large reserves, such as natural gas, have gradually gained a higher proportion structure. Monitoring predicting data play crucial role in reducing waste improving supply efficiency. However, owing to factors high monitoring device costs, safety risks associated installation, low efficiency manual meter reading, gas at household level is challenging. Moreover, there lack methods for...

10.3390/buildings14030627 article EN cc-by Buildings 2024-02-27

Image description datasets play a crucial role in the advancement of various applications such as image understanding, text-to-image generation, and text-image retrieval. Currently, primarily originate from two sources. One source is scraping image-text pairs web. Despite their abundance, these descriptions are often low quality noisy. Another through human labeling. Datasets COCO generally very short lack details. Although detailed can be annotated by humans, high annotation cost limits...

10.48550/arxiv.2406.07502 preprint EN arXiv (Cornell University) 2024-06-11
Coming Soon ...