- Multimodal Machine Learning Applications
- Natural Language Processing Techniques
- Topic Modeling
- Sentiment Analysis and Opinion Mining
- Intelligent Tutoring Systems and Adaptive Learning
- Video Analysis and Summarization
- Explainable Artificial Intelligence (XAI)
- Software Testing and Debugging Techniques
- Education and Technology Integration
- Advanced Text Analysis Techniques
- Forensic Anthropology and Bioarchaeology Studies
- Forensic and Genetic Research
- Human Pose and Action Recognition
- Text and Document Classification Technologies
- Privacy-Preserving Technologies in Data
- Face recognition and analysis
- Advanced biosensing and bioanalysis techniques
- COVID-19 diagnosis using AI
- Molecular Sensors and Ion Detection
- Scientific Computing and Data Management
- Sulfur Compounds in Biology
- Advanced Fiber Optic Sensors
- Optical Network Technologies
- Advanced Photonic Communication Systems
- Software Engineering Research
Shandong University of Science and Technology
2024-2025
Shandong University of Technology
2025
Shandong University
2021-2024
Despite commendable achievements made by existing work, prevailing multimodal sarcasm detection studies rely more on textual content over visual information. It unavoidably induces spurious correlations between words and labels, thereby significantly hindering the models' generalization capability. To address this problem, we define task of out-of-distribution (OOD) detection, which aims to evaluate generalizability when word distribution is different in training testing settings. Moreover,...
Multimodal Sarcasm Explanation (MuSE) is a new yet challenging task, which aims to generate natural language sentence for multimodal social post (an image as well its caption) explain why it contains sarcasm. Although the existing pioneer study has achieved great success with BART backbone, overlooks gap between visual feature space and decoder semantic space, object-level metadata of image, potential external knowledge. To solve these limitations, in this work, we propose novel mulTi-source...
The G‐quadruplex (G4) is an important diagnostic and therapeutic target in cancers, but the development of theranostic probes for subcellular G4s remains challenging. In this work, we report three G4‐targeted by conjugating a pyridostatin‐derived G4 ligand to G4‐specific iridium(III) complexes with desirable photophysical properties. These showed specifically enhanced luminescence mitochondrial triple negative breast cancer (TNBC) cells. Of note, complex 3 exhibited NIR emission ability...
Query-oriented micro-video summarization task aims to generate a concise sentence with two properties: (a) summarizing the main semantic of and (b) being expressed in form search queries facilitate retrieval. Despite its enormous application value retrieval area, this direction has barely been explored. Previous studies mostly focus on content for traditional long videos. Directly applying these is prone gain unsatisfactory results because unique features micro-videos queries: diverse...
When talking to the dialog robots, users have activate robot first from standby mode with special wake words, such as "Hey Siri", which is apparently not user-friendly. The latest generation of robots been equipped advanced sensors, like camera, enabling multimodal activation. In this work, we work towards awaking without words. To accomplish task, present a Multimodal Activation Scheme (MAS), consisting two key components: audio-visual consistency detection and semantic intention inference....
Multimodal Sarcasm Explanation (MuSE) is a new yet challenging task, which aims to generate natural language sentence for multimodal social post (an image as well its caption) explain why it contains sarcasm. Although the existing pioneer study has achieved great success with BART backbone, overlooks gap between visual feature space and decoder semantic space, object-level metadata of image, potential external knowledge. To solve these limitations, in this work, we propose novel mulTi-source...
The prevalence of mental disorders has become a significant issue, leading to the increased focus on Emotional Support Conversation as an effective supplement for health support. Existing methods have achieved compelling results, however, they still face three challenges: 1) variability emotions, 2) practicality response, and 3) intricate strategy modeling. To address these challenges, we propose novel knowledge-enhanced Memory mODEl emotional suppoRt coNversation (MODERN). Specifically,...
We introduce FAITHSCORE (Faithfulness to Atomic Image Facts Score), a reference-free and fine-grained evaluation metric that measures the faithfulness of generated free-form answers from large vision-language models (LVLMs). The first identifies sub-sentences containing descriptive statements need be verified, then extracts comprehensive list atomic facts these sub-sentences, finally conducts consistency verification between input image. Meta-evaluation demonstrates our highly correlates...
Despite commendable achievements made by existing work, prevailing multimodal sarcasm detection studies rely more on textual content over visual information. It unavoidably induces spurious correlations between words and labels, thereby significantly hindering the models' generalization capability. To address this problem, we define task of out-of-distribution (OOD) detection, which aims to evaluate generalizability when word distribution is different in training testing settings. Moreover,...
In line with the latest research, task of identifying helpful reviews from a vast pool user-generated textual and visual data has become prominent area study. Effective modal representations are expected to possess two key attributes: consistency differentiation. Current methods designed for Multimodal Review Helpfulness Prediction (MRHP) face limitations in capturing distinctive information due their reliance on uniform multimodal annotation. The process adding varied annotations is not...
Open-source multimodal large language models (MLLMs) excel in various tasks involving textual and visual inputs but still struggle with complex mathematical reasoning, lagging behind proprietary like GPT-4V(ision) Gemini-Pro. Although fine-tuning intermediate steps (i.e., rationales) elicits some reasoning skills, the resulting fall short comprehension due to inadequate visual-centric supervision, which leads inaccurate interpretation of math figures. To address this issue, we propose a...
Abstract The evenness of the trajectory optical fiber grinding track has a substantial effect on quality transmission signals. To boost this quality, we present blueprint for track, together with recently introduced equipment shaping end-face fiber. This model establishes correlation between and rotational speed motor, enabling control connector by manipulating motor’s rotation speed. Three-speed methods, namely Proportional-Integral-Derivative (PID) controller, fuzzy algorithm,...
Supervised fine-tuning enhances the problem-solving abilities of language models across various mathematical reasoning tasks. To maximize such benefits, existing research focuses on broadening training set with data augmentation techniques, which is effective for standard single-round question-answering settings. Our work introduces a novel technique aimed at cultivating deeper understanding problems hand, enhancing performance not only in settings but also more complex scenarios that...
Multimodal Large Language Models (MLLMs) have demonstrated impressive abilities across various tasks, including visual question answering and chart comprehension, yet existing benchmarks for chart-related tasks fall short in capturing the complexity of real-world multi-chart scenarios. Current primarily focus on single-chart neglecting multi-hop reasoning required to extract integrate information from multiple charts, which is essential practical applications. To fill this gap, we introduce...
Large Language Models (LLMs) excel in code generation yet struggle with modern AI software engineering tasks. Unlike traditional function-level or file-level coding tasks, requires not only basic proficiency but also advanced skills managing and interacting repositories. However, existing methods often overlook the need for repository-level understanding, which is crucial accurately grasping broader context developing effective solutions. On this basis, we present RepoGraph, a plug-in module...
Generative models such as Large Language Models (LLM) and Multimodal (MLLMs) trained on massive web corpora can memorize disclose individuals' confidential private data, raising legal ethical concerns. While many previous works have addressed this issue in LLM via machine unlearning, it remains largely unexplored for MLLMs. To tackle challenge, we introduce Model Unlearning Benchmark (MLLMU-Bench), a novel benchmark aimed at advancing the understanding of multimodal unlearning. MLLMU-Bench...
Instruction tuning has remarkably advanced large language models (LLMs) in understanding and responding to diverse human instructions. Despite the success high-resource languages, its application lower-resource ones faces challenges due imbalanced foundational abilities of LLMs across different stemming from uneven distribution their pre-training data. To tackle this issue, we propose pivot guided generation (PLUG), an approach that utilizes a language, primarily English, as enhance...