- Multimodal Machine Learning Applications
- Topic Modeling
- Online Learning and Analytics
- Technology-Enhanced Education Studies
- Natural Language Processing Techniques
- Network Traffic and Congestion Control
- Speech and dialogue systems
- Transportation Planning and Optimization
- Evacuation and Crowd Dynamics
- Student Assessment and Feedback
- Internet Traffic Analysis and Secure E-voting
- Domain Adaptation and Few-Shot Learning
- Education and Critical Thinking Development
- Software-Defined Networks and 5G
- Handwritten Text Recognition Techniques
- Text and Document Classification Technologies
- Traffic control and management
- Digital Rights Management and Security
- Educational Assessment and Pedagogy
Shanghai Jiao Tong University
2022-2024
Harbin Institute of Technology
2021-2024
In this paper, we propose a simple few-shot domain adaptation paradigm for reading comprehension. We first identify the lottery subnetwork structure within Transformer-based source model via gradual magnitude pruning. Then, only fine-tune subnetwork, small fraction of whole parameters, on annotated target data adaptation. To obtain more adaptable subnetworks, introduce self-attention attribution to weigh beyond simply pruning smallest which can be seen as combining structured and...
Developing and rehearsing crowd evacuation plans in gathering situations can improve efficiency reduce safety accidents. However, pedestrians create resource conflicts with other competing for routes during evacuation. Inspired by cellular automata game theory, this paper proposes a model that integrates theory to solve the among process. In construction, we construct basic using automaton, formulate rule pedestrians' conflict according prisoner's dilemma, integrate update strategy into...
There are substantial instructional videos on the Internet, which provide us tutorials for completing various tasks. Existing video datasets only focus specific steps at level, lacking experiential guidelines task can lead to beginners struggling learn new tasks due lack of relevant experience. Moreover, without trivial and unsystematic, making it difficult a clear tutorial. To address these problems, we present GUIDE (Guideline-Guided) dataset, contains 3.5K 560 in 8 domains related our...
There are substantial instructional videos on the Internet, which provide us tutorials for completing various tasks. Existing video datasets only focus specific steps at level, lacking experiential guidelines task can lead to beginners struggling learn new tasks due lack of relevant experience. Moreover, without trivial and unsystematic, making it difficult a clear tutorial. To address these problems, we present Guide (Guideline-Guided) dataset, contains 3.5K 560 in 8 domains related our...
Scene text detection is a challenging topic in computer vision, characterized by complex illumination, irregular shape, and arbitrary size. While recent advancements have been made scene detection, it remains difficult to simultaneously distinguish nearby accommodate irregularly shaped text. Therefore, this paper introduces HPNet, an enhanced detector, based on the segmentation method that predicts two-scale results. To improve shape robustness, Hybrid Attentional Feature Fusion (HAFF)...
The colossal parameters and computational overhead of Large Language Models (LLMs) challenge their real-world applications. Network pruning, which targets unstructured or structured sparsity by removing redundant parameters, has recently been explored for LLM acceleration. Existing pruning works focus on typically requires special hardware support a practical speed-up. In contrast, can reduce latency general devices. However, it remains to perform efficiently maintain performance, especially...
Research suggests "write-to-learn" tasks improve learning outcomes, yet constructed-response methods of formative assessment become unwieldy with large class sizes. This study evaluates natural language processing algorithms to assist this aim. Six short-answer completed by 1,935 students were scored several human raters, using a detailed rubric, and an algorithm. Results indicate substantial inter-rater agreement quadratic weighted kappa for rater pairs (each QWK > 0.74) group consensus...
Despite achieving remarkable performance on various vision-language tasks, Transformer-based Vision-Language Models (VLMs) suffer from redundancy in inputs and parameters, significantly hampering their efficiency real-world applications. Moreover, the degree of token representations model such as attention heads, varies for different inputs. In light challenges, we propose SmartTrim, an adaptive acceleration framework VLMs, which adjusts computational overhead per instance. Specifically,...
Research into the area of multiparty dialog has grown considerably over recent years. We present Molweni dataset, a machine reading comprehension (MRC) dataset with discourse structure built dialog. Molweni's source samples from Ubuntu Chat Corpus, including 10,000 dialogs comprising 88,303 utterances. annotate 30,066 questions on this corpus, both answerable and unanswerable questions. also uniquely contributes dependency annotations in modified Segmented Discourse Representation Theory...