- Multimodal Machine Learning Applications
- Video Analysis and Summarization
- Advanced Image and Video Retrieval Techniques
- Topic Modeling
- Recommender Systems and Techniques
- Online Learning and Analytics
- Domain Adaptation and Few-Shot Learning
- Human Pose and Action Recognition
- Natural Language Processing Techniques
- Image Retrieval and Classification Techniques
- Advanced Graph Neural Networks
- Advanced Vision and Imaging
- Intelligent Tutoring Systems and Adaptive Learning
- Video Surveillance and Tracking Methods
- Expert finding and Q&A systems
- Educational Technology and Assessment
- Advanced Malware Detection Techniques
- Caching and Content Delivery
- Online and Blended Learning
- Advanced Text Analysis Techniques
- Artificial Intelligence in Games
- Artificial Intelligence in Law
- Gait Recognition and Analysis
- Face recognition and analysis
- Legal Education and Practice Innovations
Hunan University
2018-2025
Xiamen University
2011-2018
Xiamen University of Technology
2012-2017
Huazhong University of Science and Technology
2009
Due to the prevalence of group activities in people's daily life, recommending content a users becomes an important task many information systems. A fundamental problem recommendation is how aggregate preferences members infer decision group. Toward this end, we contribute novel solution, namely AGREE (short for ''Attentive Group REcommEndation''), address preference aggregation by learning strategy from data, which based on recent developments attention network and neural collaborative...
Existing recommender algorithms mainly focused on recommending individual items by utilizing user-item interactions. However, little attention has been paid to recommend user generated lists (e.g., playlists and booklists). On one hand, contain rich signal about item co-occurrence, as within a list are usually gathered based specific theme. the other user's preference over also indicate her list. We believe that 1) if relevance can be properly leveraged, an enhanced recommendation for...
Image-text matching is a vital yet challenging task in the field of multimedia analysis. Over past decades, great efforts have been made to bridge semantic gap between visual and textual modalities. Despite significance value, most prior work still confronted with multi-view description challenge, i.e., how align an image multiple descriptions diversity. Toward this end, we present novel context-aware summarization network summarize context-enhanced region information from views. To be more...
With the proliferation of social networks, group activities have become an essential ingredient our daily life. A growing number users share their online and invite friends to join in. This imposes need in-depth study on recommendation task, i.e., recommending items a users. Despite its value significance, remains unsolved problem due 1) weights members are crucial performance but rarely learnt from data; 2) followee information is beneficial understand users' preferences considered; 3)...
Over the last decade, renaissance of Web technologies has transformed online world into an application (App) driven society. While abundant Apps have provided great convenience, their sheer number also leads to severe information overload, making it difficult for users identify desired Apps. To alleviate overloading issue, recommender systems been proposed and deployed App domain. However, existing work on recommendation largely focused one single platform (e.g., smartphones), while ignores...
Given an untrimmed video and a query sentence, cross-modal moment retrieval aims to rank from pre-segmented candidates that best matches the sentence. Pioneering work typically learns representations of textual visual content separately then obtains interactions or alignments between different modalities. However, task is not yet thoroughly addressed as it needs further identify fine-grained differences with high repeatability similarity. Moveover, relation among objects in both sentence...
Thanks to the recent advance in multimedia techniques, increasing research attention has been paid virtual try-on task, especially with 2D image modeling. The traditional task aims align target clothing item naturally given person's body and hence present a look of person. However, practice, people may also be interested their looks different poses. Therefore, this work, we introduce new setting, which enables changes both pose. Towards end, propose pose-guided scheme based on generative...
The fine-grained attribute descriptions can significantly supplement the valuable semantic information for person image, which is vital to success of re-identification (ReID) task. However, current ReID algorithms typically failed effectively leverage rich contextual available, primarily due their reliance on simplistic and coarse utilization image attributes. Recent advances in artificial intelligence generated content have made it possible automatically generate plentiful make full use...
We study learning outcome prediction for online courses. Whereas prior work has focused on semester-long courses with frequent student assessments, we focus short-courses that have single outcomes assigned by instructors at the end. The lack of performance data and generally small enrollments makes behavior learners, captured as they interact course content one another in Social Learning Networks (SLN), essential prediction. Our method defines several (machine) features based processing...
In this article, we tackle the cross-modal video moment localization issue, namely, localizing most relevant in an untrimmed given a sentence as query. The majority of existing methods focus on generating candidates with help multi-scale sliding window segmentation. They hence inevitably suffer from numerous candidates, which result less effective retrieval process. addition, spatial scene tracking is crucial for realizing process, but it rarely considered traditional techniques. To end,...
In this paper, we propose AutoVMR, a novel multimodal large language model framework that employs an autonomous event generation and localization approach for video moment retrieval. AutoVMR utilizes autoregressive architecture, accepting input fixed prompt template, to generate descriptions of segments along with their corresponding start end times. Additionally, introduce intersection over union-based reward trained using the reinforcement learning from human feedback method, which is...
The newly emerging language-based video moment retrieval task aims at retrieving a target from an untrimmed given natural language as the query. It is more applicable in reality since it able to accurately localize specific moment, compared traditional whole retrieval. In this work, we propose novel solution thoroughly investigate issue under adversarial learning. key of our formulate learning problem with two tightly connected components. Specifically, reinforcement employed generator...
Retrieving video moments from an untrimmed given a natural language as the query is challenging task in both academia and industry. Although much effort has been made to address this issue, traditional moment ranking methods are unable generate reasonable candidates localization approaches not applicable large-scale retrieval scenario. How combine into unified framework overcome their drawbacks reinforce each other rarely considered. Toward end, we contribute novel solution thoroughly...
In this article, we tackle the math word problem, namely, automatically answering a mathematical problem according to its textual description. Although recent methods have demonstrated their promising results, most of these are based on template-based generation scheme which results in limited generalization capability. To end, propose novel human-like analogical learning method recall and learn manner. Our proposed framework is composed modules memory, representation, analogy, reasoning,...
As a natural extension of image-based cross-modal recipe retrieval, retrieving specific video given as the query is seldom explored. There are various temporal and spatial elements hidden in cooking videos. In addition, current retrieval approaches mostly emphasize understanding textual visual content independently. Such methods overlook interaction between content. this work, we innovatively propose new problem video-based thoroughly investigate issue under attention paradigm. particular,...
The task of cross-modal image retrieval has recently attracted considerable research attention. In real-world scenarios, keyword-based queries issued by users are usually short and have broad semantics. Therefore, semantic diversity is as important accuracy in such user-oriented services, which improves user experience. However, most typical methods based on single point query embedding inevitably result low diversity, while existing diverse approaches frequently lead to due a lack...
A social learning network (SLN) emerges when users exchange information on educational topics with structured interactions. The recent proliferation of massively scaled online (human) learning, such as massive open courses (MOOCs), has presented a plethora research challenges surrounding SLN. In this paper, we ask: how efficient are these networks? We propose method in which the SLN efficiency is determined by comparing user benefit observed to benchmark maximum utility achievable through...
Beauty product retrieval is a challenging task due to the severe image variation issue in real-world scenes. In this work, mitigate data problem, we contribute background-agnostic feature extractor, which trained by self-supervised salient object detection method. particular, first propose foreground augmentation technique acquire with its mask. Next, extractor an attention pooling layer proposed learn representations performing manner. Finally, ensemble features of multiple models perform...