Da Cao

ORCID: 0000-0002-2611-2559
Publications
Citations
Views
---
Saved
---
About
Contact & Profiles
Research Areas
  • Multimodal Machine Learning Applications
  • Video Analysis and Summarization
  • Advanced Image and Video Retrieval Techniques
  • Topic Modeling
  • Recommender Systems and Techniques
  • Online Learning and Analytics
  • Domain Adaptation and Few-Shot Learning
  • Human Pose and Action Recognition
  • Natural Language Processing Techniques
  • Image Retrieval and Classification Techniques
  • Advanced Graph Neural Networks
  • Advanced Vision and Imaging
  • Intelligent Tutoring Systems and Adaptive Learning
  • Video Surveillance and Tracking Methods
  • Expert finding and Q&A systems
  • Educational Technology and Assessment
  • Advanced Malware Detection Techniques
  • Caching and Content Delivery
  • Online and Blended Learning
  • Advanced Text Analysis Techniques
  • Artificial Intelligence in Games
  • Artificial Intelligence in Law
  • Gait Recognition and Analysis
  • Face recognition and analysis
  • Legal Education and Practice Innovations

Hunan University
2018-2025

Xiamen University
2011-2018

Xiamen University of Technology
2012-2017

Huazhong University of Science and Technology
2009

Due to the prevalence of group activities in people's daily life, recommending content a users becomes an important task many information systems. A fundamental problem recommendation is how aggregate preferences members infer decision group. Toward this end, we contribute novel solution, namely AGREE (short for ''Attentive Group REcommEndation''), address preference aggregation by learning strategy from data, which based on recent developments attention network and neural collaborative...

10.1145/3209978.3209998 article EN 2018-06-27

Existing recommender algorithms mainly focused on recommending individual items by utilizing user-item interactions. However, little attention has been paid to recommend user generated lists (e.g., playlists and booklists). On one hand, contain rich signal about item co-occurrence, as within a list are usually gathered based specific theme. the other user's preference over also indicate her list. We believe that 1) if relevance can be properly leveraged, an enhanced recommendation for...

10.1145/3077136.3080779 article EN Proceedings of the 45th International ACM SIGIR Conference on Research and Development in Information Retrieval 2017-07-28

Image-text matching is a vital yet challenging task in the field of multimedia analysis. Over past decades, great efforts have been made to bridge semantic gap between visual and textual modalities. Despite significance value, most prior work still confronted with multi-view description challenge, i.e., how align an image multiple descriptions diversity. Toward this end, we present novel context-aware summarization network summarize context-enhanced region information from views. To be more...

10.1145/3394171.3413961 article EN Proceedings of the 30th ACM International Conference on Multimedia 2020-10-12

With the proliferation of social networks, group activities have become an essential ingredient our daily life. A growing number users share their online and invite friends to join in. This imposes need in-depth study on recommendation task, i.e., recommending items a users. Despite its value significance, remains unsolved problem due 1) weights members are crucial performance but rarely learnt from data; 2) followee information is beneficial understand users' preferences considered; 3)...

10.1109/tkde.2019.2936475 article EN IEEE Transactions on Knowledge and Data Engineering 2019-08-22

Over the last decade, renaissance of Web technologies has transformed online world into an application (App) driven society. While abundant Apps have provided great convenience, their sheer number also leads to severe information overload, making it difficult for users identify desired Apps. To alleviate overloading issue, recommender systems been proposed and deployed App domain. However, existing work on recommendation largely focused one single platform (e.g., smartphones), while ignores...

10.1145/3017429 article EN ACM transactions on office information systems 2017-07-11

Given an untrimmed video and a query sentence, cross-modal moment retrieval aims to rank from pre-segmented candidates that best matches the sentence. Pioneering work typically learns representations of textual visual content separately then obtains interactions or alignments between different modalities. However, task is not yet thoroughly addressed as it needs further identify fine-grained differences with high repeatability similarity. Moveover, relation among objects in both sentence...

10.1109/cvpr46437.2021.00225 article EN 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 2021-06-01

Thanks to the recent advance in multimedia techniques, increasing research attention has been paid virtual try-on task, especially with 2D image modeling. The traditional task aims align target clothing item naturally given person's body and hence present a look of person. However, practice, people may also be interested their looks different poses. Therefore, this work, we introduce new setting, which enables changes both pose. Towards end, propose pose-guided scheme based on generative...

10.1145/3343031.3350946 article EN Proceedings of the 30th ACM International Conference on Multimedia 2019-10-15

The fine-grained attribute descriptions can significantly supplement the valuable semantic information for person image, which is vital to success of re-identification (ReID) task. However, current ReID algorithms typically failed effectively leverage rich contextual available, primarily due their reliance on simplistic and coarse utilization image attributes. Recent advances in artificial intelligence generated content have made it possible automatically generate plentiful make full use...

10.1609/aaai.v38i7.28524 article EN Proceedings of the AAAI Conference on Artificial Intelligence 2024-03-24

We study learning outcome prediction for online courses. Whereas prior work has focused on semester-long courses with frequent student assessments, we focus short-courses that have single outcomes assigned by instructors at the end. The lack of performance data and generally small enrollments makes behavior learners, captured as they interact course content one another in Social Learning Networks (SLN), essential prediction. Our method defines several (machine) features based processing...

10.1109/tlt.2018.2793193 article EN IEEE Transactions on Learning Technologies 2018-01-15

In this article, we tackle the cross-modal video moment localization issue, namely, localizing most relevant in an untrimmed given a sentence as query. The majority of existing methods focus on generating candidates with help multi-scale sliding window segmentation. They hence inevitably suffer from numerous candidates, which result less effective retrieval process. addition, spatial scene tracking is crucial for realizing process, but it rarely considered traditional techniques. To end,...

10.1145/3394171.3413840 article EN Proceedings of the 30th ACM International Conference on Multimedia 2020-10-12

In this paper, we propose AutoVMR, a novel multimodal large language model framework that employs an autonomous event generation and localization approach for video moment retrieval. AutoVMR utilizes autoregressive architecture, accepting input fixed prompt template, to generate descriptions of segments along with their corresponding start end times. Additionally, introduce intersection over union-based reward trained using the reinforcement learning from human feedback method, which is...

10.2139/ssrn.5084339 preprint EN 2025-01-01

The newly emerging language-based video moment retrieval task aims at retrieving a target from an untrimmed given natural language as the query. It is more applicable in reality since it able to accurately localize specific moment, compared traditional whole retrieval. In this work, we propose novel solution thoroughly investigate issue under adversarial learning. key of our formulate learning problem with two tightly connected components. Specifically, reinforcement employed generator...

10.1145/3478025 article EN ACM Transactions on Multimedia Computing Communications and Applications 2022-02-16

Retrieving video moments from an untrimmed given a natural language as the query is challenging task in both academia and industry. Although much effort has been made to address this issue, traditional moment ranking methods are unable generate reasonable candidates localization approaches not applicable large-scale retrieval scenario. How combine into unified framework overcome their drawbacks reinforce each other rarely considered. Toward end, we contribute novel solution thoroughly...

10.1145/3394171.3413841 article EN Proceedings of the 30th ACM International Conference on Multimedia 2020-10-12

In this article, we tackle the math word problem, namely, automatically answering a mathematical problem according to its textual description. Although recent methods have demonstrated their promising results, most of these are based on template-based generation scheme which results in limited generalization capability. To end, propose novel human-like analogical learning method recall and learn manner. Our proposed framework is composed modules memory, representation, analogy, reasoning,...

10.18653/v1/2021.findings-emnlp.68 preprint EN cc-by 2021-01-01

As a natural extension of image-based cross-modal recipe retrieval, retrieving specific video given as the query is seldom explored. There are various temporal and spatial elements hidden in cooking videos. In addition, current retrieval approaches mostly emphasize understanding textual visual content independently. Such methods overlook interaction between content. this work, we innovatively propose new problem video-based thoroughly investigate issue under attention paradigm. particular,...

10.1145/3343031.3351067 article EN Proceedings of the 30th ACM International Conference on Multimedia 2019-10-15

The task of cross-modal image retrieval has recently attracted considerable research attention. In real-world scenarios, keyword-based queries issued by users are usually short and have broad semantics. Therefore, semantic diversity is as important accuracy in such user-oriented services, which improves user experience. However, most typical methods based on single point query embedding inevitably result low diversity, while existing diverse approaches frequently lead to due a lack...

10.1109/tnnls.2022.3168431 article EN IEEE Transactions on Neural Networks and Learning Systems 2022-04-28

A social learning network (SLN) emerges when users exchange information on educational topics with structured interactions. The recent proliferation of massively scaled online (human) learning, such as massive open courses (MOOCs), has presented a plethora research challenges surrounding SLN. In this paper, we ask: how efficient are these networks? We propose method in which the SLN efficiency is determined by comparing user benefit observed to benchmark maximum utility achievable through...

10.1109/tnet.2018.2859325 article EN IEEE/ACM Transactions on Networking 2018-08-16

Beauty product retrieval is a challenging task due to the severe image variation issue in real-world scenes. In this work, mitigate data problem, we contribute background-agnostic feature extractor, which trained by self-supervised salient object detection method. particular, first propose foreground augmentation technique acquire with its mask. Next, extractor an attention pooling layer proposed learn representations performing manner. Finally, ensemble features of multiple models perform...

10.1145/3343031.3356059 article EN Proceedings of the 30th ACM International Conference on Multimedia 2019-10-15
Coming Soon ...