- Domain Adaptation and Few-Shot Learning
- Advanced Neural Network Applications
- Multimodal Machine Learning Applications
- Human Pose and Action Recognition
- Video Surveillance and Tracking Methods
- Machine Learning and Data Classification
- Anomaly Detection Techniques and Applications
- AI in cancer detection
- Gait Recognition and Analysis
- COVID-19 diagnosis using AI
- Brain Tumor Detection and Classification
- Microplastics and Plastic Pollution
- Advanced Image and Video Retrieval Techniques
- Software Engineering Research
- Caching and Content Delivery
- Network Packet Processing and Optimization
- Artificial Intelligence in Healthcare
- Cloud Computing and Resource Management
- IoT and Edge/Fog Computing
- Network Security and Intrusion Detection
- Privacy-Preserving Technologies in Data
- Natural Language Processing Techniques
- Nanoparticles: synthesis and applications
- Topic Modeling
- Medical Imaging and Analysis
Manchester University
2024
Shanghai Jiao Tong University
2015-2023
Salesforce (United States)
2019-2023
Singapore-HUJ Alliance for Research and Enterprise
2022
National University of Defense Technology
2020-2022
National University of Singapore
2016-2020
Fudan University
2018-2020
Xi'an Jiaotong University
2018
Yunnan University
2016
This paper presents Prototypical Contrastive Learning (PCL), an unsupervised representation learning method that addresses the fundamental limitations of instance-wise contrastive learning. PCL not only learns low-level features for task instance discrimination, but more importantly, it implicitly encodes semantic structures data into learned embedding space. Specifically, we introduce prototypes as latent variables to help find maximum-likelihood estimation network parameters in...
Semi-supervised learning has been an effective paradigm for leveraging unlabeled data to reduce the reliance on labeled data. We propose CoMatch, a new semi-supervised method that unifies dominant approaches and addresses their limitations. CoMatch jointly learns two representations of training data, class probabilities low-dimensional embeddings. The interact with each other evolve. embeddings impose smoothness constraint improve pseudo-labels, whereas pseudo-labels regularize structure...
Large language models (LLMs) pretrained on vast source code have achieved prominent progress in intelligence. However, existing LLMs two main limitations. First, they often adopt a specific architecture (encoder-only or decoder-only) rely unified encoder-decoder network for different downstream tasks, lacking the flexibility to operate optimal task. Secondly, employ limited set of pretraining objectives which might not be relevant some tasks and hence result substantial performance degrade....
Data heterogeneity across clients in federated learning (FL) settings is a widely acknowledged challenge. In response, personalized (PFL) emerged as framework to curate local models for clients' tasks. PFL, common strategy develop and global jointly - the model (for generalization) informs models, personalization) are aggregated update model. A key observation that if we can improve generalization ability of then which turn builds better models. this work, consider class imbalance, an...
The recent success in human action recognition with deep learning methods mostly adopt the supervised paradigm, which requires significant amount of manually labeled data to achieve good performance. However, label collection is an expensive and time-consuming process. In this work, we propose unsupervised framework, exploits unlabeled learn video representations. Different from previous works representation learning, our task predict 3D motion multiple target views using a source view. By...
We introduce Reward-Guided Speculative Decoding (RSD), a novel framework aimed at improving the efficiency of inference in large language models (LLMs). RSD synergistically combines lightweight draft model with more powerful target model, incorporating controlled bias to prioritize high-reward outputs, contrast existing speculative decoding methods that enforce strict unbiasedness. employs process reward evaluate intermediate steps and dynamically decide whether invoke optimizing trade-off...
Training deep learning based video classifiers for action recognition requires a large amount of labeled videos. The labeling process is labor-intensive and time-consuming. On the other hand, weakly-labeled images are uploaded to Internet by users everyday. To harness rich highly diverse set Web images, scalable approach crawl these train classifier, such as Convolutional Neural Networks (CNN). However, due domain shift problem, performance trained tend degrade when directly deployed One way...
Dielectric elastomer actuators (DEAs) have been widely employed as artificial muscles in soft robots. Due to material viscoelasticity and nonlinear electromechanical coupling, it is challenging accurately model a viscoelastic DEA, especially when the actuator of complex or irregular configuration. Control DEAs thus but significant. In this letter, we propose model-free method for control DEAs, based on deep reinforcement learning. We perform dynamic feedback by considering time-dependent...
Self-supervised feature representations have been shown to be useful for supervised classification, few-shot learning, and adversarial robustness. We show that features obtained using self-supervised learning are comparable to, or better than, domain generalization in computer vision. introduce a new pretext task of predicting responses Gabor filter banks demonstrate multi-task compatible tasks improves performance as compared training individual alone. Features learnt through...
Recent advancements in multimodal pre-training methods have shown promising efficacy 3D representation learning by aligning features across shapes, their 2D counterparts, and language descriptions. However, the used existing frameworks to gather data for applications lack scalability comprehensiveness, potentially constraining full potential of learning. The main bottleneck lies modality's comprehensiveness. To address this, we introduce ULIP-2, a tri-modal framework that leverages...
The recent development of commodity 360° cameras have enabled a single video to capture an entire scene, which endows promising potentials in surveillance scenarios. However, research omnidirectional analysis has lagged behind the hardware advances. In this work, we address important problem action recognition topview videos. Due wide filed-of-view, videos usually multiple people performing actions at same time. Furthermore, appearance are deformed. proposed framework first transforms...
Training deep object detectors requires significant amount of human-annotated images with accurate labels and bounding box coordinates, which are extremely expensive to acquire. Noisy annotations much more easily accessible, but they could be detrimental for learning. We address the challenging problem training noisy annotations, where noise contains a mixture label noise. propose learning framework jointly optimizes labels, model parameters by performing alternating correction training. To...