- Privacy-Preserving Technologies in Data
- Human Pose and Action Recognition
- Advanced Image and Video Retrieval Techniques
- Topic Modeling
- Internet Traffic Analysis and Secure E-voting
- Cryptography and Data Security
- COVID-19 diagnosis using AI
- Multimodal Machine Learning Applications
- Video Surveillance and Tracking Methods
- Mobile Crowdsensing and Crowdsourcing
- Stochastic Gradient Optimization Techniques
- IoT and Edge/Fog Computing
- Artificial Intelligence in Healthcare and Education
- Domain Adaptation and Few-Shot Learning
- Image Retrieval and Classification Techniques
- Recommender Systems and Techniques
- Green IT and Sustainability
- Emotion and Mood Recognition
- Personal Information Management and User Behavior
- Gait Recognition and Analysis
- Team Dynamics and Performance
- Adversarial Robustness in Machine Learning
- Parallel Computing and Optimization Techniques
- Speech Recognition and Synthesis
- Natural Language Processing Techniques
University of Cambridge
2025
Huawei Technologies (China)
2024
Beijing University of Posts and Telecommunications
2010-2024
Tsinghua University
2024
Shanghai Jiao Tong University
2024
Park Plaza Hospital
2024
Beijing Academy of Artificial Intelligence
2022
University of Virginia
2022
Dalian University
2022
Dalian University of Technology
2022
State-of-the-art approaches for the previous emotion recognition in wild challenges are usually built on prevailing Convolutional Neural Networks (CNNs). Although there is clear evidence that CNNs with increased depth or width can bring improved predication accuracy, existing top provide supervision only at output feature layer, resulting insufficient training of deep CNN models. In this paper, we present a new learning method named Supervised Scoring Ensemble (SSE) advancing challenge CNNs....
In this paper, we present HoloNet, a well-designed Convolutional Neural Network (CNN) architecture regarding our submissions to the video based sub-challenge of Emotion Recognition in Wild (EmotiW) 2016 challenge. contrast previous related methods that usually adopt relatively simple and shallow neural network architectures address emotion recognition task, HoloNet has three critical considerations design. (1) To reduce redundant filters enhance non-saturated non-linearity lower...
Large foundation models, including large language models (LLMs), vision transformers (ViTs), diffusion, and LLM-based multimodal are revolutionizing the entire machine learning lifecycle, from training to deployment. However, substantial advancements in versatility performance these offer come at a significant cost terms of hardware resources. To support growth scalable environmentally sustainable way, there has been considerable focus on developing resource-efficient strategies. This survey...
In the current AI era, mobile devices such as smartphones are tasked with executing a myriad of deep neural networks (DNNs) locally. It presents complex landscape, these models highly fragmented in terms architecture, operators, and implementations. Such fragmentation poses significant challenges to co-optimization hardware, systems, algorithms for efficient scalable AI.
Visual question answering (VQA) requires joint comprehension of images and natural language questions, where many questions can't be directly or clearly answered from visual content but require reasoning structured human knowledge with confirmation content. This paper proposes memory network (VKMN) to address this issue, which seamlessly incorporates deep features into networks in an end-to-end learning framework. Comparing existing methods for leveraging external supporting VQA, stresses...
Transformer-based pre-trained models have revolutionized NLP for superior performance and generality. Fine-tuning downstream tasks often requires private data, which federated learning is the de-facto approach (i.e., FedNLP). However, our measurements show that FedNLP prohibitively slow due to large model sizes resultant high network/computation cost. Towards practical FedNLP, we identify as key building blocks adapters, small bottleneck modules inserted at a variety of layers. A challenge...
Natural language processing (NLP) sees rich mobile applications. To support various understanding tasks, a foundation NLP model is often fine-tuned in federated, privacy-preserving setting (FL). This process currently relies on at least hundreds of thousands labeled training samples from clients; yet users lack willingness or knowledge to label their data. Such an inadequacy data labels known as few-shot scenario; it becomes the key blocker for For first time, this work investigates...
Large Language Models (LLMs) are transforming the landscape of mobile intelligence. Federated Learning (FL), a method to preserve user data privacy, is often employed in fine-tuning LLMs downstream tasks, an approach known as FedLLM. Though recent efforts have addressed network issue induced by vast model size, they not practically mitigated vital challenges concerning integration with devices, such significant memory consumption and sluggish convergence. In response these challenges, this...
<title>Abstract</title> Forgetting is inevitable in human memory. Recently, multimodal embedding models have been proposed to vectorize reality into a unified space. The generated embeddings can be easily retrieved help mobile users remember and recall information when needed. However, as the model’s capacity increases, its resource consumption also rises. resulting slow throughput significant computational requirements hinder deployment on devices. In this paper, we present Reminisce, first...
Highly efficient catalysts for both oxygen reduction reaction (ORR) and evolution (OER) are key to the commercialization of rechargeable zinc-air batteries (ZABs). In this work, a catalyst with uniform nanospherical morphology was prepared from cobalt nitrate, acetylacetone, hydrazine hydrate. The final possesses high ORR OER performances, half-wave potential 0.911 V [vs reversible hydrogen electrode (RHE)] low 1.57 (vs RHE) at 10 mA cm-2 in 0.1 M KOH solution. Specially, ZAB based on...
Communication overhead is a significant bottleneck in federated learning (FL), which has been exaggerated with the increasing size of AI models. In this paper, we propose FedRDMA, communication-efficient cross-silo FL system that integrates RDMA into communication protocol. To overcome limitations wide-area networks (WANs), FedRDMA divides updated model chunks and designs series optimization techniques to improve efficiency robustness RDMA-based communication. We implement atop industrial...
Local feature-based symmetry detection algorithms can simultaneously consider symmetries over all locations, scales and orientations achieve state-of-the-art performance. This paper demonstrates the limitations of these in case dealing with background clutters, low contrast smooth surfaces, presents an adaptive feature point algorithm to overcome those limitations. Quantitative evaluations subjective comparisons against reflection on image dataset released by "Symmetry Detection from Real...
Transformer-based pre-trained models have emerged as the predominant solution for natural language processing (NLP). Fine-tuning such downstream tasks often requires a considerable amount of labeled private data. In practice, data is distributed across heterogeneous mobile devices and may be prohibited from being uploaded. Moreover, well-curated scarce, presenting an additional challenge. To address these challenges, we first introduce generator federated few-shot learning tasks, which...
Recent advancements in integrating large language models (LLMs) with application programming interfaces (APIs) have gained significant interest both academia and industry. These API-based agents, leveraging the strong autonomy planning capabilities of LLMs, can efficiently solve problems requiring multi-step actions. However, their ability to handle multi-dimensional difficulty levels, diverse task types, real-world demands through APIs remains unknown. In this paper, we introduce...
We are witnessing the emergence of ubiquitous learning, where each device (smartphones, wearables, IoTs, etc) can learn from their environments either alone or collaboratively. Such a new paradigm is enabled by deep learning techniques, more specifically, on-device training. Given its popularity in machine community, unfortunately, there no systematic understandings critical question: how much cost does it take to train typical models on commodity end devices? Therefore, this work performs...
Large language models (LLMs) have achieved remarkable performance in various downstream tasks.However, training LLMs is computationally expensive and requires a large amount of memory.To address this issue, backpropagation-free (BP-free) has been proposed as promising approach to reduce the computational memory costs LLMs.In survey, we provide comprehensive overview BP-free for LLMs.We first outline three mainstream methods.Subsequently, introduce their optimizations LLMs.The goal survey...
Deploying large language models (LLMs) inference into mobile devices is cost-efficient for companies, and well addresses the privacy concern of users. However, limited computation capacity memory constraints hinder their practical deployment. Prior work strives to expand model size better accuracy performance, while there a lack systematic understanding "small" sub-10 billion LLMs that are already feasible current commodity devices. To reveal landscape on devices, we conducted comprehensive...
Visual question answering (VQA) requires joint comprehension of images and natural language questions, where many questions can't be directly or clearly answered from visual content but require reasoning structured human knowledge with confirmation content. This paper proposes memory network (VKMN) to address this issue, which seamlessly incorporates deep features into networks in an end-to-end learning framework. Comparing existing methods for leveraging external supporting VQA, stresses...
In today's landscape, smartphones have evolved into hubs for hosting a multitude of deep learning models aimed at local execution. A key realization driving this work is the notable fragmentation among these models, characterized by varied architectures, operators, and implementations. This imposes significant burden on comprehensive optimization hardware, system settings, algorithms. Buoyed recent strides in large foundation introduces pioneering paradigm mobile AI: collaborative management...
Transformer-based pre-trained models have revolutionized NLP for superior performance and generality. Fine-tuning downstream tasks often requires private data, which federated learning is the de-facto approach (i.e., FedNLP). However, our measurements show that FedNLP prohibitively slow due to large model sizes resultant high network/computation cost. Towards practical FedNLP, we identify as key building blocks adapters, small bottleneck modules inserted at a variety of layers. A challenge...
There has been an increasing attention on Electroencephalograph (EEG) based personal identification over the last decade. Most existing methods address this problem by Euclidean metric Nearest Neighbor (NN) search. However, under various recording conditions, simple distance cannot model similarity relations between EEG signals precisely. To overcome drawback, a local learning Large Margin (L-LMNN) for is proposed in paper. For each sample, separate learned, making intra-class samples...
Electrochemical surface-enhanced Raman scattering (EC-SERS) spectroscopy is an ultrasensitive spectro-electrochemistry technique that provides mechanistic and dynamic information on electrochemical interfaces at the molecular level. However, plasmon-mediated photocatalysis hinders intrinsic behavior of molecules interfaces. This work aimed to develop a facile method for constructing reliable EC-SERS substrate can be used study dynamics Herein, novel Ag-WO3−x electrochromic heterostructure...