- Human Pose and Action Recognition
- Domain Adaptation and Few-Shot Learning
- Multimodal Machine Learning Applications
- Video Surveillance and Tracking Methods
- Generative Adversarial Networks and Image Synthesis
- Gait Recognition and Analysis
- Advanced Neural Network Applications
- Advanced Malware Detection Techniques
- Face recognition and analysis
- Anomaly Detection Techniques and Applications
- Spam and Phishing Detection
- ECG Monitoring and Analysis
- Phonocardiography and Auscultation Techniques
- Advanced Vision and Imaging
- EEG and Brain-Computer Interfaces
- Web Data Mining and Analysis
- AI in cancer detection
- Machine Learning and Data Classification
- Text and Document Classification Technologies
- Image Processing and 3D Reconstruction
- User Authentication and Security Systems
- Action Observation and Synchronization
- Machine Learning and ELM
- Handwritten Text Recognition Techniques
- Human Motion and Animation
University of California, Berkeley
2022-2023
Berkeley College
2022-2023
University of Moratuwa
2017-2021
Data61
2019-2020
Commonwealth Scientific and Industrial Research Organisation
2019-2020
Inception Institute of Artificial Intelligence
2019-2020
Capsule Network is a promising concept in deep learning, yet its true potential not fully realized thus far, providing sub-par performance on several key benchmark datasets with complex data. Drawing intuition from the success achieved by Convolutional Neural Networks (CNNs) going deeper, we introduce DeepCaps, capsule network architecture which uses novel 3D convolution based dynamic routing algorithm. With surpass state-of-the-art domain networks results CIFAR10, SVHN and Fashion MNIST,...
We present an approach to reconstruct humans and track them over time. At the core of our approach, we propose a fully "transformerized" version network for human mesh recovery. This network, HMR 2.0, advances state art shows capability analyze unusual poses that have in past been difficult from single images. To video, use 3D reconstructions 2.0 as input tracking system operates 3D. enables us deal with multiple people maintain identities through occlusion events. Our complete 4DHumans,...
Humans can continuously learn new knowledge as their experience grows. In contrast, previous learning in deep neural networks quickly fade out when they are trained on a task. this paper, we hypothesize problem be avoided by set of generalized parameters, that neither specific to old nor tasks. pursuit, introduce novel meta-learning approach seeks maintain an equilibrium between all the encountered This is ensured meta-update rule which avoids catastrophic forgetting. comparison techniques,...
In a real-world setting, object instances from new classes can be continuously encountered by detectors. When existing detectors are applied to such scenarios, their performance on old deteriorates significantly. A few efforts have been reported address this limitation, all of which apply variants knowledge distillation avoid catastrophic forgetting. We note that although helps retain previous learning, it obstructs fast adaptability tasks, is critical requirement for incremental learning....
Many localized languages struggle to reap the benefits of recent advancements in character recognition systems due lack substantial amount labeled training data. This is difficulty generating large amounts data for such and inability deep learning techniques properly learn from small number samples. We solve this problem by introducing a technique new samples existing samples, with realistic augmentations which reflect actual variations that are present human hand writing, adding random...
Real-world contains an overwhelmingly large number of object classes, learning all which at once is infeasible. Few shot a promising paradigm due to its ability learn out order distributions quickly with only few samples. Recent works [7, 41] show that simply good feature embedding can outperform more sophisticated meta-learning and metric algorithms for few-shot learning. In this paper, we propose simple approach improve the representation capacity deep neural networks tasks. We follow...
We present an approach for tracking people in monocular videos by predicting their future 3D representations. To achieve this, we first lift to from a single frame robust manner. This lifting includes information about the pose of person, location space, and appearance. As track collect observations over time tracklet representation. Given nature our observations, build temporal models each one previous attributes. use these predict state tracklet, including appearance, location, pose. For...
In this work we study the benefits of using tracking and 3D poses for action recognition. To achieve this, take Lagrangian view on analysing actions over a trajectory human motion rather than at fixed point in space. Taking stand allows us to use tracklets people predict their actions. spirit, first show pose infer actions, person-person in-teractions. Subsequently, propose Action Recognition model by fusing contextualized appearance tracklets. end, our method achieves state-of-the-art...
Providing secure access to smart devices such as smartphones, wearables and various other IoT is becoming increasingly important, especially these store a range of sensitive personal information. Breathing acoustics-based authentication offers highly usable possibly secondary mechanism for access. Executing sophisticated machine learning pipelines on remains an open problem, given their resource limitations in terms storage, memory computational power. To investigate this challenge, we...
We empirically study autoregressive pre-training from videos. To perform our study, we construct a series of video models, called Toto. treat videos as sequences visual tokens and train transformer models to autoregressively predict future tokens. Our are pre-trained on diverse dataset images comprising over 1 trillion explore different architectural, training, inference design choices. evaluate the learned representations range downstream tasks including image recognition, classification,...
We introduce a simple framework for predicting the behavior of an agent in multi-agent settings. In contrast to autoregressive (AR) tasks, such as language processing, our focus is on scenarios with multiple agents whose interactions are shaped by physical constraints and internal motivations. To this end, we propose Poly-Autoregressive (PAR) modeling, which forecasts ego agent's future reasoning about state history past current states other interacting agents. At its core, PAR represents...
This paper explores Masked Autoencoders (MAE) with Gaussian Splatting. While reconstructive self-supervised learning frameworks such as MAE learns good semantic abstractions, it is not trained for explicit spatial awareness. Our approach, named Autoencoder, or GMAE, aims to learn abstractions and understanding jointly. Like MAE, reconstructs the image end-to-end in pixel space, but beyond also introduces an intermediate, 3D Gaussian-based representation renders images via splatting. We show...
Counterfeit apps impersonate existing popular in attempts to misguide users. Many counterfeits can be identified once installed, however even a tech-savvy user may struggle detect them before installation. In this paper, we propose novel approach of combining content embeddings and style generated from pre-trained convolutional neural networks counterfeit apps. We present an analysis approximately 1.2 million Google Play Store identify set potential for top-10,000 Under conservative...
In a conventional supervised learning setting, machine model has access to examples of all object classes that are desired be recognized during the inference stage. This results in fixed lacks flexibility adapt new tasks. practical settings, tasks often arrive sequence and models must continually learn increment their previously acquired knowledge. Existing incremental approaches fall well below state-of-the-art cumulative use training at once. this paper, we propose random path selection...
Recent advances in the Internet of Things (IoT) are reforming health care industry by providing higher communication efficiency, lower costs, and mobility. Among many IoT applications, wireless body area networks (BANs) a remarkable solution caring for rapidly growing aged population. Predictive transmit power control schemes improve BAN communications' reliability energy efficiency through long-term optimal radio resources allocation that supports consistent pervasive healthcare services....
Activity recognition in videos a deep-learning setting-or otherwise-uses both static and pre-computed motion components. The method of combining the two components, while keeping burden on deep network less, still remains uninvestigated. Moreover, it is not clear what level contribution individual components is, how to control contribution. In this paper, we use combination convolutional-neural-network-generated features form tubes. We propose three schemas for components: based variance...
We present a novel approach for tracking multiple people in video. Unlike past approaches which employ 2D representations, we focus on using 3D representations of people, located three-dimensional space. To this end, develop method, Human Mesh and Appearance Recovery (HMAR) addition to extracting the geometry person as SMPL mesh, also extracts appearance texture map triangles mesh. This serves representation that is robust viewpoint pose changes. Given video clip, first detect bounding boxes...
We present an approach to reconstruct humans and track them over time. At the core of our approach, we propose a fully "transformerized" version network for human mesh recovery. This network, HMR 2.0, advances state art shows capability analyze unusual poses that have in past been difficult from single images. To video, use 3D reconstructions 2.0 as input tracking system operates 3D. enables us deal with multiple people maintain identities through occlusion events. Our complete 4DHumans,...
Counterfeit apps impersonate existing popular in attempts to misguide users install them for various reasons such as collecting personal information, spreading malware, or simply increase their advertisement revenue. Many counterfeits can be identified once installed, however even a tech-savvy user may struggle detect before installation app icons and descriptions quite similar the original app. To this end, paper proposes leverage recent advances deep learning methods create image text...