- Advanced Neural Network Applications
- Advanced Vision and Imaging
- Video Surveillance and Tracking Methods
- Human Pose and Action Recognition
- Robotics and Sensor-Based Localization
- Advanced Image and Video Retrieval Techniques
- Anomaly Detection Techniques and Applications
- Generative Adversarial Networks and Image Synthesis
- Domain Adaptation and Few-Shot Learning
- CCD and CMOS Imaging Sensors
- AI in cancer detection
- 3D Shape Modeling and Analysis
- Autonomous Vehicle Technology and Safety
- Visual Attention and Saliency Detection
- Remote Sensing and LiDAR Applications
- Video Analysis and Summarization
- Machine Learning and Data Classification
- Hand Gesture Recognition Systems
- Gait Recognition and Analysis
- Advanced Image Processing Techniques
- Image Enhancement Techniques
- Infrastructure Maintenance and Monitoring
- Medical Image Segmentation Techniques
- Machine Learning and ELM
- Optical measurement and interference techniques
University of Moratuwa
2015-2025
Stony Brook University
2021
The University of Sydney
2019
Western University
2004-2008
Capsule Network is a promising concept in deep learning, yet its true potential not fully realized thus far, providing sub-par performance on several key benchmark datasets with complex data. Drawing intuition from the success achieved by Convolutional Neural Networks (CNNs) going deeper, we introduce DeepCaps, capsule network architecture which uses novel 3D convolution based dynamic routing algorithm. With surpass state-of-the-art domain networks results CIFAR10, SVHN and Fashion MNIST,...
Manual annotation of large-scale point cloud dataset for varying tasks such as 3D object classification, segmentation and detection is often laborious owing to the irregular structure clouds. Self-supervised learning, which operates without any human labeling, a promising approach address this issue. We observe in real world that humans are capable mapping visual concepts learnt from 2D images understand world. Encouraged by insight, we propose CrossPoint, simple cross-modal contrastive...
Human activity recognition finds many applications in areas such as surveillance, and sports. Such a system classifies spatio-temporal feature descriptor of human figure video, based on training examples. However classifiers face the constraints long time, large size vector. Our method, due to use an Support Vector Machine (SVM) classifier, existing resolves these problems recognition. Comparison our with using two standard datasets shows that is much superior terms computational either it...
Many localized languages struggle to reap the benefits of recent advancements in character recognition systems due lack substantial amount labeled training data. This is difficulty generating large amounts data for such and inability deep learning techniques properly learn from small number samples. We solve this problem by introducing a technique new samples existing samples, with realistic augmentations which reflect actual variations that are present human hand writing, adding random...
Simultaneous localization and map-building (SLAM) continues to draw considerable attention in the robotics community due advantages it can offer building autonomous robots. It examines ability of an robot starting unknown environment incrementally build map simultaneously localize itself within this map. Recent advances computer vision have contributed a whole class solutions for challenge SLAM. This paper surveys contemporary progress SLAM algorithms, especially those using as main sensing...
In this paper, we introduce a novel road marking bench-mark dataset for detection, addressing the limitations in existing publicly available datasets such as lack of challenging scenarios, prominence given to lane markings, unavailability an evaluation script, an-notation formats and lower resolutions. Our consists 2887 total images with 4706 instances belonging 11 classes. The have high resolution 1920 × 1080 capture wide range traffic, lighting weather conditions. We provide an-notations...
We present Seg-TTO, a novel framework for zero-shot, open-vocabulary semantic segmentation (OVSS), designed to excel in specialized domain tasks. While current open vocabulary approaches show impressive performance on standard benchmarks under zero-shot settings, they fall short of supervised counterparts highly domain-specific datasets. focus segmentation-specific test-time optimization address this gap. Segmentation requires an understanding multiple concepts within single image while...
Splatting-based 3D reconstruction methods have gained popularity with the advent of Gaussian Splatting, efficiently synthesizing high-quality novel views. These commonly resort to using exponential family functions, such as function, kernels due their anisotropic nature, ease projection, and differentiability in rasterization. However, field remains restricted variations within family, leaving generalized largely underexplored, partly lack easy integrability 2D projections. In this light, we...
Recent work done on lane detection has been able to detect lanes accurately in complex scenarios, yet many fail deliver real-time performance specifically with limited computational resources. In this work, we propose SwiftLane: a simple and light-weight, end-to-end deep learning based framework, coupled the row-wise classification formulation for fast efficient detection. This framework is supplemented false positive suppression algorithm curve fitting technique further increase accuracy....
Robust feature tracking is a requirement for many computer vision tasks such as indoor robot navigation. However, scenes are characterized by poorly localizable features. As result, without artificial markers challenging and remains an attractive problem. We propose to solve this problem constraining the locations of large number nondistinctive features several planar homographies which strategically computed using distinctive experimentally show need multiple illumination-invariant...
We investigate the problem of automatic action recognition and classification videos. In this paper, we present a convolutional neural network architecture, which takes both motion static information as inputs in single stream. show that is able to treat different feature maps extract features off them, although stacked together. trained tested our on Youtube dataset. Our surpass state-of-the-art hand-engineered methods. Furthermore, also studied compared effect providing network, task...
Recent work done on traffic sign and light detection focus improving accuracy in complex scenarios, yet many fail to deliver real-time performance, specifically with limited computational resources. In this work, we propose a simple deep learning based end-to-end framework, which effectively tackles challenges inherent such as small size, large number of classes road scenarios. We optimize the models using TensorRT integrate Robot Operating System deploy an Nvidia Jetson AGX Xavier our...
Activity recognition in videos a deep-learning setting-or otherwise-uses both static and pre-computed motion components. The method of combining the two components, while keeping burden on deep network less, still remains uninvestigated. Moreover, it is not clear what level contribution individual components is, how to control contribution. In this paper, we use combination convolutional-neural-network-generated features form tubes. We propose three schemas for components: based variance...
Typical Automatic Number Plate Recognition (ANPR) system uses high resolution cameras to acquire good quality images of the vehicles passing through. In these images, license plates are localized, characters segmented, and recognized determine identity vehicles. However, steps in this workflow will fail produce expected results low a less constrained environment. Thus work, several improvements made ANPR by incorporating intelligent heuristics, image processing techniques domain knowledge...
Automatic stroke recognition of badminton video footages plays an important role in the process analyzing players and building up statistics. Yet recognizing activities from broadcast videos is a challenging task due to person dependant body postures blurring fast moving parts. We propose robust accurate approach for using dense trajectories trajectory aligned HOG features which are calculated inside local bounding boxes around players. A four-class SVM classifier then used classify strokes...
Detection of straight lines in an image is a fundamental requirement for many applications computer vision. We formulate the line detection task as energy minimization problem. This formulation helps global manner contrast to local methods used conventional algorithms. As result proposed algorithm can handle virtually co-located lines, slightly curved and edge linking unified manner. In addition, due its nature, not deceived by noise giving rise spurious segments. Therefore, robustly detect...
A single-chip FPGA implementation of a vision core is an efficient way to design fast and compact embedded systems from the PCB level. The scope research novel FPGA-based parallel architecture for entirely with on-chip resources. We designed it by utilizing block-RAMs IO interfaces on FPGA. As result, system compact, flexible. evaluated this several mid-level neighborhood algorithms using Xilinx Virtex-2 Pro (XC2VP30) Our algorithm uses 100 MHz clock which supports image processing...
In elephant management and conservation, it is vital to have non-invasive methods track elephants. Image based recognition a mechanism for tracking, albeit the inefficiency in manual method due difficulties handling large amount of data from multiple sources. To mitigate drawbacks method, we proposed computer vision based, automated, mechanism, which mainly relies on appearance algorithms. We tested feasibility system running web interface, can facilitate researchers conservationists all...
Convolutional Neural Networks (CNNs) have achieved state-of-the-art performance in many computer vision tasks over the years. However, this comes at cost of heavy computation and memory intensive network designs, suggesting potential improvements efficiency. layers CNNs partly account for such an inefficiency, as they are known to learn redundant features. In work, we exploit redundancy, observing it correlation between convolutional filters a layer, propose alternative approach reproduce...
Transformers combined with convolutional encoders have been recently used for hand gesture recognition (HGR) using micro-Doppler signatures. In this letter, we propose a vision-transformer-based architecture HGR multiantenna continuous-wave Doppler radar receivers. The proposed consists of three modules: 1) encoder–decoder, 2) an attention module transformer layers, and 3) multilayer perceptron. novel decoder helps to feed patches larger sizes the improved feature extraction. Experimental...