- Human Pose and Action Recognition
- Anomaly Detection Techniques and Applications
- Multimodal Machine Learning Applications
- Domain Adaptation and Few-Shot Learning
- Advanced Image and Video Retrieval Techniques
- Video Surveillance and Tracking Methods
- Adversarial Robustness in Machine Learning
- Hand Gesture Recognition Systems
- Advanced Neural Network Applications
- Terrorism, Counterterrorism, and Political Violence
- Advanced Vision and Imaging
- Visual Attention and Saliency Detection
- Gait Recognition and Analysis
- Human-Automation Interaction and Safety
- Diabetic Foot Ulcer Assessment and Management
- Remote-Sensing Image Classification
- 3D Surveying and Cultural Heritage
- Higher Education and Teaching Methods
- Direction-of-Arrival Estimation Techniques
- Face recognition and analysis
- 3D Shape Modeling and Analysis
- Laser and Thermal Forming Techniques
- Hearing Impairment and Communication
- Speech and Audio Processing
- Radiomics and Machine Learning in Medical Imaging
Australian Regenerative Medicine Institute
2025
Monash University
2025
Apple (United States)
2024
Institute of Software
2023
Chinese Academy of Sciences
2023
ATUM (United States)
2022
Menlo School
2022
Doshisha University
2022
Dalian University of Technology
2022
National Administration of Surveying, Mapping and Geoinformation of China
2022
Millions of hearing impaired people around the world routinely use some variants sign languages to communicate, thus automatic translation a language is meaningful and important. Currently, there are two sub-problems in Sign Language Recognition (SLR), i.e., isolated SLR that recognizes word by continuous translates entire sentences. Existing methods typically utilize SLRs as building blocks, with an extra layer preprocessing (temporal segmentation) another post-processing (sentence...
Weakly-supervised temporal action localization (WS-TAL) is a promising but challenging task with only video-level categorical labels available during training. Without requiring boundary annotations in training data, WS-TAL could possibly exploit automatically retrieved video tags as labels. However, such coarse supervision inevitably incurs confusions, especially untrimmed videos containing multiple instances. To address this challenge, we propose the Contrast-based Localization EvaluAtioN...
Vision-based mobile robot navigation is a vibrant area of research with numerous algorithms having been developed, the vast majority which either belong to scene-oriented simultaneous localization and mapping (SLAM) or fall into category robot-oriented lane-detection/trajectory tracking. These methods suffer from high computational cost require stringent labelling calibration efforts. To address these challenges, this paper proposes lightweight framework based purely on uncalibrated...
The object of Weakly-supervised Temporal Action Localization (WS-TAL) is to localize all action instances in an untrimmed video with only video-level supervision. Due the lack frame-level annotations during training, current WS-TAL methods rely on attention mechanisms foreground snippets or frames that contribute classification task. This strategy frequently confuse context actual action, localization result. Separating and a core problem for precise WS-TAL, but it very challenging has been...
Fast implementations of the SParse Iterative Covariance-based Estimation (SPICE) algorithm are presented for source localization in passive sonar applications. SPICE is a robust, user parameter-free, high-resolution, iterative and globally convergent estimation array processing. offers superior resolution lower sidelobe levels at cost higher computational complexity compared to conventional delay-and-sum beamforming method. It shown this paper that can be reduced by exploiting Toeplitz...
This paper presents a series of user parameter-free iterative Sparse Asymptotic Minimum Variance (SAMV) approaches for array processing applications based on the asymptotically minimum variance (AMV) criterion. With assumption abundant snapshots in direction-of-arrival (DOA) estimation problem, signal powers and noise are jointly estimated by proposed AMV approach, which is later proved to coincide with Maximum Likelihood (ML) estimator. We then propose power-based SAMV approaches, robust...
During recent years, convolutional neural network (CNN)-based methods have been widely applied to hyperspectral image (HSI) classification by mostly mining the spectral variabilities. However, spatial consistency in HSI is rarely discussed except as an extra channel. Very recently, development of pixel pair features (PPF) for offers a new way incorporating information. In this paper, we first propose improved PPF-style feature, feature (SPPF), that better exploits both spatial/contextual...
For unmanned ground vehicle (UGV) off-line testing and performance evaluation, massive amount of traffic scenario data is often required. The annotations in current sensory dataset typically include I) types roadways II) scene III) specific characteristics that are generally considered challenging for cognitive algorithms. While such helpful manual selection data, they insufficient comprehensive quantitate measurement per-roadway-segment complexity. To resolve limitations, we propose a...
Research in human action recognition has accelerated significantly since the introduction of powerful machine learning tools such as Convolutional Neural Networks (CNNs). However, effective and efficient methods for incorporation temporal information into CNNs are still being actively explored recent literature. Motivated by popular recurrent attention models research area natural language processing, we propose Attention-aware Temporal Weighted CNN (ATW CNN) videos, which embeds a visual...
Online gaming is consistently changing with the use of new technologies and seen as making an impact on consumers’ sustainable lifestyles. The avatars have influenced low avatar identification players to engage in physical learning activities through massively multiplayer online (MMO) game genre. fundamental purpose study classify association consumer’s behavioural intention for exercise consume healthy food. This incorporates three theories: social cognitive theory (SCT), determination...
Abstract Patient-specific quality assurance (PSQA) of volumetric modulated arc therapy (VMAT) to assure accurate treatment delivery is resource-intensive and time-consuming. Recently, machine learning has been increasingly investigated in PSQA results prediction. However, the classification performance models at different criteria needs further improvement clinical validation (CV), especially for predicting plans with low gamma passing rates (GPRs). In this study, we developed validated a...
For visual-semantic embedding, the existing methods normally treat relevance between queries and candidates in a bipolar way – relevant or irrelevant, all “irrelevant” are uniformly pushed away from query by an equal margin embedding space, regardless of their various proximity to query. This practice disregards relatively discriminative information could lead suboptimal ranking retrieval results poorer user experience, especially long-tail scenario where matching candidate may not...
The lack of automatic tools to identify giant panda makes it hard keep track and manage pandas in wildlife conservation missions. In this paper, we introduce a new Giant Panda Identification (GPID) task, which aims each individual based on an image. Though related the human re-identification animal classification problem, GPID is extraordinarily challenging due subtle visual differences between cluttered global information. propose benchmark dataset iPanda-50 for GPID. consists 6, 874 images...
Deep Neural Network classifiers are vulnerable to adversarial attacks, where an imperceptible perturbation could result in misclassification. However, the vulnerability of DNN-based image ranking systems remains under-explored. In this paper, we propose two attacks against deep systems, <italic xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">i.e.</i> , Candidate Attack and Query Attack, that can raise or lower rank chosen candidates by perturbations....
Fast implementations of the sparse iterative covariance-based estimation (SPICE) algorithm are presented for source localization with a uniform linear array (ULA). SPICE is robust, user parameter-free, high-resolution, iterative, and globally convergent processing. offers superior resolution lower sidelobe levels compared to conventional delay-and-sum beamforming method; however, traditional implementation has higher computational complexity (which exacerbated in dimensional data). It shown...
A practical yet under-explored problem often encountered by multimedia researchers is the recognition of imperfect testing data, where multiple sensing channels are deployed but interference or transmission distortion corrupts some them. Typical cases data include missing features and feature misalignments. To address these challenges, we choose latent space model introduce a new similarity learning canonical-correlation analysis (SLCCA) method to capture semantic consensus between views....
Weakly-supervised Temporal Action Localization (W-TAL) aims at simultaneously classifying and locating all action instances with only video-level supervision. However, current W-TAL methods have two limitations. First, they ignore the difference in video representations between an instance its surrounding background when generating scoring proposals. Second, unique characteristics of RGB frames optical flow are largely ignored fusing these modalities. To address problems, Coherence Network...