- Domain Adaptation and Few-Shot Learning
- Advanced Image and Video Retrieval Techniques
- Advanced Neural Network Applications
- Multimodal Machine Learning Applications
- Video Surveillance and Tracking Methods
- Topic Modeling
- Anomaly Detection Techniques and Applications
- Natural Language Processing Techniques
- Human Pose and Action Recognition
- Image Retrieval and Classification Techniques
- Text and Document Classification Technologies
- Generative Adversarial Networks and Image Synthesis
- Gait Recognition and Analysis
- Face and Expression Recognition
- Robotics and Sensor-Based Localization
- Visual Attention and Saliency Detection
- Music Technology and Sound Studies
- Speech and dialogue systems
- Image Processing Techniques and Applications
- Machine Learning and Data Classification
- Advanced Image Processing Techniques
- Hand Gesture Recognition Systems
- Music and Audio Processing
- Remote-Sensing Image Classification
- Face recognition and analysis
The University of Adelaide
2016-2025
Australian Centre for Robotic Vision
2016-2025
Westlake University
2023
Dublin City University
2023
Microsoft Research Asia (China)
2023
Chongqing Three Gorges University
2022
Southwest Jiaotong University
2018-2020
Vision Australia
2019
Australian National University
2011-2014
University of Wollongong
2014
Deep autoencoder has been extensively used for anomaly detection. Training on the normal data, is expected to produce higher reconstruction error abnormal inputs than ones, which adopted as a criterion identifying anomalies. However, this assumption does not always hold in practice. It observed that sometimes "generalizes" so well it can also reconstruct anomalies well, leading miss detection of To mitigate drawback based detector, we propose augment with memory module and develop an...
Much recent progress in Vision-to-Language (V2L) problems has been achieved through a combination of Convolutional Neural Networks (CNNs) and Recurrent (RNNs). This approach does not explicitly represent high-level semantic concepts, but rather seeks to directly from image features text. In this paper we investigate whether direct succeeds due to, or despite, the fact that it avoids explicit representation information. We propose method incorporating concepts into successful CNN-RNN...
This paper proposes to improve visual question answering (VQA) with structured representations of both scene contents and questions. A key challenge in VQA is require joint reasoning over the text domains. The predominant CNN/LSTM-based approach limited by monolithic vector that largely ignore structure question. CNN feature vectors cannot effectively capture situations as simple multiple object instances, LSTMs process questions series words, which do not reflect true complexity language...
In object recognition, soft-assignment coding enjoys computational efficiency and conceptual simplicity. However, its classification performance is inferior to the newly developed sparse or local schemes. It would be highly desirable if could become comparable state-of-the-art, leading a scheme which perfectly combines performance. To achieve this, we revisit from two key aspects: probabilistic interpretation. For first aspect, argue that inferiority of due neglect underlying manifold...
Removing pixel-wise heterogeneous motion blur is challenging due to the ill-posed nature of problem. The predominant solution estimate kernel by adding a prior, but extensive literature on subject indicates difficulty in identifying prior which suitably informative, and general. Rather than imposing based theory, we propose instead learn one from data. Learning over latent image would require modeling all possible content. critical observation underpinning our approach, however, that...
For human pose estimation in monocular images, joint occlusions and overlapping upon bodies often result deviated predictions. Under these circumstances, biologically implausible predictions may be produced. In contrast, vision is able to predict poses by exploiting geometric constraints of inter-connectivity. To address the problem incorporating priors about structure bodies, we propose a novel structure-aware convolutional network implicitly take such into account during training deep...
The success of deep learning techniques in the computer vision domain has triggered a range initial investigations into their utility for visual place recognition, all using generic features from networks that were trained other types recognition tasks. In this paper, we train, at large scale, two CNN architectures specific task and employ multi-scale feature encoding method to generate condition- viewpoint-invariant features. To enable training occur, have developed massive Specific PlacEs...
This paper tackles the problem of training a deep convolutional neural network with both low-precision weights and low-bitwidth activations. Optimizing is very challenging since process can easily get trapped in poor local minima, which results substantial accuracy loss. To mitigate this problem, we propose three simple-yet-effective approaches to improve training. First, use two-stage optimization strategy progressively find good minima. Specifically, first optimize net quantized then...
Classifying a visual concept merely from its associated online textual source, such as Wikipedia article, is an attractive research topic in zero-shot learning because it alleviates the burden of manually collecting semantic attributes. Recent work has pursued this approach by exploring various ways connecting and text domains. In paper, we revisit idea going further to consider one important factor: representation usually too noisy for application. This observation motivates us design...
In this paper, we propose to train convolutional neural networks (CNNs) with both binarized weights and activations, leading quantized models specifically for mobile devices limited power capacity computation resources. By assuming the same architecture full-precision networks, previous works on quantizing CNNs seek preserve floating-point information using a set of discrete values, which call value approximation. However, take novel ``structure approximation'' view quantization--- it is...
Recent studies in deepfake detection have yielded promising results when the training and testing face forgeries are from same dataset. However, problem remains challenging one tries to generalize detector created by unseen methods This work addresses generalizable a simple principle: representation should be sensitive diverse types of forgeries. Following this principle, we propose enrich "diversity" synthesizing augmented with pool forgery configurations strengthen "sensitivity" enforcing...
Recently, CLIP has been applied to pixel-level zero-shot learning tasks via a two-stage scheme. The general idea is first generate class-agnostic region proposals and then feed the cropped proposal regions utilize its image-level classification capability. While effective, such scheme requires two image encoders, one for generation CLIP, leading complicated pipeline high computational cost. In this work, we pursue simpler-and-efficient one-stage solution that directly extends CLIP's...
In clinical scenarios, multi-specialist consultation could significantly benefit the diagnosis, especially for intricate cases. This inspires us to explore a "multi-expert joint diagnosis" mechanism upgrade existing "single expert" framework commonly seen in current literature. To this end, we propose METransformer, method realize idea with transformer-based backbone. The key design of our is introduction multiple learnable "expert" tokens into both transformer encoder and decoder. encoder,...
A number of recent studies have shown that a Deep Convolutional Neural Network (DCNN) pretrained on large dataset can be adopted as universal image descriptor, and doing so leads to impressive performance at range classification tasks. Most these studies, if not all, adopt activations the fully-connected layer DCNN or region representation it is believed convolutional are less discriminative. This paper, however, advocates used appropriately, constitute powerful representation. achieved by...
Recognizing how objects interact with each other is a crucial task in visual recognition. If we define the context of interaction to be involved, then most current methods can categorized as either: (i) training single classifier on combination and its context; or (ii) aiming recognize independently explicit context. Both suffer limitations: former scales poorly number combinations fails generalize unseen combinations, while latter often leads poor recognition performance due difficulty...
Humans are capable of learning a new fine-grained concept with very little supervision, \emph{e.g.}, few exemplary images for species bird, yet our best deep systems need hundreds or thousands labeled examples. In this paper, we try to reduce gap by studying the image recognition problem in challenging few-shot setting, termed (FSFG). The task FSFG requires build classifiers novel categories from examples (only one less than five). To solve problem, propose an end-to-end trainable network...
Encouraged by the success of convolutional neural networks (CNNs) in image classification, recently much effort is spent on applying CNNs to video-based action recognition problems. One challenge that a video contains varying number frames, which incompatible standard input format CNNs. Existing methods handle this issue either directly sampling fixed frames or bypassing introducing 3D layer, conducts convolution spatial-temporal domain. In paper, we propose novel network structure, allows...
Over recent years, emerging interest has occurred in integrating computer vision technology into the retail industry. Automatic checkout (ACO) is one of critical problems this area which aims to automatically generate shopping list from images products purchase. The main challenge problem comes large scale and fine-grained nature product categories as well difficulty for collecting training that reflect realistic scenarios due continuous update products. Despite its significant practical...
Abstract The progress of antitumor immunotherapy is usually limited by tumor‐associated macrophages (TAMs) that account for the highest proportion immunosuppressive cells in tumor microenvironment, and TAMs can also be reversed modulating M2‐like phenotype. Herein, a biomimetic polymer magnetic nanocarrier developed with selectively targeting polarizing potentiating breast cancer. This PLGA‐ION‐R837 @ M (PIR M) achieved, first, fabrication nanoparticles (NPs) encapsulating Fe 3 O 4 NPs...
Clinical chemotherapy confronts a challenge resulting from cancer-related multidrug resistance (MDR), which can directly lead to treatment failure. To address it, an innovative approach is proposed construct light-activated reactive oxygen species (ROS)-responsive nanoplatform based on protoporphyrin (PpIX)-conjugated and dual chemotherapeutics-loaded polymer micelle. This system combines photodynamic therapy (PDT) defeat the MDR of tumors. Such intelligent nanocarrier prolong circulation...
Identifying regions of interest in an image has long been great importance a wide range tasks, including place recognition. In this letter, we propose novel attention mechanism with flexible context, which can be incorporated into existing feedforward network architecture to learn representations for long-term particular, order focus on that contribute positively recognition, introduce multiscale context-flexible estimate the each spatial region feature map. Our model is trained end-to-end...
Weakly supervised anomaly detection aims at learning an detector from a limited amount of labeled data and abundant unlabeled data. Recent works build deep neural networks for by discriminatively mapping the normal samples abnormal to different regions in feature space or fitting distributions. However, due number annotated samples, directly training with discriminative loss may not be sufficient. To overcome this issue, article proposes novel strategy transform input into more meaningful...