Anton van den Hengel

ORCID: 0000-0003-3027-8364
Publications
Citations
Views
---
Saved
---
About
Contact & Profiles
Research Areas
  • Domain Adaptation and Few-Shot Learning
  • Advanced Image and Video Retrieval Techniques
  • Multimodal Machine Learning Applications
  • Video Surveillance and Tracking Methods
  • Advanced Vision and Imaging
  • Human Pose and Action Recognition
  • Anomaly Detection Techniques and Applications
  • Advanced Neural Network Applications
  • Sparse and Compressive Sensing Techniques
  • Robotics and Sensor-Based Localization
  • Image and Object Detection Techniques
  • Advanced Image Processing Techniques
  • Topic Modeling
  • Image Processing Techniques and Applications
  • Face and Expression Recognition
  • Image Retrieval and Classification Techniques
  • Machine Learning and Algorithms
  • Optical measurement and interference techniques
  • 3D Surveying and Cultural Heritage
  • Machine Learning and Data Classification
  • Generative Adversarial Networks and Image Synthesis
  • Image and Signal Denoising Methods
  • Computer Graphics and Visualization Techniques
  • Image Enhancement Techniques
  • Network Security and Intrusion Detection

Australian Centre for Robotic Vision
2016-2025

The University of Adelaide
2016-2025

Amazon (Germany)
2022-2024

Rochester Institute of Technology
2020

Vision Australia
2017

Australian Research Council
2015

Humans inevitably develop a sense of the relationships between objects, some which are based on their appearance. Some pairs objects might be seen as being alternatives to each other (such two jeans), while others may complementary pair jeans and matching shirt). This information guides many choices that people make, from buying clothes interactions with other. We seek here model this human Our approach is not fine-grained modeling user annotations but rather capturing largest dataset...

10.1145/2766462.2767755 article EN Proceedings of the 45th International ACM SIGIR Conference on Research and Development in Information Retrieval 2015-08-04

Deep autoencoder has been extensively used for anomaly detection. Training on the normal data, is expected to produce higher reconstruction error abnormal inputs than ones, which adopted as a criterion identifying anomalies. However, this assumption does not always hold in practice. It observed that sometimes "generalizes" so well it can also reconstruct anomalies well, leading miss detection of To mitigate drawback based detector, we propose augment with memory module and develop an...

10.1109/iccv.2019.00179 article EN 2021 IEEE/CVF International Conference on Computer Vision (ICCV) 2019-10-01

A robot that can carry out a natural-language instruction has been dream since before the Jetsons cartoon series imagined life of leisure mediated by fleet attentive helpers. It is remains stubbornly distant. However, recent advances in vision and language methods have made incredible progress closely related areas. This significant because interpreting navigation on basis what it sees carrying process similar to Visual Question Answering. Both tasks be interpreted as visually grounded...

10.1109/cvpr.2018.00387 article EN 2018-06-01

Recent advances in semantic image segmentation have mostly been achieved by training deep convolutional neural networks (CNNs). We show how to improve through the use of contextual information, specifically, we explore 'patch-patch' context between regions, and 'patch-background' context. For learning from patch-patch context, formulate Conditional Random Fields (CRFs) with CNN-based pairwise potential functions capture correlations neighboring patches. Efficient piecewise proposed...

10.1109/cvpr.2016.348 article EN 2016-06-01

Visual object tracking is a significant computer vision task which can be applied to many domains, such as visual surveillance, human interaction, and video compression. Despite extensive research on this topic, it still suffers from difficulties in handling complex appearance changes caused by factors illumination variation, partial occlusion, shape deformation, camera motion. Therefore, effective modeling of the 2D tracked objects key issue for success tracker. In literature, researchers...

10.1145/2508037.2508039 article EN ACM Transactions on Intelligent Systems and Technology 2013-09-01

Predicting the depth (or surface normal) of a scene from single monocular color images is challenging task. This paper tackles this and essentially underdetermined problem by regression on deep convolutional neural network (DCNN) features, combined with post-processing refining step using conditional random fields (CRF). Our framework works at two levels, super-pixel level pixel level. First, we design DCNN model to learn mapping multi-scale image patches or normal values Second, estimated...

10.1109/cvpr.2015.7298715 article EN 2015-06-01

Much recent progress in Vision-to-Language (V2L) problems has been achieved through a combination of Convolutional Neural Networks (CNNs) and Recurrent (RNNs). This approach does not explicitly represent high-level semantic concepts, but rather seeks to directly from image features text. In this paper we investigate whether direct succeeds due to, or despite, the fact that it avoids explicit representation information. We propose method incorporating concepts into successful CNN-RNN...

10.1109/cvpr.2016.29 article EN 2016-06-01

This paper proposes to improve visual question answering (VQA) with structured representations of both scene contents and questions. A key challenge in VQA is require joint reasoning over the text domains. The predominant CNN/LSTM-based approach limited by monolithic vector that largely ignore structure question. CNN feature vectors cannot effectively capture situations as simple multiple object instances, LSTMs process questions series words, which do not reflect true complexity language...

10.1109/cvpr.2017.344 article EN 2017-07-01

Supervised hashing aims to map the original features compact binary codes that are able preserve label based similarity in Hamming space. Non-linear hash functions have demonstrated advantage over linear ones due their powerful generalization capability. In literature, kernel typically used achieve non-linearity hashing, which encouraging retrieval performance at price of slow evaluation and training time. Here we propose use boosted decision trees for achieving fast train evaluate, hence...

10.1109/cvpr.2014.253 article EN 2009 IEEE Conference on Computer Vision and Pattern Recognition 2014-06-01

Visual Question Answering (VQA) has attracted much attention in both computer vision and natural language processing communities, not least because it offers insight into the relationships between two important sources of information. Current datasets, models built upon them, have focused on questions which are answerable by direct analysis question image alone. The set such that require no external information to answer is interesting, but very limited. It excludes common sense, or basic...

10.1109/tpami.2017.2754246 article EN IEEE Transactions on Pattern Analysis and Machine Intelligence 2017-09-19

Removing pixel-wise heterogeneous motion blur is challenging due to the ill-posed nature of problem. The predominant solution estimate kernel by adding a prior, but extensive literature on subject indicates difficulty in identifying prior which suitably informative, and general. Rather than imposing based theory, we propose instead learn one from data. Learning over latent image would require modeling all possible content. critical observation underpinning our approach, however, that...

10.1109/cvpr.2017.405 preprint EN 2017-07-01

Deep Learning has had a transformative impact on Computer Vision, but for all of the success there is also significant cost. This that models and procedures used are so complex intertwined it often impossible to distinguish individual design engineering choices each model embodies. ambiguity diverts progress in field, leads situation where developing state-of-the-art as much an art science. As step towards addressing this problem we present massive exploration effects myriad architectural...

10.1109/cvpr.2018.00444 article EN 2018-06-01

We propose an effective structured learning based approach to the problem of person re-identification which outperforms current state-of-the-art on most benchmark data sets evaluated. Our framework is built basis multiple low-level hand-crafted and high-level visual features. then formulate two optimization algorithms, directly optimize evaluation measures commonly used in re-identification, also known as Cumulative Matching Characteristic (CMC) curve. new practical many real-world...

10.1109/cvpr.2015.7298794 article EN 2015-06-01

Much of the recent progress in Vision-to-Language problems has been achieved through a combination Convolutional Neural Networks (CNNs) and Recurrent (RNNs). This approach does not explicitly represent high-level semantic concepts, but rather seeks to directly from image features text. In this paper we first propose method incorporating concepts into successful CNN-RNN approach, show that it achieves significant improvement on state-of-the-art both captioning visual question answering. We...

10.1109/tpami.2017.2708709 article EN IEEE Transactions on Pattern Analysis and Machine Intelligence 2017-05-26

We propose a method for visual question answering which combines an internal representation of the content image with information extracted from general knowledge base to answer broad range image-based questions. This allows more complex questions be answered using predominant neural network-based approach than has previously been possible. It particularly asked about contents image, even when itself does not contain whole answer. The constructs textual semantic and merges it sourced base,...

10.1109/cvpr.2016.500 preprint EN 2016-06-01

Although deep learning has been applied to successfully address many data mining problems, relatively limited work done on for anomaly detection. Existing detection methods, which focus new feature representations enable downstream perform indirect optimization of scores, leading data-inefficient and suboptimal scoring. Also, they are typically designed as unsupervised due the lack large-scale labeled data. As a result, difficult leverage prior knowledge (e.g., few anomalies) when such...

10.1145/3292500.3330871 article EN 2019-07-25

Compressive Sensing has become one of the standard methods face recognition within literature. We show, however, that sparsity assumption which underpins much this work is not supported by data. This lack in data means compressive sensing approach cannot be guaranteed to recover exact signal, and therefore sparse approximations may deliver robustness or performance desired. In vein we show a simple ℓ <sub xmlns:mml="http://www.w3.org/1998/Math/MathML"...

10.1109/cvpr.2011.5995556 article EN 2011-06-01

The task in referring expression comprehension is to localize the object instance an image described by a phrased natural language. As language-to-vision matching task, key this problem learn discriminative feature that can adapt used. To avoid ambiguity, normally tends describe not only properties of referent itself, but also its relationships neighbourhood. capture and exploit important information we propose graph-based, language-guided attention mechanism. Being composed node component...

10.1109/cvpr.2019.00206 article EN 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 2019-06-01

The trend towards increasingly deep neural networks has been driven by a general observation that increasing depth increases the performance of network. Recently, however, evidence amassing simply may not be best way to increase performance, particularly given other limitations. Investigations into residual have also suggested they in fact operating as single network, but rather an ensemble many relatively shallow networks. We examine these issues, and doing so arrive at new interpretation...

10.48550/arxiv.1611.10080 preprint EN other-oa arXiv (Cornell University) 2016-01-01

Ghosting artifacts caused by moving objects or misalignments is a key challenge in high dynamic range (HDR) imaging for scenes. Previous methods first register the input low (LDR) images using optical flow before merging them, which are error-prone and cause ghosts results. A very recent work tries to bypass flows via deep network with skip-connections, however, still suffers from ghosting severe movement. To avoid source, we propose novel attention-guided end-to-end neural (AHDRNet) produce...

10.1109/cvpr.2019.00185 article EN 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 2019-06-01

Large amounts of available training data and increasing computing power have led to the recent success deep convolutional neural networks (CNN) on a large number applications. In this paper, we propose an effective semantic pixel labelling using CNN features, hand-crafted features Conditional Random Fields (CRFs). Both are applied dense image patches produce per-pixel class probabilities. The CRF infers that smooths regions while respecting edges present in imagery. method is ISPRS 2D...

10.1109/cvprw.2015.7301381 article EN 2015-06-01
Coming Soon ...