NFDI4DS | UHH-SEMS - Publication Details

Alexander G. Hauptmann

ORCID: 0000-0003-2123-0684

Publications

Citations

Views

---

Saved

---

About

Contact & Profiles

A5103099928

Research Areas

Human Pose and Action Recognition
Multimodal Machine Learning Applications
Video Surveillance and Tracking Methods
Anomaly Detection Techniques and Applications
Video Analysis and Summarization
Advanced Image and Video Retrieval Techniques
Image Retrieval and Classification Techniques
Gait Recognition and Analysis
Hand Gesture Recognition Systems
Domain Adaptation and Few-Shot Learning
Music and Audio Processing
Natural Language Processing Techniques
Advanced Neural Network Applications
Speech and dialogue systems
Advanced Vision and Imaging
Topic Modeling
Face recognition and analysis
Fire Detection and Safety Systems
Face and Expression Recognition
Text and Document Classification Technologies
Autonomous Vehicle Technology and Safety
Human Motion and Animation
Speech and Audio Processing
Machine Learning and Algorithms
Generative Adversarial Networks and Image Synthesis

Carnegie Mellon University
2016-2025

Meta (Israel)
2021

Google (United States)
2020

Association for Computing Machinery
2019

MSIGHT Technologies (China)
2017

Microsoft Research Asia (China)
2012

Laboratoire d'Informatique de Paris-Nord
1988

Infrared Patch-Image Model for Small Target Detection in a Single Image

OPENALEX - Publications

Chenqiang Gao Deyu Meng Yi Yang Yongtao Wang Xiaofang Zhou and 1 more

The robust detection of small targets is one the key techniques in infrared search and tracking applications. A novel target method a single image proposed this paper. Initially, traditional model generalized to new patch-image using local patch construction. Then, because non-local self-correlation property background image, based on formulated as an optimization problem recovering low-rank sparse matrices, which effectively solved stable principle component pursuit. Finally, simple...

10.1109/tip.2013.2281420 article EN IEEE Transactions on Image Processing 2013-09-11

Contrastive Adaptation Network for Unsupervised Domain Adaptation

OPENALEX - Publications

Guoliang Kang Lu Jiang Yi Yang Alexander G. Hauptmann

Unsupervised Domain Adaptation (UDA) makes predictions for the target domain data while manual annotations are only available in source domain. Previous methods minimize discrepancy neglecting class information, which may lead to misalignment and poor generalization performance. To address this issue, paper proposes Contrastive Network (CAN) optimizing a new metric explicitly models intra-class inter-class discrepancy. We design an alternating update strategy training CAN end-to-end manner....

10.1109/cvpr.2019.00503 article EN 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 2019-06-01

Evaluating bag-of-visual-words representations in scene classification

OPENALEX - Publications

Jun Yang Yu‐Gang Jiang Alexander G. Hauptmann Chong‐Wah Ngo

Based on keypoints extracted as salient image patches, an can be described a "bag of visual words" and this representation has been used in scene classification. The choice dimension, selection, weighting words is crucial to the classification performance but not thoroughly studied previous work. Given analogy between bag-of-words text documents, we apply techniques categorization, including term weighting, stop word removal, feature generate representations that differ words. impact these...

10.1145/1290082.1290111 article EN 2007-09-24

Cross-domain video concept detection using adaptive svms

OPENALEX - Publications

Jun Yang Rong Yan Alexander G. Hauptmann

Many multimedia applications can benefit from techniques for adapting existing classifiers to data with different distributions. One example is cross-domain video concept detection which aims adapt across various domains. In this paper, we explore two key problems classifier adaptation: (1) how transform classifier(s) into an effective a new dataset that only has limited number of labeled examples, and (2) select the best adaptation. For first problem, propose Adaptive Support Vector...

10.1145/1291233.1291276 article EN Proceedings of the 30th ACM International Conference on Multimedia 2007-09-29

Self-Paced Curriculum Learning

OPENALEX - Publications

Lu Jiang Deyu Meng Qian Zhao Shiguang Shan Alexander G. Hauptmann

Curriculum learning (CL) or self-paced (SPL) represents a recently proposed regime inspired by the process of humans and animals that gradually proceeds from easy to more complex samples in training. The two methods share similar conceptual paradigm, but differ specific schemes. In CL, curriculum is predetermined prior knowledge, remain fixed thereafter. Therefore, this type method heavily relies on quality knowledge while ignoring feedback about learner. SPL, dynamically determined adjust...

10.1609/aaai.v29i1.9608 article EN Proceedings of the AAAI Conference on Artificial Intelligence 2015-02-21

A discriminative CNN video representation for event detection

OPENALEX - Publications

Zhongwen Xu Yi Yang Alexander G. Hauptmann

In this paper, we propose a discriminative video representation for event detection over large scale dataset when only limited hardware resources are available. The focus of paper is to effectively leverage deep Convolutional Neural Networks (CNNs) advance detection, where frame level static descriptors can be extracted by the existing CNN toolkits. This makes two contributions inference representation. First, while average pooling and max have long been standard approaches aggregating...

10.1109/cvpr.2015.7298789 preprint EN 2015-06-01

DecideNet: Counting Varying Density Crowds Through Attention Guided Detection and Density Estimation

OPENALEX - Publications

Jiang Liu Chenqiang Gao Deyu Meng Alexander G. Hauptmann

In real-world crowd counting applications, the densities vary greatly in spatial and temporal domains. A detection based method will estimate crowds accurately low density scenes, while its reliability congested areas is downgraded. regression approach, on other hand, captures general information crowded regions. Without knowing location of each person, it tends to overestimate count areas. Thus, exclusively using either one them not sufficient handle all kinds scenes with varying densities....

10.1109/cvpr.2018.00545 preprint EN 2018-06-01

DevNet: A Deep Event Network for multimedia event detection and evidence recounting

OPENALEX - Publications

Chuang Gan Naiyan Wang Yi Yang Dit-Yan Yeung Alexander G. Hauptmann

In this paper, we focus on complex event detection in internet videos while also providing the key evidences of results. Convolutional Neural Networks (CNNs) have achieved promising performance image classification and action recognition tasks. However, it remains an open problem how to use CNNs for video recounting, mainly due complexity diversity events. work, propose a flexible deep CNN infrastructure, namely Deep Event Network (DevNet), that simultaneously detects pre-defined events...

10.1109/cvpr.2015.7298872 article EN 2015-06-01

An Adaptive Semisupervised Feature Analysis for Video Semantic Recognition

OPENALEX - Publications

Minnan Luo Xiaojun Chang Liqiang Nie Yi Yang Alexander G. Hauptmann and 1 more

Video semantic recognition usually suffers from the curse of dimensionality and absence enough high-quality labeled instances, thus semisupervised feature selection gains increasing attentions for its efficiency comprehensibility. Most previous methods assume that videos with close distance (neighbors) have similar labels characterize intrinsic local structure through a predetermined graph both unlabeled data. However, besides parameter tuning problem underlying construction graph, affinity...

10.1109/tcyb.2017.2647904 article EN IEEE Transactions on Cybernetics 2017-02-20

MoSIFT: Recognizing Human Actions in Surveillance Videos

OPENALEX - Publications

Mingyu Chen Alexander G. Hauptmann

The goal of this paper is to build robust human action recognition for real world surveillance videos. Local spatio-temporal features around interest points provide compact but descriptive representations video analysis and motion recognition. Current approaches tend extend spatial descriptions by adding a temporal component the appearance descriptor, which only implicitly captures information. We propose an algorithm called MoSIFT, detects encodes not their local also explicitly models...

10.1184/r1/6607523.v1 article EN 2009-01-01

Beyond Gaussian Pyramid: Multi-skip Feature Stacking for action recognition

OPENALEX - Publications

Zhenzhong Lan Ming Lin Xuanchong Li Alexander G. Hauptmann Bhiksha Raj

Most state-of-the-art action feature extractors involve differential operators, which act as highpass filters and tend to attenuate low frequency information. This attenuation introduces bias the resulting features generates ill-conditioned matrices. The Gaussian Pyramid has been used a enhancing technique that encodes scale-invariant characteristics into space in an attempt deal with this attenuation. However, at core of is convolutional smoothing operation, makes it incapable generating...

10.1109/cvpr.2015.7298616 article EN 2015-06-01

Bi-Level Semantic Representation Analysis for Multimedia Event Detection

OPENALEX - Publications

Xiaojun Chang Zhigang Ma Yi Yang Zhiqiang Zeng Alexander G. Hauptmann

Multimedia event detection has been one of the major endeavors in video analysis. A variety approaches have proposed recently to tackle this problem. Among others, using semantic representation accredited for its promising performance and desirable ability human-understandable reasoning. To generate representation, we usually utilize several external image/video archives apply concept detectors trained on them videos. Due intrinsic difference these archives, resulted is presumable different...

10.1109/tcyb.2016.2539546 article EN publisher-specific-oa IEEE Transactions on Cybernetics 2016-03-28

Adaptive Unsupervised Feature Selection With Structure Regularization

OPENALEX - Publications

Minnan Luo Feiping Nie Xiaojun Chang Yi Yang Alexander G. Hauptmann and 1 more

Feature selection is one of the most important dimension reduction techniques for its efficiency and interpretation. Since practical data in large scale are usually collected without labels, labeling these dramatically expensive time-consuming, unsupervised feature has become a ubiquitous challenging problem. Without label information, fundamental problem lies how to characterize geometry structure original space produce faithful subset, which preserves intrinsic accurately. In this paper,...

10.1109/tnnls.2017.2650978 article EN IEEE Transactions on Neural Networks and Learning Systems 2017-01-27

Uncovering the Temporal Context for Video Question Answering

OPENALEX - Publications

Linchao Zhu Zhongwen Xu Yi Yang Alexander G. Hauptmann

10.1007/s11263-017-1033-7 article EN International Journal of Computer Vision 2017-07-13

Simultaneous Bearing Fault Recognition and Remaining Useful Life Prediction Using Joint-Loss Convolutional Neural Network

OPENALEX - Publications

Ruonan Liu Boyuan Yang Alexander G. Hauptmann

Fault diagnosis and remaining useful life (RUL) prediction are always two major issues in modern industrial systems, which usually regarded as separated tasks to make the problem easier but ignore fact that there certain information of these can be shared improve performance. Therefore, capture common features between different relative problems, a joint-loss convolutional neural network (JL-CNN) architecture is proposed this paper, implement bearing fault recognition RUL parallel by sharing...

10.1109/tii.2019.2915536 article EN IEEE Transactions on Industrial Informatics 2019-05-08

Learning Spatial Awareness to Improve Crowd Counting

OPENALEX - Publications

Zhi-Qi Cheng Jun-Xiu Li Qi Dai Xiao Wu Alexander G. Hauptmann

The aim of crowd counting is to estimate the number people in images by leveraging annotation center positions for pedestrians' heads. Promising progresses have been made with prevalence deep Convolutional Neural Networks. Existing methods widely employ Euclidean distance (i.e., L <sub xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">2</sub> loss) optimize model, which, however, has two main drawbacks: (1) loss difficulty learning spatial awareness...

10.1109/iccv.2019.00625 article EN 2021 IEEE/CVF International Conference on Computer Vision (ICCV) 2019-10-01

Rethinking Spatial Invariance of Convolutional Networks for Object Counting

OPENALEX - Publications

Zhi-Qi Cheng Qi Dai Hong Li Jingkuan Song Xiao Wu and 1 more

Previous work generally believes that improving the spatial invariance of convolutional networks is key to object counting. However, after verifying several mainstream counting networks, we surprisingly found too strict pixel-level would cause overfit noise in density map generation. In this paper, try use locally connected Gaussian kernels replace original convolution filter estimate position map. The purpose allow feature extraction process potentially stimulate generation overcome...

10.1109/cvpr52688.2022.01902 article EN 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 2022-06-01

KAT: A Knowledge Augmented Transformer for Vision-and-Language

OPENALEX - Publications

Liangke Gui Borui Wang Qiuyuan Huang Alexander G. Hauptmann Yonatan Bisk and 1 more

Liangke Gui, Borui Wang, Qiuyuan Huang, Alexander Hauptmann, Yonatan Bisk, Jianfeng Gao. Proceedings of the 2022 Conference North American Chapter Association for Computational Linguistics: Human Language Technologies. 2022.

10.18653/v1/2022.naacl-main.70 article EN cc-by Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies 2022-01-01

MAGVIT: Masked Generative Video Transformer

OPENALEX - Publications

Lijun Yu Yong Cheng Kihyuk Sohn José Lezama Han Zhang and 6 more

We introduce the MAsked Generative VIdeo Transformer, MAGVIT, to tackle various video synthesis tasks with a single model. 3D tokenizer quantize into spatial-temporal visual tokens and propose an embedding method for masked token modeling facilitate multi-task learning. conduct extensive experiments demonstrate quality, efficiency, flexibility of MAGVIT. Our show that (i) MAGVIT performs favorably against state-of-the-art approaches establishes best-published FVD on three generation...

10.1109/cvpr52729.2023.01008 article EN 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 2023-06-01

Multi-Feature Fusion via Hierarchical Regression for Multimedia Analysis

OPENALEX - Publications

Yi Yang Jingkuan Song Zi Huang Zhigang Ma Nicu Sebe and 1 more

Multimedia data are usually represented by multiple features. In this paper, we propose a new algorithm, namely Multi-feature Learning via Hierarchical Regression for multimedia semantics understanding, where two issues considered. First, labeling large amount of training is labor-intensive. It meaningful to effectively leverage unlabeled facilitate understanding. Second, given that can be features, it advantageous develop an algorithm which combines evidence obtained from different features...

10.1109/tmm.2012.2234731 article EN IEEE Transactions on Multimedia 2013-03-13

Action recognition via local descriptors and holistic features

OPENALEX - Publications

Xinghua Sun Mingyu Chen Alexander G. Hauptmann

In this paper we propose a unified action recognition framework fusing local descriptors and holistic features. The motivation is that the features emphasize different aspects of actions are suitable for types databases. proposed based on frame differencing, bag-of-words feature fusion. We extract two kinds descriptors, i.e. 2D 3D SIFT both interest points. apply Zernike moments to features, one single frames other motion energy image. perform experiments KTH Weizmann databases, using...

10.1109/cvprw.2009.5204255 article EN IEEE Computer Society Conference on Computer Vision and Pattern Recognition workshops 2009-06-01

Adaptive Semi-Supervised Feature Selection for Cross-Modal Retrieval

OPENALEX - Publications

En Yu Jiande Sun Jing Li Xiaojun Chang Xian‐Hua Han and 1 more

In order to exploit the abundant potential information of unlabeled data and contribute analyzing correlation among heterogeneous data, we propose semi-supervised model named adaptive feature selection for cross-modal retrieval. First, utilize semantic regression strengthen neighboring relationship between with same semantic. And can be optimized via keeping pairwise closeness when learning common latent space. Second, adopt graph-based constraint predict accurate labels it also keep...

10.1109/tmm.2018.2877127 article EN IEEE Transactions on Multimedia 2018-10-22

Coming Soon ...