NFDI4DS | UHH-SEMS - Publication Details

Steven C. H. Hoi

ORCID: 0000-0002-4584-3453

Publications

Citations

Views

---

Saved

---

About

Contact & Profiles

A5074834854

Research Areas

Advanced Image and Video Retrieval Techniques
Domain Adaptation and Few-Shot Learning
Multimodal Machine Learning Applications
Image Retrieval and Classification Techniques
Topic Modeling
Face and Expression Recognition
Data Stream Mining Techniques
Advanced Bandit Algorithms Research
Machine Learning and Algorithms
Machine Learning and Data Classification
Advanced Neural Network Applications
Natural Language Processing Techniques
Machine Learning and ELM
Speech and dialogue systems
Text and Document Classification Technologies
Video Surveillance and Tracking Methods
Human Pose and Action Recognition
Face recognition and analysis
Video Analysis and Summarization
Spam and Phishing Detection
Anomaly Detection Techniques and Applications
Sparse and Compressive Sensing Techniques
Software Engineering Research
Recommender Systems and Techniques
Complex Network Analysis Techniques

Singapore Management University
2015-2024

Salesforce (United States)
2019-2023

A*STAR Graduate Academy
2023

Institute for Infocomm Research
2023

Nanyang Technological University
2009-2022

Singapore University of Technology and Design
2022

National University of Singapore
2011-2020

Agency for Science, Technology and Research
2020

Hong Kong University of Science and Technology
2020

University of Hong Kong
2020

Deep Learning for Image Super-Resolution: A Survey

OPENALEX - Publications

Zhihao Wang Jian Chen Steven C. H. Hoi

Image Super-Resolution (SR) is an important class of image processing techniqueso enhance the resolution images and videos in computer vision. Recent years have witnessed remarkable progress super-resolution using deep learning techniques. This article aims to provide a comprehensive survey on recent advances approaches. In general, we can roughly group existing studies SR techniques into three major categories: supervised SR, unsupervised domain-specific SR. addition, also cover some other...

10.1109/tpami.2020.2982166 article EN IEEE Transactions on Pattern Analysis and Machine Intelligence 2020-03-23

Recent advances in deep learning for object detection

OPENALEX - Publications

Xiongwei Wu Doyen Sahoo Steven C. H. Hoi

10.1016/j.neucom.2020.01.085 article EN Neurocomputing 2020-01-25

Deep Learning for Content-Based Image Retrieval

OPENALEX - Publications

Ji Wan D. Wang Steven C. H. Hoi Pengcheng Wu Jianke Zhu and 2 more

Learning effective feature representations and similarity measures are crucial to the retrieval performance of a content-based image (CBIR) system. Despite extensive research efforts for decades, it remains one most challenging open problems that considerably hinders successes real-world CBIR systems. The key challenge has been attributed well-known ``semantic gap'' issue exists between low-level pixels captured by machines high-level semantic concepts perceived human. Among various...

10.1145/2647868.2654948 article EN 2014-10-31

CodeT5: Identifier-aware Unified Pre-trained Encoder-Decoder Models for Code Understanding and Generation

OPENALEX - Publications

Yue Wang Weishi Wang Shafiq Joty Steven C. H. Hoi

Pre-trained models for Natural Languages (NL) like BERT and GPT have been recently shown to transfer well Programming (PL) largely benefit a broad set of code-related tasks. Despite their success, most current methods either rely on an encoder-only (or decoder-only) pre-training that is suboptimal generation (resp. understanding) tasks or process the code snippet in same way as NL, neglecting special characteristics PL such token types. We present CodeT5, unified pre-trained encoder-decoder...

10.18653/v1/2021.emnlp-main.685 article EN cc-by Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing 2021-01-01

Align before Fuse: Vision and Language Representation Learning with Momentum Distillation

OPENALEX - Publications

Junnan Li Ramprasaath R. Selvaraju Akhilesh Gotmare Shafiq Joty Caiming Xiong and 1 more

Large-scale vision and language representation learning has shown promising improvements on various vision-language tasks. Most existing methods employ a transformer-based multimodal encoder to jointly model visual tokens (region-based image features) word tokens. Because the are unaligned, it is challenging for learn image-text interactions. In this paper, we introduce contrastive loss ALign text representations BEfore Fusing (ALBEF) them through cross-modal attention, which enables more...

10.48550/arxiv.2107.07651 preprint EN cc-by arXiv (Cornell University) 2021-01-01

Face detection using deep learning: An improved faster RCNN approach

OPENALEX - Publications

Xudong Sun Pengcheng Wu Steven C. H. Hoi

10.1016/j.neucom.2018.03.030 article EN Neurocomputing 2018-03-21

BLIP-2: Bootstrapping Language-Image Pre-training with Frozen Image Encoders and Large Language Models

OPENALEX - Publications

Junnan Li Dongxu Li Silvio Savarese Steven C. H. Hoi

The cost of vision-and-language pre-training has become increasingly prohibitive due to end-to-end training large-scale models. This paper proposes BLIP-2, a generic and efficient strategy that bootstraps vision-language from off-the-shelf frozen pre-trained image encoders large language BLIP-2 bridges the modality gap with lightweight Querying Transformer, which is in two stages. first stage representation learning encoder. second vision-to-language generative model. achieves...

10.48550/arxiv.2301.12597 preprint EN cc-by arXiv (Cornell University) 2023-01-01

BLIP: Bootstrapping Language-Image Pre-training for Unified Vision-Language Understanding and Generation

OPENALEX - Publications

Junnan Li Dongxu Li Caiming Xiong Steven C. H. Hoi

Vision-Language Pre-training (VLP) has advanced the performance for many vision-language tasks. However, most existing pre-trained models only excel in either understanding-based tasks or generation-based Furthermore, improvement been largely achieved by scaling up dataset with noisy image-text pairs collected from web, which is a suboptimal source of supervision. In this paper, we propose BLIP, new VLP framework transfers flexibly to both understanding and generation BLIP effectively...

10.48550/arxiv.2201.12086 preprint EN cc-by arXiv (Cornell University) 2022-01-01

Salient Object Detection With Pyramid Attention and Salient Edges

OPENALEX - Publications

Wenguan Wang Zhao Shuyang Jianbing Shen Steven C. H. Hoi Ali Borji

This paper presents a new method for detecting salient objects in images using convolutional neural networks (CNNs). The proposed network, named PAGE-Net, offers two key contributions. first is the exploitation of an essential pyramid attention structure object detection. enables network to concentrate more on regions while considering multi-scale saliency information. Such stacked design provides powerful tool efficiently improve representation ability corresponding layer with enlarged...

10.1109/cvpr.2019.00154 article EN 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 2019-06-01

AR-miner: mining informative reviews for developers from mobile app marketplace

OPENALEX - Publications

Ning Chen Jialiu Lin Steven C. H. Hoi Xiaokui Xiao Boshen Zhang

With the popularity of smartphones and mobile devices, application (a.k.a. "app") markets have been growing exponentially in terms number users downloads. App developers spend considerable effort on collecting exploiting user feedback to improve satisfaction, but suffer from absence effective review analytics tools. To facilitate app discover most "informative" reviews a large rapidly increasing pool reviews, we present "AR-Miner" — novel computational framework for Review Mining, which...

10.1145/2568225.2568263 article EN Proceedings of the 44th International Conference on Software Engineering 2014-05-20

Prototypical Contrastive Learning of Unsupervised Representations

OPENALEX - Publications

Junnan Li Pan Zhou Caiming Xiong Richard Socher Steven C. H. Hoi

This paper presents Prototypical Contrastive Learning (PCL), an unsupervised representation learning method that addresses the fundamental limitations of instance-wise contrastive learning. PCL not only learns low-level features for task instance discrimination, but more importantly, it implicitly encodes semantic structures data into learned embedding space. Specifically, we introduce prototypes as latent variables to help find maximum-likelihood estimation network parameters in...

10.48550/arxiv.2005.04966 preprint EN other-oa arXiv (Cornell University) 2020-01-01

DivideMix: Learning with Noisy Labels as Semi-supervised Learning

OPENALEX - Publications

Junnan Li Steven C. H. Hoi Richard Socher

Deep neural networks are known to be annotation-hungry. Numerous efforts have been devoted reducing the annotation cost when learning with deep networks. Two prominent directions include noisy labels and semi-supervised by exploiting unlabeled data. In this work, we propose DivideMix, a novel framework for leveraging techniques. particular, DivideMix models per-sample loss distribution mixture model dynamically divide training data into labeled set clean samples an samples, trains on both in...

10.48550/arxiv.2002.07394 preprint EN other-oa arXiv (Cornell University) 2020-01-01

Batch mode active learning and its application to medical image classification

OPENALEX - Publications

Steven C. H. Hoi Rong Jin Jianke Zhu Michael R. Lyu

The goal of active learning is to select the most informative examples for manual labeling. Most previous studies in have focused on selecting a single unlabeled example each iteration. This could be inefficient since classification model has retrained every labeled example. In this paper, we present framework "batch mode learning" that applies Fisher information matrix number simultaneously. key computational challenge how efficiently identify subset can result largest reduction...

10.1145/1143844.1143897 article EN 2006-01-01

Online learning: A comprehensive survey

OPENALEX - Publications

Steven C. H. Hoi Doyen Sahoo Jing Lu Peilin Zhao

10.1016/j.neucom.2021.04.112 article EN Neurocomputing 2021-04-30

Reliable Patch Trackers: Robust visual tracking by exploiting reliable patches

OPENALEX - Publications

Yang Li Jianke Zhu Steven C. H. Hoi

Most modern trackers typically employ a bounding box given in the first frame to track visual objects, where their tracking results are often sensitive initialization. In this paper, we propose new method, Reliable Patch Trackers (RPT), which attempts identify and exploit reliable patches that can be tracked effectively through whole process. Specifically, present reliability metric measure how reliably patch tracked, probability model is proposed estimate distribution of under sequential...

10.1109/cvpr.2015.7298632 article EN 2015-06-01

Dynamic Fusion With Intra- and Inter-Modality Attention Flow for Visual Question Answering

OPENALEX - Publications

Peng Gao Zhengkai Jiang Haoxuan You Pan Lu Steven C. H. Hoi and 2 more

Learning effective fusion of multi-modality features is at the heart visual question answering. We propose a novel method dynamically fuse multi-modal with intra- and inter-modality information flow, which alternatively pass dynamic between across language modalities. It can robustly capture high-level interactions vision domains, thus significantly improves performance also show that, proposed intra modality attention flow conditioned on other modulate intra-modality current modality, vital...

10.1109/cvpr.2019.00680 article EN 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 2019-06-01

Learning Distance Metrics with Contextual Constraints for Image Retrieval

OPENALEX - Publications

Steven C. H. Hoi Wei Liu Michael R. Lyu Wei‐Ying Ma

Relevant Component Analysis (RCA) has been proposed for learning distance metrics with contextual constraints image retrieval. However, RCA two important disadvantages. One is the lack of exploiting negative which can also be informative, and other its incapability capturing complex nonlinear relationships between data instances information. In this paper, we propose algorithms to overcome these disadvantages, i.e., Discriminative (DCA) Kernel DCA. Compared complicated methods metric...

10.1109/cvpr.2006.167 article EN 2006-07-10

Online Feature Selection and Its Applications

OPENALEX - Publications

Jialei Wang Peilin Zhao Steven C. H. Hoi Rong Jin

Feature selection is an important technique for data mining. Despite its importance, most studies of feature are restricted to batch learning. Unlike traditional learning methods, online represents a promising family efficient and scalable machine algorithms large-scale applications. Most existing require accessing all the attributes/features training instances. Such classical setting not always appropriate real-world applications when instances high dimensionality or it expensive acquire...

10.1109/tkde.2013.32 article EN IEEE Transactions on Knowledge and Data Engineering 2013-05-31

Online Deep Learning: Learning Deep Neural Networks on the Fly

OPENALEX - Publications

Doyen Sahoo Quang Pham Jing Lü Steven C. H. Hoi

Deep Neural Networks (DNNs) are typically trained by backpropagation in a batch setting, requiring the entire training data to be made available prior learning task. This is not scalable for many real-world scenarios where new arrives sequentially stream. We aim address an open challenge of ``Online Learning" (ODL) DNNs on fly online setting. Unlike traditional that often optimizes some convex objective function with respect shallow model (e.g., linear/kernel-based hypothesis), ODL more...

10.24963/ijcai.2018/369 article EN 2018-07-01

Robust Graph Learning From Noisy Data

OPENALEX - Publications

Zhao Kang Haiqi Pan Steven C. H. Hoi Zenglin Xu

Learning graphs from data automatically has shown encouraging performance on clustering and semisupervised learning tasks. However, real are often corrupted, which may cause the learned graph to be inexact or unreliable. In this paper, we propose a novel robust scheme learn reliable real-world noisy by adaptively removing noise errors in raw data. We show that our proposed model can also viewed as version of manifold regularized PCA, where quality plays critical role. The is able boost...

10.1109/tcyb.2018.2887094 article EN IEEE Transactions on Cybernetics 2019-01-09

Malicious URL Detection using Machine Learning: A Survey

OPENALEX - Publications

Doyen Sahoo Chenghao Liu Steven C. H. Hoi

Malicious URL, a.k.a. malicious website, is a common and serious threat to cybersecurity. URLs host unsolicited content (spam, phishing, drive-by exploits, etc.) lure unsuspecting users become victims of scams (monetary loss, theft private information, malware installation), cause losses billions dollars every year. It imperative detect act on such threats in timely manner. Traditionally, this detection done mostly through the usage blacklists. However, blacklists cannot be exhaustive, lack...

10.48550/arxiv.1701.07179 preprint EN other-oa arXiv (Cornell University) 2017-01-01

Learning Unsupervised Video Object Segmentation Through Visual Attention

OPENALEX - Publications

Wenguan Wang Hongmei Song Zhao Shuyang Jianbing Shen Sanyuan Zhao and 2 more

This paper conducts a systematic study on the role of visual attention in Unsupervised Video Object Segmentation (UVOS) tasks. By elaborately annotating three popular video segmentation datasets (DAVIS, Youtube-Objects and SegTrack V2) with dynamic eye-tracking data UVOS setting, for first time, we quantitatively verified high consistency behavior among human observers, found strong correlation between explicit primary object judgements during dynamic, task-driven viewing. Such novel...

10.1109/cvpr.2019.00318 article EN 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 2019-06-01

InstructBLIP: Towards General-purpose Vision-Language Models with Instruction Tuning

OPENALEX - Publications

Wenliang Dai Junnan Li Dongxu Li Anthony Meng Huat Tiong Junqi Zhao and 4 more

Large-scale pre-training and instruction tuning have been successful at creating general-purpose language models with broad competence. However, building vision-language is challenging due to the rich input distributions task diversity resulting from additional visual input. Although pretraining has widely studied, remains under-explored. In this paper, we conduct a systematic comprehensive study on based pretrained BLIP-2 models. We gather 26 publicly available datasets, covering wide...

10.48550/arxiv.2305.06500 preprint EN cc-by arXiv (Cornell University) 2023-01-01

Coming Soon ...