Luhong Liang

ORCID: 0009-0001-4190-006X
Publications
Citations
Views
---
Saved
---
About
Contact & Profiles
Research Areas
  • Advanced Neural Network Applications
  • Video Surveillance and Tracking Methods
  • Advanced Image Processing Techniques
  • Video Coding and Compression Technologies
  • Advanced Vision and Imaging
  • Advanced Image and Video Retrieval Techniques
  • Advanced Data Compression Techniques
  • Speech and Audio Processing
  • Image and Video Quality Assessment
  • Music and Audio Processing
  • Advanced Memory and Neural Computing
  • Face and Expression Recognition
  • Neural Networks and Applications
  • Image Enhancement Techniques
  • CCD and CMOS Imaging Sensors
  • Advanced Image Fusion Techniques
  • Image Retrieval and Classification Techniques
  • Image and Signal Denoising Methods
  • Image Processing Techniques and Applications
  • Embedded Systems Design Techniques
  • Blind Source Separation Techniques
  • Face recognition and analysis
  • Multimodal Machine Learning Applications
  • Digital Media Forensic Detection
  • Ferroelectric and Negative Capacitance Devices

Hong Kong University of Science and Technology
2023-2025

University of Hong Kong
2023-2024

Hong Kong Science and Technology Parks Corporation
2022-2023

Applied Science and Technology Research Institute
2013-2020

Beijing Institute of Technology
2012

Institute of Computing Technology
2009-2012

Chinese Academy of Sciences
2008-2012

Tsinghua University
2002-2007

Intel (United States)
2002-2003

Microcom (United States)
2002

The use of visual features in audio-visual speech recognition (AVSR) is justified by both the generation mechanism, which essentially bimodal audio and representation, need for that are invariant to acoustic noise perturbation. As a result, current AVSR systems demonstrate significant accuracy improvements environments affected noise. In this paper, we describe two statistical models integration, coupled HMM (CHMM) factorial (FHMM), compare performance these with existing used speaker...

10.1155/s1110865702206083 article EN cc-by EURASIP Journal on Advances in Signal Processing 2002-11-28

In recent years several speech recognition systems that use visual together with audio information showed significant increase in performance over the standard systems. The of features is justified by both bimodality generation and need are invariant to acoustic noise perturbation. audio-visual system presented this paper introduces a novel fusion technique uses coupled hidden Markov model (HMM). statistical properties coupled-HMM allow us state asynchrony observations sequences while still...

10.1109/icassp.2002.5745027 article EN IEEE International Conference on Acoustics Speech and Signal Processing 2002-05-01

This paper presents a novel tree classifier for complex object detection tasks together with general framework real-time tracking in videos using the classifier. A boosted training algorithm clustering-and-splitting step is employed to construct branches nodes recursively, if and only it improves discriminative power compared single monolithic node has lower computational complexity. mouth system that integrates under proposed built tested on XM2FDB database. Experimental results show...

10.1109/icme.2003.1221607 article EN 2003-01-01

This paper proposes an ultra-low power, mixed-bit-width sparse convolutional neural network (CNN) accelerator to accelerate ventricular arrhythmia (VA) detection. The chip achieves 50% sparsity in a quantized 1D CNN using processing element (SPE) architecture. Measurement on the prototype TSMC 40nm CMOS low-power (LP) process for VA classification task demonstrates that it consumes 10.60 $\mu$W of power while achieving performance 150 GOPS and diagnostic accuracy 99.95%. computation density...

10.1145/3658617.3698479 article EN Proceedings of the 28th Asia and South Pacific Design Automation Conference 2025-01-20

This paper presents a fast method for detecting multi-view cars in real-world scenes. Cars are artificial objects with various appearance changes, but they have relatively consistent characteristics structure that consist of some basic local elements. Inspired by this, we propose novel set image strip features to describe the appearances those The new represent types lines and arcs edge-like ridge-like patterns, which significantly enrich simple such as haar-like edgelet features. They can...

10.1109/cvpr.2009.5206642 article EN 2009 IEEE Conference on Computer Vision and Pattern Recognition 2009-06-01

Multi-modal computing (M2C) has recently exhibited impressive accuracy improvements in numerous autonomous artificial intelligence of things (AIoT) systems. However, this gain is often tethered to an incredible increase energy consumption. Particularly, various highly-developed modality sensors devour most the budget, which would make deployment M2C for real-world AIoT applications a difficult challenge.

10.1145/3579371.3589066 article EN 2023-06-16

In recent years several speech recognition systems that use visual together with audio information showed significant increase in performance over the standard systems.The of features is justified by both bimodality generation and need are invariant to acoustic noise perturbation.The audio-visual system presented this paper introduces a novel fusion technique uses coupled hidden Markov model (HMM).The statistical properties coupled-HMM allow us state asynchrony observations sequences while...

10.1109/icassp.2002.1006167 article EN IEEE International Conference on Acoustics Speech and Signal Processing 2002-01-01

In this paper, a new scheme is presented to improve the coding efficiency of sequences captured by stationary cameras (or namely, static cameras) for video surveillance applications. We introduce two novel kinds frames (namely <i>background frame and difference frame</i>) input represent foreground/background without object detection, tracking or segmentation. The background built using modeling procedure periodically updated while encoding. calculated frame. A sequence structure proposed...

10.1117/12.863522 article EN Proceedings of SPIE, the International Society for Optical Engineering/Proceedings of SPIE 2010-05-27

The increase in the number of multimedia applications that require robust speech recognition systems determined a large interest study audio-visual (AVSR) systems. use visual features AVSR is justified by both audio and modality generation need for are invariant to acoustic noise perturbation. speaker independent continuous system presented relies on set obtained from accurate detection tracking mouth region. Further, observation sequences integrated using coupled hidden Markov (CHMM) model....

10.1109/icme.2002.1035365 article EN 2003-06-25

In this paper, we present a pose based approach for locating and recognizing human actions in videos. our method, poses are detected represented on deformable part model. To knowledge, is the first work exploring effectiveness of models combining detection estimation into action recognition. Comparing with previous methods, ours have three main advantages. First, method does not rely any assumption video preprocessing quality, such as satisfactory foreground segmentation or reliable...

10.1109/cvpr.2011.5995648 article EN 2011-06-01

With the increase in computational complexity of recent computers, audio-visual speech recognition (AVSR) became an attractive research topic that can lead to a robust solution for noisy environments. In audio visual continuous system presented this paper, and observation sequences are integrated using coupled hidden Markov model (CHMM). The statistical properties CHMM describe asyncrony features while preserving their natural correlation over time. experimental results show current tested...

10.21437/icslp.2002-123 article EN 4th International Conference on Spoken Language Processing (ICSLP 1996) 2002-09-16

No-reference measurement of blurring artifacts in images is a challenging problem image quality assessment field. One the difficulties that inherently blurry regions some natural may disturb evaluation artifacts. In this paper, we study gradients along local structures and propose new perceptual blur metric to deal with above problem. The gradient profile sharpness edge efficiently calculated horizontal or vertical direction. Then distribution histogram rectified by just noticeable...

10.1109/icip.2009.5413545 article EN 2009-11-01

A face detection algorithm integrating template matching and support vector machines (SVM) is presented. Two types of templates: eyes-in-whole itself, are used for coarse filtering, the SVM classifier classification. bootstrap method to collect non-face samples training under a constrained subspace, which greatly reduces complexity SVM. Comparative experimental results demonstrate its effectiveness.

10.1109/icip.2001.959218 article EN 2002-11-13

Co-occurrence histograms of oriented gradients (CoHOG) are powerful descriptors in object detection. In this paper, we propose to utilize a very large pool CoHOG features with variable-location and variable-size blocks capture salient characteristics the structure. We consider feature as block special pattern described by offset. A boosting algorithm is further introduced select appropriate locations offsets construct an efficient accurate cascade classifier. Experimental results on public...

10.1109/icip.2010.5651963 article EN 2010-09-01

Convolution neural networks (CNNs) have been implemented with custom hardware on edge devices since its algorithms were successful in many artificial intelligence applications. Although lots of unstructured pruning and mix-bit quantization proposed to successfully compress CNNs, there are few accelerators which can support both sparse CNNs. Besides, matrix computation consumes resources such as registers or BRAM fetch the needed input activations into processing element (PE). This brief...

10.1109/tcsii.2023.3257298 article EN IEEE Transactions on Circuits & Systems II Express Briefs 2023-03-15

Utilizing the special properties to improve surveillance video coding efficiency still has much room, although there have been three typical paradigms of methods: object-oriented, background-prediction-based and background-difference-based methods. However, due inaccurate foreground segmentation, low-quality or unclear background frame, potential "foreground pollution" phenomenon, is room for improvement. To address this problem, paper proposes a macro-block-level selective difference method...

10.1109/icme.2012.136 article EN 2012-07-01

Low-complexity and high-performance surveillance video Transcoding methods play an important role for a wide range of transmission storage applications. Towards this end, the special characteristics should be utilized Transcoding. In paper, we propose fast performance-maintained method. This method firstly divides macro blocks (MBs) into foreground MBs, border MBs background MBs. Statistics show that three categories have different distributions prediction modes, motion vectors reference...

10.1109/icme.2012.65 article EN 2012-07-01

Real-world video surveillance applications require storing videos without neglecting any part of scenarios for weeks or months. To reduce the storage cost, high bit-rate from cameras should be transcoded into a more efficient compressed format with as little quality loss possible. In this paper, we propose background model based method to improve transcoding efficiency captured by stationary cameras, and objectively measure it. The is trained pre-decoded I frames, then used transcode source...

10.1109/pcs.2010.5702583 article EN 2010-12-01

This paper presents a set of effective and efficient features, namely strip for detecting objects in real-scene images. Although shapes specific class usually have large intraclass variance, some basic local shape elements are relatively stable. Based on this observation, we propose features to describe the appearances those elements. Strip capture object with edgelike ridgelike patterns, which significantly enrich such as Haar-like edgelet features. The proposed can be efficiently...

10.1109/tsmcb.2012.2235066 article EN IEEE Transactions on Cybernetics 2013-01-18

As deep learning neural networks (DNNs) advance and increase in computational complexity, particularly terms of memory cost, it becomes difficult to implement DNNs fixed-point memory-sparse environments (e.g. integrated circuits consumer electronics). Thus, the training must be reformulated balance hardware costs needed represent often millions parameter weights such machine models. This paper proposes a novel optimization approach that simultaneously minimizes complexity (total memory)...

10.1109/iscas.2016.7538957 article EN 2022 IEEE International Symposium on Circuits and Systems (ISCAS) 2016-05-01
Coming Soon ...