Ching Chuen Jong

ORCID: 0000-0003-1178-9062
Publications
Citations
Views
---
Saved
---
About
Contact & Profiles
Research Areas
  • Embedded Systems Design Techniques
  • Digital Filter Design and Implementation
  • Low-power high-performance VLSI design
  • Numerical Methods and Algorithms
  • Interconnection Networks and Systems
  • VLSI and FPGA Design Techniques
  • Parallel Computing and Optimization Techniques
  • Analog and Mixed-Signal Circuit Design
  • VLSI and Analog Circuit Testing
  • 3D IC and TSV technologies
  • Advanced Data Compression Techniques
  • Electronic Packaging and Soldering Technologies
  • Nanofabrication and Lithography Techniques
  • Image and Signal Denoising Methods
  • Cryptography and Residue Arithmetic
  • Advanced Adaptive Filtering Techniques
  • Manufacturing Process and Optimization
  • Coding theory and cryptography
  • Model Reduction and Neural Networks
  • Advanced Image Fusion Techniques
  • Formal Methods in Verification
  • Image Enhancement Techniques
  • Advancements in Semiconductor Devices and Circuit Design
  • Advanced Vision and Imaging
  • Industrial Vision Systems and Defect Detection

Nanyang Technological University
2010-2022

Agency for Science, Technology and Research
2008-2021

Institute of Microelectronics
2006-2021

Singapore Science Park
2011

University of Southampton
1989

In this paper, a new efficient algorithm is proposed for the synthesis of low-complexity finite-impulse response (FIR) filters with resource sharing. The original problem statement based on minimization signed-power-of-two (SPT) terms has been reformulated to account sharable adders. common SPT (CSPT) that were considered in our addresses optimization reusability adders two major types subexpressions, together are needed spare terms. coefficient set synthesized stages. first stage, CSPT...

10.1109/tcad.2007.895615 article EN IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems 2007-09-24

In this paper, a new algorithm, called contention resolution algorithm for weight-two subexpressions (CRA-2), based on an ingenious graph synthesis approach has been developed the common subexpression elimination of multiplication block digital filter structures. CRA-2 provides leeway to break away from local minimum and flexibility varying optimization options through admissibility graph. It manages two-bit aims at achieving minimal logic depth as primary goal. The performances our proposed...

10.1109/tcsii.2005.851776 article EN IEEE Transactions on Circuits and Systems II Analog and Digital Signal Processing 2005-10-01

High density three dimensional (3D) interconnects formed by high aspect ratio through silicon vias (TSVs) and fine pitch solder microbumps are presented in this paper. The of the TSV is larger than 10 filled with Cu without voids; there electrical nickel immersion gold (ENIG) pads on top as under bump metallurgy (UBM) layer. On Si chip, Cu/Sn 16µm diameter 25µm fabricated. After singulating chip carrier, joined together interconnection between them micro bumps TSV.

10.1109/ectc.2009.5074039 article EN 2009-05-01

The tables-and-additions methods for accurate computation of elementary functions are fast in speed but require large memory. A memory-efficient method named as the integrated Add-Table Lookup-Add (iATA) is proposed this paper. In iATA, mathematical formulation computing derived without using central difference to save Three additional techniques, specifically carry select technique, symmetry property exploitation and unequal partitioning input with aid error analysis, iATA further reduce...

10.1109/tc.2012.43 article EN IEEE Transactions on Computers 2012-02-07

In this paper, we present a novel memory-efficient high-throughput scalable architecture for multi-level 2-D DWT. We studied the existing DWT architectures and observed that data scanning method has significant impact on memory efficiency of architecture. propose parallel stripe-based based analysis dependency graph lifting scheme. With new 2D DWT, high efficient pipelined is developed. The proposed requires no frame temporal size only 3 N +682 3-level decomposition with an image ×N pixels...

10.1109/tsp.2013.2274640 article EN IEEE Transactions on Signal Processing 2013-07-24

This paper presents a novel method named the Unified Mitchell-based Approximation (UMA) to obtain an optimized logarithmic conversion circuit for any desired accuracy up 14 bits. UMA is first that able when specific required. In this work, we studied and analyzed five design parameters their impact on hardware merits. We formulate model of error correction in performance evaluation. Given requirement, proposed explores space parameters. As theoretically huge, propose constraints range...

10.1109/tc.2014.2329683 article EN IEEE Transactions on Computers 2014-01-01

In this brief, we propose a new parallel lifting-based 2-D DWT architecture with high memory efficiency and short critical path. The is achieved novel scanning method that enables tradeoff of external bandwidth on-chip memory. Based on the data flow graph flipped lifting algorithm, processing units (PUs) are developed for maximally utilizing inherent parallelism. With S number PUs, throughput can be scaled while keeping latency constant. Compared best existing architecture, proposed requires...

10.1109/tcsii.2013.2268335 article EN IEEE Transactions on Circuits & Systems II Express Briefs 2013-07-04

Developments of ultra fine pitch and high density solder microbumps for advanced 3D stacking technologies are discussed in this paper. CuSn with 25 ¿m fabricated at wafer level by electroplating method the total thicknesses platted Cu Sn 10 ¿m. After plating, micro bumps on Si chip reflowed 265°C variation bump height measured within a die is less than 5%. The under metallurgy (UBM) layer carrier used electroless plated nickel immersion gold (ENIG) thickness 5 Assembly conducted FC150 flip...

10.1109/eptc.2008.4763465 article EN 2008-12-01

Multiple constant multiplications (MCM) have been a core operation in many digital signal processing applications. In this paper, an efficient generalized contention resolution algorithm (CRA) is proposed to eliminate three broad categories of reusable common subexpressions MCM. The idea revert precedential decision suboptimal by localized cost function evaluation when there conflict between two competitive subexpressions. derivatives the basic CRA are versatile that they capable satisfying...

10.1109/tcsi.2007.913707 article EN IEEE Transactions on Circuits and Systems I Regular Papers 2008-03-01

A novel approach of designing serial-serial hybrid multiplier is proposed for applications with high data sampling rate ( ≥4 GHz). The conventional way partial product formation revamped. Our technique effectively forms the entire matrix in just <i xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">n</i> cycles an × multiplication instead at least 2 multipliers. It achieves a bit by replacing full adders and 5:3 counters asynchronous 1's so that...

10.1109/tvlsi.2010.2060374 article EN IEEE Transactions on Very Large Scale Integration (VLSI) Systems 2010-09-07

Guided image filtering has been applied widely in recent years as a solution to the ever-increasing demand of high-performance filtering, especially for real-time image/video processing. The lately proposed gradient domain guided filter (GDGIF) is one typical works focusing on improving quality result original (GIF), dealing with halo-artifacts problem edge-preserving smoothing. However, due involvement global pixel values computation, high computation complexity, and additional complex...

10.1109/tcsvt.2018.2852336 article EN IEEE Transactions on Circuits and Systems for Video Technology 2018-07-02

A contention resolution algorithm (CRA) is proposed for the common subexpression elimination of multiplier block digital filter structure. CRA synthesizes subexpressions any Hamming weight to achieve an overall minimization with emphasis that every logic depth increment must be accompanied by a reduction in complexity. new data structure, called admissibility graph introduced represent succinctly set coefficients; admissible are progressively labeled on as either precedence or edges (or...

10.1109/icassp.2004.1327066 article EN IEEE International Conference on Acoustics Speech and Signal Processing 2004-09-28

This paper presents an approach based on the curve fitting method for design of non-iterative divider circuits with accuracy and area-delay product (ADP) trade-offs. The curved surfaces representing quotient are partitioned into several regions, each which is then approximated by a square/triangular plane. planes obtained using optimization. proposed architecture implementing contains only simple arithmetic operations look-up table. Several different accuracies ADPs obtained. achieved in...

10.1109/newcas.2015.7182097 article EN 2015-06-01

A novel non-iterative circuit for computing division based on logarithm is proposed in the paper. Mitchell-based methods are used logarithmic and antilogarithmic conversions. Merging conversion stages implementation not possible if existing algorithms used. Thus, critical path has at least two carry propagate adders (CPAs). This work introduces a new algorithm to merge into single one remove of CPAs. Compared best computation method 3-D graphic system, design achieves improvements by 45.4%...

10.1109/iscas.2013.6572317 article EN 2022 IEEE International Symposium on Circuits and Systems (ISCAS) 2013-05-01

In this paper, VLSI array architectures for matrix inversion are studied. A new binary-coded z-path (Bi-z) CORDIC is developed and implemented to compute the operations required in using Givens rotation (GR) based QR decomposition. The Bi-z allows both GR vectoring mode, as well division multiplication be executed a single unified processing element (PE). Hence, 2D (2 dimensional) consisting of PEs with different functionalities can folded into 1D reduce hardware complexity. also eliminates...

10.1016/j.mejo.2011.10.009 article EN Microelectronics Journal 2011-12-11

The development of ultrafine-pitch microbumps and the thermal compression bonding (TCB) process for advanced 3-D stacking technology are discussed in this paper. Microbumps, consisting Cu pillars thin Sn caps with a pitch 25 μm, fabricated on an Si chip by electroplating method. Total thickness pillar cap is 10 μm. Electroless nickel immersion gold pads total 4 μm carrier. TCB carrier conducted FC150 flip-chip bonder, good joining higher than 10-MPa die shear strength achieved. After...

10.1109/tcpmt.2012.2203130 article EN IEEE Transactions on Components Packaging and Manufacturing Technology 2012-09-18

A modified reduced adder graph (MRAG) algorithm and its hybrid version are proposed for efficient digital filter implementation. Several improvements made to exploit fully the optimal part of n-dimensional (RAG-n) algorithm. Simulation results demonstrate that MRAG is capable generating lower cost solutions.

10.1049/el:20057392 article EN Electronics Letters 2005-01-01

Research work done has shown that power consumption in digital integrated circuits can be effectively reduced by reducing the switching activity occurring on functional modules. High-level synthesis of for low often optimizes during two main processes, operation scheduling and module binding, which are usually performed one control step at a time separated stages. As processes strongly interdependent, separate optimization step-by-step manner frequently leads to sub-optimal solutions. In...

10.1016/j.mejo.2007.03.001 article EN Microelectronics Journal 2007-04-01

A new Hamming weight pyramid (HWP) that resembles the Pascal triangle is proposed to succinctly compress information about distribution of in canonical signed digit (CSD) represented numbers a visually appealing manner for analysis and synthesis. Many interesting properties are discovered this regularly structured HWP. These lead novel elegant way convert decimal their binary equivalence, which an ineluctable intermediate process conventional CSD conversion algorithms.

10.1109/iscas.2004.1329497 article EN 2004-11-30

This paper presents an architecture for matrix multiplication implemented on reconfigurable hardware with partially feature. The proposed design significantly reduces the size and achieves minimum computation cycles n /spl times/ multiplication. Compared linear array (Jang et al., 2002) area of our is reduced by 72%-81% while AT metrics (product latency) 40%-58% between 3 48 48. versatility demonstrated in different parameterisable instantiation to cater implementations various time...

10.1109/dsd.2004.72 article EN Digital Systems Design 2004-08-31

In this paper, a low-cost 256-point FFT processor design is presented for portable speech and audio applications. After an intensive review of existing architectures, single-butterfly architecture chosen to obtain low cost. architecture, two-multiplier three-adder pipelined butterfly unit proposed calculate the butterflies at different levels, recursively. Compared with other units, structure obtains best tradeoff between hardware cost processing throughput. The supply voltage scaling...

10.1109/isicir.2007.4441801 article EN 2007 International Symposium on Integrated Circuits 2007-09-01

This paper presents a new approach to serial/parallel multiplier design by using parallel 1's counters accumulate the binary partial product bits. The in each column of matrix due serially input operands are accumulated serial T-flip flop (TFF) counter. Consequently, height is reduced from N ⌊log <inf xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">2</inf> Nl+ 1⌋. logarithmic reduction results very small carry save adder (CSA) array or tree required...

10.1109/apccas.2008.4745989 article EN 2008-11-01

Modulo 2 <sup xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">n</sup> +1 squaring has been used in various applications like cryptography and Fermat number transform. Arithmetic modulo is also known to be the most time critical among three residue channels prevalent {2 -1, , +1} based system (RNS). In order speed up operation, diminished-1 representation widely employed. However use of results area overhead increased execution delay. this paper, we...

10.1109/apccas.2008.4746265 article EN 2008-11-01

Architectural synthesis for low power design is a complex optimization problem due to the interdependence of power, delay and area. In order obtain optimal architecture where both area are efficient, full space module selection must be explored. this paper we formulate as multi-objective propose branch bound approach explore large selection. Experiments show that can produce far better results than traditional architectural synthesizers all globally optimized simultaneously.

10.1109/iscas.1997.621420 article EN 2002-11-22
Coming Soon ...