P. Banerjee

ORCID: 0009-0005-0764-8146
Publications
Citations
Views
---
Saved
---
About
Contact & Profiles
Research Areas
  • Parallel Computing and Optimization Techniques
  • Interconnection Networks and Systems
  • Embedded Systems Design Techniques
  • Distributed systems and fault tolerance
  • Advanced Data Storage Technologies
  • Advanced Frequency and Time Standards
  • Distributed and Parallel Computing Systems
  • Network Time Synchronization Technologies
  • Radiation Effects in Electronics
  • VLSI and Analog Circuit Testing
  • VLSI and FPGA Design Techniques
  • Advancements in PLL and VCO Technologies
  • Optimization and Search Problems
  • Satellite Communication Systems
  • Formal Methods in Verification
  • Logic, programming, and type systems
  • GNSS positioning and interference
  • Power Line Communications and Noise
  • Inertial Sensor and Navigation
  • Sensor Technology and Measurement Systems
  • Smart Grid Security and Resilience
  • Algorithms and Data Compression
  • Advanced Graph Theory Research
  • Real-time simulation and control systems
  • Numerical Methods and Algorithms

IBM (India)
2023

CSIR National Physical Laboratory of India
1977-2022

University of Burdwan
1977-2022

Amity University
2016

Council of Scientific and Industrial Research
2011

Hewlett-Packard (United States)
2010

University of Illinois Chicago
2006

University of Illinois Urbana-Champaign
1988-2005

Northwestern University
1997-2005

Evanston Hospital
2005

An approach to the problem of automatic data partitioning is introduced. The notion constraints on distribution presented, and it shown how, based performance considerations, a compiler identifies be imposed various structures. These are then combined by obtain complete consistent picture scheme, one that offers good in terms overall execution time. Results study performed Fortran programs taken from Linpack Eispack libraries Perfect Benchmarks determine applicability real presented. results...

10.1109/71.127259 article EN IEEE Transactions on Parallel and Distributed Systems 1992-03-01

The design of fault-tolerant hypercube multiprocessor architecture is discussed. authors propose the detection and location faulty processors concurrently with actual execution parallel applications on using a novel scheme algorithm-based error detection. System-level mechanisms have been implemented for three 16-processor Intel iPSC multiprocessor: matrix multiplication, Gaussian elimination, fast Fourier transform. Schemes other are under development. Extensive studies done coverage...

10.1109/12.57055 article EN IEEE Transactions on Computers 1990-01-01

Reconfigurable hardware has the potential for significant performance improvements by providing support application-specific operations. We report our experience with Chimaera, a prototype system that integrates small and fast reconfigurable functional unit (RFU) into pipeline of an aggressive, dynamically-scheduled superscalar processor. Chimaera is capable performing 9-input/1-output operations on integer data. discuss C compiler automatically maps computations execution in RFU. of: (1)...

10.1109/isca.2000.854393 article EN 2002-11-07

This paper presents a new integrated compiler framework for improving the cache performance of scientific applications. In addition to applying loop transformations, method includes data layout optimizations, i.e., those that change memory layouts structures (arrays in this case). A key characteristic approach is transformations are used improve temporal locality while optimizations spatial locality. optimization was with sixteen nests from several benchmarks and math libraries, measured...

10.5555/290940.290999 article EN 1998-11-01

Appropriate data distribution has been found to be critical for obtaining good performance on Distributed Memory Multicomputers like the CM-5, Intel Paragon and IBM SP-1. It also that some programs need change their distributions during execution better (redistribution). This work focuses automatically generating efficient routines redistribution. We present a new mathematical representation regular called PITFALLS then discuss algorithms redistribution based this representation. A...

10.1109/fmpc.1995.380436 article EN 2002-11-19

Simulated annealing, a methodology for solving combinatorial optimization problems, is very computationally expensive algorithm and, as such, numerous researchers have undertaken efforts to parallelize it. In this paper, we investigate three of these parallel simulated annealing strategies when applied standard cell placement, specifically the TimberWolfSC placement tool. We examined moves strategy, well two new approaches placement-multiple Markov chains and speculative computation. These...

10.1109/43.602476 article EN IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems 1997-04-01

This paper presents a new integrated compiler framework for improving the cache performance of scientific applications. In addition to applying loop transformations, method includes data layout optimizations, i.e., those that change memory layouts structures (arrays in this case). A key characteristic approach is transformations are used improve temporal locality while optimizations spatial locality. optimization was with sixteen nests from several benchmarks and math libraries, measured...

10.1109/micro.1998.742790 article EN 2002-11-27

This paper describes a behavioral synthesis tool called AccelFPGA which reads in high-level descriptions of digital signal processing (DSP) applications written MATLAB, and automatically generates synthesizable register transfer level (RTL) models simulation testbenches VHDL or Verilog. The RTL can be synthesized using commercial logic tools place route onto field-programmable gate arrays (FPGAs). how powerful directives are used to provide architectural tradeoffs for the DSP designer....

10.1109/tvlsi.2004.824301 article EN IEEE Transactions on Very Large Scale Integration (VLSI) Systems 2004-03-01

Abstract : An important problem facing numerous research projects on parallelizing compilers for distributed memory machines is that of automatically determining a suitable data partitioning scheme program. Most the current leave this tedious almost entirely to user. In paper, we present novel approach automatic partitioning. We introduce notion constraints distribution, and show how compiler can infer those by looking at reference patterns in source code these may be combined obtain...

10.1109/dmcc.1991.633082 article EN 2005-08-24

Several issues concerning the design of an I/O (input/output) system for a multiprocessor such as hypercube are examined. A methodology is proposed connecting processors to efficient access. The effect communication on network analyzed. Different disk organizations that can be employed within evaluated see which organization has better performance. It observed parallelism in serving request plays dominant role scientific workload. problem mapping specific data structures matrices onto disks...

10.1109/71.80142 article EN IEEE Transactions on Parallel and Distributed Systems 1990-04-01

Article Free Access Share on A hyperplane based approach for optimizing spatial locality in loop nests Authors: M. Kandemir EECS Dept., Syracuse University, Syracuse, NY NYView Profile , A. Choudhary ECE Northwestern Evanston, IL ILView N. Shenoy P. Banerjee J. Ramanujam Louisiana State Baton Rouge, LA LAView Authors Info & Claims ICS '98: Proceedings of the 12th international conference SupercomputingJuly 1998 Pages 69–76https://doi.org/10.1145/277830.277849Online:13 July 1998Publication...

10.1145/277830.277849 article EN 1998-07-13

Field Programmable Gate Arrays (FPGAs) have been recently used as an effective platform for implementing many image/signal processing applications. MATLAB is one of the most popular languages to model We present MATCH compiler that takes input and produces a hardware in RTL VHDL, which can be mapped FPGA using commercial CAD tools. This dramatically reduces time implement application on FPGA. results some image signal algorithms was synthesized our Xilinx XC4028 with external memory. also...

10.1109/icvd.2001.902676 article EN 2002-11-13

The design of two reconfiguration strategies for hypercube multicomputer architectures under failures is discussed. first scheme uses spare processors attached to certain in the by means a novel embedding technique. second approach places between specific links hypercube. Both schemes involve mapping logical virtual onto set physical final reconfigured and hence suffer some performance degradation.< <ETX xmlns:mml="http://www.w3.org/1998/Math/MathML"...

10.1109/ftcs.1990.89368 article EN 2002-12-04

The problem of designing fault-tolerant disk arrays that are not susceptible to 100% load increases on the functional disks when one in system fails is addressed. A technique combines advantages parity schemes and traditional dual copy methods offers a wide variety options providing fault-tolerance proposed. theoretical framework for solving presented number constructive techniques By utilizing same amount hardware as earlier but with better data organization different reconstruction...

10.1109/ftcs.1991.146692 article EN 2002-12-10

The introduction of advanced FPGA architectures, with built-in DSP support, has given designers a new hardware alternative. By exploiting its inherent parallelism, it is expected that FPGAs can outperform processors. This paper describes the process and considerations for automatically translating binaries targeted general processors into Register Transfer Level (RTL) VHDL or Verilog code to be mapped onto commercial FPGAs. Texas Instruments C6000 processor architecture chosen as platform,...

10.1145/996566.996678 article EN 2004-06-07

A discussion is presented of a fault-tolerant hypercube multiprocessor architecture which uses novel algorithm-based fault-detection approach for identifying faulty processors. The scheme involves the detection and location processors concurrently with actual execution parallel applications on hypercube. authors have implemented system-level mechanisms various 16-processor Intel iPSC multiprocessor. They report results two applications: matrix multiplication fast Fourier transform. performed...

10.1109/ftcs.1988.5344 article EN 1988-01-01

The performance evaluation, workload characterization, and trace-driven simulation of a hypercube multicomputer running realistic workloads are presented. Eleven representative parallel applications were selected as benchmarks. Software monitoring techniques then used to collect execution traces. Based on the measurement results, both computation communication behavior these programs investigated. various time interval distributions modeled by statistical functions which verified nonlinear...

10.1109/71.149963 article EN IEEE Transactions on Parallel and Distributed Systems 1992-07-01

A scheme for dealing with roundoff errors in algorithm-based fault tolerance methods which complicate the check phases of algorithm is presented. The method based on error analysis incorporating some simplifications result easier derivation bounds and more useful expressions cases where theoretical bound may be too wide to much use as a phase. are used derive three applications, it shown that fault-tolerant encodings these applications using authors achieve high coverage no false alarms...

10.1109/ftcs.1993.627332 article EN 2002-12-30

Efficient high level design tools that can map behavioral descriptions to FPGA architectures are one of the key requirements fully leverage for throughput computations and meet time-to-market pressures. We present a compiler takes as input algorithms described in MATLAB generates RTL VHDL. The VHDL then be mapped FPGAs using existing commercial tools. application is multiple by parallelizing embedding communication synchronization primitives automatically. Our infers minimum number bits...

10.1109/iccad.2001.968639 article EN 2002-11-13

We suggest a new approach aimed at reducing the effort required to program distributed-memory multicomputers. The key idea in our is automatically convert written library-based programming language (MATLAB) parallel based on ScaLAPACK library. In process of performing this conversion, we apply compiler optimizations that simultaneously exploit task and data parallelism. As results show, feasible practical optimization provides significant performance benefits.

10.1109/ipps.1996.508120 article EN Proceedings of the International Conference on Parallel Processing 2002-12-23

Applications that require digital signal processing (DSP) functions are typically mapped onto general purpose DSP processors. With the introduction of advanced FPGA architectures with built-in support, a new hardware alternative is available for designers. By exploiting its inherent parallelism, it expected FPGAs can outperform However, migration assembly code to very arduous process. This paper describes process and considerations automatically translating software binary codes targeted...

10.1109/fccm.2004.44 article EN 2004-12-23

The role of HP Labs, the central research arm Hewlett-Packard, is to look beyond roadmap currently offered products and services solve some exciting technical problems that will be critical customers. paper mentions active collaboration with academic, commercial, government partners augments accelerates knowledge creation technology transfer.

10.1109/mc.2010.322 article EN Computer 2010-11-01

10.1006/jpdc.1993.1003 article EN Journal of Parallel and Distributed Computing 1993-01-01

The authors present the architecture, methodology and performance evaluation of a message-passing coprocessor (MPC) which can accelerate message communication in distributed memory multicomputer (i.e. iPSC/2 hypercube). MPC is microprogrammable processor offloads from CPU burden speeds up software processing by directly executing passing instructions microcode. It supports process scheduling, buffer management, fast copying. most unique feature that it performs caching for expected...

10.1109/superc.1990.130092 article EN Proceedings - Supercomputing 2002-12-04

High-level synthesis compilers often produce reoccurring patterns in intermediate CDFGs during translation. By identifying large patterns, one may reduce area and communication overhead by efficiently reusing hardware for multiple operations. This paper presents an algorithm dynamically generating templates of resource sharing CDFGs. Results show 40-80% reduction using small, incremental template growth, variations within a 5% margin among varying look-ahead depths.

10.1109/vlsid.2006.75 article EN 2006-01-01
Coming Soon ...