NFDI4DS | UHH-SEMS - Publication Details

Demonstration of automatic data partitioning techniques for parallelizing compilers on multicomputers

OPENALEX - Publications

Manish Gupta P. Banerjee

An approach to the problem of automatic data partitioning is introduced. The notion constraints on distribution presented, and it shown how, based performance considerations, a compiler identifies be imposed various structures. These are then combined by obtain complete consistent picture scheme, one that offers good in terms overall execution time. Results study performed Fortran programs taken from Linpack Eispack libraries Perfect Benchmarks determine applicability real presented. results...

10.1109/71.127259 article EN IEEE Transactions on Parallel and Distributed Systems 1992-03-01

Algorithm-based fault tolerance on a hypercube multiprocessor

OPENALEX - Publications

P. Banerjee J.T. Rahmeh Craig Stunkel Vivek Nair Kaushik Roy and 2 more

The design of fault-tolerant hypercube multiprocessor architecture is discussed. authors propose the detection and location faulty processors concurrently with actual execution parallel applications on using a novel scheme algorithm-based error detection. System-level mechanisms have been implemented for three 16-processor Intel iPSC multiprocessor: matrix multiplication, Gaussian elimination, fast Fourier transform. Schemes other are under development. Extensive studies done coverage...

10.1109/12.57055 article EN IEEE Transactions on Computers 1990-01-01

CHIMAERA: a high-performance architecture with a tightly-coupled reconfigurable functional unit

OPENALEX - Publications

Z.A. Ye Andreas Moshovos Scott Hauck P. Banerjee

Reconfigurable hardware has the potential for significant performance improvements by providing support application-specific operations. We report our experience with Chimaera, a prototype system that integrates small and fast reconfigurable functional unit (RFU) into pipeline of an aggressive, dynamically-scheduled superscalar processor. Chimaera is capable performing 9-input/1-output operations on integer data. discuss C compiler automatically maps computations execution in RFU. of: (1)...

10.1109/isca.2000.854393 article EN 2002-11-07

Improving locality using loop and data transformations in an integrated framework

OPENALEX - Publications

Mahmut Kandemir Alok Choudhary J. Ramanujam P. Banerjee

This paper presents a new integrated compiler framework for improving the cache performance of scientific applications. In addition to applying loop transformations, method includes data layout optimizations, i.e., those that change memory layouts structures (arrays in this case). A key characteristic approach is transformations are used improve temporal locality while optimizations spatial locality. optimization was with sixteen nests from several benchmarks and math libraries, measured...

10.5555/290940.290999 article EN 1998-11-01

Automatic generation of efficient array redistribution routines for distributed memory multicomputers

OPENALEX - Publications

S. Ramasulamy P. Banerjee

Appropriate data distribution has been found to be critical for obtaining good performance on Distributed Memory Multicomputers like the CM-5, Intel Paragon and IBM SP-1. It also that some programs need change their distributions during execution better (redistribution). This work focuses automatically generating efficient routines redistribution. We present a new mathematical representation regular called PITFALLS then discuss algorithms redistribution based this representation. A...

10.1109/fmpc.1995.380436 article EN 2002-11-19

An evaluation of parallel simulated annealing strategies with application to standard cell placement

OPENALEX - Publications

John A. Chandy Sungho Kim B. Ramkumar Steven Parkes P. Banerjee

Simulated annealing, a methodology for solving combinatorial optimization problems, is very computationally expensive algorithm and, as such, numerous researchers have undertaken efforts to parallelize it. In this paper, we investigate three of these parallel simulated annealing strategies when applied standard cell placement, specifically the TimberWolfSC placement tool. We examined moves strategy, well two new approaches placement-multiple Markov chains and speculative computation. These...

10.1109/43.602476 article EN IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems 1997-04-01

Improving locality using loop and data transformations in an integrated framework

OPENALEX - Publications

M. Kandemir Alok Choudhary J. Ramanujam P. Banerjee

This paper presents a new integrated compiler framework for improving the cache performance of scientific applications. In addition to applying loop transformations, method includes data layout optimizations, i.e., those that change memory layouts structures (arrays in this case). A key characteristic approach is transformations are used improve temporal locality while optimizations spatial locality. optimization was with sixteen nests from several benchmarks and math libraries, measured...

10.1109/micro.1998.742790 article EN 2002-11-27

Overview of a compiler for synthesizing MATLAB programs onto FPGAs

OPENALEX - Publications

P. Banerjee M. Haldar Amiya Nayak V. Kim Vishal Saxena and 7 more

This paper describes a behavioral synthesis tool called AccelFPGA which reads in high-level descriptions of digital signal processing (DSP) applications written MATLAB, and automatically generates synthesizable register transfer level (RTL) models simulation testbenches VHDL or Verilog. The RTL can be synthesized using commercial logic tools place route onto field-programmable gate arrays (FPGAs). how powerful directives are used to provide architectural tradeoffs for the DSP designer....

10.1109/tvlsi.2004.824301 article EN IEEE Transactions on Very Large Scale Integration (VLSI) Systems 2004-03-01

Automatic Data Partitioning on Distributed Memory Multiprocessors

OPENALEX - Publications

Manish Gupta P. Banerjee

Abstract : An important problem facing numerous research projects on parallelizing compilers for distributed memory machines is that of automatically determining a suitable data partitioning scheme program. Most the current leave this tedious almost entirely to user. In paper, we present novel approach automatic partitioning. We introduce notion constraints distribution, and show how compiler can infer those by looking at reference patterns in source code these may be combined obtain...

10.1109/dmcc.1991.633082 article EN 2005-08-24

Design, analysis, and simulation of I/O architectures for hypercube multiprocessors

OPENALEX - Publications

A. L. Narasimha Reddy P. Banerjee

Several issues concerning the design of an I/O (input/output) system for a multiprocessor such as hypercube are examined. A methodology is proposed connecting processors to efficient access. The effect communication on network analyzed. Different disk organizations that can be employed within evaluated see which organization has better performance. It observed parallelism in serving request plays dominant role scientific workload. problem mapping specific data structures matrices onto disks...

10.1109/71.80142 article EN IEEE Transactions on Parallel and Distributed Systems 1990-04-01

A hyperplane based approach for optimizing spatial locality in loop nests

OPENALEX - Publications

M. Kandemir Alok Choudhary Niraj Shenoy P. Banerjee J. Ramanujam

Article Free Access Share on A hyperplane based approach for optimizing spatial locality in loop nests Authors: M. Kandemir EECS Dept., Syracuse University, Syracuse, NY NYView Profile , A. Choudhary ECE Northwestern Evanston, IL ILView N. Shenoy P. Banerjee J. Ramanujam Louisiana State Baton Rouge, LA LAView Authors Info & Claims ICS '98: Proceedings of the 12th international conference SupercomputingJuly 1998 Pages 69–76https://doi.org/10.1145/277830.277849Online:13 July 1998Publication...

10.1145/277830.277849 article EN 1998-07-13

FPGA hardware synthesis from MATLAB

OPENALEX - Publications

M. Haldar Amiya Nayak Niraj Shenoy Alok Choudhary P. Banerjee

Field Programmable Gate Arrays (FPGAs) have been recently used as an effective platform for implementing many image/signal processing applications. MATLAB is one of the most popular languages to model We present MATCH compiler that takes input and produces a hardware in RTL VHDL, which can be mapped FPGA using commercial CAD tools. This dramatically reduces time implement application on FPGA. results some image signal algorithms was synthesized our Xilinx XC4028 with external memory. also...

10.1109/icvd.2001.902676 article EN 2002-11-13

Strategies for reconfiguring hypercubes under faults

OPENALEX - Publications

P. Banerjee

The design of two reconfiguration strategies for hypercube multicomputer architectures under failures is discussed. first scheme uses spare processors attached to certain in the by means a novel embedding technique. second approach places between specific links hypercube. Both schemes involve mapping logical virtual onto set physical final reconfigured and hence suffer some performance degradation.< <ETX xmlns:mml="http://www.w3.org/1998/Math/MathML"...

10.1109/ftcs.1990.89368 article EN 2002-12-04

Gracefully degradable disk arrays

OPENALEX - Publications

A. L. Narasimha Reddy P. Banerjee

The problem of designing fault-tolerant disk arrays that are not susceptible to 100% load increases on the functional disks when one in system fails is addressed. A technique combines advantages parity schemes and traditional dual copy methods offers a wide variety options providing fault-tolerance proposed. theoretical framework for solving presented number constructive techniques By utilizing same amount hardware as earlier but with better data organization different reconstruction...

10.1109/ftcs.1991.146692 article EN 2002-12-10

Automatic translation of software binaries onto FPGAs

OPENALEX - Publications

Gaurav Mittal David Zaretsky Xiaoyong Tang P. Banerjee

The introduction of advanced FPGA architectures, with built-in DSP support, has given designers a new hardware alternative. By exploiting its inherent parallelism, it is expected that FPGAs can outperform processors. This paper describes the process and considerations for automatically translating binaries targeted general processors into Register Transfer Level (RTL) VHDL or Verilog code to be mapped onto commercial FPGAs. Texas Instruments C6000 processor architecture chosen as platform,...

10.1145/996566.996678 article EN 2004-06-07

An evaluation of system-level fault tolerance on the Intel hypercube multiprocessor

OPENALEX - Publications

P. Banerjee J.T. Rahmeh Craig Stunkel Vivek Nair Kaushik Roy and 1 more

A discussion is presented of a fault-tolerant hypercube multiprocessor architecture which uses novel algorithm-based fault-detection approach for identifying faulty processors. The scheme involves the detection and location processors concurrently with actual execution parallel applications on hypercube. authors have implemented system-level mechanisms various 16-processor Intel iPSC multiprocessor. They report results two applications: matrix multiplication fast Fourier transform. performed...

10.1109/ftcs.1988.5344 article EN 1988-01-01

Performance measurement and trace driven simulation of parallel CAD and numeric applications on a hypercube multicomputer

OPENALEX - Publications

Jiun-Ming Hsu P. Banerjee

The performance evaluation, workload characterization, and trace-driven simulation of a hypercube multicomputer running realistic workloads are presented. Eleven representative parallel applications were selected as benchmarks. Software monitoring techniques then used to collect execution traces. Based on the measurement results, both computation communication behavior these programs investigated. various time interval distributions modeled by statistical functions which verified nonlinear...

10.1109/71.149963 article EN IEEE Transactions on Parallel and Distributed Systems 1992-07-01

Tolerance determination for algorithm-based checks using simplified error analysis techniques

OPENALEX - Publications

A. Roy-Chowdhury P. Banerjee

A scheme for dealing with roundoff errors in algorithm-based fault tolerance methods which complicate the check phases of algorithm is presented. The method based on error analysis incorporating some simplifications result easier derivation bounds and more useful expressions cases where theoretical bound may be too wide to much use as a phase. are used derive three applications, it shown that fault-tolerant encodings these applications using authors achieve high coverage no false alarms...

10.1109/ftcs.1993.627332 article EN 2002-12-30

A system for synthesizing optimized FPGA hardware from Matlab(R)

OPENALEX - Publications

M. Haldar Amiya Nayak Alok Choudhary P. Banerjee

Efficient high level design tools that can map behavioral descriptions to FPGA architectures are one of the key requirements fully leverage for throughput computations and meet time-to-market pressures. We present a compiler takes as input algorithms described in MATLAB generates RTL VHDL. The VHDL then be mapped FPGAs using existing commercial tools. application is multiple by parallelizing embedding communication synchronization primitives automatically. Our infers minimum number bits...

10.1109/iccad.2001.968639 article EN 2002-11-13

Compiling MATLAB programs to ScaLAPACK: exploiting task and data parallelism

OPENALEX - Publications

S. Ramaswamy Eugene W. Hodges P. Banerjee

We suggest a new approach aimed at reducing the effort required to program distributed-memory multicomputers. The key idea in our is automatically convert written library-based programming language (MATLAB) parallel based on ScaLAPACK library. In process of performing this conversion, we apply compiler optimizations that simultaneously exploit task and data parallelism. As results show, feasible practical optimization provides significant performance benefits.

10.1109/ipps.1996.508120 article EN Proceedings of the International Conference on Parallel Processing 2002-12-23

Overview of the FREEDOM Compiler for Mapping DSP Software to FPGAs

OPENALEX - Publications

David Zaretsky Gaurav Mittal Xiaoyong Tang P. Banerjee

Applications that require digital signal processing (DSP) functions are typically mapped onto general purpose DSP processors. With the introduction of advanced FPGA architectures with built-in support, a new hardware alternative is available for designers. By exploiting its inherent parallelism, it expected FPGAs can outperform However, migration assembly code to very arduous process. This paper describes process and considerations automatically translating software binary codes targeted...

10.1109/fccm.2004.44 article EN 2004-12-23

Open Innovation at HP Labs

OPENALEX - Publications

P. Banerjee Ralf Friedrich Lueny Morell

The role of HP Labs, the central research arm Hewlett-Packard, is to look beyond roadmap currently offered products and services solve some exciting technical problems that will be critical customers. paper mentions active collaboration with academic, commercial, government partners augments accelerates knowledge creation technology transfer.

10.1109/mc.2010.322 article EN Computer 2010-11-01

Design and Evaluation of Gracefully Degradable Disk Arrays

OPENALEX - Publications

A. L. Narasimha Reddy John A. Chandy P. Banerjee

10.1006/jpdc.1993.1003 article EN Journal of Parallel and Distributed Computing 1993-01-01

A message passing coprocessor for distributed memory multicomputers

OPENALEX - Publications

Jiun-Ming Hsu P. Banerjee

The authors present the architecture, methodology and performance evaluation of a message-passing coprocessor (MPC) which can accelerate message communication in distributed memory multicomputer (i.e. iPSC/2 hypercube). MPC is microprogrammable processor offloads from CPU burden speeds up software processing by directly executing passing instructions microcode. It supports process scheduling, buffer management, fast copying. most unique feature that it performs caching for expected...

10.1109/superc.1990.130092 article EN Proceedings - Supercomputing 2002-12-04

Dynamic template generation for resource sharing in control and data flow graphs

OPENALEX - Publications

David Zaretsky Gaurav Mittal Robert P. Dick P. Banerjee

High-level synthesis compilers often produce reoccurring patterns in intermediate CDFGs during translation. By identifying large patterns, one may reduce area and communication overhead by efficiently reusing hardware for multiple operations. This paper presents an algorithm dynamically generating templates of resource sharing CDFGs. Results show 40-80% reduction using small, incremental template growth, variations within a 5% margin among varying look-ahead depths.

10.1109/vlsid.2006.75 article EN 2006-01-01