Peter M. Kogge

ORCID: 0000-0002-3329-547X
Publications
Citations
Views
---
Saved
---
About
Contact & Profiles
Research Areas
  • Parallel Computing and Optimization Techniques
  • Advanced Data Storage Technologies
  • Interconnection Networks and Systems
  • Distributed and Parallel Computing Systems
  • Quantum-Dot Cellular Automata
  • Embedded Systems Design Techniques
  • Advanced Memory and Neural Computing
  • Low-power high-performance VLSI design
  • Quantum and electron transport phenomena
  • Cloud Computing and Resource Management
  • Distributed systems and fault tolerance
  • Advancements in Semiconductor Devices and Circuit Design
  • Cellular Automata and Applications
  • Semiconductor materials and devices
  • Graph Theory and Algorithms
  • Ferroelectric and Negative Capacitance Devices
  • Logic, programming, and type systems
  • Algorithms and Data Compression
  • Scientific Computing and Data Management
  • Data Management and Algorithms
  • Big Data and Business Intelligence
  • Quantum Computing Algorithms and Architecture
  • Computability, Logic, AI Algorithms
  • Quantum Information and Cryptography
  • Software System Performance and Reliability

University of Notre Dame
2015-2024

The Aerospace Corporation
2021

DELL (United States)
2016

Notre Dame University
2007

Sandia National Laboratories
2004

Rho (United States)
1994-2002

IBM (United States)
1973-1992

Stanford University
1973

An mth-order recurrence problem is defined as the computation of series x <inf xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">1</inf> , xmlns:xlink="http://www.w3.org/1999/xlink">2</inf> ..., X xmlns:xlink="http://www.w3.org/1999/xlink">N</inf> where xmlns:xlink="http://www.w3.org/1999/xlink">i</inf> = f (x xmlns:xlink="http://www.w3.org/1999/xlink">i-1</inf> xmlns:xlink="http://www.w3.org/1999/xlink">i-m</inf> ) for some function . This paper uses...

10.1109/tc.1973.5009159 article EN IEEE Transactions on Computers 1973-08-01

Article Free Access Share on Mapping irregular applications to DIVA, a PIM-based data-intensive architecture Authors: Mary Hall USC Information Sciences Institute, Marina del Rey, CA CAView Profile , Peter Kogge University of Notre Dame, IN INView Jeff Koller Pedro Diniz Jacqueline Chame Draper LaCoss John Granacki Jay Brockman Apoorv Srivastava William Athas Vincent Freeh Jaewook Shin Joonseok Park Authors Info & Claims SC '99: Proceedings the 1999 ACM/IEEE conference SupercomputingJanuary...

10.1145/331532.331589 article EN 1999-01-01

The EXECUBE chip is a new single part type building block for MPP systems that scales seamlessly from few chips (with hundred mips) to thousands of with petaop potential. Further, the architecture supports directly both SIMD and MIMD modes processing, permitting not only best current parallel computing but also possible more conventional designs. This paper discusses overall chip, computational model it represents, some comparisons against state art, how might be used real applications,...

10.1109/icpp.1994.108 article EN 1994-08-01

Register files (RF) represent a substantial portion of the energy budget in modern processors, and are growing rapidly with trend towards wider instruction issue. The actual access costs depend greatly on register file circuitry used. This paper compares various RF techniques for their ef- ficiencies, as function architectural parameters such number registers ports. Port Priority Selection technique was found to be most efficient. dependence upon technology scaling is also studied. However,...

10.1145/280756.280943 article EN 1998-01-01

In recent years, reducing power has become an important design goal for high-performance microprocessors. This work attempts to bring the issue earliest phases of microprocessor development, in particular, stage defining a chip microarchitecture. We investigate power-optimization techniques superscalar microprocessors at microarchitecture level that do not compromise performance. First, major targets reduction are identified within microarchitecture, where is heavily consumed or will be...

10.1109/12.910816 article EN IEEE Transactions on Computers 2001-03-01

The quantum cellular automata (QCA) is currently being investigated as an alternative to CMOS VLSI. While some simple logical circuits and devices have been studied, little if any work has done in considering the architecture for systems of QCA devices. This discusses progress one first such efforts. Namely, design dataflow components a microprocessor designed exclusively are discussed. Problems associated with initial designs enumerated solutions these problems (usually stemming from...

10.1002/1097-007x(200101/02)29:1<49::aid-cta132>3.0.co;2-1 article EN International Journal of Circuit Theory and Applications 2001-01-01

An mth-order recurrence problem is defined as the computation of sequence x <inf xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">1</inf> , ···, xmlns:xlink="http://www.w3.org/1999/xlink">N</inf> where xmlns:xlink="http://www.w3.org/1999/xlink">i</inf> = ƒ(a xmlns:xlink="http://www.w3.org/1999/xlink">i−1</inf> xmlns:xlink="http://www.w3.org/1999/xlink">i−m</inf> ) and a some vector parameters. This paper investigates general algorithms for solving...

10.1147/rd.182.0138 article EN IBM Journal of Research and Development 1974-03-01

We now have 20 years of data under our belt about the performance supercomputers against at least a single floating-point benchmark from dense linear algebra. Until 2004, model parallel programming, bulk synchronous using MPI model, was sufficient to permit translation into reasonable programs for more complex applications. Starting in however, confluence events changed forever architectural landscape that underpinned MPI. The first half this article goes underlying reasons these changes,...

10.1109/mcse.2013.95 article EN Computing in Science & Engineering 2013-10-16

This paper compares the system performance evaluation cooperative (SPEC) Integer and Floating-Point suites to a set of real-world applications for high-performance computing at Sandia National Laboratories. These focus on high-end scientific engineering domains; however, techniques presented in this are applicable any application domain. The compared terms three memory properties: 1) temporal locality (or reuse over time), 2) spatial use data "near" that has already been accessed), 3)...

10.1109/tc.2007.1039 article EN IEEE Transactions on Computers 2007-06-01

Supercomputers are now running our search engines and social networks.Modern supercomputers based on groups of tightly interconnected microprocessors. In recent years, have shaped daily lives more directly.

10.1109/mspec.2011.5693074 article EN IEEE Spectrum 2011-01-21

There is growing evidence that current architectures do not well handle cache-unfriendly applications such as sparse math operations, data analytics, and graph algorithms. This due, in part, to the irregular memory access patterns demonstrated by these applications, how remote accesses are handled. paper introduces a new, highly-scalable PGAS memory-centric system architecture where migrating threads travel they access. Scaling both capacities number of cores can be largely invisible...

10.1109/ia3.2016.007 article EN 2016-11-01

There is growing evidence that current architectures do not well handle cache-unfriendly applications such as sparse math operations, data analytics, and graph algorithms. This due, in part, to the irregular memory access patterns demonstrated by these applications, how remote accesses are handled. paper introduces a new, highly-scalable PGAS memory-centric system architecture where migrating threads travel they access. Scaling both capacities number of cores can be largely invisible...

10.5555/3018843.3018845 article EN Irregular Applications: Architectures and Algorithms 2016-11-13

Despite the seemingly endless upw ards spiral of modern VLSI technology, many experts are predicting a hard w all for CMOS in about decade. Given this, researc hers con tin ue to look at alternative technologies, one which is based on quan tumdots, called tumcellular automata (QCA). While first such devices have been fabricated, little kno wn how design complete systems them. This paper summarizes studies, namely an attempt complete, albeit simple, CPU technology. T o theoretical QCA...

10.1145/337292.337398 article EN Proceedings of the 40th conference on Design automation - DAC '03 2000-01-01

Pipelining is a technique that has long since been considered fundamental by computer architects. However, the world of nanoelectronics pushing idea pipelining to new and lower levels — particularly device level. How this affects circuits relationship between their timing, architecture, design will be studied in context an inherently self-latching nanotechnology termed Quantum Cellular Automata (QCA). Results indicate offers potential for “free” multi-threading “processing-in-wire”. All...

10.1145/379240.379261 article EN 2001-01-01

This paper presents the Quantum-Dot Cellular Automata (QCA) physical design problem, in context of VLSI problem. The problem is divided into three subproblems: partitioning, placement, and routing QCA circuits. an ILP formulation heuristic solution to partitioning compares two sets results. Additionally, we compare a human-generated circuit Heuristic solutions. results demonstrate that practical method reducing run time while providing result close optimal for given circuit.

10.1145/996566.996671 article EN 2004-06-07

In recent years reducing power has become a critical design goal for high-performance microprocessors. This work attempts to bring the issue earliest phase of microprocessor development. We propose methodology power-optimization at micro-architectural level. First, major targets reduction are identified within superscalar microarchitecture, then an optimization micro-architecture is performed that generates set energy-efficient configurations forming convex hull in power-performance space....

10.1145/344166.344522 article EN 2000-01-01

A new 5 V 0.8 /spl mu/m CMOS technology merges 100 K custom circuits and 4.5 Mb DRAM onto a single die that supports both high density memory significant computing logic. One of the first chips built with this implements unique Processor-In-Memory (PIM) computer architecture termed EXECUBE has 8 separate 25 MHz CPU macros 16 32 K/spl times/9 b on die. These are organized together to provide part type for scaleable massively parallel processing applications, particularly embedded ones where...

10.1109/arvlsi.1995.515607 article EN 2002-11-19

The TOP500 is a treasure trove of information on the leading edge high performance computing. It was used in 2008 DARPA Exascale technology report to isolate out effects architecture and computing, lay groundwork project how current systems might mature through coming years. Two particular classes architectures were identified: "heavy-weight" (based end commodity microprocessors) "lightweight," (primarily BlueGene variants), projections made performance, concurrency, memory capacity, power....

10.1145/2063384.2063421 article EN 2011-11-08

This paper introduces an architecture for quantum-dot cellular automata circuits with the potential high throughput and low power dissipation. The combination of regions Bennett clocking memory storage combines advantage reversible computing pipelining. Two case studies are initially presented to evaluate proposed pipelined in terms consumption due information A general model assessing is also proposed. shows that advantages possible by using a scheme depend on circuit topology, thus...

10.1109/tnano.2011.2147796 article EN IEEE Transactions on Nanotechnology 2011-05-09

Despite the seemingly endless upwards spiral of modern VLSI technology many experts are predicting a hard wall for CMOS in about decade. Given this, researchers continue to look at alternative technologies, one which is based on quantum dots, called cellular automata. While first such devices have been fabricated, little known how design complete systems. This paper summarizes studies, namely an attempt complete, albeit simple, CPU technology. The projections striking: projected 10 1...

10.1109/glsv.1999.757390 article EN 2003-01-20

This paper is a summary of proposal submitted to the NSF 100 Tera Flops Point Design Study. Its main thesis that use Processing-In-Memory (PIM) technology can provide an extremely dense and highly efficient base on which such computing systems be constructed describes strawman organization one potential PIM chip, along with how multiple chips might organized into real system, what software supporting system look like, several applications we will attempting place onto system.

10.1109/fmpc.1996.558065 article EN 2002-12-23
Coming Soon ...