- Parallel Computing and Optimization Techniques
- Advanced Data Storage Technologies
- Interconnection Networks and Systems
- Distributed and Parallel Computing Systems
- Quantum-Dot Cellular Automata
- Embedded Systems Design Techniques
- Advanced Memory and Neural Computing
- Low-power high-performance VLSI design
- Quantum and electron transport phenomena
- Cloud Computing and Resource Management
- Distributed systems and fault tolerance
- Advancements in Semiconductor Devices and Circuit Design
- Cellular Automata and Applications
- Semiconductor materials and devices
- Graph Theory and Algorithms
- Ferroelectric and Negative Capacitance Devices
- Logic, programming, and type systems
- Algorithms and Data Compression
- Scientific Computing and Data Management
- Data Management and Algorithms
- Big Data and Business Intelligence
- Quantum Computing Algorithms and Architecture
- Computability, Logic, AI Algorithms
- Quantum Information and Cryptography
- Software System Performance and Reliability
University of Notre Dame
2015-2024
The Aerospace Corporation
2021
DELL (United States)
2016
Notre Dame University
2007
Sandia National Laboratories
2004
Rho (United States)
1994-2002
IBM (United States)
1973-1992
Stanford University
1973
An mth-order recurrence problem is defined as the computation of series x <inf xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">1</inf> , xmlns:xlink="http://www.w3.org/1999/xlink">2</inf> ..., X xmlns:xlink="http://www.w3.org/1999/xlink">N</inf> where xmlns:xlink="http://www.w3.org/1999/xlink">i</inf> = f (x xmlns:xlink="http://www.w3.org/1999/xlink">i-1</inf> xmlns:xlink="http://www.w3.org/1999/xlink">i-m</inf> ) for some function . This paper uses...
Article Free Access Share on Mapping irregular applications to DIVA, a PIM-based data-intensive architecture Authors: Mary Hall USC Information Sciences Institute, Marina del Rey, CA CAView Profile , Peter Kogge University of Notre Dame, IN INView Jeff Koller Pedro Diniz Jacqueline Chame Draper LaCoss John Granacki Jay Brockman Apoorv Srivastava William Athas Vincent Freeh Jaewook Shin Joonseok Park Authors Info & Claims SC '99: Proceedings the 1999 ACM/IEEE conference SupercomputingJanuary...
The EXECUBE chip is a new single part type building block for MPP systems that scales seamlessly from few chips (with hundred mips) to thousands of with petaop potential. Further, the architecture supports directly both SIMD and MIMD modes processing, permitting not only best current parallel computing but also possible more conventional designs. This paper discusses overall chip, computational model it represents, some comparisons against state art, how might be used real applications,...
Register files (RF) represent a substantial portion of the energy budget in modern processors, and are growing rapidly with trend towards wider instruction issue. The actual access costs depend greatly on register file circuitry used. This paper compares various RF techniques for their ef- ficiencies, as function architectural parameters such number registers ports. Port Priority Selection technique was found to be most efficient. dependence upon technology scaling is also studied. However,...
In recent years, reducing power has become an important design goal for high-performance microprocessors. This work attempts to bring the issue earliest phases of microprocessor development, in particular, stage defining a chip microarchitecture. We investigate power-optimization techniques superscalar microprocessors at microarchitecture level that do not compromise performance. First, major targets reduction are identified within microarchitecture, where is heavily consumed or will be...
The quantum cellular automata (QCA) is currently being investigated as an alternative to CMOS VLSI. While some simple logical circuits and devices have been studied, little if any work has done in considering the architecture for systems of QCA devices. This discusses progress one first such efforts. Namely, design dataflow components a microprocessor designed exclusively are discussed. Problems associated with initial designs enumerated solutions these problems (usually stemming from...
An mth-order recurrence problem is defined as the computation of sequence x <inf xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">1</inf> , ···, xmlns:xlink="http://www.w3.org/1999/xlink">N</inf> where xmlns:xlink="http://www.w3.org/1999/xlink">i</inf> = ƒ(a xmlns:xlink="http://www.w3.org/1999/xlink">i−1</inf> xmlns:xlink="http://www.w3.org/1999/xlink">i−m</inf> ) and a some vector parameters. This paper investigates general algorithms for solving...
We now have 20 years of data under our belt about the performance supercomputers against at least a single floating-point benchmark from dense linear algebra. Until 2004, model parallel programming, bulk synchronous using MPI model, was sufficient to permit translation into reasonable programs for more complex applications. Starting in however, confluence events changed forever architectural landscape that underpinned MPI. The first half this article goes underlying reasons these changes,...
This paper compares the system performance evaluation cooperative (SPEC) Integer and Floating-Point suites to a set of real-world applications for high-performance computing at Sandia National Laboratories. These focus on high-end scientific engineering domains; however, techniques presented in this are applicable any application domain. The compared terms three memory properties: 1) temporal locality (or reuse over time), 2) spatial use data "near" that has already been accessed), 3)...
Supercomputers are now running our search engines and social networks.Modern supercomputers based on groups of tightly interconnected microprocessors. In recent years, have shaped daily lives more directly.
There is growing evidence that current architectures do not well handle cache-unfriendly applications such as sparse math operations, data analytics, and graph algorithms. This due, in part, to the irregular memory access patterns demonstrated by these applications, how remote accesses are handled. paper introduces a new, highly-scalable PGAS memory-centric system architecture where migrating threads travel they access. Scaling both capacities number of cores can be largely invisible...
There is growing evidence that current architectures do not well handle cache-unfriendly applications such as sparse math operations, data analytics, and graph algorithms. This due, in part, to the irregular memory access patterns demonstrated by these applications, how remote accesses are handled. paper introduces a new, highly-scalable PGAS memory-centric system architecture where migrating threads travel they access. Scaling both capacities number of cores can be largely invisible...
Despite the seemingly endless upw ards spiral of modern VLSI technology, many experts are predicting a hard w all for CMOS in about decade. Given this, researc hers con tin ue to look at alternative technologies, one which is based on quan tumdots, called tumcellular automata (QCA). While first such devices have been fabricated, little kno wn how design complete systems them. This paper summarizes studies, namely an attempt complete, albeit simple, CPU technology. T o theoretical QCA...
Pipelining is a technique that has long since been considered fundamental by computer architects. However, the world of nanoelectronics pushing idea pipelining to new and lower levels — particularly device level. How this affects circuits relationship between their timing, architecture, design will be studied in context an inherently self-latching nanotechnology termed Quantum Cellular Automata (QCA). Results indicate offers potential for “free” multi-threading “processing-in-wire”. All...
This paper presents the Quantum-Dot Cellular Automata (QCA) physical design problem, in context of VLSI problem. The problem is divided into three subproblems: partitioning, placement, and routing QCA circuits. an ILP formulation heuristic solution to partitioning compares two sets results. Additionally, we compare a human-generated circuit Heuristic solutions. results demonstrate that practical method reducing run time while providing result close optimal for given circuit.
In recent years reducing power has become a critical design goal for high-performance microprocessors. This work attempts to bring the issue earliest phase of microprocessor development. We propose methodology power-optimization at micro-architectural level. First, major targets reduction are identified within superscalar microarchitecture, then an optimization micro-architecture is performed that generates set energy-efficient configurations forming convex hull in power-performance space....
A new 5 V 0.8 /spl mu/m CMOS technology merges 100 K custom circuits and 4.5 Mb DRAM onto a single die that supports both high density memory significant computing logic. One of the first chips built with this implements unique Processor-In-Memory (PIM) computer architecture termed EXECUBE has 8 separate 25 MHz CPU macros 16 32 K/spl times/9 b on die. These are organized together to provide part type for scaleable massively parallel processing applications, particularly embedded ones where...
The TOP500 is a treasure trove of information on the leading edge high performance computing. It was used in 2008 DARPA Exascale technology report to isolate out effects architecture and computing, lay groundwork project how current systems might mature through coming years. Two particular classes architectures were identified: "heavy-weight" (based end commodity microprocessors) "lightweight," (primarily BlueGene variants), projections made performance, concurrency, memory capacity, power....
This paper introduces an architecture for quantum-dot cellular automata circuits with the potential high throughput and low power dissipation. The combination of regions Bennett clocking memory storage combines advantage reversible computing pipelining. Two case studies are initially presented to evaluate proposed pipelined in terms consumption due information A general model assessing is also proposed. shows that advantages possible by using a scheme depend on circuit topology, thus...
Despite the seemingly endless upwards spiral of modern VLSI technology many experts are predicting a hard wall for CMOS in about decade. Given this, researchers continue to look at alternative technologies, one which is based on quantum dots, called cellular automata. While first such devices have been fabricated, little known how design complete systems. This paper summarizes studies, namely an attempt complete, albeit simple, CPU technology. The projections striking: projected 10 1...
This paper is a summary of proposal submitted to the NSF 100 Tera Flops Point Design Study. Its main thesis that use Processing-In-Memory (PIM) technology can provide an extremely dense and highly efficient base on which such computing systems be constructed describes strawman organization one potential PIM chip, along with how multiple chips might organized into real system, what software supporting system look like, several applications we will attempting place onto system.