Michael Schulte

ORCID: 0000-0003-1305-406X
Publications
Citations
Views
---
Saved
---
About
Contact & Profiles
Research Areas
  • Parallel Computing and Optimization Techniques
  • Numerical Methods and Algorithms
  • Low-power high-performance VLSI design
  • Embedded Systems Design Techniques
  • Interconnection Networks and Systems
  • Digital Filter Design and Implementation
  • Analog and Mixed-Signal Circuit Design
  • Advanced Data Storage Technologies
  • Distributed and Parallel Computing Systems
  • Radiation Effects in Electronics
  • Advanced Wireless Communication Techniques
  • Cryptography and Residue Arithmetic
  • Particle Detector Development and Performance
  • Particle physics theoretical and experimental studies
  • Algorithms and Data Compression
  • Computational Physics and Python Applications
  • Real-Time Systems Scheduling
  • Optical measurement and interference techniques
  • Wireless Communication Networks Research
  • Coding theory and cryptography
  • VLSI and Analog Circuit Testing
  • 3D IC and TSV technologies
  • Advanced Memory and Neural Computing
  • Polynomial and algebraic computation
  • Surface Roughness and Optical Measurements

Advanced Micro Devices (Canada)
2002-2024

Advanced Micro Devices (United States)
2010-2023

University of Wisconsin–Madison
2005-2019

IEEE Computer Society
2013

Universidad de Málaga
2007-2010

TU Dortmund University
2010

Bremen Institute for Applied Beam Technology
2007

Madison Group (United States)
2007

University of Wisconsin System
2004-2006

Lehigh University
1998-2005

The set-top and portable device market continues to grow, as does the demand for more performance under increasing cost, power, thermal constraints. integration of Graphics Processing Units (GPUs) into these devices emergence general-purpose computations on graphics hardware enable a new set highly parallel applications. In this paper, we propose make case GPU multitasking technique called spatial multitasking. Traditional techniques, such cooperative preemptive multitasking, partition time...

10.1109/hpca.2012.6168946 article EN 2012-02-01

Decimal multiplication is important in many commercial applications including financial analysis, banking, tax calculation, currency conversion, insurance, and accounting. We present two novel designs for fixed-point decimal that utilize carry-save addition to reduce the critical path delay. First, a multiplier stores reduced number of multiplicand multiples uses iterative portion design presented. Then, second proposed with several notable improvements fast generation do not need be stored,...

10.1109/asap.2003.1212858 article EN 2004-03-01

This article provides an overview of AMD's vision for exascale computing, and in particular, how heterogeneity will play a central role realizing this vision. Exascale computing requires high levels performance capabilities while staying within stringent power budgets. Using hardware optimized specific functions is much more energy efficient than implementing those with general-purpose cores. However, there strong desire supercomputer customers not to have pay custom components designed only...

10.1109/mm.2015.71 article EN IEEE Micro 2015-07-01

State-of-the-art graphic processing units (GPUs) provide very high memory bandwidth, but the performance of many general-purpose GPU (GPGPU) workloads is still bounded by bandwidth. Although compression techniques have been adopted commercial GPUs, they are only used for compressing texture and color data, not data GPGPU workloads. Furthermore, microarchitectural details proprietary its benefits previously published. In this paper, we first investigate required changes to support lossless...

10.1145/2370816.2370864 article EN 2012-09-19

The challenges to push computing exaflop levels are difficult given desired targets for memory capacity, bandwidth, power efficiency, reliability, and cost. This paper presents a vision an architecture that can be used construct exascale systems. We describe conceptual Exascale Node Architecture (ENA), which is the computational building block supercomputer. ENA consists of Heterogeneous Processor (EHP) coupled with advanced system. EHP provides high-performance accelerated processing unit...

10.1109/hpca.2017.42 article EN 2017-02-01

10.1023/a:1008004523235 article EN The Journal of VLSI Signal Processing Systems for Signal Image and Video Technology 1999-01-01

Decimal multiplication is important in many commercial applications including financial analysis, banking, tax calculation, currency conversion, insurance, and accounting. This paper presents a novel design for fixed-point decimal that utilizes simple recoding scheme to produce signed-magnitude representations of the operands thereby greatly simplifying process generating partial products each multiplier digit. The are generated using digit-by-digit on word-by-digit basis, first signed-digit...

10.1109/arith.2005.15 article EN 2005-07-27

There is increasing interest in hardware support for decimal arithmetic as a result of recent growth commercial, financial, and Internet-based applications. Consequently, new specifications floating-point have been added to the draft revision IEEE-754 Standard arithmetic. This paper introduces analyzes three techniques performing fast addition on multiple binary coded (BCD) operands. Two speculate BCD correction values correct intermediate results while adding input The first speculates over...

10.1109/tc.2005.129 article EN IEEE Transactions on Computers 2005-06-22

State-of-the-art graphic processing units (GPUs) can offer very high computational throughput for highly parallel applications using hundreds of integrated cores. In general, the peak a GPU is proportional to product number cores and their frequency. However, often limited by power constraint. Although be increased with more some applications, it cannot others because parallelism and/or bandwidth on-chip interconnects/caches off-chip memory are limited. this paper, first, we demonstrate that...

10.1109/pact.2011.17 article EN International Conference on Parallel Architectures and Compilation Techniques 2011-10-01

Sudden variations in current (large di/dt) can lead to significant power supply voltage droops and timing errors modern microprocessors. Several papers discuss the complexity involved with developing test programs, also known as stress marks, system. Authors of these produced tools methodologies generate marks automatically using techniques such integer linear programming or genetic algorithms. However, nearly all previous work took place context single-core systems, results were collected...

10.1109/micro.2012.28 article EN 2012-12-01

High-throughput and low-latency sorting is a key requirement in many applications that deal with large amounts of data. This paper presents efficient techniques for designing high-throughput, units. Our architectures utilize modular design hierarchically construct units from smaller building blocks. The are optimized situations which only the M largest numbers N inputs needed, because this situation commonly occurs scientific computing, data mining, network processing, digital signal...

10.1109/tc.2012.108 article EN IEEE Transactions on Computers 2012-05-30

The peak compute performance of GPUs has been increased by integrating more resources and operating them at higher frequency. However, such approaches significantly increase power consumption GPUs, limiting further due to the constraint. Facing a challenge, we propose three techniques improve efficiency in this paper. First, observe that many GPGPU applications are integer-intensive. For applications, combine pair dependent integer instructions into composite instruction can be executed an...

10.1109/hpca.2013.6522330 article EN 2013-02-01

With technology scaling, manufacturers are integrating both CPU and GPU cores in a single chip to improve the throughput of emerging applications. To maximize single-chip heterogeneous processor (SCHP), power budget shared between must be effectively utilized. At same time, an SCHP each satisfy its own constraint. Furthermore, allocated impacts performance. In this paper, using detailed cycle-level simulator, we first demonstrate that joint optimization workload partitioning can provide 13%...

10.1145/2370816.2370873 article EN 2012-09-19

Reducing the power dissipation of parallel multipliers is important in design digital signal processing systems. In many these systems, products are rounded to avoid growth word size. The and area can be significantly reduced by a technique known as truncated multiplication. With this technique, least significant columns multiplication matrix not used. Instead, carries generated estimated. This estimate added with most produce product. paper presents implementation multipliers. Simulations...

10.1109/lpd.1999.750404 article EN 1999-01-01

Column compression multipliers are frequently used in high-performance computer systems due to their short worst case delay. This paper examines the area, delay, and power characteristics of Dadda (1965) Wallace (1964) column deep submicron technology. Our analysis shows that have slightly more area approximately same delay as multipliers. It also importance considering parasitic capacitances when determining multipliers, since parasitics can increase multiplier by over 60%. As size...

10.1109/arith.2001.930101 article EN 2002-11-13

Decimal floating-point multiplication is important in many commercial applications including banking, tax calculation, currency conversion, and other financial areas. This paper presents a fully parallel decimal multiplier compliant with the recent draft of IEEE P754 Standard for Floating-point Arithmetic (IEEE P754). The novelty design that it first offering low latency high throughput. based on previously published fixed-point which uses alternate digit encodings to reduce area delay....

10.1109/iccd.2007.4601916 article EN 2007-10-01

The demand for improved SIMD floating-point performance on general-purpose x86-compatible microprocessors is rising. At the same time, there a conflicting in low-power computing market reduction power consumption. Along with this, absolute necessity of backward compatibility microprocessors, which includes support x87 scientific instructions. combined effect that need low-power, low-cost units are still capable delivering good while maintaining full x86 functionality. This paper presents...

10.1109/tc.2008.203 article EN IEEE Transactions on Computers 2008-11-06

Decimal multiplication is important in many commercial applications including financial analysis, banking, tax calculation, currency conversion, insurance, and accounting. This paper presents the design of two decimal floating-point multipliers: one whose partial product accumulation strategy employs carry-save addition that binary addition. The multiplier based on favors a nonpipelined iterative implementation. utilizing allows for an efficient pipelined implementation when latency...

10.1109/tc.2008.218 article EN IEEE Transactions on Computers 2008-12-18

Per-core voltage domains can improve performance under a power constraint. Most commercial processors, however, only have single domain for all processor cores. This is because splitting the into per-core and powering them with multiple off-chip regulators (VRs) incur high cost platform package designs. Although using on-chip switching VRs be an alternative solution, integrating high-quality inductors cores has been technical challenge. In this paper, we propose cost-effective delivery...

10.1109/tvlsi.2013.2257900 article EN IEEE Transactions on Very Large Scale Integration (VLSI) Systems 2013-05-29

Due to rapid growth in financial, commercial, and Internet-based applications, there is an increasing desire allow computers operate on both binary decimal floating-point numbers. Consequently, specifications for support are being added the IEEE-754 Standard Floating-Point Arithmetic. In this paper, we present design implementation of a adder that compliant with current draft revision standard. The supports operations 64-bit (16-digit) operands. We provide synthesis results indicating...

10.1109/isvlsi.2004.1339563 article EN IEEE Computer Society Annual Symposium on VLSI 2004-10-04

Decimal arithmetic is often used in commercial, financial, and Internet-based applications. Due to the growing importance of decimal floating-point (DFP) arithmetic, IEEE 754-2008 Standard for Floating-Point Arithmetic (IEEE 754-2008) includes specifications DFP arithmetic. IBM recently announced adding instructions their POWER6, z9, z10 microprocessor architectures. As processor support emerges, it important investigate efficient algorithms hardware designs common operations. This paper...

10.1109/tc.2008.147 article EN IEEE Transactions on Computers 2008-08-19

Media processing applications typically involve large amounts of data-level parallelism and operate on low-precision operands. This paper presents multiplier architectures for multimedia compares them to conventional general-purpose in terms area delay. The proposed support subword additional features, which enhance their performance applications, yet require only slightly more delay than multipliers processing.

10.1109/acssc.2003.1292369 article EN 2004-07-08

Barrel shifters are often utilized by embedded digital signal processors and general-purpose to manipulate data. This paper examines design alternatives for barrel that perform the following functions: shift right logical, arithmetic, rotate right, left left. Four different shifter designs presented compared in terms of area delay a variety operand sizes. also techniques detecting results overflow zero parallel with or operation. Several Java programs developed generate structural VHDL...

10.1117/12.452034 article EN Proceedings of SPIE, the International Society for Optical Engineering/Proceedings of SPIE 2002-12-01
Coming Soon ...