NFDI4DS | UHH-SEMS - Publication Details

Michael Schulte

ORCID: 0000-0003-1305-406X

Publications

Citations

Views

---

Saved

---

About

Contact & Profiles

A5103196891

Research Areas

Parallel Computing and Optimization Techniques
Numerical Methods and Algorithms
Low-power high-performance VLSI design
Embedded Systems Design Techniques
Interconnection Networks and Systems
Digital Filter Design and Implementation
Analog and Mixed-Signal Circuit Design
Advanced Data Storage Technologies
Distributed and Parallel Computing Systems
Radiation Effects in Electronics
Advanced Wireless Communication Techniques
Cryptography and Residue Arithmetic
Particle Detector Development and Performance
Particle physics theoretical and experimental studies
Algorithms and Data Compression
Computational Physics and Python Applications
Real-Time Systems Scheduling
Optical measurement and interference techniques
Wireless Communication Networks Research
Coding theory and cryptography
VLSI and Analog Circuit Testing
3D IC and TSV technologies
Advanced Memory and Neural Computing
Polynomial and algebraic computation
Surface Roughness and Optical Measurements

Advanced Micro Devices (Canada)
2002-2024

Advanced Micro Devices (United States)
2010-2023

University of Wisconsin–Madison
2005-2019

IEEE Computer Society
2013

Universidad de Málaga
2007-2010

TU Dortmund University
2010

Bremen Institute for Applied Beam Technology
2007

Madison Group (United States)
2007

University of Wisconsin System
2004-2006

Lehigh University
1998-2005

The case for GPGPU spatial multitasking

OPENALEX - Publications

Jacob T. Adriaens Katherine Compton Nam Sung Kim Michael Schulte

The set-top and portable device market continues to grow, as does the demand for more performance under increasing cost, power, thermal constraints. integration of Graphics Processing Units (GPUs) into these devices emergence general-purpose computations on graphics hardware enable a new set highly parallel applications. In this paper, we propose make case GPU multitasking technique called spatial multitasking. Traditional techniques, such cooperative preemptive multitasking, partition time...

10.1109/hpca.2012.6168946 article EN 2012-02-01

Decimal multiplication via carry-save addition

OPENALEX - Publications

M.A. Erle Michael Schulte

Decimal multiplication is important in many commercial applications including financial analysis, banking, tax calculation, currency conversion, insurance, and accounting. We present two novel designs for fixed-point decimal that utilize carry-save addition to reduce the critical path delay. First, a multiplier stores reduced number of multiplicand multiples uses iterative portion design presented. Then, second proposed with several notable improvements fast generation do not need be stored,...

10.1109/asap.2003.1212858 article EN 2004-03-01

An Overview of Reconfigurable Hardware in Embedded Systems

OPENALEX - Publications

Philip Garcia Katherine Compton Michael Schulte Emily Blem Wenyin Fu

10.1155/es/2006/56320 article EN EURASIP Journal on Embedded Systems 2006-01-01

Achieving Exascale Capabilities through Heterogeneous Computing

OPENALEX - Publications

Michael Schulte Mike Ignatowski Gabriel H. Loh Bradford M. Beckmann William C. Brantley and 5 more

This article provides an overview of AMD's vision for exascale computing, and in particular, how heterogeneity will play a central role realizing this vision. Exascale computing requires high levels performance capabilities while staying within stringent power budgets. Using hardware optimized specific functions is much more energy efficient than implementing those with general-purpose cores. However, there strong desire supercomputer customers not to have pay custom components designed only...

10.1109/mm.2015.71 article EN IEEE Micro 2015-07-01

Lossless and lossy memory I/O link compression for improving performance of GPGPU workloads

OPENALEX - Publications

Vijay Sathish Michael Schulte Nam Sung Kim

State-of-the-art graphic processing units (GPUs) provide very high memory bandwidth, but the performance of many general-purpose GPU (GPGPU) workloads is still bounded by bandwidth. Although compression techniques have been adopted commercial GPUs, they are only used for compressing texture and color data, not data GPGPU workloads. Furthermore, microarchitectural details proprietary its benefits previously published. In this paper, we first investigate required changes to support lossless...

10.1145/2370816.2370864 article EN 2012-09-19

Design and Analysis of an APU for Exascale Computing

OPENALEX - Publications

Thiruvengadam Vijayaraghavan Arun Karunanithi Onur Kayıran Mitesh R. Meswani Indrani Paul and 13 more

The challenges to push computing exaflop levels are difficult given desired targets for memory capacity, bandwidth, power efficiency, reliability, and cost. This paper presents a vision an architecture that can be used construct exascale systems. We describe conceptual Exascale Node Architecture (ENA), which is the computational building block supercomputer. ENA consists of Heterogeneous Processor (EHP) coupled with advanced system. EHP provides high-performance accelerated processing unit...

10.1109/hpca.2017.42 article EN 2017-02-01

OPENALEX - Publications

James E. Stine Michael Schulte

10.1023/a:1008004523235 article EN The Journal of VLSI Signal Processing Systems for Signal Image and Video Technology 1999-01-01

Decimal Multiplication with Efficient Partial Product Generation

OPENALEX - Publications

M.A. Erle E. Schwarz Michael Schulte

Decimal multiplication is important in many commercial applications including financial analysis, banking, tax calculation, currency conversion, insurance, and accounting. This paper presents a novel design for fixed-point decimal that utilizes simple recoding scheme to produce signed-magnitude representations of the operands thereby greatly simplifying process generating partial products each multiplier digit. The are generated using digit-by-digit on word-by-digit basis, first signed-digit...

10.1109/arith.2005.15 article EN 2005-07-27

High-Speed Multioperand Decimal Adders

OPENALEX - Publications

R.D. Kenney Michael Schulte

There is increasing interest in hardware support for decimal arithmetic as a result of recent growth commercial, financial, and Internet-based applications. Consequently, new specifications floating-point have been added to the draft revision IEEE-754 Standard arithmetic. This paper introduces analyzes three techniques performing fast addition on multiple binary coded (BCD) operands. Two speculate BCD correction values correct intermediate results while adding input The first speculates over...

10.1109/tc.2005.129 article EN IEEE Transactions on Computers 2005-06-22

Improving Throughput of Power-Constrained GPUs Using Dynamic Voltage/Frequency and Core Scaling

OPENALEX - Publications

Jungseob Lee Vijay Sathisha Michael Schulte Katherine Compton Nam Sung Kim

State-of-the-art graphic processing units (GPUs) can offer very high computational throughput for highly parallel applications using hundreds of integrated cores. In general, the peak a GPU is proportional to product number cores and their frequency. However, often limited by power constraint. Although be increased with more some applications, it cannot others because parallelism and/or bandwidth on-chip interconnects/caches off-chip memory are limited. this paper, first, we demonstrate that...

10.1109/pact.2011.17 article EN International Conference on Parallel Architectures and Compilation Techniques 2011-10-01

AUDIT: Stress Testing the Automatic Way

OPENALEX - Publications

Young-Taek Kim Lizy K. John Sanjay Pant Srilatha Manne Michael Schulte and 2 more

Sudden variations in current (large di/dt) can lead to significant power supply voltage droops and timing errors modern microprocessors. Several papers discuss the complexity involved with developing test programs, also known as stress marks, system. Authors of these produced tools methodologies generate marks automatically using techniques such integer linear programming or genetic algorithms. However, nearly all previous work took place context single-core systems, results were collected...

10.1109/micro.2012.28 article EN 2012-12-01

Modular Design of High-Throughput, Low-Latency Sorting Units

OPENALEX - Publications

Amin Farmahini-Farahani Henry Duwe Michael Schulte Katherine Compton

High-throughput and low-latency sorting is a key requirement in many applications that deal with large amounts of data. This paper presents efficient techniques for designing high-throughput, units. Our architectures utilize modular design hierarchically construct units from smaller building blocks. The are optimized situations which only the M largest numbers N inputs needed, because this situation commonly occurs scientific computing, data mining, network processing, digital signal...

10.1109/tc.2012.108 article EN IEEE Transactions on Computers 2012-05-30

Power-efficient computing for compute-intensive GPGPU applications

OPENALEX - Publications

Syed Zohaib Gilani Nam Sung Kim Michael Schulte

The peak compute performance of GPUs has been increased by integrating more resources and operating them at higher frequency. However, such approaches significantly increase power consumption GPUs, limiting further due to the constraint. Facing a challenge, we propose three techniques improve efficiency in this paper. First, observe that many GPGPU applications are integer-intensive. For applications, combine pair dependent integer instructions into composite instruction can be executed an...

10.1109/hpca.2013.6522330 article EN 2013-02-01

Workload and power budget partitioning for single-chip heterogeneous processors

OPENALEX - Publications

Hao Wang Vijay Sathish Ripudaman Singh Michael Schulte Nam Sung Kim

With technology scaling, manufacturers are integrating both CPU and GPU cores in a single chip to improve the throughput of emerging applications. To maximize single-chip heterogeneous processor (SCHP), power budget shared between must be effectively utilized. At same time, an SCHP each satisfy its own constraint. Furthermore, allocated impacts performance. In this paper, using detailed cycle-level simulator, we first demonstrate that joint optimization workload partitioning can provide 13%...

10.1145/2370816.2370873 article EN 2012-09-19

Reduced power dissipation through truncated multiplication

OPENALEX - Publications

Michael Schulte James E. Stine J.G. Jansen

Reducing the power dissipation of parallel multipliers is important in design digital signal processing systems. In many these systems, products are rounded to avoid growth word size. The and area can be significantly reduced by a technique known as truncated multiplication. With this technique, least significant columns multiplication matrix not used. Instead, carries generated estimated. This estimate added with most produce product. paper presents implementation multipliers. Simulations...

10.1109/lpd.1999.750404 article EN 1999-01-01

Analysis of column compression multipliers

OPENALEX - Publications

K’Andrea Bickerstaff Earl E. Swartzlander Michael Schulte

Column compression multipliers are frequently used in high-performance computer systems due to their short worst case delay. This paper examines the area, delay, and power characteristics of Dadda (1965) Wallace (1964) column deep submicron technology. Our analysis shows that have slightly more area approximately same delay as multipliers. It also importance considering parasitic capacitances when determining multipliers, since parasitics can increase multiplier by over 60%. As size...

10.1109/arith.2001.930101 article EN 2002-11-13

A parallel IEEE P754 decimal floating-point multiplier

OPENALEX - Publications

Brian Hickmann Andrew Krioukov Michael Schulte M.A. Erle

Decimal floating-point multiplication is important in many commercial applications including banking, tax calculation, currency conversion, and other financial areas. This paper presents a fully parallel decimal multiplier compliant with the recent draft of IEEE P754 Standard for Floating-point Arithmetic (IEEE P754). The novelty design that it first offering low latency high throughput. based on previously published fixed-point which uses alternate digit encodings to reduce area delay....

10.1109/iccd.2007.4601916 article EN 2007-10-01

Low-Power Multiple-Precision Iterative Floating-Point Multiplier with SIMD Support

OPENALEX - Publications

Dimitri Tan Carl E. Lemonds Michael Schulte

The demand for improved SIMD floating-point performance on general-purpose x86-compatible microprocessors is rising. At the same time, there a conflicting in low-power computing market reduction power consumption. Along with this, absolute necessity of backward compatibility microprocessors, which includes support x87 scientific instructions. combined effect that need low-power, low-cost units are still capable delivering good while maintaining full x86 functionality. This paper presents...

10.1109/tc.2008.203 article EN IEEE Transactions on Computers 2008-11-06

Decimal Floating-Point Multiplication

OPENALEX - Publications

M.A. Erle Brian Hickmann Michael Schulte

Decimal multiplication is important in many commercial applications including financial analysis, banking, tax calculation, currency conversion, insurance, and accounting. This paper presents the design of two decimal floating-point multipliers: one whose partial product accumulation strategy employs carry-save addition that binary addition. The multiplier based on favors a nonpipelined iterative implementation. utilizing allows for an efficient pipelined implementation when latency...

10.1109/tc.2008.218 article EN IEEE Transactions on Computers 2008-12-18

Low-Cost Per-Core Voltage Domain Support for Power-Constrained High-Performance Processors

OPENALEX - Publications

Abhishek Sinkar Hamid Ghasemi Michael Schulte Ulya R. Karpuzcu Nam Sung Kim

Per-core voltage domains can improve performance under a power constraint. Most commercial processors, however, only have single domain for all processor cores. This is because splitting the into per-core and powering them with multiple off-chip regulators (VRs) incur high cost platform package designs. Although using on-chip switching VRs be an alternative solution, integrating high-quality inductors cores has been technical challenge. In this paper, we propose cost-effective delivery...

10.1109/tvlsi.2013.2257900 article EN IEEE Transactions on Very Large Scale Integration (VLSI) Systems 2013-05-29

A 64-bit decimal floating-point adder

OPENALEX - Publications

Jennifer Thompson N. Karra Michael Schulte

Due to rapid growth in financial, commercial, and Internet-based applications, there is an increasing desire allow computers operate on both binary decimal floating-point numbers. Consequently, specifications for support are being added the IEEE-754 Standard Floating-Point Arithmetic. In this paper, we present design implementation of a adder that compliant with current draft revision standard. The supports operations 64-bit (16-digit) operands. We provide synthesis results indicating...

10.1109/isvlsi.2004.1339563 article EN IEEE Computer Society Annual Symposium on VLSI 2004-10-04

The Sandbridge SB3011 Platform

OPENALEX - Publications

John Glossner Daniel Iancu Mayan Moudgill Gary Nacer Sanjay Jinturkar and 2 more

10.1155/2007/56467 article EN EURASIP Journal on Embedded Systems 2007-01-01

Hardware Designs for Decimal Floating-Point Addition and Related Operations

OPENALEX - Publications

Liang‐Kai Wang Michael Schulte John D. Thompson Nandini Jairam

Decimal arithmetic is often used in commercial, financial, and Internet-based applications. Due to the growing importance of decimal floating-point (DFP) arithmetic, IEEE 754-2008 Standard for Floating-Point Arithmetic (IEEE 754-2008) includes specifications DFP arithmetic. IBM recently announced adding instructions their POWER6, z9, z10 microprocessor architectures. As processor support emerges, it important investigate efficient algorithms hardware designs common operations. This paper...

10.1109/tc.2008.147 article EN IEEE Transactions on Computers 2008-08-19

Multiplier architectures for media processing

OPENALEX - Publications

S. Krithivasan Michael Schulte

Media processing applications typically involve large amounts of data-level parallelism and operate on low-precision operands. This paper presents multiplier architectures for multimedia compares them to conventional general-purpose in terms area delay. The proposed support subword additional features, which enhance their performance applications, yet require only slightly more delay than multipliers processing.

10.1109/acssc.2003.1292369 article EN 2004-07-08

Design alternatives for barrel shifters

OPENALEX - Publications

Matthew Rudolf Pillmeier Michael Schulte E. George Walters

Barrel shifters are often utilized by embedded digital signal processors and general-purpose to manipulate data. This paper examines design alternatives for barrel that perform the following functions: shift right logical, arithmetic, rotate right, left left. Four different shifter designs presented compared in terms of area delay a variety operand sizes. also techniques detecting results overflow zero parallel with or operation. Several Java programs developed generate structural VHDL...

10.1117/12.452034 article EN Proceedings of SPIE, the International Society for Optical Engineering/Proceedings of SPIE 2002-12-01

Coming Soon ...