- Parallel Computing and Optimization Techniques
- Embedded Systems Design Techniques
- Advanced Data Storage Technologies
- Formal Methods in Verification
- Interconnection Networks and Systems
- Low-power high-performance VLSI design
- Numerical Methods and Algorithms
- Logic, programming, and type systems
- Radiation Effects in Electronics
- Real-Time Systems Scheduling
- Distributed and Parallel Computing Systems
- Digital Filter Design and Implementation
- Advanced Authentication Protocols Security
- Advanced Neural Network Applications
- VLSI and Analog Circuit Testing
- Algorithms and Data Compression
- Distributed systems and fault tolerance
- Advanced Software Engineering Methodologies
- User Authentication and Security Systems
- Security and Verification in Computing
- Analog and Mixed-Signal Circuit Design
- Cloud Computing and Remote Desktop Technologies
- Scientific Computing and Data Management
- Architecture and Computational Design
- IoT and Edge/Fog Computing
IBM (United States)
2005-2022
Intel (United States)
2022
Xilinx (United States)
2022
The University of Tokyo
2022
Takeda (United States)
2022
University of Massachusetts Amherst
2022
NEC (Japan)
2022
Poughkeepsie Public Library District
2015-2018
IEEE Computer Society
2013
IBM (Germany)
2002-2012
We present the introduction of transactional memory into next generation IBM System z CPU. first describe instruction-set architecture features, including requirements for enterprise-class software RAS. then implementation in zEnterprise EC12 (zEC12) microprocessor generation, focusing on how can be embedded existing cache design and multiprocessor shared-memory infrastructure. explain practical reasons behind our choices. The zEC12 system is available since September 2012.
The IBM System z10™ microprocessor is currently the fastest running 64-bit CISC (complex instruction set computer) microprocessor. This operates at 4.4 GHz and provides up to two times performance improvement compared with its predecessor, z9® In addition ultrahigh-frequency pipeline, offers such enhancements as a sophisticated branch-prediction structure, large second-level private cache, data-prefetch engine, hardwired decimal floating-point arithmetic unit. z10 also implements new...
The IBM POWER6™ microprocessor core includes two accelerators for increasing performance of specific workloads. vector multimedia extension (VMX) provides a acceleration graphic and scientific It single instructions that work on multiple data elements. separate 128-bit into different components are operated concurrently. decimal floating-point unit (DFU) commercial workloads, more specifically, financial transactions. new number system performs implicit rounding to radix points, feature...
Lossless data compression is highly desirable in enterprise and cloud environments for storage memory cost savings improved utilization I/O network. While the value provided by recognized, its application practice often limited because it's a processor intensive operation resulting low throughput high elapsed time intense workloads.The IBM POWER9 z15 systems overcome shortcomings of existing approaches including novel on-chip integrated accelerator. The accelerator reduces cycles, traffic,...
The floating-point unit (FPU) in the synergistic processor element (SPE) of a CELL is fully pipelined 4-way single-instruction multiple-data (SIMD) designed to accelerate media and data streaming with 128-bit operands. It supports 32-bit single-precision 16-bit integer operands two different latencies, six-cycle seven-cycle, 11 FO4 delay per stage. FPU optimizes performance critical multiply-add operations. Since exact rounding, exceptions, de-norm number handling are not important...
In this paper we describe a methodology to measure exactly the quality of fault-tolerant designs by combining fault-injection in high level design (HLD) descriptions with formal verification approach. We utilize BDD based symbolic simulation determine coverage online error-detection and -correction logic. an easily portable approach, which can be applied wide variety multi-GHz industrial
The floating-point unit in the synergistic processor element of 1st generation multi-core CELL is described. FPU supports 4-way SIMD single precision and integer operations 2-way double operations. design required a high-frequency, low latency, power area efficiency with primary application to multimedia streaming workloads, such as 3D graphics. has 3 different latencies, optimizing performance critical FMA operations, which are executed 6-cycle latency at an 11FO4 cycle time. includes...
We describe the results of examining two large research and commercial systems for ways that they use threads. used three methods: analysis macroscopic thread statistics, microsecond spacing between events, reading implementation code. identify ten different paradigms usage: defer work, general pumps, slack processes, sleepers, one-shots, deadlock avoidance, rejuvenation, serializers, encapsulated fork exploiting parallelism. While some, like are well known, others have not been previously...
The IBM zEnterprise® 196 (z196) system, announced in the second quarter of 2010, is latest generation System z® mainframe. system designed with a new microprocessor and memory subsystems, which distinguishes it from its z10® predecessor. has up to 40% improvement performance for traditional z/OS® workloads carries 60% more capacity when compared z10 subsystem four levels cache hierarchy (L1 through L4) constructs L3 L4 caches embedded DRAM silicon technology, achieves approximately three...
In this paper we describe a fully-automated methodology for formal verification of fused-multiply-add floating point units (FPU). Our verifies an implementation FPU against simple reference model derived from the processor's architectural specification, which may include all aspects IEEE specification including denormal operands and exceptions. strategy uses combination BDD- SAT-based symbolic simulation. To make task tractable, use case-splitting, multiplier isolation, automatic reduction...
The zEnterprise EC12 is the latest generation of IBM'S System Z Enterprise Class mainframe servers. microprocessor operates at an ultra-high frequency 5.5 GHz and incorporates many pipeline-optimization instruction-processing techniques. It also supports innovative instruction set-architecture extensions for future software exploitation to acquire performance gains. this article highlights various factors inside zEC12 achieving best possible computing performance.
The IBM z13™ system is the latest generation of z Systems™ mainframes. z13 microprocessor improves upon zEnterprise® EC12 (zEC12) processor with two vector execution units, higher instruction parallelism, and a simultaneous multithreaded (SMT) architecture that supports concurrent threads. These advances yield performance gains in legacy online transaction processing business analytics workloads. This features an eight-core chip, robust cache hierarchy, large multiprocessor design optimized...
The IBM Z microprocessor in the z14 system has been redesigned to improve performance, capacity, and security [1] over previous z13 [2]. contains up 24 central processor (CP) 4 controller (SC) chips. Each CP, shown die photo A (Fig. 2.2.7), operates at 5.2GHz is comprised of 10 cores, 2 PCIe Gen3 interfaces, an IO bus (GX), 128MB L3 embedded DRAM (eDRAM) cache, X-BUS interfaces connecting other CP chips one SC chip, a redundant array independent memory (RAIM) interface. core on chip 4MB...
IBM Telum is the next generation processor chip for Z and LinuxONE systems. The design focused on enterprise class workloads it achieves over 40% per socket performance growth compared to z15. first server-class with a dedicated on-chip AI accelerator that enables clients gain real time insights from their data as getting processed.
Cedar is the name for both a language and an environment in use Computer Science Laboratory at Xerox PARC since 1980. The superset of Mesa, major additions being garbage collection runtime types. Neither nor was originally intended to be portable, many years ran only on D-machines few other locations Xerox. We recently re-implemented make it portable across different architectures. Our strategy was, first, machine-dependent C code as intermediate language, second, create language-independent...
The IBM z14 is the latest update in storied history of mainframes. Reliability, availability, security, and scalability are foundation mainframe line. System reliability availability targets excess 10 years, requiring rigorous chip characterization processes. In this paper, we discuss some many processes used to ensure that lifetime. An additional part power management (PM). 5.2-GHz high-power design central processor requires advanced on-die PM capabilities adapt intensive instruction...
The latest-generation IBM Z processor provides enhanced performance and compute capacity compared to its z13 predecessor. This paper describes some of the major improvements that include an additional perceptron branch predictor, a completely redesigned translation engine is tightly integrated into core pipeline, level-1 cache directory lookaside buffer design. Outside central processing unit (CPU), sizes have increased on each level, chip now contains 10 CPUs. system topology has been...