- Parallel Computing and Optimization Techniques
- Advanced Data Storage Technologies
- Evolutionary Algorithms and Applications
- Distributed and Parallel Computing Systems
- Embedded Systems Design Techniques
- Metaheuristic Optimization Algorithms Research
- Distributed systems and fault tolerance
- Cloud Computing and Resource Management
- Botany, Ecology, and Taxonomy Studies
- Botany and Plant Ecology Studies
- Interconnection Networks and Systems
- Mediterranean and Iberian flora and fauna
- Low-power high-performance VLSI design
- Advanced Multi-Objective Optimization Algorithms
- Advanced Memory and Neural Computing
- Neural Networks and Applications
- Radiation Effects in Electronics
- Aerodynamics and Fluid Dynamics Research
- Mechanical Engineering and Vibrations Research
- Neural dynamics and brain function
- Advanced Database Systems and Queries
- Plant Ecology and Taxonomy Studies
- Solar and Space Plasma Dynamics
- Plant and animal studies
- Plant Diversity and Evolution
University of the West of England
2015-2024
University of Bath
2024
Franciscan Health
2024
Creative Technologies (United States)
2022-2023
The Francis Crick Institute
2022
University of Wisconsin–Madison
2005-2021
Environmental Protection Agency
2001-2021
Northern Research Station
2008-2021
University of Arizona
1997-2021
Carnegie Mellon University
2021
The performance tradeoff between hardware complexity and clock speed is studied.First, a generic superscalar pipeline defined.Then the specific areas of register renaming, instruction window wakeup selection logic, operand bypassing are analyzed.Each modeled Spice simulated for feature sizes O&m, 0.35,um, 0.18~7% Performance results trends expressed in terms issue width size.Our analysis indicates that logic as well bypass likely to be most critical future.A microarchitecture simplifies...
Article Free AccessA study of branch prediction strategies Share on Author: James E. Smith Control Data Corporation, Arden Hills, Minnesota MinnesotaView Profile Authors Info & Claims ISCA '98: 25 years the international symposia Computer architecture (selected papers)August 1998 Pages 202–215https://doi.org/10.1145/285930.285980Online:01 August 1998Publication History 143citation2,616DownloadsMetricsTotal Citations143Total Downloads2,616Last 12 Months286Last 6 weeks43 Get Citation AlertsNew...
BACKGROUND AND PURPOSE. Peer norms influence the adoption of behavior changes to reduce risk for HIV (human immunodeficiency virus) infection. By experimentally intervening at a community level modify norms, it may be possible promote generalized reductions in practices within population. METHODS. We trained persons reliably identified as popular opinion leaders among gay men small city serve change endorsers their peers. The acquired social skills making these endorsements and complied...
The combination of evolutionary algorithms with local search was named "memetic algorithms" (MAs) (Moscato, 1989). These methods are inspired by models natural systems that combine the adaptation a population individual learning within lifetimes its members. Additionally, MAs Richard Dawkin's concept meme, which represents unit cultural evolution can exhibit refinement (Dawkins, 1976). In case MA's, "memes" refer to strategies (e.g., refinement, perturbation, or constructive methods, etc.)...
A virtual machine can support individual processes or a complete system depending on the abstraction level where virtualization occurs. Some VMs flexible hardware usage and software isolation, while others translate from one instruction set to another. Virtualizing component -such as processor, memory, an I/O device - at given maps its interface visible resources onto of underlying, possibly different, real system. Consequently, appears different even multiple systems. Interjecting...
As the issue width of superscalar processors is increased, instruction fetch bandwidth requirements will also increase. It become necessary to multiple basic blocks per cycle. Conventional caches hinder this effort because long sequences are not always in contiguous cache locations. We propose supplementing conventional with a trace cache. This structure traces dynamic stream, so instructions that otherwise noncontiguous appear contiguous. For Instruction Benchmark Suite (IBS) and SPEC92...
The predictability of data values is studied at a fundamental level. Two basic predictor models are defined: Computational predictors perform an operation on previous to yield predicted next values. Examples we study stride value prediction (which adds delta value) and last performs the trivial identity value); Context Based} match recent history (context) with predict based entirely previously observed patterns. To understand potential simulations unbounded tables that immediately updated...
We propose and evaluate a multi-thread memory scheduler that targets high performance CMPs. The proposed is based on concepts originally developed for network fair queuing scheduling algorithms. provides quality of service (QoS) while improving system performance. On four processor CMP running workloads containing mix applications with range bandwidth demands, the QoS to all threads in workloads, improves by an average 14% (41% best case), reduces variance threads' target utilization from .2 .0058
Superscalar processing is the latest in along series of innovations aimed at producing ever-faster microprocessors. By exploiting instruction-level parallelism, superscalar processors are capable executing more than one instruction a clock cycle. This paper discusses microarchitecture processors. We begin with discussion general problem solved by processors: converting an ostensibly sequential program into parallel one. The principles underlying this process, and constraints that must be...
Traces are dynamic instruction sequences constructed and cached by hardware. A microarchitecture organized around traces is presented as a means for efficiently executing many instructions per cycle. Trace processors exploit both control flow data hierarchy to overcome complexity architectural limitations of conventional superscalar (1) distributing execution resources based on trace boundaries (2) applying prediction at the level rather than individual branches or instructions. Three sets...
A new structure for implementing data cache prefetching is proposed and analyzed via simulation. The based on a Global History Buffer that holds the most recent miss addresses in FIFO order. Linked lists within this global history buffer connect have some common property, e.g. they were all generated by same load instruction. can be used number of previously prefetch methods, as well ones. Prefetching with has two significant advantages over conventional table methods. First, use improve...
Microprocessors are designed to provide good average performance over a variety of workloads. This can lead inefficiencies both in power and for individual programs during phases within the same program. Microarchitectures with multi-configuration units (e.g. caches, predictors, instruction windows) able adapt dynamically program behavior enable/disable resources as needed. A key element existing configuration algorithms is adjusting phase changes. typically done by "tuning" when change...
Five solutions to the precise interrupt problem in pipelined processors are described and evaluated. An is if saved process state corresponds a sequential model of program execution which one instruction completes before next begins. In processor, interrupts difficult implement because an may be initiated its predecessors have completed. The first solution forces instructions complete modify architectural order. other four allow any order, but additional hardware used, so that can restored...
An architecture for improving computer performance is presented and discussed. The main feature of the a high degree decoupling between operand access execution. This results in an implementation which has two separate instruction streams that communicate via queues. A similar been previously proposed array processors, but context software called on to do most coordination synchronization streams. paper emphasizes features remove this burden from programmer. Performance comparisons with...
Dependences among loads and stores whose addresses are unknown hinder the extraction of instruction level parallelism during execution a sequential program. Such ambiguous memory dependences can be overcome by dependence speculation which enables load or store to speculatively executed before all preceding known. Furthermore, multiple speculative location create versions location. Program order must tracked maintain semantics. A previously proposed approach, address resolution buffer (ARB)...
A proposed performance model for superscalar processorsconsists of 1) a component that models the relationshipbetween instructions issued per cycle and sizeof instruction window under ideal conditions, 2)methods calculating transient penaltiesdue to branch mispredictions, cache misses,and data misses.Using trace-derived dependenceinformation, miss rates,and miss-prediction rates as inputs, canarrive at estimates typical superscalarprocessor are within 5.8% detailed simulation onaverage 13%...
Many high performance processors predict conditional branches and consume processor resources based on the prediction. In some situations, resource allocation can be better optimized if a confidence level is assigned to branch prediction; i.e. quantity of allocated function level. To support such optimizations, we consider hardware mechanisms that partition predictions into two sets: those which are accurate relatively percentage time, low time. The objective concentrate as many...