NFDI4DS | UHH-SEMS - Publication Details

Memory access scheduling

OPENALEX - Publications

Scott Rixner William J. Dally Ujval J. Kapasi Peter Mattson John D. Owens

The bandwidth and latency of a memory system are strongly dependent on the manner in which accesses interact with "3-D" structure banks, rows, columns characteristic contemporary DRAM chips. There is nearly an order magnitude difference between successive references to different within row rows bank. This paper introduces access scheduling, technique that improves performance by reordering exploit locality 3-D structure. Conservative reordering, first ready reference sequence performed, 40%...

10.1145/339647.339668 article EN 2000-01-01

Scheduling I/O in virtual machine monitors

OPENALEX - Publications

Diego Ongaro Alan L. Cox Scott Rixner

This paper explores the relationship between domain scheduling in avirtual machine monitor (VMM) and I/O performance. Traditionally, VMM schedulers have focused on fairly sharing processor resources among domains while leaving of as asecondary concern. However, this can resultin poor and/or unpredictable application performance, making virtualization less desirable for applications that require efficient consistent behavior.

10.1145/1346256.1346258 article EN 2008-03-05

The Hadoop distributed filesystem: Balancing portability and performance

OPENALEX - Publications

Jeffrey Shafer Scott Rixner Alan L. Cox

Hadoop is a popular open-source implementation of MapReduce for the analysis large datasets. To manage storage resources across cluster, uses distributed user-level filesystem. This filesystem - HDFS written in Java and designed portability heterogeneous hardware software platforms. paper analyzes performance uncovers several issues. First, architectural bottlenecks exist that result inefficient usage due to delays scheduling new tasks. Second, limitations prevent from exploiting features...

10.1109/ispass.2010.5452045 article EN 2010-03-01

Imagine: media processing with streams

OPENALEX - Publications

Brucek Khailany William J. Dally Ujval J. Kapasi Peter Mattson Jae‐Eun Namkoong and 4 more

The power-efficient Imagine stream processor achieves performance densities comparable to those of special-purpose embedded processors. Executing programs mapped streams and kernels, a single is expected have peak 20 gflops sustain 18.3 gops on mpeg-2 encoding.

10.1109/40.918001 article EN IEEE Micro 2001-01-01

Programmable stream processors

OPENALEX - Publications

Ujval J. Kapasi Scott Rixner William J. Dally Brucek Khailany Jung Ho Ahn and 2 more

The demand for flexibility in media processing motivates the use of programmable processors. Stream bridges gap between inflexible special-purpose solutions and current architectures that cannot meet computational demands media-processing applications. central idea behind stream is to organize an application into streams kernels expose inherent locality concurrency performance Imagine processor on these given.

10.1109/mc.2003.1220582 article EN Computer 2003-08-01

Register organization for media processing

OPENALEX - Publications

Scott Rixner William J. Dally Brucek Khailany Peter Mattson Ujval J. Kapasi and 1 more

Processor architectures with tens to hundreds of arithmetic units are emerging handle media processing applications. These applications, such as image coding, synthesis and understanding, require rates up 10/sup 11/ operations per second. As the number in a processor increases meet these demands, register storage communication between dominate area, delay power units. In this paper, we show that partitioning file along three axes reduces cost without significantly impacting performance. We...

10.1109/hpca.2000.824366 article EN 2002-11-07

A bandwidth-efficient architecture for media processing

OPENALEX - Publications

Scott Rixner William J. Dally Ujval J. Kapasi Brucek Khailany A. Lopez-Lagunas and 2 more

Media applications are characterized by large amounts of available parallelism, little data reuse, and a high computation to memory access ratio. While these characteristics poorly matched conventional microprocessor architectures, they good fit for modern VLSI technology with its arithmetic capacity but limited global bandwidth. The stream programming model, in which an application is coded as streams records passing through kernels, exposes both parallelism locality media that can be...

10.5555/290940.290946 article EN International Symposium on Microarchitecture 1998-11-01

The Imagine Stream Processor

OPENALEX - Publications

Ujval J. Kapasi William J. Dally Scott Rixner John D. Owens Brucek Khailany

The Imagine Stream Processor is a single-chip programmable media processor with 48 parallel ALUs. At 400 MHz, this translates to peak arithmetic rate of 16 GFLOPS on single-precision data and 32 GOPS bit fixed-point data. scalability Imagine's programming model architecture enable it achieve such high rates. executes applications that have been mapped the stream model. decomposes into set computation kernels operate streams. This mapping exposes inherent locality parallelism in application,...

10.1109/iccd.2002.1106783 article EN 2003-06-26

Translation caching

OPENALEX - Publications

Thomas W. Barr Alan L. Cox Scott Rixner

This paper explores the design space of MMU caches that accelerate virtual-to-physical address translation in processor architectures, such as x86-64, use a radix tree page table. In particular, these table walk occurs after miss Translation Lookaside Buffer. shows most effective are caches, which store partial translations and allow hardware to skip one or more levels

10.1145/1815961.1815970 article EN 2010-06-19

Concurrent Direct Network Access for Virtual Machine Monitors

OPENALEX - Publications

Jeffrey Shafer David Carr Aravind Menon Scott Rixner Anna L. Cox and 2 more

This paper presents hardware and software mechanisms to enable concurrent direct network access (CDNA) by operating systems running within a virtual machine monitor. In conventional monitor, each system must the through software-virtualized interface. These interfaces are multiplexed in onto physical interface, incurring significant performance overheads. The CDNA architecture improves networking efficiency dividing tasks of traffic multiplexing, interrupt delivery, memory protection between...

10.1109/hpca.2007.346208 article EN 2007-01-01

SpecTLB

OPENALEX - Publications

Thomas W. Barr Alan L. Cox Scott Rixner

Data-intensive computing applications are using more and memory placing an increasing load on the virtual system. While use of large pages can help alleviate overhead address translation, they limit control operating system has over allocation protection. We present a novel device, SpecTLB, that exploits predictable behavior reservation-based physical allocators to interpolate translations.

10.1145/2000064.2000101 article EN 2011-06-04

Memory Controller Optimizations for Web Servers

OPENALEX - Publications

Scott Rixner

This paper analyzes memory access scheduling and virtual channels as mechanisms to reduce the latency of main accesses by CPU peripherals in web servers. Despite address filtering effects CPU's cache hierarchy, there is significant locality bank parallelism DRAM stream a server, which includes traffic from operating system, application, peripherals. However, sequential controller leaves much this unexploited, serialization conflicts affect realizable latency. Aggressive within exploit...

10.1109/micro.2004.22 article EN 2005-12-13

Memory access scheduling

OPENALEX - Publications

Scott Rixner William J. Dally Ujval J. Kapasi Peter Mattson John D. Owens

The bandwidth and latency of a memory system are strongly dependent on the manner in which accesses interact with “3-D” structure banks, rows, columns characteristic contemporary DRAM chips. There is nearly an order magnitude difference between successive references to different within row rows bank. This paper introduces access scheduling, technique that improves performance by reordering exploit locality 3-D structure. Conservative reordering, first ready reference sequence performed, 40%...

10.1145/342001.339668 article EN ACM SIGARCH Computer Architecture News 2000-05-01

Achieving 10 Gb/s using safe and transparent network interface virtualization

OPENALEX - Publications

Kaushik Kumar Ram José Renato Santos Yoshio Turner Alan L. Cox Scott Rixner

This paper presents mechanisms and optimizations to reduce the overhead of network interface virtualization when using driver domain I/O model. The model provides benefits such as support for legacy device drivers fault isolation. However, processing overheads incurred in achieve these limit overall performance. demonstrates effectiveness two approaches overheads. First, Xen is modified multi-queue interfaces eliminate software packet demultiplexing copying. Second, a grant reuse mechanism...

10.1145/1508293.1508303 article EN 2009-03-10

Predictive parallelization

OPENALEX - Publications

Myeongjae Jeon Saehoon Kim Seung-won Hwang Yuxiong He Sameh Elnikety and 2 more

Web search engines are optimized to reduce the high-percentile response time consistently provide fast responses almost all user queries. This is a challenging task because query workload exhibits large variability, consisting of many short-running queries and few long-running that significantly impact time. With modern multicore servers, parallelizing processing an individual promising solution execution time, but it gives limited benefits compared sequential since most see little or no...

10.1145/2600428.2609572 article EN 2014-07-03

Facilitating human interaction in an online programming course

OPENALEX - Publications

Joe Warren Scott Rixner John Greiner Stephen B. Wong

Human/human interaction is a critical component of learning in many domains including introductory computer programming. For on-campus courses, lectures and problem sessions provide opportunities for students to interact with the instructor(s) their peers. online human/human are more limited usually correspond activities like forum postings study groups. programming situation potentially even worse since computational tools designed facilitate program, such as unit testing, emphasize...

10.1145/2538862.2538893 article EN 2014-02-18

Scalable Multi-Failure Fast Failover via Forwarding Table Compression

OPENALEX - Publications

Brent Stephens Alan L. Cox Scott Rixner

In datacenter networks, link and switch failures are a common occurrence. Although most of these do not disconnect the underlying topology, they cause routing failures, disrupting communications between some hosts. Unfortunately, current 1:1 redundancy groups only partly effective at reducing impact failures. principle, local fast failover schemes, such as OpenFlow groups, could reduce by preinstalling backup routes that protect against multiple simultaneous However, providing sufficient...

10.1145/2890955.2890957 article EN 2016-03-14

A bandwidth-efficient architecture for media processing

OPENALEX - Publications

Scott Rixner William J. Dally Ujval J. Kapasi Brucek Khailany A. Lopez-Lagunas and 2 more

Media applications are characterized by large amounts of available parallelism, little data reuse, and a high computation to memory access ratio. While these characteristics poorly matched conventional microprocessor architectures, they good fit for modern VLSI technology with its arithmetic capacity but limited global bandwidth. The stream programming model, in which an application is coded as streams records passing through kernels, exposes both parallelism locality media that can be...

10.1109/micro.1998.742118 article EN 2002-11-27

Efficient conditional operations for data-parallel architectures

OPENALEX - Publications

Ujval J. Kapasi William J. Dally Scott Rixner Peter Mattson John D. Owens and 1 more

Article Free Access Share on Efficient conditional operations for data-parallel architectures Authors: Ujval J. Kapasi Computer Systems Laboratory, Stanford University, Stanford, CA CAView Profile , William Dally Scott Rixner Peter R. Mattson John D. Owens Brucek Khailany Authors Info & Claims MICRO 33: Proceedings of the 33rd annual ACM/IEEE international symposium MicroarchitectureDecember 2000 Pages 159–170https://doi.org/10.1145/360128.360145Online:01 December 2000Publication History...

10.1145/360128.360145 article EN 2000-12-01

Media processing applications on the Imagine stream processor

OPENALEX - Publications

John D. Owens Scott Rixner Ujval J. Kapasi Peter Mattson Brian Towles and 2 more

Media applications, such as image processing, signal video, and graphics, require high computation rates data bandwidths. The stream programming model is a natural powerful way to describe these applications. Expressing media applications in this allows hardware software systems take advantage of their concurrency locality order meet computational demands. Imagine system, set tools algorithms, used program the model. We achieve real-time performance on variety processing with (4-15 billion...

10.1109/iccd.2002.1106785 article EN 2003-06-26

Adaptive parallelism for web search

OPENALEX - Publications

Myeongjae Jeon Yuxiong He Sameh Elnikety Alan L. Cox Scott Rixner

A web search query made to Microsoft Bing is currently parallelized by distributing the processing across many servers. Within each of these servers, is, however, processed sequentially. Although server may be multiple queries concurrently, with modern multicore parallelizing an individual within nonetheless improve user's experience reducing response time. In this paper, we describe issues that make parallelization a challenging, and present approach effectively addresses challenges. Since...

10.1145/2465351.2465367 article EN 2013-04-15

Plinko

OPENALEX - Publications

Brent Stephens Alan L. Cox Scott Rixner

This paper introduces Plinko, a network architecture that uses novel forwarding model and routing algorithm to build networks with paths that, assuming arbitrarily large tables, are provably resilient against t link failures, ∀t ∈ N. However, in practice, there clearly limits on the size of tables. Nonetheless, when constrained hardware comparable modern top-of-rack (TOR) switches, Plinko scales high resilience up ten thousand hosts. Thus, as long or fewer links have failed, only reason...

10.1145/2535771.2535774 article EN 2013-11-21