- Parallel Computing and Optimization Techniques
- Distributed and Parallel Computing Systems
- Embedded Systems Design Techniques
- Topic Modeling
- Advanced Data Storage Technologies
- Interconnection Networks and Systems
- Natural Language Processing Techniques
- Distributed systems and fault tolerance
- Software Testing and Debugging Techniques
- Security and Verification in Computing
- Logic, programming, and type systems
- Software Engineering Research
- Advanced Malware Detection Techniques
- Speech and dialogue systems
- Cloud Computing and Resource Management
- Peer-to-Peer Network Technologies
- Formal Methods in Verification
- Multimodal Machine Learning Applications
- Personal Information Management and User Behavior
- Crystallization and Solubility Studies
- X-ray Diffraction in Crystallography
- Algorithms and Data Compression
- Real-Time Systems Scheduling
- Complex Network Analysis Techniques
- AI in Service Interactions
Stanford University
2016-2025
University of Central Oklahoma
2020-2024
Dean College
2023
Oklahoma Library Association
2022
Laboratoire d'Informatique de Paris-Nord
2005-2021
Qingdao University
2020
Creighton University
2020
La Trobe University
2015
Florey Institute of Neuroscience and Mental Health
2015
The University of Melbourne
2015
article Free Access Share on A data locality optimizing algorithm Authors: Michael E. Wolf Computer Systems Laboratory, Stanford University, CA CAView Profile , Monica S. Lam Authors Info & Claims ACM SIGPLAN NoticesVolume 26Issue 6June 1991 pp 30–44https://doi.org/10.1145/113446.113449Online:01 May 1991Publication History 1,081citation3,275DownloadsMetricsTotal Citations1,081Total Downloads3,275Last 12 Months187Last 6 weeks21 Get Citation AlertsNew Alert added!This alert has been...
The overall goals and major features of the directory architecture for shared memory (Dash) are presented. fundamental premise behind is that it possible to build a scalable high-performance machine with single address space coherent caches. Dash in achieves linear or near-linear performance growth as number processors increases from few thousand. This results distributing among processing nodes using network bandwidth connect nodes. allows data be cached, significantly reducing latency...
The basic idea behind software pipelining was first developed by Patel and Davidson for scheduling hardware pipe-lines. As instruction-level parallelism made its way into general-purpose computing, it became necessary to automate scheduling. How whether instructions can be scheduled statically have major ramifications on the design of computer architectures. Rau Glaeser were use in a compiler machine with specialized designed support pipelining. In meantime, trace touted technique choice...
article Free Access Share on Design and evaluation of a compiler algorithm for prefetching Authors: Todd C. Mowry View Profile , Monica S. Lam Anoop Gupta Authors Info & Claims ACM SIGPLAN NoticesVolume 27Issue 9Sept. 1992 pp 62–73https://doi.org/10.1145/143371.143488Online:01 September 1992Publication History 632citation2,106DownloadsMetricsTotal Citations632Total Downloads2,106Last 12 Months218Last 6 weeks42 Get Citation AlertsNew Alert added!This alert has been successfully added will be...
An approach to transformations for general loops in which dependence vectors represent precedence constraints on the iterations of a loop is presented. Therefore, dependences extracted from nest must be lexicographically positive. This leads simple test legality compound transformations: any code transformation that leaves positive legal. The theory applied problem maximizing degree coarse- or fine-grain parallelism nest. It shown maximum can achieved by transforming into coarsest fully...
This paper introduces DIDUCE, a practical and effective tool that aids programmers in detecting complex program errors identifying their root causes. By instrumenting observing its behavior as it runs, DIDUCE dynamically formulates hypotheses of invariants obeyed by the program. hypothesizes strictest at beginning, gradually relaxes hypothesis violations are detected to allow for new behavior. The reported help users catch software bugs soon they occur. They also give visibility into...
This paper shows that software pipelining is an effective and viable scheduling technique for VLIW processors. In pipelining, iterations of a loop in the source program are continuously initiated at constant intervals, before preceding complete. The advantage optimal performance can be achieved with compact object code.
This paper presents the first scalable context-sensitive, inclusion-based pointer alias analysis for Java programs. Our approach to context sensitivity is create a clone of method every interest, and run context-insensitive algorithm over expanded call graph get context-sensitive results. For precision, we generate acyclic path through program's graph, treating methods in strongly connected component as single node. Normally, this formulation hopelessly intractable often has 10 14 paths or...
This article describes automatic parallelization techniques in the SUIF (Stanford University Intermediate Format) compiler that result good multiprocessor performance for array-based numerical programs. Parallelizing compilers multiprocessors face many hurdles. However, SUIF's robust analysis and memory optimization enabled speedups on three fourths of NAS SPECfp95 benchmark
Compiler infrastructures that support experimental research are crucial to the advancement of high-performance computing. New compiler technology must be implemented and evaluated in context a complete compiler, but developing such an infrastructure requires huge investment time resources. We have spent number years building SUIF into powerful, flexible system, we would now like share results our efforts.SUIF consists small, clearly documented kernel toolkit passes built on top kernel. The...
This paper proposes an efficient technique for context-sensitive pointer analysis that is applicable to real C programs. For efficiency, we summarize the effects of procedures using partial transfer functions . A function (PTF) describes behavior a procedure assuming certain alias relationships hold when it called. We can reuse PTF in many calling contexts as long aliases among inputs are same. Our empirical results demonstrate this successful—a single per usually sufficient obtain...
A number of effective error detection tools have been built in recent years to check if a program conforms certain design rules. An important class rules deals with sequences events asso-ciated set related objects. This paper presents language called PQL (Program Query Language) that allows programmers express such questions easily an application-specific context. query looks like code excerpt corresponding the shortest amount would violate rule. Details tar-get application's precise...
This paper proposes an efficient technique for context-sensitive pointer analysis that is applicable to real C programs. For efficiency, we summarize the effects of procedures using partial transfer functions. A function (PTF) describes behavior a procedure assuming certain alias relationships hold when it called. We can reuse PTF in many calling contexts as long aliases among inputs are same. Our empirical results demonstrate this successful—a single per usually sufficient obtain completely...
This paper shows how to quickly move the state of a running computer across network, including in its disks, memory, CPU registers, and I/O devices. We call this capsule . Capsule is hardware state, so it includes entire operating system as well applications processes.We have chosen x 86 states because computers are common, cheap, run software we use, tools for migration. Unfortunately, capsules can be large, containing hundreds megabytes memory gigabytes disk data. developed techniques...
Article Global optimizations for parallelism and locality on scalable parallel machines Share Authors: Jennifer M. Anderson View Profile , Monica S. Lam Authors Info & Claims PLDI '93: Proceedings of the ACM SIGPLAN 1993 conference Programming language design implementationAugust Pages 112–125https://doi.org/10.1145/155090.155101Online:01 June 1993Publication History 268citation695DownloadsMetricsTotal Citations268Total Downloads695Last 12 Months11Last 6 weeks2 Get Citation AlertsNew Alert...
This paper discusses three techniques useful in relaxing the constraints imposed by control flow on parallelism: dependence analysis, executing multiple flows of simultaneously, and speculative execution. We evaluate these using trace simulations to find limits parallelism for machines that employ different combinations techniques. have major results. First, local regions code limited parallelism, analysis is extracting global from parts a program. Second, superscalar processor fundamentally...
iWarp is a system architecture for high speed signal, image and scientific computing. The heart of an the component: single chip processor that requires only addition memory chips to form complete building block, called cell. Each component contains both powerful computation engine (20 MFLOPS) throughput (320 MBytes/sec), low latency (100-150 ns) communication interfacing with other cells. Because its strong capabilities, versatile block various performance parallel systems. These systems...
The Warp machine is a systolic array computer of linearly connected cells, each which programmable processor capable performing 10 million floating-point operations per second (10 MFLOPS). A typical includes ten thus having peak computation rate 100 MFLOPS. can be extended to include more cells accommodate applications using the increased computational bandwidth. integrated as an attached into Unix host system. Programs for are written in high-level language supported by optimizing compiler....
Jade, a high-level parallel programming language for managing coarse-grained parallelism, is discussed. Jade simplifies by providing sequential-execution and shared-address-space abstractions. It also platform-independent; the same program runs on uniprocessors, multiprocessors, heterogeneous networks of machines. An example that illustrates how programmers express irregular, dynamically determined concurrency implementation exploits this source presented. A digital video imaging...
Component-based software design is a popular and effective approach to designing large systems. While components typically have well-defined interfaces, sequencing information---which calls must come in which order---is often not formally specified.This paper proposes using multiple finite statemachine (FSM) submodels model the interface of class. A submodel includes subset methods that, for example, implement Java interface, or access some particular field. Each state-modifying method...
Article Free Access Share on The design, implementation and evaluation of SMART: a scheduler for multimedia applications Authors: Jason Nieh Computer Systems Laboratory, Stanford University Sun Microsystems Laboratories LaboratoriesView Profile , Monica S. Lam UniversityView Authors Info & Claims SOSP '97: Proceedings the sixteenth ACM symposium Operating systems principlesOctober 1997 Pages 184–197https://doi.org/10.1145/268998.266677Published:01 October 1997Publication History...