- Parallel Computing and Optimization Techniques
- Advanced Data Storage Technologies
- Distributed systems and fault tolerance
- Distributed and Parallel Computing Systems
- Cloud Computing and Resource Management
- Software System Performance and Reliability
- Interconnection Networks and Systems
- Radiation Effects in Electronics
- Embedded Systems Design Techniques
- Peer-to-Peer Network Technologies
- Scientific Computing and Data Management
- Security and Verification in Computing
- Network Traffic and Congestion Control
- Real-Time Systems Scheduling
- Advanced Malware Detection Techniques
- Advanced Software Engineering Methodologies
- Advanced Memory and Neural Computing
- Service-Oriented Architecture and Web Services
- Brain Tumor Detection and Classification
- Simulation Techniques and Applications
- Network Security and Intrusion Detection
- Caching and Content Delivery
- Cloud Data Security Solutions
- FinTech, Crowdfunding, Digital Finance
- Semiconductor materials and devices
University of New Mexico
2014-2024
Sandia National Laboratories
2011-2021
Lawrence Berkeley National Laboratory
2020-2021
Red Hat (United States)
2021
University of Castilla-La Mancha
2020
Eindhoven University of Technology
2020
Polytechnic University of Turin
2020
The Ohio State University
2020
University of Pittsburgh
2020
IBM (United States)
2011
As high-end computing machines continue to grow in size, issues such as fault tolerance and reliability limit application scalability. Current techniques ensure progress across faults, like checkpoint-restart, are increasingly problematic at these scales due excessive overheads predicted more than double an application's time solution. Replicated techniques, particularly state machine replication, long used distributed mission critical systems, have been suggested alternative...
Operating system noise has been shown to be a key limiter of application scalability in high-end systems. While several studies have attempted quantify the sources and effects interference using user-level mechanisms, there are few published on effect different kinds kernel-generated performance at scale. In this paper, we examine sensitivity real-world, large-scale applications range OS patterns kernel-based injection mechanism implemented Catamount lightweight kernel. Our results...
Palacios is a new open-source VMM under development at Northwestern University and the of New Mexico that enables applications executing in virtualized environment to achieve scalable high performance on large machines. functions as modularized extension Kitten, operating system being developed Sandia National Laboratories support large-scale supercomputing applications. Together, Kitten provide thin layer over hardware full-featured environments alongside Kitten's lightweight native...
Operating system noise has been shown to be a key limiter of application scalability in high-end systems. While several studies have attempted quantify the sources and effects interference using user-level mechanisms, there are few published on effect different kinds kernel-generated performance at scale. In this paper, we examine sensitivity real-world, large-scale applications range OS patterns kernel-based injection mechanism implemented Catamount lightweight kernel. Our results...
Traffic analysis may threaten user privacy, even if the traffic is encrypted. In this paper, we use IEEE 802.11 wireless local area networks (WLANs) as an example to show that inferring users' online activities accurately by without administrator's privilege possible during very short periods (e.g., a few seconds). The investigated include web browsing, chatting, gaming, downloading, uploading and video watching, etc. We implement hierarchical classification system based on machine learning...
Virtualization has the potential to dramatically increase usability and reliability of high performance computing (HPC) systems. However, this will remain unrealized unless overheads can be minimized. This is particularly challenging on large scale machines that run carefully crafted HPC OSes supporting tightly-coupled, parallel applications. In paper, we show how careful use hardware VMM features enables virtualization a large-scale system, specifically Cray XT4 machine, with < = 5%...
This paper describes several techniques designed to improve protocol latency, and reports on their effectiveness when measured a modern RISC machine employing the DEC Alpha processor. We found that memory system---which has long been known dominate network throughput---is also key factor in latency. As result, improving instruction cache can greatly reduce processing overheads. An important metric this context is cycles per instructions (mCPI), which average number of an stalls waiting for...
Infiniband is becoming an important interconnect technology in high performance computing. Recent efforts large scale deployments are raising scalability questions the HPC community. Open MPI, a new open source implementation of MPI standard targeted for production computing, provides several mechanisms to enhance scalability. Initial comparisons with MVAPICH, most widely used implementation, show similar but much better characteristics. Specifically, small message latency improved by up 10%...
The increasing size and complexity of high performance computing (HPC) systems have led to major concerns over fault frequencies the mechanisms necessary tolerate these faults. Previous studies shown that state-of-the-field checkpoint/restart will not scale sufficiently for future generation systems. Therefore, optimizations reduce checkpoint overheads are keep effective. In this work, we demonstrate data compression is a feasible mechanism reducing commit latencies storage overheads....
<para xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink"> The ability to configure transport protocols from collections of smaller software modules allows the characteristics protocol be customized for a specific application or network technology. This paper describes configurable system called CTP in which microprotocols implementing individual attributes can combined into composite that realizes desired overall functionality. In addition describing...
The authors describe a Java-based platform for liquid software, called Joust, that is specifically designed to support low-level, communication-oriented systems and avoid the limitations of general-purpose OSs. contrast requirements software with those computation-oriented identify current platforms, outline benefits Joust. They also offer an overview Scout (the underlying OS upon which Joust built), its runtime system, just-in-time (JIT) compiler.
Instruction-level simulation is necessary to evaluate new architectures. However, single-node cannot predict the behavior of a parallel application on supercomputer. We present scalable simulator that couples cycle-accurate node with supercomputer network model. Our executes individual instances IBM's Mambo PowerPC hundreds cores. integrated NIC emulator into and model instead fully simulating it. This decouples simulators makes our design scalable.
Reaching Exascale will require leveraging massive parallelism while potentially asynchronous communication to help achieve scalability at such large levels of concurrency. MPI is a good candidate for providing the mechanisms support scales. Two existing are particularly relevant Exascale: multi-threading, concurrency, and Remote Memory Access (RMA), communication. Unfortunately, multi-threaded RMA code has not been extensively studied. Part reason this that no public benchmarks or proxy...
One-sided communication is crucial to enabling concurrency. As core counts have increased, particularly with many-core architectures, one-sided (RMA) has been proposed address the ever increasing contention at network interface. The difficulty in using MPI that performance of implementations RMA multiple concurrent threads not well understood. Past studies done combination multi-threading (RMA-MT) but they performed on older lacking RMA-MT optimizations. In addition prior work only smaller...
Infiniband is becoming an important interconnect technology in high performance computing. Efforts large scale deployments are raising scalability questions the HPC community. Open MPI, a new open source implementation of MPI standard targeted for production computing, provides several mechanisms to enhance scalability. Initial comparisons with MVAPICH, most widely used implementation, show similar but much better characteristics. Specifically, small message latency improved by up 10%...
Current fault tolerance protocols are not sufficiently scalable for the exascale era. The most-widely used method, coordinated checkpointing, places enormous demands on I/O subsystem and imposes frequent synchronizations. Uncoordinated use message logging which introduces rate limitations or undesired memory storage requirements to hold payload event logs. In this paper we propose a combination of several techniques, namely optimistic logging, protocol that glues them together. This...