- Cloud Computing and Resource Management
- IoT and Edge/Fog Computing
- Distributed and Parallel Computing Systems
- Software System Performance and Reliability
- Data Stream Mining Techniques
- Scientific Computing and Data Management
- Scheduling and Optimization Algorithms
- Distributed systems and fault tolerance
- Advanced Data Storage Technologies
- Advanced Database Systems and Queries
- Parallel Computing and Optimization Techniques
- Cloud Data Security Solutions
Carnegie Mellon University
1989-2020
Many shared computing clusters allow users to utilize excess idle resources at lower cost or priority, with the proviso that some all may be taken away any time. But, exploiting such dynamic resource availability and often fluctuating markets for them requires agile elasticity effective acquisition strategies. Proteus aggressively exploits transient revocable do machine learning (ML) cheaper and/or faster. Its parameter server framework, AgileML, efficiently adapts bulk additions revocations...
Stratus is a new cluster scheduler specialized for orchestrating batch job execution on virtual clusters, dynamically allocated collections of machine instances public IaaS platforms. Unlike schedulers conventional focuses primarily dollar cost considerations, since clouds provide effectively unlimited, highly heterogeneous resources demand. But, are charged-for while allocated, aggressively packs tasks onto machines, guided by runtime estimates, trying to make be either mostly full (highly...
The authors examine a special software development environment called the Parallel Programming and Instrumentation Environment (PIE). PIE is designed to develop performance-efficient parallel sequential computations. Following an explanation of PIE's general theory features, visualization tools are used isolate repair parallelism problem eight-process computation. Two more difficult examples using discussed. Some issues involved in correctly presenting visual information, such as features...
Datacenters are under-utilized, primarily due to unused resources on over-provisioned nodes of latency-critical jobs. Such idle can be used run batch data analytic jobs increase datacenter utilization, but these transient must evicted whenever require them again. Resource evictions often lead cascading recomputations, which is usually handled by checkpointing intermediate results stable storages eviction-free reserved resources. However, has major shortcomings in its substantial overhead...
Modern cloud has turned data services into easily accessible commodities. With just a few clicks, users are now able to access catalog of processing systems for wide range tasks. However, the brings in both complexity and opportunity. While can quickly start an application by using various services, it be difficult configure optimize these gain most value from them. For providers, managing every aspect ever-increasing set while meeting customer SLAs minimizing operational cost is becoming...
Resource Managers like YARN and Mesos have emerged as a critical layer in the cloud computing system stack, but developer abstractions for leasing cluster resources instantiating application logic are very low level. This flexibility comes at high cost terms of effort, each must repeatedly tackle same challenges (e.g., fault tolerance, task scheduling coordination) reimplement common mechanisms caching, bulk-data transfers). article presents REEF, development framework that provides control...
Shared multi-tenant infrastructures have enabled companies to consolidate workloads and data, increasing data-sharing cross-organizational re-use of job outputs. This same resource- work-sharing has also increased the risk missed deadlines diverging priorities as recurring jobs workflows developed by different teams evolve independently. To prevent incidental business disruptions, identifying managing dependencies with clarity becomes increasingly important. Owl is a cluster log analysis...