- Distributed and Parallel Computing Systems
- Scientific Computing and Data Management
- Advanced Data Storage Technologies
- Cloud Computing and Resource Management
- Parallel Computing and Optimization Techniques
- Distributed systems and fault tolerance
- Research Data Management Practices
- Peer-to-Peer Network Technologies
- Software System Performance and Reliability
- Genomics and Phylogenetic Studies
- Caching and Content Delivery
- Mobile Crowdsensing and Crowdsourcing
- Privacy-Preserving Technologies in Data
- Embedded Systems Design Techniques
- Data Quality and Management
- Opportunistic and Delay-Tolerant Networks
- Open Source Software Innovations
- Energy Efficient Wireless Sensor Networks
- Stochastic Gradient Optimization Techniques
- Network Security and Intrusion Detection
- Advanced Neural Network Applications
- Machine Learning and Data Classification
- Software Engineering Research
- Green IT and Sustainability
- Digital and Cyber Forensics
University of Notre Dame
2015-2024
Notre Dame of Dadiangas University
2017
Notre Dame University
2006-2014
University of Wisconsin–Madison
2001-2006
GANIL
2003
Laboratoire de Recherche en Informatique
2003
Abstract Since 1984, the Condor project has enabled ordinary users to do extraordinary computing. Today, continues explore social and technical problems of cooperative computing on scales ranging from desktop world‐wide computational Grid. In this paper, we provide history philosophy describe how it interacted with other projects evolved along field distributed We outline core components system technology must correspond structures. Throughout, reflect lessons experience chart course...
Large scale hardware-supported multithreading, an attractive means of increasing computational power, benefits significantly from low per-thread costs. Hardware support for lightweight threads is a developing area research. Each architecture with such provides unique interface, hindering development them and comparisons between them. A portable abstraction that basic thread control synchronization primitives needed. Such would assist in exploring both the architectural needs large threading...
Eucalyptus, Open Nebula and Nimbus are three major open-source cloud-computing software platforms. The overall function of these systems is to manage the provisioning virtual machines for a cloud providing infrastructure-as-a-service. These various projects provide an important alternative those who do not wish use commercially provided cloud. We comparison analysis each systems. begin with short summary comparing current raw feature set projects. After that, we deepen our by describing how...
In recent years, there has been a renewed interest in languages and systems for large scale distributed computing. Unfortunately, most available to the end user use custom description language tightly coupled specific runtime implementation, making it difficult transfer applications between systems. To address this problem we introduce Makeflow, simple system expressing running data-intensive workflow across multiple execution engines without requiring changes application or description....
Although modern parallel and distributed computing systems provide easy access to large amounts of power, it is not always for non-expert users harness these effectively. A workload composed in what seems be the obvious way by a naive user may accidentally abuse shared resources achieve very poor performance. To address this problem, we propose that production should end with high-level abstractions allow expression efficient execution data intensive workloads. We present one example an...
Workflows are a widely used abstraction for representing large scientific applications and executing them on distributed systems such as clusters, clouds, grids. However, workflow have been largely silent the question of precisely what environment each task in is expected to run in. As result, may correctly which it was designed, but when moved another machine, highly likely fail due differences operating system, installed applications, available data, so forth. Lightweight container...
Access to remote data is one of the principal challenges Grid computing. While performing I/O, applications must be prepared for server crashes, performance variations and exhausted resources. To achieve high throughput in such a hostile environment, need resilient service that moves while hiding errors latencies. We illustrate this idea with Kangaroo, simple movement system makes opportunistic use disks networks keep running. demonstrate Kangaroo can better end-to-end than traditional...
Distributed computing continues to be an alphabet-soup of services and protocols for managing computation storage. To live in this environment, applications require middleware that can transparently adapt standard interfaces new distributed systems; such is known as interposition agent. In paper, we present several lessons learned about agents via a progressive study design possibilities. Although performance important concern, pay special attention less tangible issues portability,...
Today, campus grids provide users with easy access to thousands of CPUs. However, it is not always for nonexpert harness these systems effectively. A large workload composed in what seems be the obvious way by a naive user may accidentally abuse shared resources and achieve very poor performance. To address this problem, we argue that should end high-level abstractions allow expression efficient execution data-intensive workloads. We present one example an abstraction—All-Pairs—that fits...
With the rapid growth of online social media and ubiquitous Internet connectivity, sensing has emerged as a new crowdsourcing application paradigm collecting observations (often called claims) about physical environment from humans or devices on their behalf. A fundamental problem in applications lies effectively ascertaining correctness claims reliability data sources without knowing either them priori, which is referred to truth discovery. While significant progress been made solve...
Data-intensive applications involving the analysis of large datasets often require amounts compute and storage resources, for which data locality can be crucial to high throughput performance. We propose a diffusion approach that acquires resources dynamically, replicates in response demand, schedules computations close data. As demand increases, more are acquired, thus allowing faster subsequent requests refer same data; when drops, released. This provide benefits dedicated hardware without...
Task characteristics estimations such as runtime, disk space, and memory consumption, are commonly used by scheduling algorithms resource provisioning techniques to provide successful efficient workflow executions. These methods assume that accurate available, but in production systems it is hard compute estimates with good accuracy. In this work, we first profile three real scientific workflows collecting fine-grained information process I/O, usage, CPU utilization. We then propose a method...
The landscape of workflow systems for scientific applications is notoriously convoluted with hundreds seemingly equivalent systems, many isolated research claims, and a steep learning curve. To address some these challenges lay the groundwork transforming workflows development, WorkflowsRI ExaWorks projects partnered to bring international community together. This paper reports on discussions findings from two virtual "Workflows Community Summits" (January April, 2021). overarching goals...
As the size of available datasets has grown from Megabytes to Gigabytes and now into Terabytes, machine learning algorithms computing infrastructures have continuously evolved in an effort keep pace. But at large scales, mining for useful patterns still presents challenges terms data management as well computation. These issues can be addressed by dividing both computation build ensembles classifiers a distributed fashion, but trade-offs cost, performance, accuracy must considered when...
Over the past decade, high performance applications have embraced parallel programming and computing models. While offers advantages such as good utilization of dedicated hardware resources, it also has several drawbacks poor fault-tolerance, scalability, ability to harness available resources during run-time. The advent cloud presents a viable promising alternative because its in offering distributed model. In this work, we establish directives that serve guidelines for design...
Container management frameworks, such as Docker, package diverse applications and their complex dependencies in self-contained images, which facilitates application deployment, distribution, sharing. Currently, Docker employs a shared-nothing storage architecture, i.e. every Docker-enabled host requires its own copy of an image on local to create run containers. This greatly inflates utilization, network load, job completion times the cluster. In this paper, we investigate option storing...
A limitation of traditional molecular dynamics (MD) is that reaction rates are difficult to compute. This due the rarity observing transitions between metastable states since high energy barriers trap system in these states. Recently weighted ensemble (WE) family methods have emerged which can flexibly and efficiently sample conformational space without being trapped allow calculation unbiased rates. However, while WE correctly efficiently, a scalable implementation applicable interesting...
Workflows are a widely used abstraction for describing large scientific applications and running them on distributed systems. However, most workflow systems have been silent the question of what execution environment each task in is expected to run in. Consequently, may successfully it was created, but fail other platforms due differences environment. Container-based schedulers recently arisen as potential solution this problem, adopting containers distribute computing resources deliver...
Scientific workflows have been used almost universally across scientific domains, and underpinned some of the most significant discoveries past several decades. Many these high computational, storage, and/or communication demands, thus must execute on a wide range large-scale platforms, from large clouds to upcoming exascale high-performance computing (HPC) platforms. These executions be managed using software infrastructure. Due popularity workflows, workflow management systems (WMSs)...