NFDI4DS | UHH-SEMS - Publication Details

Minimizing and Managing Cloud Failures

OPENALEX - Publications

Patrícia Takako Endo Guto Leoni Santos Daniel Rosendo Demis Gomes André Moreira and 4 more

Guaranteeing high levels of availability is a huge challenge for cloud providers. The authors look at the causes failures and recommend ways to prevent them minimize their effects when they occur.

10.1109/mc.2017.4041358 article EN Computer 2017-11-01

E2Clab: Exploring the Computing Continuum through Repeatable, Replicable and Reproducible Edge-to-Cloud Experiments

OPENALEX - Publications

Daniel Rosendo Pedro Silva Matthieu Simonin Alexandru Costan Gabriel Antoniu

Distributed digital infrastructures for computation and analytics are now evolving towards an interconnected ecosystem allowing complex applications to be executed from IoT Edge devices the HPC Cloud (aka Computing Continuum, Digital or Transcontinuum). Understanding end-to-end performance in such a continuum is challenging. This breaks down reconciling many, typically contradicting application requirements constraints with low-level infrastructure design choices. One important challenge...

10.1109/cluster49012.2020.00028 preprint EN 2020-09-01

Analyzing the IT subsystem failure impact on availability of cloud services

OPENALEX - Publications

Guto Leoni Santos Patrícia Takako Endo Glauco Estácio Gonçalves Daniel Rosendo Demis Gomes and 3 more

Cloud computing has gained popularity in recent years due to its pay-as-you-go business model, high availability of services, and scalability. Service unavailability does not affect just user experience but is also translated into direct costs for cloud providers companies. Part this SLA breaches, once interruption time greater than those signed the contract generate financial penalties. Thus, have tried identify failure points estimate their services. This paper proposes models assess...

10.1109/iscc.2017.8024612 article EN 2022 IEEE Symposium on Computers and Communications (ISCC) 2017-07-01

Evaluating the cooling subsystem availability on a Cloud data center

OPENALEX - Publications

Demis Gomes Patrícia Takako Endo Glauco Estácio Gonçalves Daniel Rosendo Guto Leoni Santos and 3 more

A data center is divided into three basic subsystems: information technology (IT), power, and cooling. Cooling plays an important role related to availability, a failure in this subsystem may cause interruption of services. Generally, redundant cooling implemented based on replacing the failed component by standby one. However, it also can be rotation computer room air conditioners (CRACs). This paper proposes scalable models that represent behavior evaluate impact failures availability....

10.1109/iscc.2017.8024615 article EN 2022 IEEE Symposium on Computers and Communications (ISCC) 2017-07-01

How to Improve Cloud Services Availability? Investigating the Impact of Power and It Subsystems Failures

OPENALEX - Publications

Daniel Rosendo Guto Leoni Demis Gomes André Moreira Glauco Estácio Gonçalves and 4 more

The cloud data center is a complex system composed of power, cooling, and IT subsystems. power subsystem crucial to feed the equipment. Power disruptions may result in service unavailability. This paper analyzes impact failures on services regarding different architecture configurations based TIA-942 standard such as non-redundant, redundant, concurrently maintainable, fault tolerant. We model both subsystems, IT, through Stochastic Petri Net (SPN). availability results show that tolerant...

10.24251/hicss.2018.193 article EN cc-by-nc-nd Proceedings of the ... Annual Hawaii International Conference on System Sciences/Proceedings of the Annual Hawaii International Conference on System Sciences 2018-01-01

A methodology to assess the availability of next-generation data centers

OPENALEX - Publications

Daniel Rosendo Demis Gomes Guto Leoni Santos Glauco Estácio Gonçalves André Moreira and 6 more

10.1007/s11227-019-02852-3 article EN The Journal of Supercomputing 2019-04-15

A Standard to Rule Them All: Redfish

OPENALEX - Publications

Glauco Estácio Gonçalves Daniel Rosendo Leylane Ferreira Guto Leoni Santos Demis Gomes and 5 more

Large data centers are complex systems that depend on several generations of hardware and software components, ranging from legacy mainframes rack-based appliances to modular blade servers modern rack scale design solutions. To cope with this heterogeneity, the center manager must coordinate a multitude tools, protocols, standards. Currently, managers, standardization bodies, hardware/software manufacturers joining efforts develop promote Redfish as main management standard for centers, even...

10.1109/mcomstd.2019.1800045 article EN IEEE Communications Standards Magazine 2019-06-01

An autonomic and policy-based authorization framework for OpenFlow networks

OPENALEX - Publications

Daniel Rosendo Patrícia Takako Endo Djamel Sadok Judith Kelner

The Network Access Control (NAC) management is a critical task, especially in current networks that are composed of many heterogeneous things (Internet Things) connected to share data, resources and Internet access. Software-Defined Networking (SDN) simplifies the network design operation, offers new opportunities (programmability, flexibility, dy-namicity, standardization) manage network. Despite this, access control remains challenge, once managing security policies involves dealing with...

10.23919/cnsm.2017.8255990 article EN 2017-11-01

Standardization Efforts for Traditional Data Center Infrastructure Management: The Big Picture

OPENALEX - Publications

Leylane Ferreira Patrícia Takako Endo Daniel Rosendo Guto Leoni Santos Demis Gomes and 6 more

Traditional data center infrastructure suffers from a lack of standard and ubiquitous management solutions. Despite the achieved advances, existing tools interoperability are sometimes hardware dependent. Vendors already actively participating in specification design new software interfaces within different forums. Nevertheless, complexity variety components that includes servers, cooling, networking, power hardware, coupled with introduction defined paradigm, led to parallel development...

10.1109/emr.2020.2969864 article EN IEEE Engineering Management Review 2020-01-27

Availability analysis of design configurations to compose virtual performance‐optimized data center systems in next‐generation cloud data centers

OPENALEX - Publications

Daniel Rosendo Demis Gomes Guto Leoni Santos Leylane Silva André Moreira and 6 more

Summary Next‐generation cloud data centers are based on software‐defined center infrastructures that promote flexibility, automation, optimization, and scalability. The Redfish standard the Intel Rack Scale Design technology enable infrastructure disaggregate bare‐metal compute, storage, networking resources into virtual pools to dynamically compose create performance‐optimized (vPODs) tailored workload‐specific demands. This article proposes four chassis design configurations Distributed...

10.1002/spe.2833 article EN Software Practice and Experience 2020-04-21

A Network Access Control solution combining OrBAC and SDN

OPENALEX - Publications

Rafael Roque Aschoff Daniel Rosendo Marcos Machado Alexandre Santos Djamel Sadok

10.23919/inm.2017.7987316 article EN 2017-05-01

Modeling and analyzing power system failures on cloud services

OPENALEX - Publications

Daniel Rosendo Patrícia Takako Endo Guto Leoni Santos Demis Gomes Glauco Estácio Gonçalves and 4 more

Many enterprises rely on cloud infrastructure to host their critical applications (such as trading, banking transaction, airline reservation system, and credit card authorization). The unavailability of these may lead severe consequences that go beyond the financial losses, reaching provider reputation too. However, maintain high availability in a data center is difficult task due its complexity. power subsystem crucial for entire operation because it supplies all other subsystems, including...

10.23919/cnsm.2017.8256034 article EN 2017-11-01

Optimizing the Cloud Data Center Availability Empowered by Surrogate Models

OPENALEX - Publications

Glauco Estácio Gonçalves Demis Gomes Guto Leoni Santos Daniel Rosendo André Moreira and 3 more

Making data centers highly available remains a challenge that must be considered since the design phase. The problem is selecting right strategies and components for achieving this goal given limited investment. Furthermore, center designers currently lack reliable specialized tools to accomplish task. In paper, we disclose formal method chooses optimize availability of while considering budget as constraint. For that, make use stochastic models represent cloud infrastructure based on...

10.24251/hicss.2020.193 article EN cc-by-nc-nd Proceedings of the ... Annual Hawaii International Conference on System Sciences/Proceedings of the Annual Hawaii International Conference on System Sciences 2020-01-01

Measuring the impact of data center failures on a cloud‐based emergency medical call system

OPENALEX - Publications

Demis Gomes Guto Leoni Santos Daniel Rosendo Glauco Estácio Gonçalves André Moreira and 3 more

Summary Emergency call services are expected to be highly available in order minimize the loss of urgent calls and, as a consequence, life due lack timely medical response. This service availability depends heavily on cloud data center which it is hosted. However, information alone cannot provide sufficient understanding how failures impact and users' perception. In this paper, we evaluate an emergency system, considering service‐level metrics such number affected per failure time takes...

10.1002/cpe.5156 article EN Concurrency and Computation Practice and Experience 2019-02-13

DCAV: A software system to evaluate next‐generation cloud data center availability through a friendly graphical interface

OPENALEX - Publications

André Moreira Daniel Rosendo Demis Gomes Guto Leoni Santos Leylane Silva and 7 more

Summary To assess the availability of different data center configurations, understand main root causes failures and represent its low‐level details, such as subsystem's behavior their interconnections, we have proposed, in previous works, a set stochastic models to architectures (considering three subsystems: power, cooling, IT) based on TIA‐942 standard. In this paper, propose Data Center Availability (DCAV), web‐based software system allow operators evaluate infrastructure through...

10.1002/spe.2743 article EN Software Practice and Experience 2019-09-11

Reproducible Performance Optimization of Complex Applications on the Edge-to-Cloud Continuum

OPENALEX - Publications

Daniel Rosendo Alexandru Costan Gabriel Antoniu Matthieu Simonin Jean‐Christophe Lombardo and 2 more

In more and application areas, we are witnessing the emergence of complex workflows that combine computing, analytics learning. They often require a hybrid execution infrastructure with IoT devices interconnected to cloud/HPC systems (aka Computing Continuum). Such subject constraints requirements in terms performance, resource usage, energy consumption financial costs. This makes it challenging optimize their configuration deployment. We propose methodology support optimization real-life...

10.1109/cluster48925.2021.00043 preprint EN 2021-09-01

Prototyping a high availability PaaS: Performance analysis and lessons learned

OPENALEX - Publications

Marcos Machado Daniel Rosendo Demis Gomes André Moreira Moises Bezerra and 3 more

10.23919/inm.2017.7987367 article EN 2017-05-01

Maximizing the Availability of Composable Systems of Next-Generation Data Centers

OPENALEX - Publications

Leylane Ferreira Patrícia Takako Endo Glauco Estácio Gonçalves Daniel Rosendo Guto Leoni Santos and 5 more

The next-generation data center introduces the refactoring of traditional in order to create pools disaggregated resource units, such as processors, memory, storage, network, power, and cooling sources, named composable system (CSs) with purpose offering flexibility, automation, optimization, scalability. In this paper, we solve an optimization problem allocate CSs considering next- generation centers. main goal is maximize CS availability for application owner, having its minimum...

10.1109/smc.2019.8914382 article EN 2022 IEEE International Conference on Systems, Man, and Cybernetics (SMC) 2019-10-01

A High-level Authorization Framework for Software-Defined Networks

OPENALEX - Publications

Daniel Rosendo Judith Kelner Patrícia Takako Endo

Enterprise network managers need to control the access their resources and protect them from malicious users. Current Network Access Control (NAC) solutions rely on approaches, such as ﬁrewalls, VLAN, ACL, LDAP that are inﬂexible require per-device vendor-speciﬁc conﬁgurations, being error-prone. Besides, misconﬁgurations may result in vulnerabilities could compromise overall security. Managing security policies involve dealing with many rules, conﬂicting policies, rule priorities, right...

10.5753/sbrc_estendido.2018.14177 article EN 2018-05-06

Enabling Reproducible Analysis of Complex Workflows on the Edge-to-Cloud Continuum

OPENALEX - Publications

Daniel Rosendo Alexandru Costan Gabriel Antoniu Patrick Valduriez

Distributed digital infrastructures for computation and analytics are now evolving towards an interconnected ecosystem allowing complex applications to be executed from IoT Edge devices the HPC Cloud (aka Computing Continuum, Digital or Transcontinuum). Understanding end-to-end performance in such a continuum is challenging. This breaks down reconciling many, typically contradicting application requirements constraints with low-level infrastructure design choices. One important challenge...

10.48550/arxiv.2109.01379 preprint EN other-oa arXiv (Cornell University) 2021-01-01

Reproducible Performance Optimization of Complex Applications on the Edge-to-Cloud Continuum

OPENALEX - Publications

Daniel Rosendo Alexandru Costan Gabriel Antoniu Matthieu Simonin Jean‐Christophe Lombardo and 2 more

In more and application areas, we are witnessing the emergence of complex workflows that combine computing, analytics learning. They often require a hybrid execution infrastructure with IoT devices interconnected to cloud/HPC systems (aka Computing Continuum). Such subject constraints requirements in terms performance, resource usage, energy consumption financial costs. This makes it challenging optimize their configuration deployment. We propose methodology support optimization real-life...

10.48550/arxiv.2108.04033 preprint EN other-oa arXiv (Cornell University) 2021-01-01