NFDI4DS | UHH-SEMS - Publication Details

Eli Cortez

ORCID: 0000-0003-4010-5854

Publications

Citations

Views

---

Saved

---

About

Contact & Profiles

A5046842171

Research Areas

Web Data Mining and Analysis
Data Quality and Management
Cloud Computing and Resource Management
Advanced Database Systems and Queries
Topic Modeling
Semantic Web and Ontologies
IoT and Edge/Fog Computing
Caching and Content Delivery
Natural Language Processing Techniques
Text and Document Classification Technologies
Spam and Phishing Detection
Big Data and Business Intelligence
Multimedia Communication and Technology
Data Mining Algorithms and Applications
Blockchain Technology Applications and Security
Data Management and Algorithms
Software Engineering Research
Advanced Text Analysis Techniques
Distributed and Parallel Computing Systems
Service-Oriented Architecture and Web Services
Face and Expression Recognition
Scientific Computing and Data Management
Mobile Crowdsensing and Crowdsourcing
Advanced Neural Network Applications
Retinal Imaging and Analysis

Microsoft (United States)
2015-2025

Seattle University
2025

University of Washington
2025

University of California, Los Angeles
2025

University of Illinois Urbana-Champaign
2025

Menlo School
2025

Google (United States)
2025

Microsoft Research (United Kingdom)
2017-2023

Universidade Federal do Amazonas
2007-2013

Resource Central

OPENALEX - Publications

Eli Cortez Anand Bonde Alexandre Muzio Mark Russinovich Marcus Fontoura and 1 more

Cloud research to date has lacked data on the characteristics of production virtual machine (VM) workloads large cloud providers. A thorough understanding these can inform providers' resource management systems, e.g. VM scheduler, power manager, server health manager. In this paper, we first introduce an extensive characterization Microsoft Azure's workload, including distributions VMs' lifetime, deployment size, and consumption. We then show that certain behaviors are fairly consistent over...

10.1145/3132747.3132772 article EN 2017-10-12

Toward ML-centric cloud platforms

OPENALEX - Publications

Ricardo Bianchini Marcus Fontoura Eli Cortez Anand Bonde Alexandre Muzio and 5 more

Exploring the opportunities to use ML, possible designs, and our experience with Microsoft Azure.

10.1145/3364684 article EN Communications of the ACM 2020-01-22

FLUX-CIM

OPENALEX - Publications

Eli Cortez Altigran S. da Silva Marcos André Gonçalves Filipe Mesquita Edleno Silva de Moura

In this paper we propose a knowledge-base approach to help extracting the correct components of citations in any given format. Differently from related approaches that rely on manually built knowledge-bases (KBs) for recognizing citation, our case, such KB is automatically constructed an existing set sample metadata records area (e.g., computer science or health sciences). Our does not patterns encoding specific delimitators particular citation style. It also unsupervised, sense it learning...

10.1145/1255175.1255219 article EN 2007-06-18

A probabilistic approach for automatically filling form-based web interfaces

OPENALEX - Publications

Guilherme A. Toda Eli Cortez Altigran S. da Silva Edleno Silva de Moura

In this paper we present a proposal for the implementation and evaluation of novel method automatically using data-rich text filling form-based input interfaces. Our solution takes as input, extracts implicit data values from it fills appropriate fields. For task, rely on knowledge obtained previous submissions each field, which are freely usage approach, called iForm , exploits features related to content style these values, combined through Bayesian framework. Through extensive...

10.14778/1929861.1929862 article EN Proceedings of the VLDB Endowment 2010-12-01

Coach: Exploiting Temporal Patterns for All-Resource Oversubscription in Cloud Platforms

OPENALEX - Publications

Benjamin Reidys Pantea Zardoshti Íñigo Goiri Celine Irvene Daniel S. Berger and 14 more

Cloud platforms remain underutilized despite multiple proposals to improve their utilization (e.g., disaggregation, harvesting, and oversubscription). Our characterization of the resource virtual machines (VMs) in Azure reveals that, while CPU is main resource, we need provide a solution manage all resources holistically. We also observe that many VMs exhibit complementary temporal patterns, which can be leveraged oversubscription resources. Based on these insights, propose Coach: system...

10.1145/3669940.3707226 preprint EN 2025-02-03

ONDUX

OPENALEX - Publications

Eli Cortez Altigran S. da Silva Marcos André Gonçalves Edleno Silva de Moura

Information extraction by text segmentation (IETS) applies to cases in which data values of interest are organized implicit semi-structured records available textual sources (e.g. postal addresses, bibliographic information, ads). It is an important practical problem that has been frequently addressed the recent literature. In this paper we introduce ONDUX (On Demand Unsupervised Extraction), a new unsupervised probabilistic approach for IETS. As other IETS approaches, relies on information...

10.1145/1807167.1807254 article EN 2010-06-06

Joint unsupervised structure discovery and information extraction

OPENALEX - Publications

Eli Cortez Daniel Oliveira Altigran S. da Silva Edleno Silva de Moura Alberto H. F. Laender

In this paper we present JUDIE (Joint Unsupervised Structure Discovery and Information Extraction), a new method for automatically extracting semi-structured data records in the form of continuous text (e.g., bibliographic citations, postal addresses, classified ads, etc.) having no explicit delimiters between them. While state-of-the-art Extraction methods structure is manually supplied by user as training step, capable detecting each individual record being extracted without any...

10.1145/1989323.1989380 article EN 2011-06-12

Snape: Reliable and Low-Cost Computing with Mixture of Spot and On-Demand VMs

OPENALEX - Publications

Fangkai Yang Lu Wang Zhenyu Xu Jue Zhang Liqun Li and 13 more

Cloud providers often have resources that are not being fully utilized, and they may offer them at a lower cost to make up for the reduced availability of these resources. However, customers be hesitant use such offerings (such as spot VMs) making trade-offs between resource is always straightforward. In this work, we propose Snape (Spot On-demand Perfect Mixture), an intelligent framework optimize by dynamically mixing on-demand VMs with VMs. Through detailed characterization based on real...

10.1145/3582016.3582028 article EN 2023-03-20

Workload Intelligence: Punching Holes Through the Cloud Abstraction

OPENALEX - Publications

Lexiang Huang Anjaly Parayil Jue Zhang Xiaoting Qin Chetan Bansal and 11 more

Today, cloud workloads are essentially opaque to the platform. Typically, only information platform receives is virtual machine (VM) type and possibly a decoration (e.g., VM evictable). Similarly, receive little no from platform; generally, might telemetry their VMs or exceptional signals shortly before evicted). The narrow interface between platforms has several drawbacks: (1) surge in types decorations public complicates customer selection; (2) essential workload characteristics low...

10.48550/arxiv.2404.19143 preprint EN arXiv (Cornell University) 2024-04-29

A flexible approach for extracting metadata from bibliographic citations

OPENALEX - Publications

Eli Cortez Altigran S. da Silva Marcos André Gonçalves Filipe Mesquita Edleno Silva de Moura

Abstract In this article we present FLUX‐CiM, a novel method for extracting components (e.g., author names, titles, venues, page numbers) from bibliographic citations. Our does not rely on patterns encoding specific delimiters used in particular citation style. This feature yields high degree of automation and flexibility, allows FLUX‐CiM to extract citations any given format. Differently previous methods that are based models learned user‐driven training, our relies knowledge base...

10.1002/asi.21049 article EN Journal of the American Society for Information Science and Technology 2009-02-25

Building a research social network from an individual perspective

OPENALEX - Publications

Alberto H. F. Laender Mirella M. Moro Marcos André Gonçalves Clodoveu A. Davis Altigran S. da Silva and 9 more

In this poster paper, we present an overview of CienciaBrasil, a research social network involving researchers within the Brazilian INCT program. We describe its architecture and solutions adopted for data collection, extraction, deduplication, materializing visualizing network.

10.1145/1998076.1998168 article EN 2011-06-13

Annotating database schemas to help enterprise search

OPENALEX - Publications

Eli Cortez Philip A. Bernstein Yeye He Lev Novik

In large enterprises, data discovery is a common problem faced by users who need to find relevant information in relational databases. this scenario, schema annotation useful tool enrich database with descriptive keywords. paper, we demonstrate Barcelos, system that automatically annotates corporate Unlike existing approaches use Web oriented knowledge bases, Barcelos mines enterprise spreadsheets candidate annotations. Our experimental evaluation shows produces high quality annotations; the...

10.14778/2824032.2824105 article EN Proceedings of the VLDB Endowment 2015-08-01

Automatically filling form-based web interfaces with free text inputs

OPENALEX - Publications

Guilherme A. Toda Eli Cortez Filipe Mesquita Altigran S. da Silva Edleno Silva de Moura and 1 more

On the web of today most prevalent solution for users to interact with data-intensive applications is use form-based interfaces composed by several data input fields, such as text boxes, radio buttons, pull-down lists, check etc. Although these are popular and effective, in many cases, free preferred over ones. In this paper we discuss proposal implementation a novel IR-based method using rich interfaces. Our takes input, extracts implicitly values from it fills appropriate fields them. For...

10.1145/1526709.1526908 article EN 2009-04-20

Lightweight methods for large-scale product categorization

OPENALEX - Publications

Eli Cortez Mauro Rojas Herrera Altigran S. da Silva Edleno Silva de Moura Marden Neubert

In this article, we present a study about classification methods for large-scale categorization of product offers on e-shopping web sites. We the performance previously proposed approaches and deployed probabilistic approach to model problem. also studied an alternative way modeling information description investigated usage price store as features adopted in process. Our experiments used two collections over million categorized by human editors taxonomies hundreds categories from real site....

10.1002/asi.21586 article EN Journal of the American Society for Information Science and Technology 2011-06-21

Unsupervised strategies for information extraction by text segmentation

OPENALEX - Publications

Eli Cortez Altigran S. da Silva

Information extraction by text segmentation (IETS) applies to cases in which data values of interest are organized implicit semi-structured records available textual sources (e.g. postal addresses, bibliographic information, ads). It is an important practical problem that has been frequently addressed the recent literature. We report here partial results from a PhD thesis work we introduce ONDUX (On Demand Unsupervised Extraction), new unsupervised probabilistic approach for IETS. As other...

10.1145/1811136.1811145 article EN 2010-06-11

Spot Virtual Machine Eviction Prediction in Microsoft Cloud

OPENALEX - Publications

Fangkai Yang Bowen Pang Jue Zhang Bo Qiao Lu Wang and 12 more

Azure Spot Virtual Machines (Spot VMs) utilize unused compute capacity at significant cost savings. They can be evicted when needs the back, therefore suitable for workloads that tolerate interruptions. A good prediction of VM evictions is beneficial to optimize utilization and offers users information better plan deployments by selecting clusters reduce potential evictions. The current in-service cluster-level method ignores node heterogeneity aggregating information. In this paper, we...

10.1145/3487553.3524229 article EN Companion Proceedings of the The Web Conference 2018 2022-04-25

FleDEx

OPENALEX - Publications

Filipe Mesquita Denilson Barbosa Eli Cortez Altigran S. da Silva

We propose a lightweight framework for data exchange that is suitable non-expert and casual users sharing on the Web or through peer-to-peer systems. Unlike previous work, we consider simplistic model schema formalism are describing typical online data, algorithms mapping such schemas as well translating corresponding instances. Our solution requires minimal overhead setup costs compared to existing systems, making it very attractive in setting. report experimental results indicating our...

10.1145/1316902.1316907 article EN 2007-11-09

How Different are the Cloud Workloads? Characterizing Large-Scale Private and Public Cloud Workloads

OPENALEX - Publications

Xiaoting Qin Minghua Ma Yuheng Zhao Jue Zhang Chao Du and 9 more

With the rapid development of cloud systems, an increasing number service workloads are deployed in private and/or public cloud. Although large providers such as Azure and Google have published workload traces past, prior work has not focused on analyzing characterizing differences between detail. Based our experience working with Azure, one most widely used platforms world, we find that characteristics different workloads. Specifically, compared workloads, tend to be more homogeneous both...

10.1109/dsn58367.2023.00055 article EN 2023-06-01

Risk-aware Adaptive Virtual CPU Oversubscription in Microsoft Cloud via Prototypical Human-in-the-loop Imitation Learning

OPENALEX - Publications

Lu Wang Mayukh Das Fangkai Yang Junjie Sheng Bo Qiao and 9 more

Oversubscription is a prevalent practice in cloud services where the system offers more virtual resources, such as cores machines, to users or applications than its available physical capacity for reducing revenue loss due unused/redundant capacity. While oversubscription can potentially lead significant enhancement efficient resource utilization, caveat that it comes with risks of overloading and introducing jitter at level nodes if all co-located machines have high utilization. Thus...

10.48550/arxiv.2401.07033 preprint EN other-oa arXiv (Cornell University) 2024-01-01

ICE: Managing cold state for big data applications

OPENALEX - Publications

Badrish Chandramouli Justin J. Levandoski Eli Cortez

The use of big data in a business revolves around monitor-mine-manage (M3) loop: is monitored real-time, while mined insights are used to manage the and derive value. While mining has traditionally been performed offline, recent years have seen an increasing need perform all phases M3 real-time. A stream processing engine (SPE) enables such seamless loop for applications as targeted advertising, recommender systems, risk analysis, call-center analytics. However, these require SPE maintain...

10.1109/icde.2016.7498262 article EN 2016-05-01

Coming Soon ...