Eli Cortez

ORCID: 0000-0003-4010-5854
Publications
Citations
Views
---
Saved
---
About
Contact & Profiles
Research Areas
  • Web Data Mining and Analysis
  • Data Quality and Management
  • Cloud Computing and Resource Management
  • Advanced Database Systems and Queries
  • Topic Modeling
  • Semantic Web and Ontologies
  • IoT and Edge/Fog Computing
  • Caching and Content Delivery
  • Natural Language Processing Techniques
  • Text and Document Classification Technologies
  • Spam and Phishing Detection
  • Big Data and Business Intelligence
  • Multimedia Communication and Technology
  • Data Mining Algorithms and Applications
  • Blockchain Technology Applications and Security
  • Data Management and Algorithms
  • Software Engineering Research
  • Advanced Text Analysis Techniques
  • Distributed and Parallel Computing Systems
  • Service-Oriented Architecture and Web Services
  • Face and Expression Recognition
  • Scientific Computing and Data Management
  • Mobile Crowdsensing and Crowdsourcing
  • Advanced Neural Network Applications
  • Retinal Imaging and Analysis

Microsoft (United States)
2015-2025

Seattle University
2025

University of Washington
2025

University of California, Los Angeles
2025

University of Illinois Urbana-Champaign
2025

Menlo School
2025

Google (United States)
2025

Microsoft Research (United Kingdom)
2017-2023

Universidade Federal do Amazonas
2007-2013

Cloud research to date has lacked data on the characteristics of production virtual machine (VM) workloads large cloud providers. A thorough understanding these can inform providers' resource management systems, e.g. VM scheduler, power manager, server health manager. In this paper, we first introduce an extensive characterization Microsoft Azure's workload, including distributions VMs' lifetime, deployment size, and consumption. We then show that certain behaviors are fairly consistent over...

10.1145/3132747.3132772 article EN 2017-10-12

Exploring the opportunities to use ML, possible designs, and our experience with Microsoft Azure.

10.1145/3364684 article EN Communications of the ACM 2020-01-22

In this paper we propose a knowledge-base approach to help extracting the correct components of citations in any given format. Differently from related approaches that rely on manually built knowledge-bases (KBs) for recognizing citation, our case, such KB is automatically constructed an existing set sample metadata records area (e.g., computer science or health sciences). Our does not patterns encoding specific delimitators particular citation style. It also unsupervised, sense it learning...

10.1145/1255175.1255219 article EN 2007-06-18

In this paper we present a proposal for the implementation and evaluation of novel method automatically using data-rich text filling form-based input interfaces. Our solution takes as input, extracts implicit data values from it fills appropriate fields. For task, rely on knowledge obtained previous submissions each field, which are freely usage approach, called iForm , exploits features related to content style these values, combined through Bayesian framework. Through extensive...

10.14778/1929861.1929862 article EN Proceedings of the VLDB Endowment 2010-12-01

Cloud platforms remain underutilized despite multiple proposals to improve their utilization (e.g., disaggregation, harvesting, and oversubscription). Our characterization of the resource virtual machines (VMs) in Azure reveals that, while CPU is main resource, we need provide a solution manage all resources holistically. We also observe that many VMs exhibit complementary temporal patterns, which can be leveraged oversubscription resources. Based on these insights, propose Coach: system...

10.1145/3669940.3707226 preprint EN 2025-02-03

Information extraction by text segmentation (IETS) applies to cases in which data values of interest are organized implicit semi-structured records available textual sources (e.g. postal addresses, bibliographic information, ads). It is an important practical problem that has been frequently addressed the recent literature. In this paper we introduce ONDUX (On Demand Unsupervised Extraction), a new unsupervised probabilistic approach for IETS. As other IETS approaches, relies on information...

10.1145/1807167.1807254 article EN 2010-06-06

In this paper we present JUDIE (Joint Unsupervised Structure Discovery and Information Extraction), a new method for automatically extracting semi-structured data records in the form of continuous text (e.g., bibliographic citations, postal addresses, classified ads, etc.) having no explicit delimiters between them. While state-of-the-art Extraction methods structure is manually supplied by user as training step, capable detecting each individual record being extracted without any...

10.1145/1989323.1989380 article EN 2011-06-12

Cloud providers often have resources that are not being fully utilized, and they may offer them at a lower cost to make up for the reduced availability of these resources. However, customers be hesitant use such offerings (such as spot VMs) making trade-offs between resource is always straightforward. In this work, we propose Snape (Spot On-demand Perfect Mixture), an intelligent framework optimize by dynamically mixing on-demand VMs with VMs. Through detailed characterization based on real...

10.1145/3582016.3582028 article EN 2023-03-20

Today, cloud workloads are essentially opaque to the platform. Typically, only information platform receives is virtual machine (VM) type and possibly a decoration (e.g., VM evictable). Similarly, receive little no from platform; generally, might telemetry their VMs or exceptional signals shortly before evicted). The narrow interface between platforms has several drawbacks: (1) surge in types decorations public complicates customer selection; (2) essential workload characteristics low...

10.48550/arxiv.2404.19143 preprint EN arXiv (Cornell University) 2024-04-29

Abstract In this article we present FLUX‐CiM, a novel method for extracting components (e.g., author names, titles, venues, page numbers) from bibliographic citations. Our does not rely on patterns encoding specific delimiters used in particular citation style. This feature yields high degree of automation and flexibility, allows FLUX‐CiM to extract citations any given format. Differently previous methods that are based models learned user‐driven training, our relies knowledge base...

10.1002/asi.21049 article EN Journal of the American Society for Information Science and Technology 2009-02-25

In this poster paper, we present an overview of CienciaBrasil, a research social network involving researchers within the Brazilian INCT program. We describe its architecture and solutions adopted for data collection, extraction, deduplication, materializing visualizing network.

10.1145/1998076.1998168 article EN 2011-06-13

In large enterprises, data discovery is a common problem faced by users who need to find relevant information in relational databases. this scenario, schema annotation useful tool enrich database with descriptive keywords. paper, we demonstrate Barcelos, system that automatically annotates corporate Unlike existing approaches use Web oriented knowledge bases, Barcelos mines enterprise spreadsheets candidate annotations. Our experimental evaluation shows produces high quality annotations; the...

10.14778/2824032.2824105 article EN Proceedings of the VLDB Endowment 2015-08-01

On the web of today most prevalent solution for users to interact with data-intensive applications is use form-based interfaces composed by several data input fields, such as text boxes, radio buttons, pull-down lists, check etc. Although these are popular and effective, in many cases, free preferred over ones. In this paper we discuss proposal implementation a novel IR-based method using rich interfaces. Our takes input, extracts implicitly values from it fills appropriate fields them. For...

10.1145/1526709.1526908 article EN 2009-04-20

In this article, we present a study about classification methods for large-scale categorization of product offers on e-shopping web sites. We the performance previously proposed approaches and deployed probabilistic approach to model problem. also studied an alternative way modeling information description investigated usage price store as features adopted in process. Our experiments used two collections over million categorized by human editors taxonomies hundreds categories from real site....

10.1002/asi.21586 article EN Journal of the American Society for Information Science and Technology 2011-06-21

Information extraction by text segmentation (IETS) applies to cases in which data values of interest are organized implicit semi-structured records available textual sources (e.g. postal addresses, bibliographic information, ads). It is an important practical problem that has been frequently addressed the recent literature. We report here partial results from a PhD thesis work we introduce ONDUX (On Demand Unsupervised Extraction), new unsupervised probabilistic approach for IETS. As other...

10.1145/1811136.1811145 article EN 2010-06-11

Azure Spot Virtual Machines (Spot VMs) utilize unused compute capacity at significant cost savings. They can be evicted when needs the back, therefore suitable for workloads that tolerate interruptions. A good prediction of VM evictions is beneficial to optimize utilization and offers users information better plan deployments by selecting clusters reduce potential evictions. The current in-service cluster-level method ignores node heterogeneity aggregating information. In this paper, we...

10.1145/3487553.3524229 article EN Companion Proceedings of the The Web Conference 2018 2022-04-25

We propose a lightweight framework for data exchange that is suitable non-expert and casual users sharing on the Web or through peer-to-peer systems. Unlike previous work, we consider simplistic model schema formalism are describing typical online data, algorithms mapping such schemas as well translating corresponding instances. Our solution requires minimal overhead setup costs compared to existing systems, making it very attractive in setting. report experimental results indicating our...

10.1145/1316902.1316907 article EN 2007-11-09

With the rapid development of cloud systems, an increasing number service workloads are deployed in private and/or public cloud. Although large providers such as Azure and Google have published workload traces past, prior work has not focused on analyzing characterizing differences between detail. Based our experience working with Azure, one most widely used platforms world, we find that characteristics different workloads. Specifically, compared workloads, tend to be more homogeneous both...

10.1109/dsn58367.2023.00055 article EN 2023-06-01

Oversubscription is a prevalent practice in cloud services where the system offers more virtual resources, such as cores machines, to users or applications than its available physical capacity for reducing revenue loss due unused/redundant capacity. While oversubscription can potentially lead significant enhancement efficient resource utilization, caveat that it comes with risks of overloading and introducing jitter at level nodes if all co-located machines have high utilization. Thus...

10.48550/arxiv.2401.07033 preprint EN other-oa arXiv (Cornell University) 2024-01-01

The use of big data in a business revolves around monitor-mine-manage (M3) loop: is monitored real-time, while mined insights are used to manage the and derive value. While mining has traditionally been performed offline, recent years have seen an increasing need perform all phases M3 real-time. A stream processing engine (SPE) enables such seamless loop for applications as targeted advertising, recommender systems, risk analysis, call-center analytics. However, these require SPE maintain...

10.1109/icde.2016.7498262 article EN 2016-05-01
Coming Soon ...