Olga Papaemmanouil

ORCID: 0000-0003-4526-3595
Publications
Citations
Views
---
Saved
---
About
Contact & Profiles
Research Areas
  • Advanced Database Systems and Queries
  • Cloud Computing and Resource Management
  • Data Management and Algorithms
  • Peer-to-Peer Network Technologies
  • Data Stream Mining Techniques
  • Caching and Content Delivery
  • Data Quality and Management
  • Advanced Data Storage Technologies
  • Scientific Computing and Data Management
  • Distributed systems and fault tolerance
  • Optimization and Search Problems
  • Time Series Analysis and Forecasting
  • Semantic Web and Ontologies
  • Recommender Systems and Techniques
  • Distributed and Parallel Computing Systems
  • IoT and Edge/Fog Computing
  • Machine Learning and Data Classification
  • Machine Learning and Algorithms
  • Stochastic Gradient Optimization Techniques
  • Software System Performance and Reliability
  • Functional Brain Connectivity Studies
  • Big Data and Business Intelligence
  • Cell Image Analysis Techniques
  • Anomaly Detection Techniques and Applications
  • Advanced Image and Video Retrieval Techniques

Brandeis University
2012-2022

John Brown University
2006-2013

Brown University
2004-2008

Athens University of Economics and Business
2001

Data exploration is about efficiently extracting knowledge from data even if we do not know exactly what are looking for. In this tutorial, survey recent developments in the emerging area of database systems tailored for exploration. We discuss new ideas on how to store and access as well interact with a system enable users applications quickly figure out which parts interest. addition, exploit lessons-learned past research, challenges crafts, future research directions.

10.1145/2723372.2731084 article EN 2015-05-27

Current trends in data management systems, such as cloud and multi-tenant databases, are leading to processing environments that concurrently execute heterogeneous query workloads. At the same time, these systems need satisfy diverse performance expectations. In newly-emerging settings, avoiding potential Quality-of-Service (QoS) violations heavily relies on predictability, i.e., ability estimate impact of concurrent execution individual queries a continuously evolving workload.

10.1145/1989323.1989359 article EN 2011-06-12

Interactive Data Exploration (IDE) is a key ingredient of diverse set discovery-oriented applications, including ones from scientific computing and evidence-based medicine. In these data discovery highly ad hoc interactive process where users execute numerous exploration queries using varying predicates aiming to balance the trade-off between collecting all relevant information reducing size returned data. Therefore, there strong need support human-in-the-loop applications by assisting their...

10.1145/2588555.2610523 article EN 2014-06-18

Join order selection plays a significant role in query performance. However, modern optimizers typically employ static join enumeration algorithms that do not incorporate feedback about the quality of resulting plan. Hence, often repeatedly choose same bad plan, as they have no mechanism for "learning from their mistakes." Here, we argue deep reinforcement learning techniques can be applied to address this challenge. These techniques, powered by artificial neural networks, automatically...

10.1145/3211954.3211957 preprint EN 2018-05-22

Query performance prediction, the task of predicting a query's latency prior to execution, is challenging problem in database management systems. Existing approaches rely on features and models engineered by human experts, but often fail capture complex interactions between query operators input relations, generally do not adapt naturally workload characteristics patterns execution plans. In this paper, we argue that deep learning can be applied prediction problem, introduce novel neural...

10.14778/3342263.3342646 article EN Proceedings of the VLDB Endowment 2019-07-01

Query optimization is one of the most challenging problems in database systems. Despite progress made over past decades, query optimizers remain extremely complex components that require a great deal hand-tuning for specific workloads and datasets. Motivated by this shortcoming inspired recent advances applying machine learning to data management challenges, we introduce Neo ( Neural Optimizer ), novel learning-based optimizer relies on deep neural networks generate executions plans....

10.14778/3342263.3342644 article EN Proceedings of the VLDB Endowment 2019-07-01

In this paper, we argue that database systems be augmented with an automated data exploration service methodically steers users through the in a meaningful way. Such system is crucial for deriving insights from complex datasets found many big applications such as scientific and healthcare well reducing human effort of exploration. Towards end, present AIDE, Automatic Interactive Data Exploration framework assists discovering new interesting patterns eliminate expensive ad-hoc exploratory...

10.1109/tkde.2016.2599168 article EN IEEE Transactions on Knowledge and Data Engineering 2016-08-10

Borealis is a distributed stream processing engine that being developed at Brandeis University, Brown and MIT. inherits core functionality from Aurora inter-node communication Medusa.We propose to demonstrate some of the key aspects operation in Borealis, using multi-player network game as underlying application. The demonstration will illustrate dynamic resource management, query optimization high availability mechanisms employed by visual performance-monitoring tools well gaming experience.

10.1145/1066157.1066274 article EN 2005-06-14

Query optimization remains one of the most important and well-studied problems in database systems. However, traditional query optimizers are complex heuristically-driven systems, requiring large amounts time to tune for a particular even more develop maintain first place. In this vision paper, we argue that new type optimizer, based on deep reinforcement learning, can drastically improve state-of-the-art. We identify potential complications future research integrates learning with...

10.48550/arxiv.1809.10212 preprint EN other-oa arXiv (Cornell University) 2018-01-01

Workload management for cloud databases deals with the tasks of resource provisioning, query placement, and scheduling in a manner that meets application's performance goals while minimizing cost using resources. Existing solutions have approached these three challenges isolation aiming to optimize single metric. In this paper, we introduce WiSeDB, learning-based framework generating holistic workload customized application-defined characteristics. Our approach relies on supervised learning...

10.14778/2977797.2977804 article EN Proceedings of the VLDB Endowment 2016-06-01

We address the problem of content-based dissemination highly-distributed, high-volume data streams for stream-based monitoring applications and large-scale delivery. Existing approaches commonly rely on distributed filtering trees that require at all brokers tree. present a new semantic multicast approach eliminates need interior facilitates fine-grained control over construction efficient trees. The central idea is to split incoming (based their contents, rates, destinations) then spread...

10.1109/icde.2005.131 article EN 2005-04-19

We discuss the problem of resource provisioning for database management systems operating on top an Infrastructure-As-A-Service (IaaS) cloud. To solve this problem, we describe extensible framework that, given a target query workload, continually optimizes system's operational cost, estimated based IaaS provider's pricing model, while satisfying QoS expectations. Specifically, two different approaches, ¿white-box¿ approach that uses fine-grained estimation expected consumption and...

10.1109/icdew.2010.5452746 article EN 2010-01-01

Science applications are accumulating an ever-increasing amount of multidimensional data. Although some it can be processed in a relational database, much is better suited to array-based engines. As such, important optimize the query processing these systems. This paper focuses on efficient join operations within array database. These engines invariably ``chunk'' their data into tiles that they use efficiently process spatial queries. traditional algorithms need substantially modified take...

10.1145/2723372.2723709 article EN 2015-05-27

Predicting query performance under concurrency is a difficult task that has many applications in capacity planning, cloud computing, and batch scheduling. We introduce Contender, new resourcemodeling approach for predicting the concurrent of analytical workloads. Contender’s unique feature it can generate effective predictions both static as well adhoc or dynamic workloads with low training requirements. These characteristics make Contender practical solution real-world deployment. relies on...

10.5441/002/edbt.2014.11 article EN 2014-01-01

Distributed data management systems often operate on "elastic'' clusters that can scale up or down demand. These face numerous challenges, including fragmentation, replication, and cluster sizing. Unfortunately, these challenges have traditionally been treated independently, leaving administrators with little insight how the interplay of decisions affects query performance. This paper introduces NashDB, an adaptive distribution framework relies economic model to automatically balance supply...

10.1145/3183713.3196935 article EN Proceedings of the 2022 International Conference on Management of Data 2018-05-25

We introduce XPORT, a profile-driven distributed data dissemination system that supports an extensible set of types, profile and optimization metrics. XPORT efficiently implements generic tree-based overlay network, which can be customized per application using small number methods encapsulate application-specific filtering, aggregation, logic. The clean separation between the "plumbing" "application" enables to uniformly support disparate dissemination-based applications.We first provide...

10.1145/1142473.1142541 article EN 2006-06-27

We consider the problem of content-based routing and dissemination highly-distributed, fast data streams from multiple sources to receivers. Our target application domain includes real-time, stream-based monitoring applications large-scale event dissemination. introduce SemCast, a new semantic multicast approach that, unlike previous approaches, eliminates need for forwarding at interior brokers facilitates fine-grained control over construction overlays. present initial design SemCast...

10.1145/1017074.1017085 article EN 2004-06-17

Existing stream processing systems are optimized for a specific metric, which may limit their applicability to diverse applications and environments. This paper presents XFlow, generic data collection, processing, dissemination system that addresses this limitation efficiently. XFlow can express optimize variety of optimization metrics constraints by distributing queries across wide-area network. It uses metric-independent decentralized algorithms work on localized, aggregated statistics,...

10.1109/icde.2009.11 article EN Proceedings - International Conference on Data Engineering 2009-03-01

We introduce pulse, a framework for processing continuous queries over models of continuous-time data, which can compactly and accurately represent many real-world activities processes. Pulse implements several query operators, including filters, aggregates joins, that work by solving simultaneous equation systems, in cases is significantly cheaper than stream tuples. As such, pulse translates regular to on inputs, reduce computational overhead latency while meeting user-specified error...

10.1109/icde.2008.4497475 article EN 2008-04-01
Coming Soon ...