NFDI4DS | UHH-SEMS - Publication Details

Olga Papaemmanouil

ORCID: 0000-0003-4526-3595

Publications

Citations

Views

---

Saved

---

About

Contact & Profiles

A5051190490

Research Areas

Advanced Database Systems and Queries
Cloud Computing and Resource Management
Data Management and Algorithms
Peer-to-Peer Network Technologies
Data Stream Mining Techniques
Caching and Content Delivery
Data Quality and Management
Advanced Data Storage Technologies
Scientific Computing and Data Management
Distributed systems and fault tolerance
Optimization and Search Problems
Time Series Analysis and Forecasting
Semantic Web and Ontologies
Recommender Systems and Techniques
Distributed and Parallel Computing Systems
IoT and Edge/Fog Computing
Machine Learning and Data Classification
Machine Learning and Algorithms
Stochastic Gradient Optimization Techniques
Software System Performance and Reliability
Functional Brain Connectivity Studies
Big Data and Business Intelligence
Cell Image Analysis Techniques
Anomaly Detection Techniques and Applications
Advanced Image and Video Retrieval Techniques

Brandeis University
2012-2022

John Brown University
2006-2013

Brown University
2004-2008

Athens University of Economics and Business
2001

Overview of Data Exploration Techniques

OPENALEX - Publications

Stratos Idreos Olga Papaemmanouil Surajit Chaudhuri

Data exploration is about efficiently extracting knowledge from data even if we do not know exactly what are looking for. In this tutorial, survey recent developments in the emerging area of database systems tailored for exploration. We discuss new ideas on how to store and access as well interact with a system enable users applications quickly figure out which parts interest. addition, exploit lessons-learned past research, challenges crafts, future research directions.

10.1145/2723372.2731084 article EN 2015-05-27

Performance prediction for concurrent database workloads

OPENALEX - Publications

Jennie Duggan Uğur Çetintemel Olga Papaemmanouil Eli Upfal

Current trends in data management systems, such as cloud and multi-tenant databases, are leading to processing environments that concurrently execute heterogeneous query workloads. At the same time, these systems need satisfy diverse performance expectations. In newly-emerging settings, avoiding potential Quality-of-Service (QoS) violations heavily relies on predictability, i.e., ability estimate impact of concurrent execution individual queries a continuously evolving workload.

10.1145/1989323.1989359 article EN 2011-06-12

Explore-by-example

OPENALEX - Publications

Kyriaki Dimitriadou Olga Papaemmanouil Yanlei Diao

Interactive Data Exploration (IDE) is a key ingredient of diverse set discovery-oriented applications, including ones from scientific computing and evidence-based medicine. In these data discovery highly ad hoc interactive process where users execute numerous exploration queries using varying predicates aiming to balance the trade-off between collecting all relevant information reducing size returned data. Therefore, there strong need support human-in-the-loop applications by assisting their...

10.1145/2588555.2610523 article EN 2014-06-18

Deep Reinforcement Learning for Join Order Enumeration

OPENALEX - Publications

Ryan Marcus Olga Papaemmanouil

Join order selection plays a significant role in query performance. However, modern optimizers typically employ static join enumeration algorithms that do not incorporate feedback about the quality of resulting plan. Hence, often repeatedly choose same bad plan, as they have no mechanism for "learning from their mistakes." Here, we argue deep reinforcement learning techniques can be applied to address this challenge. These techniques, powered by artificial neural networks, automatically...

10.1145/3211954.3211957 preprint EN 2018-05-22

Plan-structured deep neural network models for query performance prediction

OPENALEX - Publications

Ryan Marcus Olga Papaemmanouil

Query performance prediction, the task of predicting a query's latency prior to execution, is challenging problem in database management systems. Existing approaches rely on features and models engineered by human experts, but often fail capture complex interactions between query operators input relations, generally do not adapt naturally workload characteristics patterns execution plans. In this paper, we argue that deep learning can be applied prediction problem, introduce novel neural...

10.14778/3342263.3342646 article EN Proceedings of the VLDB Endowment 2019-07-01

Neo

OPENALEX - Publications

Ryan Marcus Parimarjan Negi Hongzi Mao Chi Zhang Mohammad Alizadeh and 3 more

Query optimization is one of the most challenging problems in database systems. Despite progress made over past decades, query optimizers remain extremely complex components that require a great deal hand-tuning for specific workloads and datasets. Motivated by this shortcoming inspired recent advances applying machine learning to data management challenges, we introduce Neo ( Neural Optimizer ), novel learning-based optimizer relies on deep neural networks generate executions plans....

10.14778/3342263.3342644 article EN Proceedings of the VLDB Endowment 2019-07-01

AIDE: An Active Learning-Based Approach for Interactive Data Exploration

OPENALEX - Publications

Kyriaki Dimitriadou Olga Papaemmanouil Yanlei Diao

In this paper, we argue that database systems be augmented with an automated data exploration service methodically steers users through the in a meaningful way. Such system is crucial for deriving insights from complex datasets found many big applications such as scientific and healthcare well reducing human effort of exploration. Towards end, present AIDE, Automatic Interactive Data Exploration framework assists discovering new interesting patterns eliminate expensive ad-hoc exploratory...

10.1109/tkde.2016.2599168 article EN IEEE Transactions on Knowledge and Data Engineering 2016-08-10

Distributed operation in the Borealis stream processing engine

OPENALEX - Publications

Yanif Ahmad Bradley Berg Uğur Çetintemel Mark G. Humphrey Jeong-Hyon Hwang and 8 more

Borealis is a distributed stream processing engine that being developed at Brandeis University, Brown and MIT. inherits core functionality from Aurora inter-node communication Medusa.We propose to demonstrate some of the key aspects operation in Borealis, using multi-player network game as underlying application. The demonstration will illustrate dynamic resource management, query optimization high availability mechanisms employed by visual performance-monitoring tools well gaming experience.

10.1145/1066157.1066274 article EN 2005-06-14

Towards a Hands-Free Query Optimizer through Deep Learning

OPENALEX - Publications

Ryan Marcus Olga Papaemmanouil

Query optimization remains one of the most important and well-studied problems in database systems. However, traditional query optimizers are complex heuristically-driven systems, requiring large amounts time to tune for a particular even more develop maintain first place. In this vision paper, we argue that new type optimizer, based on deep reinforcement learning, can drastically improve state-of-the-art. We identify potential complications future research integrates learning with...

10.48550/arxiv.1809.10212 preprint EN other-oa arXiv (Cornell University) 2018-01-01

WiSeDB

OPENALEX - Publications

Ryan Marcus Olga Papaemmanouil

Workload management for cloud databases deals with the tasks of resource provisioning, query placement, and scheduling in a manner that meets application's performance goals while minimizing cost using resources. Existing solutions have approached these three challenges isolation aiming to optimize single metric. In this paper, we introduce WiSeDB, learning-based framework generating holistic workload customized application-defined characteristics. Our approach relies on supervised learning...

10.14778/2977797.2977804 article EN Proceedings of the VLDB Endowment 2016-06-01

SemCast: Semantic Multicast for Content-Based Data Dissemination

OPENALEX - Publications

Olga Papaemmanouil Uğur Çetintemel

We address the problem of content-based dissemination highly-distributed, high-volume data streams for stream-based monitoring applications and large-scale delivery. Existing approaches commonly rely on distributed filtering trees that require at all brokers tree. present a new semantic multicast approach eliminates need interior facilitates fine-grained control over construction efficient trees. The central idea is to split incoming (based their contents, rates, destinations) then spread...

10.1109/icde.2005.131 article EN 2005-04-19

A generic auto-provisioning framework for cloud databases

OPENALEX - Publications

Jennie Rogers Olga Papaemmanouil Uğur Çetintemel

We discuss the problem of resource provisioning for database management systems operating on top an Infrastructure-As-A-Service (IaaS) cloud. To solve this problem, we describe extensible framework that, given a target query workload, continually optimizes system's operational cost, estimated based IaaS provider's pricing model, while satisfying QoS expectations. Specifically, two different approaches, ¿white-box¿ approach that uses fine-grained estimation expected consumption and...

10.1109/icdew.2010.5452746 article EN 2010-01-01

Skew-Aware Join Optimization for Array Databases

OPENALEX - Publications

Jennie Duggan Olga Papaemmanouil Leilani Battle Michael Stonebraker

Science applications are accumulating an ever-increasing amount of multidimensional data. Although some it can be processed in a relational database, much is better suited to array-based engines. As such, important optimize the query processing these systems. This paper focuses on efficient join operations within array database. These engines invariably ``chunk'' their data into tiles that they use efficiently process spatial queries. traditional algorithms need substantially modified take...

10.1145/2723372.2723709 article EN 2015-05-27

Contender: A Resource Modeling Approach for Concurrent Query Performance Prediction

OPENALEX - Publications

Jennie Duggan Olga Papaemmanouil Uğur Çetintemel Eli Upfal

Predicting query performance under concurrency is a difficult task that has many applications in capacity planning, cloud computing, and batch scheduling. We introduce Contender, new resourcemodeling approach for predicting the concurrent of analytical workloads. Contender’s unique feature it can generate effective predictions both static as well adhoc or dynamic workloads with low training requirements. These characteristics make Contender practical solution real-world deployment. relies on...

10.5441/002/edbt.2014.11 article EN 2014-01-01

NashDB

OPENALEX - Publications

Ryan Marcus Olga Papaemmanouil Sofiya Semenova Solomon Garber

Distributed data management systems often operate on "elastic'' clusters that can scale up or down demand. These face numerous challenges, including fragmentation, replication, and cluster sizing. Unfortunately, these challenges have traditionally been treated independently, leaving administrators with little insight how the interplay of decisions affects query performance. This paper introduces NashDB, an adaptive distribution framework relies economic model to automatically balance supply...

10.1145/3183713.3196935 article EN Proceedings of the 2022 International Conference on Management of Data 2018-05-25

Extensible optimization in overlay dissemination trees

OPENALEX - Publications

Olga Papaemmanouil Yanif Ahmad Uğur Çetintemel John Jannotti Yenel Yildirim

We introduce XPORT, a profile-driven distributed data dissemination system that supports an extensible set of types, profile and optimization metrics. XPORT efficiently implements generic tree-based overlay network, which can be customized per application using small number methods encapsulate application-specific filtering, aggregation, logic. The clean separation between the "plumbing" "application" enables to uniformly support disparate dissemination-based applications.We first provide...

10.1145/1142473.1142541 article EN 2006-06-27

Big Data Exploration, Visualization and Analytics

OPENALEX - Publications

Nikos Bikakis George Papastefanatos Olga Papaemmanouil

10.1016/j.bdr.2019.100123 article EN Big Data Research 2019-12-01

Semantic multicast for content-based stream dissemination

OPENALEX - Publications

Olga Papaemmanouil Uğur Çetintemel

We consider the problem of content-based routing and dissemination highly-distributed, fast data streams from multiple sources to receivers. Our target application domain includes real-time, stream-based monitoring applications large-scale event dissemination. introduce SemCast, a new semantic multicast approach that, unlike previous approaches, eliminates need for forwarding at interior brokers facilitates fine-grained control over construction overlays. present initial design SemCast...

10.1145/1017074.1017085 article EN 2004-06-17

Supporting Generic Cost Models for Wide-Area Stream Processing

OPENALEX - Publications

Olga Papaemmanouil Uğur Çetintemel John Jannotti

Existing stream processing systems are optimized for a specific metric, which may limit their applicability to diverse applications and environments. This paper presents XFlow, generic data collection, processing, dissemination system that addresses this limitation efficiently. XFlow can express optimize variety of optimization metrics constraints by distributing queries across wide-area network. It uses metric-independent decentralized algorithms work on localized, aggregated statistics,...

10.1109/icde.2009.11 article EN Proceedings - International Conference on Data Engineering 2009-03-01

Simultaneous Equation Systems for Query Processing on Continuous-Time Data Streams

OPENALEX - Publications

Yanif Ahmad Olga Papaemmanouil Uğur Çetintemel Jennie Rogers

We introduce pulse, a framework for processing continuous queries over models of continuous-time data, which can compactly and accurately represent many real-world activities processes. Pulse implements several query operators, including filters, aggregates joins, that work by solving simultaneous equation systems, in cases is significantly cheaper than stream tuples. As such, pulse translates regular to on inputs, reduce computational overhead latency while meeting user-specified error...

10.1109/icde.2008.4497475 article EN 2008-04-01

Coming Soon ...