NFDI4DS | UHH-SEMS - Publication Details

Ihab F. Ilyas

ORCID: 0000-0001-9052-9714

Publications

Citations

Views

---

Saved

---

About

Contact & Profiles

A5000141065

Research Areas

Data Quality and Management
Advanced Database Systems and Queries
Data Management and Algorithms
Privacy-Preserving Technologies in Data
Data Mining Algorithms and Applications
Semantic Web and Ontologies
Topic Modeling
Web Data Mining and Analysis
Advanced Graph Neural Networks
Natural Language Processing Techniques
Big Data and Business Intelligence
Data-Driven Disease Surveillance
Bayesian Modeling and Causal Inference
Distributed systems and fault tolerance
Algorithms and Data Compression
Anomaly Detection Techniques and Applications
Cloud Data Security Solutions
Cryptography and Data Security
Scientific Computing and Data Management
Advanced Image and Video Retrieval Techniques
Peer-to-Peer Network Technologies
Logic, Reasoning, and Knowledge
Graph Theory and Algorithms
Data Stream Mining Techniques
Constraint Satisfaction and Optimization

University of Waterloo
2015-2024

Apple (United States)
2022-2023

Sapienza University of Rome
2023

University of Calgary
2023

University of Michigan
2023

Universitas Flores
2021

Universitas Syiah Kuala
2020

Qatar Cardiovascular Research Center
2012-2013

Qatar Foundation
2011-2013

Qatar Airways (Qatar)
2013

Top-k Query Processing in Uncertain Databases

OPENALEX - Publications

Mohamed A. Soliman Ihab F. Ilyas Kevin Chen–Chuan Chang

Top-k processing in uncertain databases is semantically and computationally different from traditional top-k processing. The interplay between score uncertainty makes techniques inapplicable. We introduce new probabilistic formulations for queries. Our are based on "marriage" of semantics possible worlds semantics. In the light these formulations, we construct a framework that encapsulates state space model efficient query to tackle challenges data settings. prove our optimal terms number...

10.1109/icde.2007.367935 article EN 2007-04-01

HoloClean

OPENALEX - Publications

Theodoros Rekatsinas Xu Chu Ihab F. Ilyas Christopher Ré

We introduce HoloClean, a framework for holistic data repairing driven by probabilistic inference. HoloClean unifies qualitative repairing, which relies on integrity constraints or external sources, with quantitative methods, leverage statistical properties of the input data. Given an inconsistent dataset as input, automatically generates program that performs repairing. Inspired recent theoretical advances in inference, we series optimizations ensure inference over HoloClean's model scales...

10.14778/3137628.3137631 article EN Proceedings of the VLDB Endowment 2017-08-01

Supporting top-k join queries in relational databases

OPENALEX - Publications

Ihab F. Ilyas WalidG. Aref AhmedK. Elmagarmid

10.1007/s00778-004-0128-2 article EN The VLDB Journal 2004-08-11

Data Cleaning

OPENALEX - Publications

Xu Chu Ihab F. Ilyas Sanjay Krishnan Jiannan Wang

Detecting and repairing dirty data is one of the perennial challenges in analytics, failure to do so can result inaccurate analytics unreliable decisions. Over past few years, there has been a surge interest from both industry academia on cleaning problems including new abstractions, interfaces, approaches for scalability, statistical techniques. To better understand advances field, we will first present taxonomy literature which highlight recent techniques that use constraints, rules, or...

10.1145/2882903.2912574 article EN Proceedings of the 2022 International Conference on Management of Data 2016-06-16

Holistic data cleaning: Putting violations into context

OPENALEX - Publications

Xu Chu Ihab F. Ilyas Paolo Papotti

Data cleaning is an important problem and data quality rules are the most promising way to face it with a declarative approach. Previous work has focused on specific formalisms, such as functional dependencies (FDs), conditional (CFDs), matching (MDs), those have always been studied in isolation. Moreover, techniques usually applied pipeline or interleaved. In this we tackle novel, unified framework. First, let users specify using denial constraints ad-hoc predicates. This language subsumes...

10.1109/icde.2013.6544847 article EN 2022 IEEE 38th International Conference on Data Engineering (ICDE) 2013-04-01

NADEEF

OPENALEX - Publications

Michele Dallachiesa Amr Ebaid Ahmed Eldawy Ahmed K. Elmagarmid Ihab F. Ilyas and 2 more

Despite the increasing importance of data quality and rich theoretical practical contributions in all aspects cleaning, there is no single end-to-end off-the-shelf solution to (semi-)automate detection repairing violations w.r.t. a set heterogeneous ad-hoc constraints. In short, commodity platform similar general purpose DBMSs that can be easily customized deployed solve application-specific problems. this paper, we present NADEEF, an extensible, generalized easy-to-deploy cleaning platform....

10.1145/2463676.2465327 article EN 2013-06-22

KATARA

OPENALEX - Publications

Xu Chu John Morcos Ihab F. Ilyas Mourad Ouzzani Paolo Papotti and 2 more

Classical approaches to clean data have relied on using integrity constraints, statistics, or machine learning. These are known be limited in the cleaning accuracy, which can usually improved by consulting master and involving experts resolve ambiguity. The advent of knowledge bases KBs both general-purpose within enterprises, crowdsourcing marketplaces providing yet more opportunities achieve higher accuracy at a larger scale. We propose KATARA, base crowd powered system that, given table,...

10.1145/2723372.2749431 article EN 2015-05-27

Discovering denial constraints

OPENALEX - Publications

Xu Chu Ihab F. Ilyas Paolo Papotti

Integrity constraints (ICs) provide a valuable tool for enforcing correct application semantics. However, designing ICs requires experts and time. Proposals automatic discovery have been made some formalisms, such as functional dependencies their extension conditional dependencies. Unfortunately, these cannot express many common business rules. For example, an American citizen lower salary higher tax rate than another in the same state. In this paper, we tackle challenges of discovering more...

10.14778/2536258.2536262 article EN Proceedings of the VLDB Endowment 2013-08-01

Detecting data errors

OPENALEX - Publications

Ziawasch Abedjan Xu Chu Dong Deng Raul Castro Fernandez Ihab F. Ilyas and 4 more

Data cleaning has played a critical role in ensuring data quality for enterprise applications. Naturally, there been extensive research this area, and many algorithms have translated into tools to detect possibly repair certain classes of errors such as outliers, duplicates, missing values, violations integrity constraints. Since different types may coexist the same set, we often need run more than one kind tool. In paper, investigate two pragmatic questions: (1) are these robust enough...

10.14778/2994509.2994518 article EN Proceedings of the VLDB Endowment 2016-08-01

CORDS

OPENALEX - Publications

Ihab F. Ilyas Volker Markl Peter J. Haas Paul Brown Ashraf Aboulnaga

The rich dependency structure found in the columns of real-world relational databases can be exploited to great advantage, but also cause query optimizers---which usually assume that are statistically independent---to underestimate selectivities conjunctive predicates by orders magnitude. We introduce CORDS, an efficient and scalable tool for automatic discovery correlations soft functional dependencies between columns. CORDS searches column pairs might have interesting useful relations...

10.1145/1007568.1007641 article EN 2004-06-13

RankSQL

OPENALEX - Publications

Chengkai Li Kevin Chen–Chuan Chang Ihab F. Ilyas Sumin Song

This paper introduces RankSQL, a system that provides systematic and principled framework to support efficient evaluations of ranking (top-k) queries in relational database systems (RDBMS), by extending algebra query optimization. Previously, top-k processing is studied the middleware scenario or RDBMS piecemeal fashion, i.e., focusing on specific operator sitting outside core engines. In contrast, we aim as first-class construct. As key insight, new relationship can be viewed another...

10.1145/1066157.1066173 article EN 2005-06-14

Guided data repair

OPENALEX - Publications

Mohamed Yakout Ahmed K. Elmagarmid Jennifer Neville Mourad Ouzzani Ihab F. Ilyas

In this paper we present GDR, a Guided Data Repair framework that incorporates user feedback in the cleaning process to enhance and accelerate existing automatic repair techniques while minimizing involvement. GDR consults on updates are most likely be beneficial improving data quality. also uses machine learning methods identify apply correct directly database without actual involvement of these specific updates. To rank potential for consultation by user, first group repairs quantify...

10.14778/1952376.1952378 article EN Proceedings of the VLDB Endowment 2011-02-01

Efficient search for the top-k probable nearest neighbors in uncertain databases

OPENALEX - Publications

George Beskales Mohamed A. Soliman Ihab F. Ilyas

Uncertainty pervades many domains in our lives. Current real-life applications, e.g., location tracking using GPS devices or cell phones, multimedia feature extraction, and sensor data management, deal with different kinds of uncertainty. Finding the nearest neighbor objects to a given query point is an important type these applications. In this paper, we study problem finding highest marginal probability being neighbors object. We adopt general uncertainty model allowing for Under model,...

10.14778/1453856.1453895 article EN Proceedings of the VLDB Endowment 2008-08-01

BigDansing

OPENALEX - Publications

Zuhair Khayyat Ihab F. Ilyas Alekh Jindal Samuel Madden Mourad Ouzzani and 4 more

Data cleansing approaches have usually focused on detecting and fixing errors with little attention to scaling big datasets. This presents a serious impediment since data often involves costly computations such as enumerating pairs of tuples, handling inequality joins, dealing user-defined functions. In this paper, we present BigDansing, Big Cleansing system tackle efficiency, scalability, ease-of-use issues in cleansing. The can run top most common general purpose processing platforms,...

10.1145/2723372.2747646 article EN 2015-05-27

Trends in Cleaning Relational Data: Consistency and Deduplication

OPENALEX - Publications

Ihab F. Ilyas Xu Chu

10.1561/1900000045 article EN Foundations and Trends in Databases 2015-01-01

HoloDetect

OPENALEX - Publications

Alireza Heidari McGrath Joshua Ihab F. Ilyas Theodoros Rekatsinas

We introduce a few-shot learning framework for error detection. show that data augmentation (a form of weak supervision) is key to training high-quality, ML-based detection models require minimal human involvement. Our consists two parts: (1) an expressive model learn rich representations capture the inherent syntactic and semantic heterogeneity errors; (2) that, given small seed clean records, uses dataset-specific transformations automatically generate additional data. insight policies...

10.1145/3299869.3319888 preprint EN Proceedings of the 2022 International Conference on Management of Data 2019-06-18

Rank-aware query optimization

OPENALEX - Publications

Ihab F. Ilyas Rahul Shah Walid G. Aref Jeffrey Scott Vitter Ahmed K. Elmagarmid

Ranking is an important property that needs to be fully supported by current relational query engines. Recently, several rank-join operators have been proposed based on rank aggregation algorithms. Rank-join progressively the join results while performing operation. The new a direct impact traditional processing and optimization.We introduce rank-aware optimization framework integrates into extending System R dynamic programming algorithm in both enumeration pruning. We define ranking as...

10.1145/1007568.1007593 article EN 2004-06-13

Sampling the repairs of functional dependency violations under hard constraints

OPENALEX - Publications

George Beskales Ihab F. Ilyas Lukasz Golab

Violations of functional dependencies (FDs) are common in practice, often arising the context data integration or Web extraction. Resolving these violations is known to be challenging for a variety reasons, one them being exponential number possible "repairs". Previous work has tackled this problem either by producing single repair that (nearly) optimal with respect some metric, computing consistent answers selected classes queries without explicitly generating repairs. In paper, we propose...

10.14778/1920841.1920870 article EN Proceedings of the VLDB Endowment 2010-09-01

Nile: a query processing engine for data streams

OPENALEX - Publications

Moustafa A. Hammad Mohamed F. Mokbel Mohamed H. Ali Walid G. Aref Ann Christine Catlin and 8 more

We present the demonstration of design "STEAM", Purdue Boiler Makers' stream database system that allows for processing continuous and snap-shot queries over data streams. Specifically, focuses on query engine, "Nile". Nile extends processor engine an object-relational management system, PREDATOR, to process supports extended SQL operators handle sliding-window execution as approach restrict size stored state in such join.

10.1109/icde.2004.1320080 article EN 2004-09-28

On the relative trust between inconsistent data and inaccurate constraints

OPENALEX - Publications

George Beskales Ihab F. Ilyas Lukasz Golab Artur Galiullin

Functional dependencies (FDs) specify the intended data semantics while violations of FDs indicate deviation from these semantics. In this paper, we study a cleaning problem in which may not be completely correct, e.g., due to evolution or incomplete knowledge We argue that notion relative trust is crucial aspect problem: if are outdated, should modify them fit data, but suspect there problems with FDs. practice, it usually unclear how much versus To address problem, propose an algorithm for...

10.1109/icde.2013.6544854 article EN 2022 IEEE 38th International Conference on Data Engineering (ICDE) 2013-04-01

Coming Soon ...