Zuhair Khayyat

ORCID: 0000-0003-3650-6997
Publications
Citations
Views
---
Saved
---
About
Contact & Profiles
Research Areas
  • Graph Theory and Algorithms
  • Cloud Computing and Resource Management
  • Advanced Database Systems and Queries
  • Data Management and Algorithms
  • Data Mining Algorithms and Applications
  • Data Quality and Management
  • Advanced Image and Video Retrieval Techniques
  • Distributed and Parallel Computing Systems
  • Advanced Graph Theory Research
  • Complex Network Analysis Techniques
  • Big Data and Business Intelligence
  • Big Data Technologies and Applications
  • Sentiment Analysis and Opinion Mining
  • Topic Modeling
  • Data Visualization and Analytics
  • Algorithms and Data Compression
  • Graph Labeling and Dimension Problems
  • Spam and Phishing Detection
  • Privacy-Preserving Technologies in Data
  • Scientific Computing and Data Management
  • Advanced Graph Neural Networks
  • Advanced Text Analysis Techniques
  • Caching and Content Delivery
  • Cloud Data Security Solutions
  • Semantic Web and Ontologies

King Abdullah University of Science and Technology
2013-2019

Pregel [23] was recently introduced as a scalable graph mining system that can provide significant performance improvements over traditional MapReduce implementations. Existing implementations focus primarily on partitioning preprocessing step to balance computation across compute nodes. In this paper, we examine the runtime characteristics of system. We show alone is insufficient for minimizing end-to-end computation. Especially where data very large or behavior algorithm unknown, an...

10.1145/2465351.2465369 article EN 2013-04-15

Data cleansing approaches have usually focused on detecting and fixing errors with little attention to scaling big datasets. This presents a serious impediment since data often involves costly computations such as enumerating pairs of tuples, handling inequality joins, dealing user-defined functions. In this paper, we present BigDansing, Big Cleansing system tackle efficiency, scalability, ease-of-use issues in cleansing. The can run top most common general purpose processing platforms,...

10.1145/2723372.2747646 article EN 2015-05-27

Distributed SPARQL engines promise to support very large RDF datasets by utilizing shared-nothing computer clusters. Some are based on distributed frameworks such as MapReduce; others implement proprietary processing; and some rely expensive preprocessing for data partitioning. These systems exhibit a variety of trade-offs that not well-understood, due the lack any comprehensive quantitative qualitative evaluation. In this paper, we present survey 22 state-of-the-art cover entire spectrum...

10.14778/3151106.3151109 article EN Proceedings of the VLDB Endowment 2017-09-01

Frequent Subgraph Mining is an essential operation for graph analytics and knowledge extraction. Due to its high computational cost, parallel solutions are necessary. Existing approaches either suffer from load imbalance, or communication synchronization overheads. In this paper we propose ScaleMine; a novel frequent subgraph mining system single large graph. ScaleMine introduces two-phase approach. The first phase approximate; it quickly identifies subgraphs that with probability, while...

10.1109/sc.2016.60 article EN 2016-11-01

Frequent Subgraph Mining is an essential operation for graph analytics and knowledge extraction. Due to its high computational cost, parallel solutions are necessary. Existing approaches either suffer from load imbalance, or communication synchronization overheads. In this paper we propose ScaleMine; a novel frequent subgraph mining system single large graph. ScaleMine introduces two-phase approach. The first phase approximate; it quickly identifies subgraphs that with probability, while...

10.5555/3014904.3014986 article EN IEEE International Conference on High Performance Computing, Data, and Analytics 2016-11-13

Many emerging applications, from domains such as healthcare and oil & gas, require several data processing systems for complex analytics. This demo paper showcases system, a framework that provides multi-platform task execution applications. It features three-layer abstraction new query optimization approach settings. We will demonstrate the strengths of system by using real-world scenarios three different namely, machine learning, cleaning, fusion.

10.1145/2882903.2899414 article EN Proceedings of the 2022 International Conference on Management of Data 2016-06-16

Inequality joins, which join relational tables on inequality conditions, are used in various applications. While there have been a wide range of optimization methods for joins database systems, from algorithms such as sort-merge and band join, to indices B + -tree, R * -tree Bitmap, received little attention queries containing usually very slow. In this paper, we introduce fast algorithms. We put columns be joined sorted arrays use permutation encode positions tuples one array w.r.t. the...

10.14778/2831360.2831362 article EN Proceedings of the VLDB Endowment 2015-09-01

This paper provides a detailed description of new Twitter-based benchmark dataset for Arabic Sentiment Analysis (ASAD), which is launched in competition3, sponsored by KAUST awarding 10000 USD, 5000 USD and 2000 to the first, second third place winners, respectively. Compared other publicly released datasets, ASAD large, high-quality annotated dataset(including 95K tweets), with three-class sentiment labels (positive, negative neutral). We presents details data collection process annotation...

10.48550/arxiv.2011.00578 preprint EN other-oa arXiv (Cornell University) 2020-01-01

This paper provides an overview of the Arabic Sentiment Analysis Challenge organized by King Abdullah University Science and Technology (KAUST). The task in this challenge is to develop machine learning models classify a given tweet into one three categories Positive, Negative, or Neutral. From our recently released ASAD dataset, we provide competitors with 55K tweets for training, 20K validation, based on which performance participating teams are ranked leaderboard,...

10.48550/arxiv.2109.14456 preprint EN other-oa arXiv (Cornell University) 2021-01-01

This is in response to recent feedback from some readers, which requires clarifications regarding our IEJ oin algorithm published [1]. The revolves around four points: (1) a typo illustrating example of the join process; (2) naming error for index used by improve bit array scan; (3) sort order algorithms; and (4) missing explanation on how duplicates are handled self algorithm.

10.14778/3099622.3099629 article EN Proceedings of the VLDB Endowment 2017-05-01
Coming Soon ...