Roee Shraga

ORCID: 0000-0001-8803-8481
Publications
Citations
Views
---
Saved
---
About
Contact & Profiles
Research Areas
  • Data Quality and Management
  • Semantic Web and Ontologies
  • Advanced Database Systems and Queries
  • Topic Modeling
  • Natural Language Processing Techniques
  • Data Management and Algorithms
  • Mobile Crowdsensing and Crowdsourcing
  • Business Process Modeling and Analysis
  • Data Mining Algorithms and Applications
  • Scientific Computing and Data Management
  • Anomaly Detection Techniques and Applications
  • Privacy-Preserving Technologies in Data
  • Web Data Mining and Analysis
  • Machine Learning and Algorithms
  • Service-Oriented Architecture and Web Services
  • Context-Aware Activity Recognition Systems
  • Sports Analytics and Performance
  • Personal Information Management and User Behavior
  • Information Retrieval and Search Behavior
  • Time Series Analysis and Forecasting
  • Image Retrieval and Classification Techniques
  • Advanced Text Analysis Techniques
  • Sentiment Analysis and Opinion Mining
  • Machine Learning and Data Classification
  • QR Code Applications and Technologies

Worcester Polytechnic Institute
2023-2024

Northeastern University
2022-2023

Universidad del Noreste
2022-2023

Technion – Israel Institute of Technology
2017-2021

Daegu Haany University
2019

Hasselt University
2019

Janssen (Belgium)
2019

Eindhoven University of Technology
2019

Hankuk University of Foreign Studies
2019

Humboldt-Universität zu Berlin
2019

Existing techniques for unionable table search define unionability using metadata (tables must have the same or similar schemas) column-based metrics (for example, values in a should be drawn from domain). In this work, we introduce use of semantic relationships between pairs columns to improve accuracy union search. Consequently, new notion that considers columns, together with semantics principled way. To do so, present two methods discover columns. The first uses an existing knowledge...

10.1145/3588689 article EN Proceedings of the ACM on Management of Data 2023-05-26

We have made tremendous strides in providing tools for data scientists to discover new tables useful their analyses. But despite these advances, the proper integration of discovered has been under-explored. An interesting semantics integration, called Full Disjunction, was proposed 1980's, but there little progress using it science integrate culled from lakes. provide ALITE, first proposal scalable that may join, union or related table search. empirically show ALITE can outperform previous...

10.14778/3574245.3574274 article EN Proceedings of the VLDB Endowment 2022-12-01

We address the web table retrieval task, aiming to retrieve and rank tables as whole answers a given information need. To this end, we formally define multimodal objects. then suggest neural ranking model, termed MTR, which makes novel use of Gated Multimodal Units (GMUs) learn joint-representation query different modalities. further enhance model with co-learning approach utilizes automatically learned query-independent query-dependent "helper'' labels. evaluate proposed solution using both...

10.1145/3397271.3401120 article EN 2020-07-25

Data integration is an important step in any data science pipeline where the objective to unify information available different datasets for comprehensive analysis. Full Disjunction, which associative extension of outer join operator, has been shown be effective operator integrating datasets. It fully preserves and combines information. Existing Disjunction algorithms only consider equi-join scenario tuples having same value on joining columns are integrated. This, however, does not...

10.48550/arxiv.2501.09211 preprint EN arXiv (Cornell University) 2025-01-15

Ordinal regression classifies an object to a class out of given set possible classes, where labels possess natural order. It is relevant wide array domains including risk assessment, sentiment analysis, image ranking, and recommender systems. Like common classification, the primary goal ordinal accuracy. Yet, in this context, severity prediction errors varies, e.g., Critical Risk more urgent than High significantly No risk. This leads modified objective ensuring that model's output as close...

10.1609/aaai.v39i18.34158 article EN Proceedings of the AAAI Conference on Artificial Intelligence 2025-04-11

Schema matching is a process that serves in integrating structured and semi-structured data. Being handy tool multiple contemporary business commerce applications, it has been investigated the fields of databases, AI, Semantic Web, data mining for many years. The core challenge still remains ability to create quality algorithmic matchers, automatic tools identifying correspondences among concepts ( e.g. , database attributes). In this work, we offer novel post processing step schema improves...

10.14778/3397230.3397237 article EN Proceedings of the VLDB Endowment 2020-05-01

Given a keyword query, the ad hoc table retrieval task aims at retrieving ranked list of top-k most relevant tables in given corpus. Previous works have primarily focused on designing table-centric lexical and semantic features, which could be utilized for learning-to-rank (LTR) tables. In this work, we make novel use intrinsic (passage-based) extrinsic (manifold-based) similarities enhanced retrieval. Using WikiTables benchmark, study merits utilizing such task. To end, combine both...

10.1145/3366423.3379995 article EN 2020-04-20

In multi-user environments in which data science and analysis is collaborative, multiple versions of the same datasets are generated. While managing storing has received some attention research literature, semantic nature such changes remained under-explored. this work, we introduce Explain-Da-V, a framework aiming to explain between two given dataset versions. Explain-Da-V generates explanations that use transformations changes. We further set measures evaluate validity, generalizability,...

10.14778/3583140.3583169 article EN Proceedings of the VLDB Endowment 2023-02-01

Entity resolution, a longstanding problem of data cleaning and integration, aims at identifying records that represent the same real-world entity. Existing approaches treat entity resolution as universal task, assuming existence single interpretation focusing only on finding matched records, separating corresponding from non-corresponding ones, with respect to this interpretation. However, in scenarios, where is part more general project, downstream applications may have varying...

10.1145/3588722 article EN Proceedings of the ACM on Management of Data 2023-05-26

Schema matching is at the heart of integrating structured and semi-structured data with applications in warehousing, analysis recommendations, Web table matching, etc. known as an uncertain process a common method to overcome this uncertainty introduces human expert ranked list possible schema matches choose from, top- <i xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">K</i> matching. In work we propose learning algorithm that utilizes innovative set...

10.1109/tkde.2019.2962124 article EN IEEE Transactions on Knowledge and Data Engineering 2019-12-27

In this work we explore relationships between human and algorithmic schema matchers. We provide a novel approach to similar matchers termed coordinated use it predict future matching choices. show throughout comprehensive analysis that are usually with intuitive algorithms, e.g., based on attribute name similarity, frequently do not assign lower confidence levels, which indicates over in their Finally, choices can be reasonably predicted using collaborative opinions coordination.

10.1145/3209900.3209905 article EN 2018-06-04

Structured product data in the form of attribute/value pairs is foundation many e-commerce applications such as faceted search, comparison, and recommendation. Product offers often only contain textual descriptions attributes titles or free text. Hence, extracting from an essential enabler for applications. In order to excel, state-of-the-art information extraction methods require large quantities task-specific training data. The also struggle with generalizing out-of-distribution attribute...

10.48550/arxiv.2306.14921 preprint EN other-oa arXiv (Cornell University) 2023-01-01

10.1109/icde60146.2024.00272 article EN 2022 IEEE 38th International Conference on Data Engineering (ICDE) 2024-05-13

We present InCognitoMatch, the first cognitive-aware crowdsourcing application for matching tasks. InCognitoMatch provides a handy tool to validate, annotate, and correct correspondences using crowd whilst accounting human biases. In addition, enables system administrators control context information visible workers analyze their performance accordingly. For workers, is an easy-to-use that may be accessed from multiple platforms. completing task are offered suggestions followup sessions...

10.1145/3318464.3384697 article EN 2020-05-29

Schema matching is a core task of any data integration process. Being investigated in the fields databases, AI, Semantic Web, and mining for many years, main challenge remains ability to generate quality matches among concepts (e.g., database attributes). In this work, we examine novel angle on behavior humans as matchers, studying match creation We analyze dynamics common evaluation measures (precision, recall, f-measure), with respect highlight need unbiased support analysis. Unbiased...

10.1145/3483423 article EN Journal of Data and Information Quality 2022-03-28

Schema matching is a task at the heart of integrating heterogeneous structured and semi-structured data with applications in warehousing, process matching, analysis recommendations, Web table etc. known to be an uncertain common method overcoming this uncertainty by introducing human expert ranked list possible schema matches from which may choose, as <i xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">top-K</i> matching. In work we propose learning...

10.1109/icdm.2018.00118 article EN 2021 IEEE International Conference on Data Mining (ICDM) 2018-11-01

Measuring key performance indicators, such as queue lengths and waiting times, using event logs serve for improvement of resource-driven business processes. However, existing techniques assume the availability complete life cycle information, including time a case was scheduled execution (aka arrival times). Yet, in practice, information may be missing large portion recorded cases. In this paper, we propose methodology to address life-cycle data by incorporating predicted processes analysis....

10.1109/icpm49681.2020.00019 article EN 2020-10-01

Matching is a task at the heart of any data integration process, aimed identifying correspondences among elements. problems were traditionally solved in semi-automatic manner, with being generated by matching algorithms and outcomes subsequently validated human experts. Human-in-the-loop has been recently challenged introduction big recent studies have analyzed obstacles to effective validation. In this work we characterize experts, those humans whose proposed can mostly be trusted valid. We...

10.1109/icde51399.2021.00111 article EN 2022 IEEE 38th International Conference on Data Engineering (ICDE) 2021-04-01

Virtual Knowledge Graphs (VKGs) constitute one of the most promising paradigms for integrating and accessing legacy data sources. A critical bottleneck in integration process involves definition, validation, maintenance mapping assertions that link sources to a domain ontology. To support management mappings throughout their entire lifecycle, we identify comprehensive catalog sophisticated patterns emerge when linking databases ontologies. do so, build on well-established methodologies...

10.1016/j.datak.2023.102157 article EN cc-by Data & Knowledge Engineering 2023-03-04

Entity matching, a core data integration problem, is the task of deciding whether two tuples refer to same real-world entity. Recent advances in deep learning methods, using pre-trained language models, were proposed for resolving entity matching. Although demonstrating unprecedented results, these solutions suffer from major drawback as they require large amounts labeled training, and, such, are inadequate be applied low resource matching problems. To overcome challenge obtaining sufficient...

10.1145/3626711 article EN Proceedings of the ACM on Management of Data 2023-12-08

Discovery plays a key role in data-driven analysis of business processes. The vast majority contemporary discovery algorithms aims at the identification control-flow constructs. increase data richness, however, enables that incorporates context process execution beyond perspective. A "control-flow first" approach, where serves for refinement and annotation, is limited fails to detect fundamental changes depend on data. In this work, we thus propose novel approach combining perspectives under...

10.1109/icpm.2019.00016 article EN 2019-06-01
Coming Soon ...