NFDI4DS | UHH-SEMS - Publication Details

Lukasz Golab

ORCID: 0000-0003-0632-7496

Publications

Citations

Views

---

Saved

---

About

Contact & Profiles

A5049437648

Research Areas

Advanced Database Systems and Queries
Data Quality and Management
Data Management and Algorithms
Data Mining Algorithms and Applications
Semantic Web and Ontologies
Data Stream Mining Techniques
Distributed systems and fault tolerance
Blockchain Technology Applications and Security
Topic Modeling
Cloud Computing and Resource Management
Privacy-Preserving Technologies in Data
Natural Language Processing Techniques
Advanced Data Storage Technologies
Smart Grid Energy Management
Graph Theory and Algorithms
Electric Vehicles and Infrastructure
Cryptography and Data Security
Hate Speech and Cyberbullying Detection
Misinformation and Its Impacts
Explainable Artificial Intelligence (XAI)
Complex Network Analysis Techniques
Scientific Computing and Data Management
Sentiment Analysis and Opinion Mining
Advanced Graph Neural Networks
Social Media and Politics

University of Waterloo
2015-2024

University of Cambridge
2023

Linköping University
2023

National Institute of Informatics
2023

Association for Computing Machinery
2023

Chinese University of Hong Kong
2023

Oracle (United States)
2023

Aalborg University
2019

University of Windsor
2018

AT&T (United States)
2008-2012

Issues in data stream management

OPENALEX - Publications

Lukasz Golab M. TAMER ÖZSU

Traditional databases store sets of relatively static records with no pre-defined notion time, unless timestamp attributes are explicitly added. While this model adequately represents commercial catalogues or repositories personal information, many current and emerging applications require support for on-line analysis rapidly changing data streams. Limitations traditional DBMSs in supporting streaming have been recognized, prompting research to augment existing technologies build new systems...

10.1145/776985.776986 article EN ACM SIGMOD Record 2003-06-01

Profiling relational data: a survey

OPENALEX - Publications

Ziawasch Abedjan Lukasz Golab Felix Naumann

10.1007/s00778-015-0389-y article EN The VLDB Journal 2015-06-01

FastFabric: Scaling Hyperledger Fabric to 20,000 Transactions per Second

OPENALEX - Publications

Christian Gorenflo Stephen Lee Lukasz Golab Srinivasan Keshav

Blockchain technologies are expected to make a significant impact on variety of industries. However, one issue holding them back is their limited transaction throughput, especially compared established solutions such as distributed database systems. In this paper, we re-architect modern permissioned blockchain system, Hyperledger Fabric, increase throughput from 3,000 20,000 transactions per second. We focus performance bottlenecks beyond the consensus mechanism, and propose architectural...

10.1109/bloc.2019.8751452 article EN 2021 IEEE International Conference on Blockchain and Cryptocurrency (ICBC) 2019-05-01

FastFabric: Scaling hyperledger fabric to 20 000 transactions per second

OPENALEX - Publications

Christian Gorenflo Stephen Lee Lukasz Golab Srinivasan Keshav

Summary Blockchain technologies are expected to make a significant impact on variety of industries. However, one issue holding them back is their limited transaction throughput, especially compared established solutions such as distributed database systems. In this paper, we rearchitect modern permissioned blockchain system, Hyperledger Fabric, increase throughput from 3000 20 000 transactions per second. We focus performance bottlenecks beyond the consensus mechanism, and propose...

10.1002/nem.2099 article EN International Journal of Network Management 2020-02-11

On generating near-optimal tableaux for conditional functional dependencies

OPENALEX - Publications

Lukasz Golab Howard Karloff Flip Korn Divesh Srivastava Bei Yu

Conditional functional dependencies (CFDs) have recently been proposed as a useful integrity constraint to summarize data semantics and identify inconsistencies. A CFD augments dependency (FD) with pattern tableau that defines the context (i.e., subset of tuples) in which underlying FD holds. While many aspects CFDs studied, including static analysis detecting repairing violations, there has not prior work on generating tableaux, is critical realize full potential CFDs. This paper first...

10.14778/1453856.1453900 article EN Proceedings of the VLDB Endowment 2008-08-01

Identifying frequent items in sliding windows over on-line packet streams

OPENALEX - Publications

Lukasz Golab David DeHaan Erik D. Demaine Alejandro López‐Ortiz J. Ian Munro

Internet traffic patterns are believed to obey the power law, implying that most of bandwidth is consumed by a small set heavy users. Hence, queries return list frequently occurring items important in analysis real-time packet streams. While several results exist for computing frequent item using limited memory infinite stream model, this paper we consider limited-memory sliding window model. This model maintains last $N$ have arrived at any given time and forbids storage entire memory. We...

10.1145/948205.948227 article EN 2003-01-01

Sampling the repairs of functional dependency violations under hard constraints

OPENALEX - Publications

George Beskales Ihab F. Ilyas Lukasz Golab

Violations of functional dependencies (FDs) are common in practice, often arising the context data integration or Web extraction. Resolving these violations is known to be challenging for a variety reasons, one them being exponential number possible "repairs". Previous work has tackled this problem either by producing single repair that (nearly) optimal with respect some metric, computing consistent answers selected classes queries without explicitly generating repairs. In paper, we propose...

10.14778/1920841.1920870 article EN Proceedings of the VLDB Endowment 2010-09-01

Stream warehousing with DataDepot

OPENALEX - Publications

Lukasz Golab Theodore Johnson Joseph Seidel Vladislav Shkapenyuk

We describe DataDepot, a tool for generating warehouses from streaming data feeds, such as network-traffic traces, router alerts, financial tickers, transaction logs, and so on. DataDepot is warehouse designed to automate the ingestion of wide variety sources maintain complex materialized views over these sources. As warehouse, similar Data Stream Management Systems (DSMSs) with its emphasis on temporal data, best-effort consistency, real-time response. However, store tens hundreds terabytes...

10.1145/1559845.1559934 article EN 2009-06-29

Sequential dependencies

OPENALEX - Publications

Lukasz Golab Howard Karloff Flip Korn Avishek Saha Divesh Srivastava

We study sequential dependencies that express the semantics of data with ordered domains and help identify quality problems such data. Given an interval g , we write X → Y to denote difference between -attribute values any two consecutive records, when sorted on must be in g. For example, time (0,∞) sequence_number indicates sequence numbers are strictly increasing over time, whereas [4, 5] means "gaps" 4 5. Sequential relationships attributes, missing (gaps too large), extraneous small)...

10.14778/1687627.1687693 article EN Proceedings of the VLDB Endowment 2009-08-01

On the relative trust between inconsistent data and inaccurate constraints

OPENALEX - Publications

George Beskales Ihab F. Ilyas Lukasz Golab Artur Galiullin

Functional dependencies (FDs) specify the intended data semantics while violations of FDs indicate deviation from these semantics. In this paper, we study a cleaning problem in which may not be completely correct, e.g., due to evolution or incomplete knowledge We argue that notion relative trust is crucial aspect problem: if are outdated, should modify them fit data, but suspect there problems with FDs. practice, it usually unclear how much versus To address problem, propose an algorithm for...

10.1109/icde.2013.6544854 article EN 2022 IEEE 38th International Conference on Data Engineering (ICDE) 2013-04-01

Interpretable and informative explanations of outcomes

OPENALEX - Publications

Kareem El Gebaly Parag Agrawal Lukasz Golab Flip Korn Divesh Srivastava

In this paper, we solve the following data summarization problem: given a multi-dimensional set augmented with binary attribute, how can construct an interpretable and informative summary of factors affecting attribute in terms combinations values dimension attributes? We refer to such summaries as explanation tables. show hardness constructing optimally-informative tables from data, propose effective efficient heuristics. The proposed heuristics are based on sampling include optimizations...

10.14778/2735461.2735467 article EN Proceedings of the VLDB Endowment 2014-09-01

Smart Meter Data Analytics

OPENALEX - Publications

Xiufeng Liu Lukasz Golab Wojciech Golab Ihab F. Ilyas Shichao Jin

Smart electricity meters have been replacing conventional worldwide, enabling automated collection of fine-grained (e.g., every 15 minutes or hourly) consumption data. A variety smart meter analytics algorithms and applications proposed, mainly in the grid literature. However, focus has on what can be done with data rather than how to do it efficiently. In this article, we examine from a software performance perspective. First, design benchmark that includes common tasks. These include...

10.1145/3004295 article EN ACM Transactions on Database Systems 2016-11-21

Iterative Edit-Based Unsupervised Sentence Simplification

OPENALEX - Publications

Dhruv Kumar Lili Mou Lukasz Golab Olga Vechtomova

We present a novel iterative, edit-based approach to unsupervised sentence simplification. Our model is guided by scoring function involving fluency, simplicity, and meaning preservation. Then, we iteratively perform word phrase-level edits on the complex sentence. Compared with previous approaches, our does not require parallel training set, but more controllable interpretable. Experiments Newsela WikiLarge datasets show that nearly as effective state-of-the-art supervised approaches.

10.18653/v1/2020.acl-main.707 article EN cc-by 2020-01-01

Scheduling Updates in a Real-Time Stream Warehouse

OPENALEX - Publications

Lukasz Golab Theodore Johnson Vladislav Shkapenyuk

This paper discusses updating a data warehouse that collects near-real-time streams from variety of external sources. The objective is to keep all the tables and materialized views up-to-date as new arrive over time. We define notion staleness, formalize problem scheduling updates in way minimizes average present algorithms designed handle complex environment real-time stream warehouse. A novel feature our framework it considers effect an update on staleness underlying rather than any...

10.1109/icde.2009.202 article EN Proceedings - International Conference on Data Engineering 2009-03-01

Large-scale network traffic monitoring with DBStream, a system for rolling big data analysis

OPENALEX - Publications

Arian Bär Alessandro Finamore Pedro Casas Lukasz Golab Marco Mellia

The complexity of the Internet has rapidly increased, making it more important and challenging to design scalable network monitoring tools. Network typically requires rolling data analysis, i.e., continuously incrementally updating (rolling-over) various reports statistics over highvolume streams. In this paper, we describe DBStream, which is an SQL-based system that explicitly supports incremental queries for analysis. We also present a performance comparison DBStream with parallel...

10.1109/bigdata.2014.7004227 article EN 2021 IEEE International Conference on Big Data (Big Data) 2014-10-01

Effective and complete discovery of order dependencies via set-based axiomatization

OPENALEX - Publications

Jaroslaw Szlichta Parke Godfrey Lukasz Golab Mehdi Kargar Divesh Srivastava

Integrity constraints (ICs) are useful for query optimization and expressing enforcing application semantics. However, formulating manually requires domain expertise, is prone to human errors, may be excessively time consuming, especially on large datasets. Hence, proposals automatic discovery have been made some classes of ICs, such as functional dependencies (FDs), recently, order (ODs). ODs properly subsume FDs, they can additionally express business rules involving order; e.g., an...

10.14778/3067421.3067422 article EN Proceedings of the VLDB Endowment 2017-03-01

Update-pattern-aware modeling and processing of continuous queries

OPENALEX - Publications

Lukasz Golab M. TAMER ÖZSU

A defining characteristic of continuous queries over on-line data streams, possibly bounded by sliding windows, is the potentially infinite and time-evolving nature their inputs outputs. New items continually arrive on input streams new results are produced. Additionally, expire falling out range windows when they cease to satisfy query. This impacts query processing in two ways. First, stream systems allow tables be queried alongside but terms semantics, it not clear how updates different...

10.1145/1066157.1066232 article EN 2005-06-14

Sampling from repairs of conditional functional dependency violations

OPENALEX - Publications

George Beskales Ihab F. Ilyas Lukasz Golab Artur Galiullin

10.1007/s00778-013-0316-z article EN The VLDB Journal 2013-04-26

Distributed data placement to minimize communication costs via graph partitioning

OPENALEX - Publications

Lukasz Golab Marios Hadjieleftheriou Howard Karloff Barna Saha

With the widespread use of shared-nothing clusters servers, there has been a proliferation distributed object stores that offer high availability, reliability and enhanced performance for MapReduce-style workloads. However, data-intensive scientific workflows join-intensive queries cannot always be evaluated efficiently using processing without extensive data migrations, which cause network congestion reduced query throughput. In this paper, we study problem computing placement strategies...

10.1145/2618243.2618258 article EN 2014-06-24

Compact group discovery in attributed graphs and social networks

OPENALEX - Publications

Abeer Khan Lukasz Golab Mehdi Kargar Jaroslaw Szlichta Morteza Zihayat

10.1016/j.ipm.2019.102054 article EN Information Processing & Management 2019-06-22

XOX Fabric: A hybrid approach to blockchain transaction execution

OPENALEX - Publications

Christian Gorenflo Lukasz Golab Srinivasan Keshav

Performance and scalability are major concerns for blockchains: permissionless systems typically limited by slow proof of X consensus algorithms sequential postorder transaction execution on every node the network. By introducing a small amount trust in their participants, permissioned blockchain such as Hyperledger Fabric can benefit from more efficient make use parallel pre-order subset network nodes. Fabric, particular, has been shown to handle tens thousands transactions per second....

10.1109/icbc48266.2020.9169478 article EN 2021 IEEE International Conference on Blockchain and Cryptocurrency (ICBC) 2020-05-01

Multi-Modal Discussion Transformer: Integrating Text, Images and Graph Transformers to Detect Hate Speech on Social Media

OPENALEX - Publications

Liam Hebert Gaurav Sahu Yuxuan Guo Nanda Kishore Sreenivas Lukasz Golab and 1 more

We present the Multi-Modal Discussion Transformer (mDT), a novel method for detecting hate speech on online social networks such as Reddit discussions. In contrast to traditional comment-only methods, our approach labelling comment involves holistic analysis of text and images grounded in discussion context. This is done by leveraging graph transformers capture contextual relationships surrounding grounding interwoven fusion layers that combine image embeddings instead processing modalities...

10.1609/aaai.v38i20.30213 article EN Proceedings of the AAAI Conference on Artificial Intelligence 2024-03-24

Scalable Scheduling of Updates in Streaming Data Warehouses

OPENALEX - Publications

Lukasz Golab Theodore Johnson Vladislav Shkapenyuk

We discuss update scheduling in streaming data warehouses, which combine the features of traditional warehouses and stream systems. In our setting, external sources push append-only streams into warehouse with a wide range interarrival times. While are typically refreshed during downtimes, updated as new arrive. model problem problem, where jobs correspond to processes that load tables, whose objective is minimize staleness over time (at t, if table has been information up some earlier r,...

10.1109/tkde.2011.45 article EN IEEE Transactions on Knowledge and Data Engineering 2011-02-11

Coming Soon ...