Lukasz Golab

ORCID: 0000-0003-0632-7496
Publications
Citations
Views
---
Saved
---
About
Contact & Profiles
Research Areas
  • Advanced Database Systems and Queries
  • Data Quality and Management
  • Data Management and Algorithms
  • Data Mining Algorithms and Applications
  • Semantic Web and Ontologies
  • Data Stream Mining Techniques
  • Distributed systems and fault tolerance
  • Blockchain Technology Applications and Security
  • Topic Modeling
  • Cloud Computing and Resource Management
  • Privacy-Preserving Technologies in Data
  • Natural Language Processing Techniques
  • Advanced Data Storage Technologies
  • Smart Grid Energy Management
  • Graph Theory and Algorithms
  • Electric Vehicles and Infrastructure
  • Cryptography and Data Security
  • Hate Speech and Cyberbullying Detection
  • Misinformation and Its Impacts
  • Explainable Artificial Intelligence (XAI)
  • Complex Network Analysis Techniques
  • Scientific Computing and Data Management
  • Sentiment Analysis and Opinion Mining
  • Advanced Graph Neural Networks
  • Social Media and Politics

University of Waterloo
2015-2024

University of Cambridge
2023

Linköping University
2023

National Institute of Informatics
2023

Association for Computing Machinery
2023

Chinese University of Hong Kong
2023

Oracle (United States)
2023

Aalborg University
2019

University of Windsor
2018

AT&T (United States)
2008-2012

Traditional databases store sets of relatively static records with no pre-defined notion time, unless timestamp attributes are explicitly added. While this model adequately represents commercial catalogues or repositories personal information, many current and emerging applications require support for on-line analysis rapidly changing data streams. Limitations traditional DBMSs in supporting streaming have been recognized, prompting research to augment existing technologies build new systems...

10.1145/776985.776986 article EN ACM SIGMOD Record 2003-06-01

Blockchain technologies are expected to make a significant impact on variety of industries. However, one issue holding them back is their limited transaction throughput, especially compared established solutions such as distributed database systems. In this paper, we re-architect modern permissioned blockchain system, Hyperledger Fabric, increase throughput from 3,000 20,000 transactions per second. We focus performance bottlenecks beyond the consensus mechanism, and propose architectural...

10.1109/bloc.2019.8751452 article EN 2021 IEEE International Conference on Blockchain and Cryptocurrency (ICBC) 2019-05-01

Summary Blockchain technologies are expected to make a significant impact on variety of industries. However, one issue holding them back is their limited transaction throughput, especially compared established solutions such as distributed database systems. In this paper, we rearchitect modern permissioned blockchain system, Hyperledger Fabric, increase throughput from 3000 20 000 transactions per second. We focus performance bottlenecks beyond the consensus mechanism, and propose...

10.1002/nem.2099 article EN International Journal of Network Management 2020-02-11

Conditional functional dependencies (CFDs) have recently been proposed as a useful integrity constraint to summarize data semantics and identify inconsistencies. A CFD augments dependency (FD) with pattern tableau that defines the context (i.e., subset of tuples) in which underlying FD holds. While many aspects CFDs studied, including static analysis detecting repairing violations, there has not prior work on generating tableaux, is critical realize full potential CFDs. This paper first...

10.14778/1453856.1453900 article EN Proceedings of the VLDB Endowment 2008-08-01

Internet traffic patterns are believed to obey the power law, implying that most of bandwidth is consumed by a small set heavy users. Hence, queries return list frequently occurring items important in analysis real-time packet streams. While several results exist for computing frequent item using limited memory infinite stream model, this paper we consider limited-memory sliding window model. This model maintains last $N$ have arrived at any given time and forbids storage entire memory. We...

10.1145/948205.948227 article EN 2003-01-01

Violations of functional dependencies (FDs) are common in practice, often arising the context data integration or Web extraction. Resolving these violations is known to be challenging for a variety reasons, one them being exponential number possible "repairs". Previous work has tackled this problem either by producing single repair that (nearly) optimal with respect some metric, computing consistent answers selected classes queries without explicitly generating repairs. In paper, we propose...

10.14778/1920841.1920870 article EN Proceedings of the VLDB Endowment 2010-09-01

We describe DataDepot, a tool for generating warehouses from streaming data feeds, such as network-traffic traces, router alerts, financial tickers, transaction logs, and so on. DataDepot is warehouse designed to automate the ingestion of wide variety sources maintain complex materialized views over these sources. As warehouse, similar Data Stream Management Systems (DSMSs) with its emphasis on temporal data, best-effort consistency, real-time response. However, store tens hundreds terabytes...

10.1145/1559845.1559934 article EN 2009-06-29

We study sequential dependencies that express the semantics of data with ordered domains and help identify quality problems such data. Given an interval g , we write X → Y to denote difference between -attribute values any two consecutive records, when sorted on must be in g. For example, time (0,∞) sequence_number indicates sequence numbers are strictly increasing over time, whereas [4, 5] means "gaps" 4 5. Sequential relationships attributes, missing (gaps too large), extraneous small)...

10.14778/1687627.1687693 article EN Proceedings of the VLDB Endowment 2009-08-01

Functional dependencies (FDs) specify the intended data semantics while violations of FDs indicate deviation from these semantics. In this paper, we study a cleaning problem in which may not be completely correct, e.g., due to evolution or incomplete knowledge We argue that notion relative trust is crucial aspect problem: if are outdated, should modify them fit data, but suspect there problems with FDs. practice, it usually unclear how much versus To address problem, propose an algorithm for...

10.1109/icde.2013.6544854 article EN 2022 IEEE 38th International Conference on Data Engineering (ICDE) 2013-04-01

In this paper, we solve the following data summarization problem: given a multi-dimensional set augmented with binary attribute, how can construct an interpretable and informative summary of factors affecting attribute in terms combinations values dimension attributes? We refer to such summaries as explanation tables. show hardness constructing optimally-informative tables from data, propose effective efficient heuristics. The proposed heuristics are based on sampling include optimizations...

10.14778/2735461.2735467 article EN Proceedings of the VLDB Endowment 2014-09-01

Smart electricity meters have been replacing conventional worldwide, enabling automated collection of fine-grained (e.g., every 15 minutes or hourly) consumption data. A variety smart meter analytics algorithms and applications proposed, mainly in the grid literature. However, focus has on what can be done with data rather than how to do it efficiently. In this article, we examine from a software performance perspective. First, design benchmark that includes common tasks. These include...

10.1145/3004295 article EN ACM Transactions on Database Systems 2016-11-21

We present a novel iterative, edit-based approach to unsupervised sentence simplification. Our model is guided by scoring function involving fluency, simplicity, and meaning preservation. Then, we iteratively perform word phrase-level edits on the complex sentence. Compared with previous approaches, our does not require parallel training set, but more controllable interpretable. Experiments Newsela WikiLarge datasets show that nearly as effective state-of-the-art supervised approaches.

10.18653/v1/2020.acl-main.707 article EN cc-by 2020-01-01

This paper discusses updating a data warehouse that collects near-real-time streams from variety of external sources. The objective is to keep all the tables and materialized views up-to-date as new arrive over time. We define notion staleness, formalize problem scheduling updates in way minimizes average present algorithms designed handle complex environment real-time stream warehouse. A novel feature our framework it considers effect an update on staleness underlying rather than any...

10.1109/icde.2009.202 article EN Proceedings - International Conference on Data Engineering 2009-03-01

The complexity of the Internet has rapidly increased, making it more important and challenging to design scalable network monitoring tools. Network typically requires rolling data analysis, i.e., continuously incrementally updating (rolling-over) various reports statistics over highvolume streams. In this paper, we describe DBStream, which is an SQL-based system that explicitly supports incremental queries for analysis. We also present a performance comparison DBStream with parallel...

10.1109/bigdata.2014.7004227 article EN 2021 IEEE International Conference on Big Data (Big Data) 2014-10-01

Integrity constraints (ICs) are useful for query optimization and expressing enforcing application semantics. However, formulating manually requires domain expertise, is prone to human errors, may be excessively time consuming, especially on large datasets. Hence, proposals automatic discovery have been made some classes of ICs, such as functional dependencies (FDs), recently, order (ODs). ODs properly subsume FDs, they can additionally express business rules involving order; e.g., an...

10.14778/3067421.3067422 article EN Proceedings of the VLDB Endowment 2017-03-01

A defining characteristic of continuous queries over on-line data streams, possibly bounded by sliding windows, is the potentially infinite and time-evolving nature their inputs outputs. New items continually arrive on input streams new results are produced. Additionally, expire falling out range windows when they cease to satisfy query. This impacts query processing in two ways. First, stream systems allow tables be queried alongside but terms semantics, it not clear how updates different...

10.1145/1066157.1066232 article EN 2005-06-14

With the widespread use of shared-nothing clusters servers, there has been a proliferation distributed object stores that offer high availability, reliability and enhanced performance for MapReduce-style workloads. However, data-intensive scientific workflows join-intensive queries cannot always be evaluated efficiently using processing without extensive data migrations, which cause network congestion reduced query throughput. In this paper, we study problem computing placement strategies...

10.1145/2618243.2618258 article EN 2014-06-24

Performance and scalability are major concerns for blockchains: permissionless systems typically limited by slow proof of X consensus algorithms sequential postorder transaction execution on every node the network. By introducing a small amount trust in their participants, permissioned blockchain such as Hyperledger Fabric can benefit from more efficient make use parallel pre-order subset network nodes. Fabric, particular, has been shown to handle tens thousands transactions per second....

10.1109/icbc48266.2020.9169478 article EN 2021 IEEE International Conference on Blockchain and Cryptocurrency (ICBC) 2020-05-01

We present the Multi-Modal Discussion Transformer (mDT), a novel method for detecting hate speech on online social networks such as Reddit discussions. In contrast to traditional comment-only methods, our approach labelling comment involves holistic analysis of text and images grounded in discussion context. This is done by leveraging graph transformers capture contextual relationships surrounding grounding interwoven fusion layers that combine image embeddings instead processing modalities...

10.1609/aaai.v38i20.30213 article EN Proceedings of the AAAI Conference on Artificial Intelligence 2024-03-24

We discuss update scheduling in streaming data warehouses, which combine the features of traditional warehouses and stream systems. In our setting, external sources push append-only streams into warehouse with a wide range interarrival times. While are typically refreshed during downtimes, updated as new arrive. model problem problem, where jobs correspond to processes that load tables, whose objective is minimize staleness over time (at t, if table has been information up some earlier r,...

10.1109/tkde.2011.45 article EN IEEE Transactions on Knowledge and Data Engineering 2011-02-11
Coming Soon ...