Nick Koudas

ORCID: 0000-0001-5648-0638
Publications
Citations
Views
---
Saved
---
About
Contact & Profiles
Research Areas
  • Data Management and Algorithms
  • Advanced Database Systems and Queries
  • Data Quality and Management
  • Data Mining Algorithms and Applications
  • Algorithms and Data Compression
  • Semantic Web and Ontologies
  • Advanced Image and Video Retrieval Techniques
  • Data Stream Mining Techniques
  • Web Data Mining and Analysis
  • Complex Network Analysis Techniques
  • Topic Modeling
  • Peer-to-Peer Network Technologies
  • Privacy-Preserving Technologies in Data
  • Video Analysis and Summarization
  • Time Series Analysis and Forecasting
  • Video Surveillance and Tracking Methods
  • Human Mobility and Location-Based Analysis
  • Machine Learning and Algorithms
  • Anomaly Detection Techniques and Applications
  • Distributed systems and fault tolerance
  • Mobile Crowdsensing and Crowdsourcing
  • Machine Learning and Data Classification
  • Opinion Dynamics and Social Influence
  • Multimodal Machine Learning Applications
  • Human Pose and Action Recognition

University of Toronto
2015-2025

Hong Kong Baptist University
2021

New Jersey Institute of Technology
2021

Athens University of Economics and Business
2021

The University of Texas at Arlington
2019

Center for Information Technology
2008

National University of Singapore
2008

Information Technology University
2007

Institute of Electrical and Electronics Engineers
2006

AT&T (United States)
2000-2005

XML employs a tree-structured data model, and, naturally, queries specify patterns of selection predicates on multiple elements related by tree structure. Finding all occurrences such twig pattern in an database is core operation for query processing. Prior work has typically decomposed the into binary structural (parent-child and ancestor-descendant) relationships, matching achieved by: (i) using join algorithms to match relationships against database, (ii) stitching together these basic...

10.1145/564691.564727 article EN 2002-06-03

We present TwitterMonitor, a system that performs trend detection over the Twitter stream. The identifies emerging topics (i.e. 'trends') on in real time and provides meaningful analytics synthesize an accurate description of each topic. Users interact with by ordering identified trends using different criteria submitting their own for trend.

10.1145/1807167.1807306 article EN 2010-06-06

XML queries typically specify patterns of selection predicates on multiple elements that have some specified tree structured relationships. The primitive relationships are parent-child and ancestor-descendant, finding all occurrences these in an database is a core operation for query processing. We develop two families structural join algorithms this task: tree-merge stack-tree. natural extension traditional merge joins the multi-predicate joins, while stack-tree no counterpart relational...

10.1109/icde.2002.994704 article EN 2003-06-25

Privacy is a serious concern when microdata need to be released for ad hoc analyses. The privacy goals of existing protection approaches (e.g., k-anonymity and l-diversity) are suitable only categorical sensitive attributes. Since applying them directly numerical attributes salary) may result in undesirable information leakage, we propose better capture the Complementing desire support aggregate analyses over microdata. Existing generalization-based anonymization cannot answer queries with...

10.1109/icde.2007.367857 article EN 2007-04-01

Many location-based applications require constant monitoring of k-nearest neighbor (k-NN) queries over moving objects within a geographic area. Existing approaches to this problem have focused on predictive queries, and relied the assumption that trajectories are fully predictable at query processing time. We relax assumption, propose two efficient scalable algorithms using grid indices. One is based indexing objects, other queries. For each approach, cost model developed, detailed analysis...

10.1109/icde.2005.92 article EN 2005-04-19

This tutorial provides a comprehensive and cohesive overview of the key research results in area record linkage methodologies algorithms for identifying approximate duplicate records, available tools this purpose. It encompasses techniques introduced several communities including databases, information retrieval, statistics machine learning. aims to identify similarities differences across as well their merits limitations.

10.1145/1142473.1142599 article EN 2006-06-27

Histograms have been used widely to capture data distribution, represent the by a small number of step functions. Dynamic programming algorithms which provide optimal construction these histograms exist, albeit running in quadratic time and linear space. In this paper we 1 + ε approximation histograms, polylogarithmic

10.1145/380752.380841 article EN 2001-07-06

Users often need to optimize the selection of objects by appropriately weighting importance multiple object attributes. Such optimization problems appear in operations' research and applied mathematics as well everyday life; e.g., a buyer may select home weighted function number attributes like its distance from office, price, area, etc.

10.1145/375663.375690 article EN 2001-05-01

Large-scale data analysis lies in the core of modern enterprises and scientific research. With emergence cloud computing, use an analytical query processing infrastructure (e.g., Amazon EC2) can be directly mapped to monetary value. MapReduce has been a popular framework context designed serve long running queries (jobs) which processed batch mode. Taking into account that different jobs often perform similar work, there are many opportunities for sharing. In principle, sharing work reduces...

10.14778/1920841.1920906 article EN Proceedings of the VLDB Endowment 2010-09-01

Histograms are a concise and flexible way to construct summary structures for large data sets. They have attracted lot of attention in database research due their utility many areas, including query optimization, approximate answering. also basic tool visualization analysis.In this paper, we present formal study dynamic multidimensional histogram over continuous streams. At the heart our proposal is use structure (vastly different from histogram) maintaining succinct approximation...

10.1145/564691.564741 article EN 2002-06-03

We investigate the use of biased sampling according to density data set speed up operation general mining tasks, such as clustering and outlier detection in large multidimensional sets. In density-biased sampling, probability that a given point will be included sample depends on local set. propose technique for can factor user requirements properties interest tuned specific tasks. This allows great flexibility improved accuracy results over simple random sampling. describe our approach...

10.1109/tkde.2003.1232271 article EN IEEE Transactions on Knowledge and Data Engineering 2003-09-01

XML employs a tree-structured data model, and, naturally, queries specify patterns of selection predicates on multiple elements related by tree structure. Finding all occurrences such twig pattern in an database is core operation for query processing. Prior work has typically decomposed the into binary structural (parent-child and ancestor-descendant) relationships, matching achieved by: (i) using join algorithms to match relationships against database, (ii) stitching together these basic...

10.1145/564724.564727 article EN 2002-01-01

Query monitoring refers to the problem of observing and predicting various parameters related execution a query in database system. In addition being useful tool for users administrators, it can also serve as an information collection service resource allocation adaptive processing techniques. this article, we present system from ground up, describing new techniques monitoring, their implementation inside real system, novel interface that presents observed predicted accessible manner. To...

10.1145/1508857.1508858 article EN ACM Transactions on Database Systems 2009-04-01

Recent years have witnessed an unprecedented proliferation of social media. People around the globe author, every day, millions blog posts, micro-blog network status updates, etc. This rich stream information can be used to identify, on ongoing basis, emerging stories, and events that capture popular attention. Stories identified via groups tightly-coupled real-world entities, namely people, locations, products, etc., are involved in story. The sheer scale, rapid evolution data necessitate...

10.14778/2168651.2168658 article EN Proceedings of the VLDB Endowment 2012-02-01

Recent works have shown the benefits of keyword proximity search in querying XML documents addition to text documents. For example, given query keywords over Shakespeare's plays XML, user might be interested knowing how cooccur. In this paper, we focus on trees and define keyword, queries return (possibly heterogeneous) set minimum connecting (MCTs) matches individual query. We consider efficiently executing labeled (XML) various settings: 1) when database has been preprocessed 2) no indices...

10.1109/tkde.2006.61 article EN IEEE Transactions on Knowledge and Data Engineering 2006-04-01

The problem of obtaining efficient answers to top-k queries has attracted a lot research attention. Several algorithms and numerous variants the retrieval have been introduced in recent years. general form this requests k highest ranked values from relation, using monotone combining functions on (a subset of) its attributes.In paper we explore space performance tradeoffs related problem. In particular study answering views. A view context is materialized version previously posed query,...

10.5555/1182635.1164167 article EN 2006-09-01

Selectivity estimation - the problem of estimating result size queries is a fundamental in databases. Accurate query selectivity involving multiple correlated attributes especially challenging. Poor cardinality estimates could selection bad plans by optimizer. Recently, deep learning has been applied to this with promising results. However, many proposed approaches often struggle provide accurate results for multi attribute large number predicates and low selectivity. In paper, we propose...

10.1145/3318464.3389741 article EN 2020-05-29

In this paper we address the issue of using local embeddings for data visualization in two and three dimensions, classification. We advocate their use on basis that they provide an efficient mapping procedure from original dimension data, to a lower intrinsic dimension. depict how can accurately capture user's perception similarity high-dimensional purposes. Moreover, exploit low-dimensional provided by these embeddings, develop new classification techniques, show experimentally accuracy is...

10.1145/775047.775143 article EN 2002-07-23

The integration of data produced and collected across autonomous, heterogeneous web services is an increasingly important challenging problem. Due to the lack global identifiers, same entity (e.g., a product) might have different textual representations databases. Textual also often noisy because transcription errors, incomplete information, standard formats. A fundamental task during matching strings that refer entity. In this paper, we adopt widely used established cosine similarity metric...

10.1145/775152.775166 article EN 2003-01-01

XML is widely recognized as the data interchange standard for tomorrow, because of its ability to represent from a wide variety sources. Hence, likely be format through which multiple sources integrated.In this paper we study problem integrating correlations realized join operations. A challenging aspect operation document structure. Two documents might convey approximately or exactly same information but may quite different in Consequently approximate match structure, addition to, content...

10.1145/564691.564725 article EN 2002-06-03
Coming Soon ...