Aristides Gionis

ORCID: 0000-0002-5211-112X
Publications
Citations
Views
---
Saved
---
About
Contact & Profiles
Research Areas
  • Complex Network Analysis Techniques
  • Data Management and Algorithms
  • Opinion Dynamics and Social Influence
  • Advanced Graph Neural Networks
  • Social Media and Politics
  • Data Mining Algorithms and Applications
  • Algorithms and Data Compression
  • Advanced Database Systems and Queries
  • Mobile Crowdsensing and Crowdsourcing
  • Web Data Mining and Analysis
  • Peer-to-Peer Network Technologies
  • Misinformation and Its Impacts
  • Graph Theory and Algorithms
  • Complexity and Algorithms in Graphs
  • Caching and Content Delivery
  • Rough Sets and Fuzzy Logic
  • Advanced Clustering Algorithms Research
  • Privacy-Preserving Technologies in Data
  • Optimization and Search Problems
  • Recommender Systems and Techniques
  • Advanced Image and Video Retrieval Techniques
  • Spam and Phishing Detection
  • Human Mobility and Location-Based Analysis
  • Advanced Graph Theory Research
  • Machine Learning and Algorithms

KTH Royal Institute of Technology
2019-2025

Aalto University
2013-2021

Helsinki Institute for Information Technology
2005-2020

National University of Singapore
2020

University of Technology
2019

University of Helsinki
2004-2017

St. Francis Hospital
2016

Taylor Wimpey (United Kingdom)
2016

Yahoo (United Kingdom)
2007-2013

Yahoo (Spain)
2007-2013

The quality of user-generated content varies drastically from excellent to abuse and spam. As the availability such increases, task identifying high-quality sites based on user contributions --social media -- becomes increasingly important. Social in general exhibit a rich variety information sources: addition itself, there is wide array non-content available, as links between items explicit ratings members community. In this paper we investigate methods for exploiting community feedback...

10.1145/1341531.1341557 article EN 2008-01-01

We consider the problem of maintaining aggregates and statistics over data streams, with respect to last N elements seen so far. refer this model as sliding window model. following basic problem: Given a stream bits, maintain count number 1's in from stream. show that, using $O(\frac{1}{\epsilon} \log^2 N)$ bits memory, we can estimate within factor $1 + \epsilon$. also give matching lower bound $\Omega(\frac{1}{\epsilon}\log^2 memory for any deterministic or randomized algorithms. extend...

10.1137/s0097539701398363 article EN SIAM Journal on Computing 2002-01-01

We consider the following problem: given a set of clusterings, find single clustering that agrees as much possible with input clusterings. This problem, aggregation , appears naturally in various contexts. For example, categorical data is an instance problem; each attribute can be viewed rows where are grouped together if they take same value on attribute. Clustering also used metaclustering method to improve robustness by combining output multiple algorithms. Furthermore, problem...

10.1145/1217299.1217303 article EN ACM Transactions on Knowledge Discovery from Data 2007-03-01

A lot of research in graph mining has been devoted the discovery communities. Most work focused scenario where communities need to be discovered with only reference input graph. However, for many interesting applications one is interested finding community formed by a given set nodes. In this paper we study query-dependent variant community-detection problem, which call community-search problem: G, and query nodes graph, seek find subgraph G that contains it densely connected.

10.1145/1835804.1835923 article EN 2010-07-25

Social network analysis has gained significant attention in recent years, largely due to the success of online social networking and media-sharing sites, consequent availability a wealth data. In spite growing interest, however, there is little understanding potential business applications mining networks. While large body research on different problems methods for mining, gap between techniques developed by community their deployment real-world applications. Therefore impact these still...

10.1145/1961189.1961194 article EN ACM Transactions on Intelligent Systems and Technology 2011-04-01

Web spam can significantly deteriorate the quality of search engine results. Thus there is a large incentive for commercial engines to detect pages efficiently and accurately. In this paper we present detection system that combines link-based content-based features, uses topology graph by exploiting link dependencies among pages. We find linked hosts tend belong same class: either both are or non-spam. demonstrate three methods incorporating into predictions obtained our base classifier: (i)...

10.1145/1277741.1277814 article EN 2007-07-23

In this paper we study the problem of local triangle counting in large graphs. Namely, given a graph G = (V;E) want to estimate as accurately possible number triangles incident every node υ ∈ V graph. The computing global has been considered before, but our knowledge is first that addresses with focus on efficiency issues arising massive distribution and related clustering coefficient can be used many interesting applications. For example, show measures compute help detect presence spamming...

10.1145/1401890.1401898 article EN 2008-08-24

We study the problem of online team formation. consider a setting in which people possess different skills and compatibility among potential members is modeled by social network. A sequence tasks arrives an fashion, each task requires specific set skills. The goal to form new upon arrival task, so that (i) possesses all required (ii) has small communication overhead, (iii) workload performing balanced fairest possible way.

10.1145/2187836.2187950 article EN 2012-04-16

Web data and computational models can play important roles in analyzing cultural trends.The current study presents an analysis of 23,492 sneaker images metadata collected from a global reselling shop, StockX.com.Based on encompassing 22 years 1999 to 2020, we propose design index that helps track changes the characteristics sneakers using contrastive learning method.Our suggest designs have been employing brighter colors lower hue saturation values over time.We also observe how popular...

10.1145/3485447 preprint EN 2022-04-25

Query logs record the queries and actions of users search engines, as such they contain valuable information about interests, preferences, behavior users, well their implicit feedback to engine results. Mining wealth available in query has many important applications including query-log analysis, user profiling personalization, advertising, recommendation, more.In this paper we introduce query-flow graph, a graph representation interesting knowledge latent querying behavior. Intuitively,...

10.1145/1458082.1458163 article EN 2008-10-26

Which topics spark the most heated debates on social media? Identifying those is not only interesting from a societal point of view but also allows filtering and aggregation media content for disseminating news stories. In this article, we perform systematic methodological study controversy detection by using network structure media. Unlike previous work, rather than studying in single hand-picked topic domain-specific knowledge, take general approach to any domain . Our quantifying based...

10.1145/3140565 article EN ACM Transactions on Social Computing 2018-01-18

Finding dense subgraphs is an important graph-mining task with many applications. Given that the direct optimization of edge density not meaningful, as even a single achieves maximum density, research has focused on optimizing alternative functions. A very popular among such functions average degree, whose maximization leads to well-known densest-subgraph notion. Surprisingly enough, however, densest are typically large graphs, small and diameter.

10.1145/2487575.2487645 article EN 2013-08-11

In this paper we study approximate landmark-based methods for point-to-point distance estimation in very large networks. These involve selecting a subset of nodes as landmarks and computing offline the distances from each node graph to those landmarks. At runtime, when between pair is needed, it can be estimated quickly by combining precomputed distances. We prove that optimal set an NP-hard problem, thus heuristic solutions need employed. therefore explore theoretical insights devise...

10.1145/1645953.1646063 article EN 2009-11-02

We study the problem of correlating micro-blogging activity with stock-market events, defined as changes in price and traded volume stocks. Specifically, we collect messages related to a number companies, search for correlations between events those companies features extracted from messages. The extract can be categorized two groups. Features first group measure overall platform, such posts, re-posts, so on. second properties an induced interaction graph, instance, connected components,...

10.1145/2124295.2124358 article EN 2012-02-08

Echo chambers, i.e., situations where one is exposed only to opinions that agree with their own, are an increasing concern for the political discourse in many democratic countries. This paper studies phenomenon of echo chambers on social media. We identify two components phenomenon: opinion shared, and »chamber» (i.e., network) allows »echo» be re-shared -- examine closely at how these interact. define a production consumption measure social-media users, which captures leaning content shared...

10.1145/3178876.3186139 article EN 2018-01-01

We present a unified approach for simultaneously clustering and discovering outliers in data. Our is formalized as generalization of the k-MEANS problem. prove that problem NP-hard then practical polynomial time algorithm, which guaranteed to converge local optimum. Furthermore we extend our all distance measures can be expressed form Bregman divergence. Experiments on synthetic real datasets demonstrate effectiveness utility carrying out both outlier detection concurrent manner. In...

10.1137/1.9781611972832.21 article EN 2013-05-02

Association-rule mining has heretofore relied on the condition of high support to do its work efficiently. In particular, well-known a priori algorithm is only effective when rules interest are relationships that occur very frequently. However, there number applications, such as data mining, identification similar Web documents, clustering, and collaborative filtering, where have comparatively few instances in data. these cases, we must look for highly correlated items, or possibly even...

10.1109/69.908981 article EN IEEE Transactions on Knowledge and Data Engineering 2001-01-01

Matrix decomposition methods represent a data matrix as product of two factor matrices: one containing basis vectors that meaningful concepts in the data, and another describing how observed can be expressed combinations vectors. Decomposition have been studied extensively, but many return real-valued matrices. Interpreting matrices is hard if original Boolean. In this paper, we describe formulation for Boolean Discrete Basis Problem. The problem seeks binary matrix, thus allowing user to...

10.1109/tkde.2008.53 article EN IEEE Transactions on Knowledge and Data Engineering 2008-08-27

We consider the problem of maintaining aggregates and statistics over data streams, with respect to last N elements seen so far. refer this model as sliding window model. following basic problem: Given a stream bits, maintain count number 1's in from stream. show that using O(1/e log2N) bits memory, we can estimate within factor 1 + e. also give matching lower bound Ω(1/e memory for any deterministic or randomized algorithms. extend our scheme sum positive integers. provide upper bounds more...

10.5555/545381.545466 article EN Symposium on Discrete Algorithms 2002-01-06

Complex networks, such as biological, social, and communication often entail uncertainty, thus, can be modeled probabilistic graphs . Similar to the problem of similarity search in standard graphs, a fundamental for is efficiently answer k-nearest neighbor queries ( k -NN), which computing closest nodes some specific node. In this paper we introduce framework processing -NN graphs. We propose novel distance functions that extend well-known graph concepts, shortest paths. order compute them...

10.14778/1920841.1920967 article EN Proceedings of the VLDB Endowment 2010-09-01

The problem of assessing the significance data mining results on high-dimensional 0--1 datasets has been studied extensively in literature. For problems such as frequent sets and finding correlations, testing can be done by standard statistical tests chi-square, or other methods. However, depend only specific attributes not dataset a whole. Moreover, are difficult to apply patterns complex algorithms. In this article, we consider simple randomization technique that deals with shortcoming....

10.1145/1297332.1297338 article EN ACM Transactions on Knowledge Discovery from Data 2007-12-01

In this paper we study the trade-offs in designing efficient caching systems for Web search engines. We explore impact of different approaches, such as static vs. dynamic caching, and query results vs.caching posting lists. Using a log spanning whole year limitations demonstrate that lists can achieve higher hit rates than answers. propose new algorithm lists, which outperforms previous methods. also problem finding optimal way to split cache between answers Finally, measure how changes...

10.1145/1277741.1277775 article EN 2007-07-23

XML is rapidly emerging as the new standard for data representation and exchange on Web. An document can be accompanied by a Document Type Descriptor (DTD) which plays role of schema an collection. DTDs contain valuable information structure documents thus have crucial in efficient storage data, well effective formulation optimization queries. In this paper, we propose XTRACT, novel system inferring DTD database documents. Since syntax incorporates full expressive power regular expressions ,...

10.1145/335191.335409 article EN ACM SIGMOD Record 2000-05-16
Coming Soon ...