Antti Ukkonen

ORCID: 0000-0001-6060-1746
Publications
Citations
Views
---
Saved
---
About
Contact & Profiles
Research Areas
  • Data Management and Algorithms
  • Data Mining Algorithms and Applications
  • Complex Network Analysis Techniques
  • Rough Sets and Fuzzy Logic
  • Advanced Clustering Algorithms Research
  • Algorithms and Data Compression
  • Opinion Dynamics and Social Influence
  • Machine Learning and Data Classification
  • Data Visualization and Analytics
  • Time Series Analysis and Forecasting
  • Face and Expression Recognition
  • Advanced Database Systems and Queries
  • Anomaly Detection Techniques and Applications
  • Natural Language Processing Techniques
  • Mobile Crowdsensing and Crowdsourcing
  • Biomedical Text Mining and Ontologies
  • Neural Networks and Applications
  • Digital Marketing and Social Media
  • Speech Recognition and Synthesis
  • Bayesian Modeling and Causal Inference
  • Imbalanced Data Classification Techniques
  • Personal Information Management and User Behavior
  • Data Stream Mining Techniques
  • Topic Modeling
  • Advanced Image and Video Retrieval Techniques

University of Helsinki
2008-2022

Speech Pathology Australia
2021

Helsinki Institute for Information Technology
2007-2020

Finnish Institute of Occupational Health
2015-2017

Aalto University
2008-2015

University of Technology
2014

Yahoo (Spain)
2011-2012

Yahoo (United Kingdom)
2011-2012

Information and communications technologies (ICTs) have enabled the rise of so‐called “Collaborative Consumption” (CC): peer‐to‐peer‐based activity obtaining, giving, or sharing access to goods services, coordinated through community‐based online services . CC has been expected alleviate societal problems such as hyper‐consumption, pollution, poverty by lowering cost economic coordination within communities. However, beyond anecdotal evidence, there is a dearth understanding why people...

10.1002/asi.23552 article EN Journal of the Association for Information Science and Technology 2015-06-02

We live in a computerized and networked society where many of our actions leave digital trace affect other people’s actions. This has lead to the emergence new data-driven research field: mathematical methods computer science, statistical physics sociometry provide insights on wide range disciplines ranging from social science human mobility. A recent important discovery is that search engine traffic (i.e., number requests submitted by users engines www) can be used track and, some cases,...

10.1371/journal.pone.0040014 article EN cc-by PLoS ONE 2012-07-19

We present Spine, an efficient algorithm for finding the "backbone" of influence network. Given a social graph and log past propagations, we build instance independent-cascade model that describes propagations. aim at reducing complexity model, while preserving most its accuracy in describing data.

10.1145/2020408.2020492 article EN 2011-08-21

Information and communications technologies (ICTs) have enabled the rise of so-called "Collaborative Consumption" (CC): peer-to-peer-based activity obtaining, giving, or sharing access to goods services, coordinated through community-based online services. CC has been expected alleviate societal problems such as hyper-consumption, pollution, poverty by lowering cost economic coordination within communities. However, beyond anecdotal evidence, there is a dearth understanding why people...

10.2139/ssrn.2271971 article EN SSRN Electronic Journal 2013-01-01

10.1007/s10115-012-0522-9 article EN Knowledge and Information Systems 2012-07-21

We investigate the problem of mining "tips" from Yahoo! Answers and displaying those tips in response to related web queries. Here, a "tip" is short, concrete self-contained bit non-obvious advice such as "To zest lime if you don't have zester : use cheese grater."

10.1145/2124295.2124369 article EN 2012-02-08

It is known that periods of intense social interaction result in shared patterns collaborators’ physiological signals. However, applied quantitative research on collaboration hindered due to scarcity objective metrics teamwork effectiveness. Indeed, especially the domain productive, ecologically-valid activity such as programming, there a lack evidence for most effective, affordable and reliable measures quality. In this study we investigate synchrony signals between collaborating computer...

10.1371/journal.pone.0159178 article EN cc-by PLoS ONE 2016-07-14

Adaption of end-to-end speech recognition systems to new tasks is known be challenging. A number solutions have been proposed which apply external language models with various fusion methods, possibly a combination two-pass decoding. Also TTS used generate adaptation data for the models. In this paper we show that RNN-transducer can effectively adapted domains using only small amounts textual data. By taking advantage model's inherent structure, where prediction network interpreted as model,...

10.21437/interspeech.2021-1191 article EN Interspeech 2022 2021-08-27

Ordering and ranking items of different types are important tasks in various applications, such as query processing scientific data mining. A total order for the can be misleading, since there groups that have practically equal ranks.We consider bucket orders, i.e., orders with ties. They used to capture essential information without overfitting data: they form a useful concept class between arbitrary partial orders. We address question finding set items, given pairwise precedence items....

10.1145/1150402.1150468 article EN 2006-08-20

We study a novel clustering problem in which the pairwise relations between objects are categorical. This can be viewed as vertices of graph whose edges different types (colors). introduce an objective function that aims at partitioning such within each cluster have, much possible, same color. show is NP-hard and propose randomized algorithm with approximation guarantee proportional to maximum degree input graph. The iteratively picks random edge pivot, builds around it, removes from...

10.1145/2339530.2339735 article EN 2012-08-12

We study a novel clustering problem in which the pairwise relations between objects are categorical . This can be viewed as vertices of graph whose edges different types ( colors ). introduce an objective function that ensures within each cluster have, much possible, same color. show is NP -hard and propose randomized algorithm with approximation guarantee proportional to maximum degree input graph. The iteratively picks random edge pivot, builds around it, removes from Although being fast,...

10.1145/2728170 article EN ACM Transactions on Knowledge Discovery from Data 2015-06-01

We propose a framework for searching the Wikipedia with contextual information. Our extends typical keyword search, by considering queries of type (q,p), where q is set terms (as in classical Web search), and p source document. The query represent information that user interested finding, document provides context query. task to rank other documents respect their relevance given p. By associating terms, search results initiated particular page can be made more relevant.

10.1145/1458082.1458274 article EN 2008-10-26

We introduce a new approach to the problem of overlapping clustering. The main idea is formulate clustering as an optimization in which each data point mapped small set labels, representing membership different clusters. objective find mapping so that distances between points agree much possible with taken over their label sets. To define sets, we consider two measures: set-intersection indicator function and Jaccard coefficient. solve propose local-search algorithm. iterative step our...

10.1109/icdm.2011.114 article EN 2011-12-01

In this work we present the novel ASTRID method for investigating which attribute interactions classifiers exploit when making predictions. Attribute in classification tasks mean that two or more attributes together provide stronger evidence a particular class label. Knowledge of such makes models interpretable by revealing associations between attributes. This has applications, e.g., pharmacovigilance to identify drugs bioinformatics investigate single nucleotide polymorphisms. We also show...

10.48550/arxiv.1707.07576 preprint EN other-oa arXiv (Cornell University) 2017-01-01

In this paper we address the following density estimation problem: given a number of relative similarity judgements over set items D, assign value p(x) to each item x in D. Our work is motivated by human computing applications where can be interpreted e.g. as measure rarity an item. While humans are excellent at solving range different visual tasks, assessing absolute (or distance) two (e.g. photographs) difficult. Relative similarity, such A more similar B than C, on other hand,...

10.1609/hcomp.v3i1.13232 article EN Proceedings of the AAAI Conference on Human Computation and Crowdsourcing 2015-09-23

We study the problem of discrepancy maximization on graphs: given a set nodes Q an underlying graph G, we aim to identify connected subgraph G that contains many more from than other nodes. This variant discrepancy-maximization extends well-known notion "bump hunting" in Euclidean space. consider under two access models. In unrestricted-access model, whole is as input, while local-access model can only retrieve neighbors node using possibly slow and costly interface. prove basic graphs...

10.1109/icde.2015.7113364 article EN 2015-04-01

The power of human computation is founded on the capabilities humans to process qualitative information in a manner that hard reproduce with computer. However, all machine learning algorithms rely mathematical operations, such as sums, averages, least squares etc. are less suitable for computation. This paper an effort combine these two aspects data processing. We consider problem computing centroid set, key component many data-analysis applications clustering, using very simple intelligence...

10.1609/hcomp.v1i1.13079 article EN Proceedings of the AAAI Conference on Human Computation and Crowdsourcing 2013-11-03

Crowdsourced, or human computation based clustering algorithms usually rely on relative distance comparisons, as these are easier to elicit from workers than absolute information. We build upon existing work correlation clustering, a well-known non-parametric approach and present novel algorithm for computation. first define variant of that is briefly outline an approximation this problem. As second contribution, we propose more practical algorithm, which empirically compare against methods...

10.1109/icdm.2017.148 article EN 2021 IEEE International Conference on Data Mining (ICDM) 2017-11-01
Coming Soon ...