- Data Management and Algorithms
- Data Mining Algorithms and Applications
- Complex Network Analysis Techniques
- Rough Sets and Fuzzy Logic
- Advanced Clustering Algorithms Research
- Algorithms and Data Compression
- Opinion Dynamics and Social Influence
- Machine Learning and Data Classification
- Data Visualization and Analytics
- Time Series Analysis and Forecasting
- Face and Expression Recognition
- Advanced Database Systems and Queries
- Anomaly Detection Techniques and Applications
- Natural Language Processing Techniques
- Mobile Crowdsensing and Crowdsourcing
- Biomedical Text Mining and Ontologies
- Neural Networks and Applications
- Digital Marketing and Social Media
- Speech Recognition and Synthesis
- Bayesian Modeling and Causal Inference
- Imbalanced Data Classification Techniques
- Personal Information Management and User Behavior
- Data Stream Mining Techniques
- Topic Modeling
- Advanced Image and Video Retrieval Techniques
University of Helsinki
2008-2022
Speech Pathology Australia
2021
Helsinki Institute for Information Technology
2007-2020
Finnish Institute of Occupational Health
2015-2017
Aalto University
2008-2015
University of Technology
2014
Yahoo (Spain)
2011-2012
Yahoo (United Kingdom)
2011-2012
Information and communications technologies (ICTs) have enabled the rise of so‐called “Collaborative Consumption” (CC): peer‐to‐peer‐based activity obtaining, giving, or sharing access to goods services, coordinated through community‐based online services . CC has been expected alleviate societal problems such as hyper‐consumption, pollution, poverty by lowering cost economic coordination within communities. However, beyond anecdotal evidence, there is a dearth understanding why people...
We live in a computerized and networked society where many of our actions leave digital trace affect other people’s actions. This has lead to the emergence new data-driven research field: mathematical methods computer science, statistical physics sociometry provide insights on wide range disciplines ranging from social science human mobility. A recent important discovery is that search engine traffic (i.e., number requests submitted by users engines www) can be used track and, some cases,...
We present Spine, an efficient algorithm for finding the "backbone" of influence network. Given a social graph and log past propagations, we build instance independent-cascade model that describes propagations. aim at reducing complexity model, while preserving most its accuracy in describing data.
Information and communications technologies (ICTs) have enabled the rise of so-called "Collaborative Consumption" (CC): peer-to-peer-based activity obtaining, giving, or sharing access to goods services, coordinated through community-based online services. CC has been expected alleviate societal problems such as hyper-consumption, pollution, poverty by lowering cost economic coordination within communities. However, beyond anecdotal evidence, there is a dearth understanding why people...
We investigate the problem of mining "tips" from Yahoo! Answers and displaying those tips in response to related web queries. Here, a "tip" is short, concrete self-contained bit non-obvious advice such as "To zest lime if you don't have zester : use cheese grater."
It is known that periods of intense social interaction result in shared patterns collaborators’ physiological signals. However, applied quantitative research on collaboration hindered due to scarcity objective metrics teamwork effectiveness. Indeed, especially the domain productive, ecologically-valid activity such as programming, there a lack evidence for most effective, affordable and reliable measures quality. In this study we investigate synchrony signals between collaborating computer...
Adaption of end-to-end speech recognition systems to new tasks is known be challenging. A number solutions have been proposed which apply external language models with various fusion methods, possibly a combination two-pass decoding. Also TTS used generate adaptation data for the models. In this paper we show that RNN-transducer can effectively adapted domains using only small amounts textual data. By taking advantage model's inherent structure, where prediction network interpreted as model,...
Ordering and ranking items of different types are important tasks in various applications, such as query processing scientific data mining. A total order for the can be misleading, since there groups that have practically equal ranks.We consider bucket orders, i.e., orders with ties. They used to capture essential information without overfitting data: they form a useful concept class between arbitrary partial orders. We address question finding set items, given pairwise precedence items....
We study a novel clustering problem in which the pairwise relations between objects are categorical. This can be viewed as vertices of graph whose edges different types (colors). introduce an objective function that aims at partitioning such within each cluster have, much possible, same color. show is NP-hard and propose randomized algorithm with approximation guarantee proportional to maximum degree input graph. The iteratively picks random edge pivot, builds around it, removes from...
We study a novel clustering problem in which the pairwise relations between objects are categorical . This can be viewed as vertices of graph whose edges different types ( colors ). introduce an objective function that ensures within each cluster have, much possible, same color. show is NP -hard and propose randomized algorithm with approximation guarantee proportional to maximum degree input graph. The iteratively picks random edge pivot, builds around it, removes from Although being fast,...
We propose a framework for searching the Wikipedia with contextual information. Our extends typical keyword search, by considering queries of type (q,p), where q is set terms (as in classical Web search), and p source document. The query represent information that user interested finding, document provides context query. task to rank other documents respect their relevance given p. By associating terms, search results initiated particular page can be made more relevant.
We introduce a new approach to the problem of overlapping clustering. The main idea is formulate clustering as an optimization in which each data point mapped small set labels, representing membership different clusters. objective find mapping so that distances between points agree much possible with taken over their label sets. To define sets, we consider two measures: set-intersection indicator function and Jaccard coefficient. solve propose local-search algorithm. iterative step our...
In this work we present the novel ASTRID method for investigating which attribute interactions classifiers exploit when making predictions. Attribute in classification tasks mean that two or more attributes together provide stronger evidence a particular class label. Knowledge of such makes models interpretable by revealing associations between attributes. This has applications, e.g., pharmacovigilance to identify drugs bioinformatics investigate single nucleotide polymorphisms. We also show...
In this paper we address the following density estimation problem: given a number of relative similarity judgements over set items D, assign value p(x) to each item x in D. Our work is motivated by human computing applications where can be interpreted e.g. as measure rarity an item. While humans are excellent at solving range different visual tasks, assessing absolute (or distance) two (e.g. photographs) difficult. Relative similarity, such A more similar B than C, on other hand,...
We study the problem of discrepancy maximization on graphs: given a set nodes Q an underlying graph G, we aim to identify connected subgraph G that contains many more from than other nodes. This variant discrepancy-maximization extends well-known notion "bump hunting" in Euclidean space. consider under two access models. In unrestricted-access model, whole is as input, while local-access model can only retrieve neighbors node using possibly slow and costly interface. prove basic graphs...
The power of human computation is founded on the capabilities humans to process qualitative information in a manner that hard reproduce with computer. However, all machine learning algorithms rely mathematical operations, such as sums, averages, least squares etc. are less suitable for computation. This paper an effort combine these two aspects data processing. We consider problem computing centroid set, key component many data-analysis applications clustering, using very simple intelligence...
Crowdsourced, or human computation based clustering algorithms usually rely on relative distance comparisons, as these are easier to elicit from workers than absolute information. We build upon existing work correlation clustering, a well-known non-parametric approach and present novel algorithm for computation. first define variant of that is briefly outline an approximation this problem. As second contribution, we propose more practical algorithm, which empirically compare against methods...