- Time Series Analysis and Forecasting
- Data Management and Algorithms
- Algorithms and Data Compression
- Anomaly Detection Techniques and Applications
- Explainable Artificial Intelligence (XAI)
- Advanced Clustering Algorithms Research
- Image Retrieval and Classification Techniques
- Advanced Database Systems and Queries
- Music and Audio Processing
- Adversarial Robustness in Machine Learning
- Topic Modeling
- Neural Networks and Applications
- Advanced Graph Neural Networks
- Data Stream Mining Techniques
- Machine Learning and Data Classification
- Complex Systems and Time Series Analysis
- Advanced Steganography and Watermarking Techniques
- Sparse and Compressive Sensing Techniques
- Natural Language Processing Techniques
- Recommender Systems and Techniques
- Video Analysis and Summarization
- Privacy-Preserving Technologies in Data
- Cell Image Analysis Techniques
- Machine Learning and Algorithms
- Text Readability and Simplification
University of Lausanne
2019-2024
National and Kapodistrian University of Athens
2024
Institute of Communication and Computer Systems
2022
Los Alamitos Medical Center
2020
IBM Research - Zurich
2009-2019
IBM (United States)
2007-2018
IBM Research - Thomas J. Watson Research Center
2006-2017
University of California, Riverside
2002-2006
National Technical University of Athens
2004
We investigate techniques for analysis and retrieval of object trajectories in two or three dimensional space. Such data usually contain a large amount noise, that has made previously used metrics fail. Therefore, we formalize non-metric similarity functions based on the longest common subsequence (LCSS), which are very robust to noise furthermore provide an intuitive notion between by giving more weight similar portions sequences. Stretching sequences time is allowed, as well global...
Although most time-series data mining research has concentrated on providing solutions for a single distance function, in this work we motivate the need index structure that can support multiple measures. Our specific area of interest is efficient retrieval and analysis trajectory similarities. Trajectory datasets are very common environmental applications, mobility experiments, video surveillance especially important discovery certain biological patterns. primary similarity measure based...
We present several methods for mining knowledge from the query logs of MSN search engine. Using logs, we build a time series each word or phrase (e.g., 'Thanksgiving' 'Christmas gifts') where elements are number times that is issued on day. All describe use sequences this form and can be applied to data generally. Our primary goal discovery semantically similar queries do so by identifying with demand patterns. Utilizing best Fourier coefficients energy omitted components, improve upon...
This work motivates the need for more flexible structural similarity measures between time-series sequences, which are based on extraction of important periodic features. Specifically, we present non-parametric methods accurate periodicity detection and introduce new distance sequences. The goal these tools techniques to assist in detecting, monitoring visualizing changes. It is our belief that can be directly applicable manufacturing industry preventive maintenance medical sciences...
The matching of two-dimensional shapes is an important problem with applications in domains as diverse biometrics, industry, medicine and anthropology. distance measure used must be invariant to many distortions, including scale, offset, noise, partial occlusion, etc. Most these distortions are relatively easy handle, either the representation data or similarity used. However rotation invariance seems uniquely difficult. Current approaches typically try achieve data, at expense...
In this paper we address the issue of using local embeddings for data visualization in two and three dimensions, classification. We advocate their use on basis that they provide an efficient mapping procedure from original dimension data, to a lower intrinsic dimension. depict how can accurately capture user's perception similarity high-dimensional purposes. Moreover, exploit low-dimensional provided by these embeddings, develop new classification techniques, show experimentally accuracy is...
The past decade has seen a wealth of research on time series representations, because the manipulation, storage, and indexing large volumes raw data is impractical. vast majority concentrated representations that are calculated in batch mode represent each value with approximately equal fidelity. However, increasing deployment mobile devices real sensors brought home need for can be incrementally updated, approximate fidelity proportional to its age. latter property allows us answer queries...
For the discovery of similar patterns in 1D time-series, it is very typical to perform a normalization data (for example transformation so that follow zero mean and unit standard deviation). Such transformations can reveal latent are commonly used datamining applications. However, when dealing with multidimensional which appear naturally applications such as video-tracking, motion-capture etc, motion also be expressed at different orientations. It therefore imperative provide support for...
This work introduces distance-based criteria for segmentation of object trajectories. Segmentation leads to simplification the original objects into smaller, less complex primitives that are better suited storage and retrieval purposes. Previous on trajectory attacked problem locally, segmenting separately each database. Therefore, they did not directly optimize inter-object separability, which is necessary mining operations such as searching, clustering, classification large databases. In...
Data mining increasingly faces complex challenges in the real-life world of business problems and needs. The gap between expectations R&D results this area involves key aspects field, such as methodologies, targeted problems, pattern interestingness, infrastructure support. Both researchers practitioners are realizing importance domain knowledge to close develop actionable for real user
The ever-increasing number of intrusions in public and commercial networks has created the need for high-speed archival solutions that continuously store streaming network data to enable forensic analysis auditing. However, "turning back clock" post-attack analyses is not a trivial task. first major challenge solution sustain archiving under extremely insertion rates. Moreover, archives be stored format compressed but still amenable indexing. above requirements make general-purpose databases...
We consider the problem of generating interpretable recommendations by identifying overlapping co-clusters clients and products, based only on positive or implicit feedback. Our approach is applicable very large datasets because it exhibits almost linear complexity in input examples number co-clusters. show, both real industrial data publicly available datasets, that recommendation accuracy our algorithm competitive to state-of-art matrix factorization techniques. In addition, technique has...
We investigate techniques for similarity analysis of spatio-temporal trajectories mobile objects. Such data may contain a large number outliers, which degrade the performance Euclidean and time warping distance. Therefore, we propose use non-metric distance functions based on longest common subsequence (LCSS), in conjunction with sigmoidal matching function. Finally, compare these new methods to various L/sub p/ norms also (for real synthetic data) present experimental results that validate...
In this paper we present the Threshold Join Algorithm (TJA), which is an efficient TOP-k query processing algorithm for distributed sensor networks. The objective of a top-k to find k highest ranked answers user defined similarity function. evaluation such in network environment associated with transfer data over extremely expensive communication medium. TJA uses non-uniform threshold on queried attribute order minimize number tuples that have be transferred towards querying node....
This paper addresses the task of change analysis correlated multi-sensor systems. The goal is to compute anomaly score each sensor when we know that system has some potential difference from a reference state. Examples include validating proper performance various car sensors in automobile industry. We solve this problem based on neighborhood preservation principle -If working normally, graph almost invariant against fluctuations experimental conditions. Here defined correlation between...
The amount of unstructured text-based data is growing every day. Querying, clustering, and classifying this big requires similarity computations across large sets documents. Whereas low-complexity metrics are available, attention has been shifting towards more complex methods that achieve a higher accuracy. In particular, the Word Mover's Distance (WMD) method proposed by Kusner et al. promising new approach, but its time complexity grows cubically with number unique words in Relaxed (RWMD)...
The past decade has seen a wealth of research on time series representations. vast majority concentrated representations that are calculated in batch mode and represent each value with approximately equal fidelity. However, the increasing deployment mobile devices real sensors brought home need for can be incrementally updated, approximate data fidelity proportional to its age. latter property allows us answer queries about recent greater precision, since many domains information is more...