Vassilis Katsouros

ORCID: 0000-0002-4185-2344
Publications
Citations
Views
---
Saved
---
About
Contact & Profiles
Research Areas
  • Music and Audio Processing
  • Music Technology and Sound Studies
  • Handwritten Text Recognition Techniques
  • Speech Recognition and Synthesis
  • Speech and Audio Processing
  • Natural Language Processing Techniques
  • Image Processing and 3D Reconstruction
  • Bayesian Methods and Mixture Models
  • Human Motion and Animation
  • Hand Gesture Recognition Systems
  • Human Pose and Action Recognition
  • Video Analysis and Summarization
  • Vehicle License Plate Recognition
  • Image Retrieval and Classification Techniques
  • Mathematics, Computing, and Information Processing
  • Topic Modeling
  • Neuroscience and Music Perception
  • Digital Media Forensic Detection
  • Tactile and Sensory Interactions
  • Theoretical and Computational Physics
  • Subtitles and Audiovisual Media
  • Machine Learning and Data Classification
  • Data Management and Algorithms
  • Diverse Musicological Studies
  • Enhanced Oil Recovery Techniques

Athena Research and Innovation Center In Information Communication & Knowledge Technologies
2013-2024

Institute for Language and Speech Processing
2015-2024

National Technical University of Athens
2023

Universitat Pompeu Fabra
2021

Imperial Valley College
1999

Industrial Cyber-Physical Systems have benefitted substantially from the introduction of a range technology enablers. These include web-based and semantic computing, ubiquitous sensing, internet things (IoT) with multi-connectivity, advanced computing architectures digital platforms, coupled edge or cloud side data management analytics, contributed to shaping up enhanced new value chains in manufacturing. While parts such flows are increasingly automated, there is now greater demand for more...

10.1016/j.arcontrol.2019.03.004 article EN cc-by Annual Reviews in Control 2019-01-01

A. Vacalopoulou 1V. Gardelli 2T. Karafyllidis3F. Liwicki 2H. Mokayed 2M. Papaevripidou3G. Paraskevopoulos1S. Stamouli 1A. Katsamanis Katsouros 1

10.21125/inted.2024.1877 article EN INTED proceedings 2024-03-01

In this paper, we present tempo estimation and beat tracking algorithms by utilizing percussive/harmonic separation of the audio signal, in order to extract filterbank energies chroma features from respective components. Periodicity analysis is carried out convolution feature sequences with a bank resonators. Target estimated resulting periodicity vector incorporating metrical relations knowledge. Tempo followed local refinement method enhance beat-tracking algorithm. Beat involves...

10.1109/icassp.2012.6287906 article EN 2012-03-01

In this work, we investigate the personalization of text-to-music diffusion models in a few-shot setting. Motivated by recent advances computer vision domain, are first to explore combination pre-trained text-to-audio diffusers with two established methods. We experiment effect audio-specific data augmentation on overall system performance and assess different training strategies. For evaluation, construct novel dataset prompts music clips. consider both embedding-based music-specific...

10.1109/icassp48485.2024.10446869 article EN ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2024-03-18

This paper reports on high-performance Optical Character Recognition (OCR) experiments using Long Short-Term Memory (LSTM) Networks for Greek polytonic script. Even though there are many manuscripts, the digitization of such documents has not been widely applied, and very limited work done recognition scripts. We have collected a large number diverse document pages scripts in novel database, called Polyton-DB, containing 15; 689 textlines synthetic authentic printed performed baseline LSTM...

10.1109/icdar.2015.7333865 article EN 2015-08-01

Recognition of old Greek document images containing polytonic (multi accent) characters is a challenging task due to the large number existing character classes (more than 270) which cannot be handled sufficiently by current OCR technologies. Taking into account that system was used from late antiquity until recently, amount scanned documents still remains without full test search capabilities. In order assist progress relevant research, this paper introduces first publicly available...

10.1109/icdar.2015.7333841 article EN 2015-08-01

In this paper, we explore deep learning architectures applied to the air-writing recognition problem where a person writes text freely in three dimensional space. We focus on handwritten digits, namely from 0 9, which are structured as multidimensional time-series acquired Leap Motion Controller (LMC) sensor. examine both dynamic and static approaches model motion trajectory. train compare several state-of-the-art convolutional recurrent architectures. Specifically, employed Long Short-Term...

10.1109/icfhr2020.2020.00013 article EN 2020-09-01

This paper addresses the extraction of multipurpose spectral rhythm features that simultaneously tackle a variety analysis tasks, namely, dance style classification, meter estimation, and tempo estimation. The term emanates from origin extracted features, which is periodicity function (PF), representation encapsulates salience frequencies. Two dimensionality reduction techniques applied on PF to extract expressive compact are compared, linear transformation resulting Principal Component...

10.1109/taslp.2016.2554283 article EN IEEE/ACM Transactions on Audio Speech and Language Processing 2016-04-15

This paper addresses the problem of automatic text-line and word segmentation in handwritten document images. Two novel approaches are presented, one for each task. In a Viterbi algorithm is proposed while an SVM-based metric adopted to locate words text-line. The overall was tested ICDAR2007 handwriting contest showed highly promising results.

10.1109/icassp.2008.4518379 article EN Proceedings of the ... IEEE International Conference on Acoustics, Speech, and Signal Processing 2008-03-01

Automatically synthesizing dance motion sequences is an increasingly popular research task in the broader field of human analysis. Recent approaches have mostly used recurrent neural networks (RNNs), which are known to suffer from prediction error accumulation, usually limiting models synthesize short choreographies less than 100 poses. In this paper we present a multimodal convolutional autoencoder that combines 2D skeletal and audio information by employing attention-based feature fusion...

10.1109/access.2022.3169782 article EN cc-by-nc-nd IEEE Access 2022-01-01

A critical issue in recognition of mathematical expressions is the identification spatial relations symbols or/and sub-expressions that comprise entire formula. This paper addresses problem structural analysis by constructing appropriate feature vectors to represent affinity objects (mathematical or sub-expressions) under examination and employing two popular machine learning techniques: (i) Support Vector Machines (SVM) (ii) Artificial Neural Networks (ANN) recognize relation between these...

10.1109/icfhr.2014.35 article EN 2014-09-01

Document image segmentation to text lines is a critical stage towards unconstrained handwritten document recognition. Although morphological operations proved be effective in processing machine-printed documents for several issues, similar methods unconstraint-handwritten lack accuracy. We propose an efficient method based on binary morphology text-line such documents. The basic steps of our approach are: a) sub sampling and rank order filtering enhance the structures b) applying dilations...

10.1109/icfhr.2010.11 article EN 2010-11-01

This paper discusses the use of BIC with respect to speaker diarization, i.e., problem assigning observation vectors an audio file a set speakers unknown cardinality. Our primary goals are examine two dominant approaches BIC, namely global and local combine strengths variants into one intuitive criterion, segmental-BIC. We then consider asymptotic behavior segmental-BIC, when dealing models that highly misspecified, as ones commonly used in diarization task. main result is modified version...

10.1109/jstsp.2010.2048656 article EN IEEE Journal of Selected Topics in Signal Processing 2010-04-20

We present a system for recognizing online mathematical expressions (ME). Symbol recognition is based on template elastic matching distance between pen direction features. The structural analysis of the ME extracting baseline and then classifying symbols into levels above below baseline. are sequentially analyzed using six spatial relations respective 2d structure processed to give resulting MathML representation ME. was evaluated Competition Recognition Online Handwritten Mathematical...

10.1109/icfhr.2012.172 article EN International Conference on Frontiers in Handwriting Recognition 2012-09-01

This paper focuses on the hand gesture recognition problem, in which input is a multidimensional time series signal acquired from Leap Motion Sensor and output predefined set of gestures. In present work, we propose adoption Convolutional Neural Networks (CNNs), either combination with Long Short-Term Memory (LSTM) neural network (i.e. CNN-LSTM), or standalone deep architecture dCNN) to automate feature learning classification raw data. The learned features are considered as higher level...

10.23919/eusipco.2019.8902973 article EN 2021 29th European Signal Processing Conference (EUSIPCO) 2019-09-01

Optical Character Recognition (OCR) of ancient Greek polytonic scripts is a challenging task due to the large number character classes, resulting from variations diacritical marks on vowel letters. Classical OCR systems require segmentation phase, which in case main source errors that finally affects overall performance. This paper suggests free HMM-based recognition system and compares its performance with other commercial, open source, state-of-the art systems. The evaluation has been...

10.1109/das.2016.60 article EN 2016-04-01

Maintenance management and engineering practice has progressed to adopt approaches which aim reach maintenance decisions not by means of pre-specified plans recommendations but increasingly on the basis best contextually relevant available information knowledge, all considered against stated objectives. Different methods for automating event detection, diagnostics prognostics have been proposed, may achieve very high performance when appropriately adapted tuned serve needs well defined...

10.1016/j.ifacol.2016.11.038 article EN IFAC-PapersOnLine 2016-01-01
Coming Soon ...