- Music and Audio Processing
- Speech and Audio Processing
- Music Technology and Sound Studies
- Neuroscience and Music Perception
- Speech Recognition and Synthesis
- Video Analysis and Summarization
- Time Series Analysis and Forecasting
- Diverse Musicological Studies
- Human Pose and Action Recognition
- Data Management and Algorithms
- Human Motion and Animation
- Hand Gesture Recognition Systems
- Internet of Things and Social Network Interactions
- Voice and Speech Disorders
- Radiomics and Machine Learning in Medical Imaging
- Traffic Prediction and Management Techniques
- Human Mobility and Location-Based Analysis
- Peer-to-Peer Network Technologies
- Physical Unclonable Functions (PUFs) and Hardware Security
- Semantic Web and Ontologies
- Advanced Malware Detection Techniques
- Indoor and Outdoor Localization Technologies
- Reinforcement Learning in Robotics
- Engineering Applied Research
- Network Security and Intrusion Detection
University of Piraeus
2015-2024
Capital University
2023
Institute for Language and Speech Processing
2018-2022
Innovative Technologies Center (Greece)
2022
TU Wien
2020
National Technical University of Athens
2008-2017
National and Kapodistrian University of Athens
1998-2009
Hella (Germany)
2008
Athens State University
2006-2007
Signal Processing (United States)
1998
In this paper, we present a novel Deep Neural Network-based indoor localization method that estimates the position of Bluetooth Low Energy (BLE) transmitter (tag) by using received signals' characteristics at multiple Anchor Points (APs). We use signal strength indicator (RSSI) value and in-phase quadrature-phase (IQ) components BLE signals single time instance to simultaneously estimate angle arrival (AoA) all APs. Through supervised learning on simulated data, various machine (ML)...
Forecasting vessel locations is of major importance in the maritime domain, with applications safety, logistics, etc. Nowadays, tracking has become possible largely due to increased GPS-based data availability. This paper introduces a novel Vessel Location (VLF) framework, based on Long-Short Term Memory (LSTM) Neural Networks, aiming perform effective location forecasting time horizons up 60 minutes, even for vessels not recorded past. The proposed VLF framework specially designed handling...
The fingerprinting technique is a popular approach to reveal location of persons, instruments or devices in an indoor environment. Typically based on signal strength measurement, power level map created first the learning phase align with measured values inference. Second, determined by taking point for which recorded received closest actually measured. biggest limit this reliability measurements, may lack accuracy many wireless systems. To end, work extends measurement using multiple...
In this paper we present a novel method for extracting affective information from movies, based on speech data. The is 2D representation of emotions (Emotion Wheel). goal twofold. First, to investigate whether the Emotion Wheel offers good associated with signals. To end, several humans have manually annotated data movies using and level disagreement has been computed as measure quality. results indicate that emotion wheel in Second, regression approach adopted, order predict location an...
This paper presents a multistage system for speech/music discrimination which is based on three-step procedure. The first step computationally efficient scheme consisting of region growing technique and operates 1-D feature sequence, extracted from the raw audio stream. used as preprocessing stage yields segments with high music speech precision at expense leaving certain parts recording unclassified. unclassified stream are then fed input to more demanding scheme. latter treats radio...
In this work, we present a multi-class classification algorithm for audio segments recorded from movies, focusing on the detection of violent content, protecting sensitive social groups (e.g. children). Towards end, have used twelve features stemming nature signals under study. order to classify into six classes (three them violent), Bayesian networks been in combination with one versus all architecture. The overall system has trained and tested large data set (5000 segments), more than 30...
This paper presents a new extension to the variable duration hidden Markov model (HMM), capable of classifying musical pattens that have been extracted from raw audio data into set predefined classes. Each pattern is converted sequence music intervals by means fundamental frequency tracking procedure. subsequently presented as input variable-duration HMMs. one these models has trained recognize patterns corresponding class. Classification determined based on highest recognition probability....
This paper presents a method for detecting violent content in video sharing sites. The proposed approach operates on fusion of three modalities: audio, moving image and text data, the latter being collected from accompanying user comments. problem is treated as binary classification task (violent vs non-violent content) 9-dimensional feature space, where 7 out 9 features are extracted audio stream. has been evaluated 210 YouTube videos overall accuracy reached 82%.
This paper treats gunshot detection in audio streams from movies as a maximization task, where the solution is obtained by means of dynamic programming. The proposed method seeks sequence segments and respective class labels, i.e., gunshots vs. all other types, that maximize product posterior label probabilities, given segments' data. required probabilities are estimated combining soft classification decisions set Bayesian Network combiners. Tests have been performed on large indicate yields...
Automatically synthesizing dance motion sequences is an increasingly popular research task in the broader field of human analysis. Recent approaches have mostly used recurrent neural networks (RNNs), which are known to suffer from prediction error accumulation, usually limiting models synthesize short choreographies less than 100 poses. In this paper we present a multimodal convolutional autoencoder that combines 2D skeletal and audio information by employing attention-based feature fusion...
Automatic recognition of musical patterns plays a crucial part in musicological and ethnomusicological research can become an indispensable tool for the search comparison music extracts within large multimedia database. This paper presents efficient method recognizing isolated monophonic environment, using novel extension dynamic time warping, which we call context dependent warping. Each pattern, to be recognized, is converted into sequence frequency jumps by means fundamental tracking...
In this paper, we present a study on the efficiency of neural networks for hard problem automatically classifying voice disorders. To end, convolutional architectures combined with feed-forward are used classification four types Speech signals and data from medical records, collected by Far Eastern Memorial Hospital (FEMH), involving speech pathologies, (functional dysphonia, phonotrauma, laryngeal neoplasm unilateral vocal paralysis), were analyzed proposed method participated at FEMH Voice...