Kornel Laskowski

ORCID: 0009-0008-8060-6913
Publications
Citations
Views
---
Saved
---
About
Contact & Profiles
Research Areas
  • Speech and dialogue systems
  • Speech Recognition and Synthesis
  • Speech and Audio Processing
  • Phonetics and Phonology Research
  • Music and Audio Processing
  • Language, Discourse, Communication Strategies
  • Topic Modeling
  • Natural Language Processing Techniques
  • Humor Studies and Applications
  • Language, Metaphor, and Cognition
  • Sentiment Analysis and Opinion Mining
  • Opinion Dynamics and Social Influence
  • Emotion and Mood Recognition
  • Multi-Agent Systems and Negotiation
  • Video Analysis and Summarization
  • Neural Networks and Applications
  • Complex Network Analysis Techniques
  • Linguistic Studies and Language Acquisition
  • Text and Document Classification Technologies
  • Machine Learning in Healthcare
  • Advanced Text Analysis Techniques
  • Animal Vocal Communication and Behavior
  • Language and cultural evolution
  • Advanced Adaptive Filtering Techniques
  • Linguistic Variation and Morphology

Human Immunome Project
2024

The Human Diagnosis Project
2024

Voci (United States)
2016-2019

Stockholm University
2019

Carnegie Mellon University
2006-2017

KTH Royal Institute of Technology
2011

Karlsruhe University of Education
2007-2008

Karlsruhe Institute of Technology
2006-2007

Automatic detection of emotions has been evaluated using standard Mel-frequency Cepstral Coefficients, MFCCs, and a variant, MFCC-low, calculated between 20 300 Hz, in order to model pitch. Also plain pitch features have used. These acoustic all modeled by Gaussian mixture models, GMMs, on the frame level. The method tested two different corpora languages; Swedish voice controlled telephone services English meetings. results indicate that GMMs level is feasible technique for emotion...

10.21437/interspeech.2006-277 article EN Interspeech 2022 2006-09-17

An important task in automatic conversation understanding is the inference of social structure governing participant behavior. We explore dependence between several dimensions, including assigned role, gender, and seniority, a set low-level features descriptive talkspurt deployment multiparticipant context. Experiments conducted on two large, publicly available meeting corpora suggest that our are quite useful predicting these excepting gender. The classification experiments we present...

10.3115/1622064.1622094 article EN 2008-01-01

Automatic speech understanding in natural multiparty conversation settings stands to gain from parsing not only verbal but also non-verbal vocal communicative behaviors. In this work, we study the most frequently annotated behavior, laughter, whose detection has clear implications for tasks, and automatic recognition of affect particular. To complement existing acoustic descriptions phenomenon, explore temporal patterning laughter over course conversation, with a view towards its...

10.21437/interspeech.2007-395 article EN Interspeech 2022 2007-08-27

This work explores the timing of very short utterances in conversations, as well effects excluding intervals adjacent to such from distributions betweenspeaker interval durations. The results show that are more precisely timed preceding utterance than longer terms a smaller variance and larger proportion no-gap-no-overlaps. Excluding furthermore measures central tendency closer zero (i.e. no-gap-no-overlaps) relatively gaps overlaps). Index Terms: Human speech production, Prosody

10.21437/interspeech.2011-710 article EN Interspeech 2022 2011-08-27

The taking of turns to speak is an intrinsic property conversation. It expected that models turns, providing a prior distribution over conversational form, can reduce the perplexity what attended and processed by spoken dialogue systems. We propose single-port model multi-party turn-taking which allows conversants behave independently but condition their behavior on past entire group. performs at least as well existing multi-port subsequent speech activity. quantify effect longer histories...

10.1109/icassp.2011.5947629 article EN 2011-05-01

As spoken dialogue systems become deployed in increasingly complex domains, they face rising demands on the naturalness of interaction. We focus system responsiveness, aiming to mimic human-like flow control by predicting speaker changes as observed real human-human conversations. derive an instantaneous vector representation pitch variation and show that it is amenable standard acoustic modeling techniques. Using a small amount automatically labeled data, we train models which significantly...

10.1109/icassp.2008.4518791 article EN Proceedings of the ... IEEE International Conference on Acoustics, Speech, and Signal Processing 2008-03-01

The detection of laughter in conversational interaction presents an important challenge meeting understanding, primarily because is predictive the emotional state participants. We present evidence which suggests that ignoring unvoiced improves prediction involvement collocated speech, making a case for distinction between voiced and during detection. Our experiments show exclusion model training as well its explicit modeling lead to scores are much higher than those otherwise obtained all...

10.1109/icassp.2009.4960696 article EN IEEE International Conference on Acoustics Speech and Signal Processing 2009-04-01

We propose an algorithm for segmenting multispeaker meeting audio, recorded with personal channel microphones, into speech and non-speech intervals each microphone’s wearer. An of this type turns out to be necessary prior subsequent audio processing because, in spite close-talking the channels exhibit a high degree crosstalk due unbalanced calibration small inter-speaker distance. The proposed is based on short-time crosscorrelation all pairs. It requires no training executes one fifth real...

10.21437/interspeech.2004-350 article EN Interspeech 2022 2004-10-04

The study of meetings, and multi-party conversation in general, is currently the focus much attention, calling for more robust accurate speech activity detection systems. We present a novel multichannel algorithm, which explicitly models overlap incurred by participants taking turns at speaking. Parameters overlapped states are estimated during decoding using combining knowledge from other observed same meeting, an unsupervised manner. demonstrate on NIST Rich Transcription Spring 2004 data...

10.1109/icassp.2006.1660190 article EN 2006-08-02

In recent years, the field of automatic speaker identification has begun to exploit high-level sources speaker-discriminative information, in addition traditional models spectral shape. These include pronunciation models, prosodic dynamics, pitch, pause, and duration features, phone streams, conversational interaction. As part this broader thrust, we explore a new frame-level vector representation instantaneous change fundamental frequency, known as frequency variation (FFV). The FFV...

10.1109/icassp.2009.4960640 article EN IEEE International Conference on Acoustics Speech and Signal Processing 2009-04-01

This paper describes the 2006 lecture recognition system developed at Interactive Systems Laboratories (ISL), for individual head-microphone (IHM), single distant microphone (SDM), and multiple microphones (MDM) conditions.It was evaluated in RT-06S rich transcription meeting evaluation sponsored by US National Institute of Standards Technologies (NIST).We describe principal differences between our current those submitted previous years, namely, improved acoustic language models, cross...

10.21437/interspeech.2006-370 article EN Interspeech 2022 2006-09-17

This paper describes the Interactive Systems Lab’s Meeting transcription system, which performs segmentation, speaker clustering as well transcriptions of conversational meeting speech. The system described here was evaluated in NIST’s RT-04S “Meeting” speech evaluation. compares performance our Broadcast News and most recent Switchboard on data both with a newly-trained recognizer. Furthermore we investigate effects automatic segmentation adaptation. Our best achieves 44.5% MDM condition

10.21437/interspeech.2004-186 article EN Interspeech 2022 2004-10-04

The field of speaker identification has recently seen significant advancement, but improvements have tended to be benchmarked on near-field speech, ignoring the more realistic setting far-field-instrumented speakers. In this work we present several findings far-field speech from MIXER5 Corpus, in areas feature extraction, modeling, and multichannel score combination. First, observe that minimum-variance distortionless response (MVDR) features outperform Mel-frequency cepstral coefficient...

10.1109/icassp.2010.5495590 article EN IEEE International Conference on Acoustics Speech and Signal Processing 2010-01-01
Coming Soon ...