NFDI4DS | UHH-SEMS - Publication Details

Zhuoyao Li

ORCID: 0000-0003-0295-0897

Publications

Citations

Views

---

Saved

---

About

Contact & Profiles

A5066542435

Research Areas

Music and Audio Processing
Music Technology and Sound Studies
Speech and Audio Processing
Hearing Loss and Rehabilitation
Multisensory perception and integration

National University of Singapore
2023

Towards Controllable Audio Texture Morphing

OPENALEX - Publications

Chitralekha Gupta Purnima Kamath Yize Wei Zhuoyao Li Suranga Nanayakkara and 1 more

In this paper, we propose a data-driven approach to train Generative Adversarial Network (GAN) conditioned on "soft-labels" distilled from the penultimate layer of an audio classifier trained target set texture classes. We demonstrate that interpolation between such conditions or control vectors provide smooth morphing generated textures, and show similar better capability compared state-of-the-art methods. The proposed results in well-organized latent space generates novel outputs while...

10.1109/icassp49357.2023.10096328 article EN ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2023-05-05

Evaluating Descriptive Quality of AI-Generated Audio Using Image-Schemas

OPENALEX - Publications

Purnima Kamath Zhuoyao Li Chitralekha Gupta Kokil Jaidka Suranga Nanayakkara and 1 more

Novel AI-generated audio samples are evaluated for descriptive qualities such as the smoothness of a morph using crowdsourced human listening tests. However, methods to design interfaces experiments and effectively articulate quality under test receive very little attention in evaluation metrics literature. In this paper, we explore use visual metaphors image-schema evaluate audio. Furthermore, highlight importance framing contextualizing measurement constructs. Using both pitched sounds...

10.1145/3581641.3584083 article EN 2023-03-27

Towards Controllable Audio Texture Morphing

OPENALEX - Publications

Chitralekha Gupta Purnima Kamath Yize Wei Zhuoyao Li Suranga Nanayakkara and 1 more

In this paper, we propose a data-driven approach to train Generative Adversarial Network (GAN) conditioned on "soft-labels" distilled from the penultimate layer of an audio classifier trained target set texture classes. We demonstrate that interpolation between such conditions or control vectors provides smooth morphing generated textures, and shows similar better capability compared state-of-the-art methods. The proposed results in well-organized latent space generates novel outputs while...

10.48550/arxiv.2304.11648 preprint EN cc-by-nc-sa arXiv (Cornell University) 2023-01-01

Coming Soon ...