NFDI4DS | UHH-SEMS - Publication Details

EnCLAP: Combining Neural Audio Codec and Audio-Text Joint Embedding for Automated Audio Captioning

Codec Closed captioning

DOI: 10.48550/arxiv.2401.17690 Publication Date: 2024-01-31

Abstract Supplemental Material References Cited by

AUTHORS (4)

Jaeyeon Kim

Jae‐Yoon Jung

Jinjoo Lee

Sang Hoon Woo

ABSTRACT

We propose EnCLAP, a novel framework for automated audio captioning. EnCLAP employs two acoustic representation models, EnCodec and CLAP, along with pretrained language model, BART. also introduce new training objective called masked codec modeling that improves awareness of the model. Experimental results on AudioCaps Clotho demonstrate our model surpasses performance baseline models. Source code will be available at https://github.com/jaeyeonkim99/EnCLAP . An online demo is https://huggingface.co/spaces/enclap-team/enclap

SUPPLEMENTAL MATERIAL

Coming soon ....

REFERENCES ()

CITATIONS ()

EXTERNAL LINKS

OPENALEX - Publications

PlumX Metrics

EnCLAP: Combining Neural Audio Codec and Audio-Text Joint Embedding for Automated Audio Captioning

RECOMMENDATIONS

FAIR ASSESSMENT

Coming soon ....

JUPYTER LAB

Coming soon ....