Effectively Utilizing the Category Labels for Image Captioning

Closed captioning Boosting
DOI: 10.1587/transinf.2022dlp0013 Publication Date: 2023-04-30T22:23:53Z
ABSTRACT
As a further investigation of the image captioning task, some works extended vision-text dataset for specific subtasks, such as stylized caption generating. The corpus in is usually composed obvious sentiment-bearing words. While, special cases, captions are classified depending on category. This will result latent problem: generated sentences close semantic meaning but belong to different or even opposite categories. It worthy issue explore an effective way utilize category label boost difference. Therefore, we proposed network with control mechanism (LCNET) this paper. First, improve difference, LCNET employs enhancement module provide decoder global vectors. Then, through LSTM, can dynamically modulate generation labels. Finally, integrates spatial features vectors output caption. Using all standard evaluation metrics shows that our model outperforms compared models. Caption analysis demonstrates approach performance representation. Compared other mechanisms, capable boosting difference according labels and keeping better consistent content well.
SUPPLEMENTAL MATERIAL
Coming soon ....
REFERENCES (33)
CITATIONS (0)
EXTERNAL LINKS
PlumX Metrics
RECOMMENDATIONS
FAIR ASSESSMENT
Coming soon ....
JUPYTER LAB
Coming soon ....