Indoor Scene Change Captioning Based on Multimodality Data

Closed captioning RGB color model Scene statistics
DOI: 10.3390/s20174761 Publication Date: 2020-08-24T01:28:06Z
ABSTRACT
This study proposes a framework for describing scene change using natural language text based on indoor observations conducted before and after change. The recognition of changes plays an essential role in variety real-world applications, such as anomaly detection. Most understanding research has focused static scenes. existing captioning methods detect from single-view RGB images, neglecting the underlying three-dimensional structures. Previous use simulated scenes consisting geometry primitives, making it unsuitable applications. To solve these problems, we automatically generated large-scale caption datasets. We propose end-to-end various input modalities, namely, depth point cloud data, which are available most robot experiments with modalities models evaluated model performance datasets levels complexity. Experimental results show that combine images data achieve high sentence generation correctness robust type developed contribute to understanding.
SUPPLEMENTAL MATERIAL
Coming soon ....
REFERENCES (62)
CITATIONS (20)
EXTERNAL LINKS
PlumX Metrics
RECOMMENDATIONS
FAIR ASSESSMENT
Coming soon ....
JUPYTER LAB
Coming soon ....