FindVehicle and VehicleFinder: a NER dataset for natural language-based vehicle retrieval and a keyword-based cross-modal vehicle retrieval system

FOS: Computer and information sciences Computer Vision and Pattern Recognition (cs.CV) 0502 economics and business 05 social sciences Computer Science - Computer Vision and Pattern Recognition Computer Science - Multimedia Multimedia (cs.MM)
DOI: 10.1007/s11042-023-16373-y Publication Date: 2023-08-14T10:03:13Z
ABSTRACT
Abstract Natural language (NL) based vehicle retrieval is a task aiming to retrieve that most consistent with given NL query from among all candidate vehicles. Because can be easily obtained, such has promising prospect in building an interactive intelligent traffic system (ITS). Current solutions mainly focus on extracting both text and image features mapping them the same latent space compare similarity. However, existing methods usually use dependency analysis or semantic role-labelling techniques find keywords related attributes. These may require lot of pre-processing post-processing work, also suffer wrong keyword when complex. To tackle these problems simplify, we borrow idea named entity recognition (NER) construct FindVehicle, NER dataset domain. It 42.3k labelled descriptions tracks, containing information as location, orientation, type colour vehicle. FindVehicle adopts overlapping entities fine-grained meet further requirements. verify its effectiveness, propose baseline NL-based model called VehicleFinder. Our experiment shows by using encoders pre-trained VehicleFinder achieves 87.7% precision 89.4% recall retrieving target command our homemade UA-DETRAC [1]. From loading into identifying whether command, time cost 279.35 ms one ARM v8.2 CPU 93.72 RTX A4000 GPU, which much faster than Transformer-based system. The open-source via link https://github.com/GuanRunwei/FindVehicle , implementation found https://github.com/GuanRunwei/VehicleFinder-CTIM .
SUPPLEMENTAL MATERIAL
Coming soon ....
REFERENCES (52)
CITATIONS (3)