Leveraging Image-Text Similarity and Caption Modification for the DataComp Challenge: Filtering Track and BYOD Track

Similarity (geometry)
DOI: 10.48550/arxiv.2310.14581 Publication Date: 2023-01-01
ABSTRACT
Large web crawl datasets have already played an important role in learning multimodal features with high generalization capabilities. However, there are still very limited studies investigating the details or improvements of data design. Recently, a DataComp challenge has been designed to propose best training fixed models. This paper presents our solution both filtering track and BYOD challenge. Our adopts large models CLIP BLIP-2 filter modify data, utilize external along bag tricks improve quality. Experiments show significantly outperforms baselines (filtering track: 6.6% improvement, 48.5% improvement).
SUPPLEMENTAL MATERIAL
Coming soon ....
REFERENCES ()
CITATIONS ()
EXTERNAL LINKS
PlumX Metrics
RECOMMENDATIONS
FAIR ASSESSMENT
Coming soon ....
JUPYTER LAB
Coming soon ....