Some Like It Small: Czech Semantic Embedding Models for Industry Applications
FOS: Computer and information sciences
Computer Science - Computation and Language
9. Industry and infrastructure
Computation and Language (cs.CL)
Information Retrieval (cs.IR)
Computer Science - Information Retrieval
DOI:
10.1609/aaai.v38i21.30307
Publication Date:
2024-03-25T13:03:20Z
AUTHORS (4)
ABSTRACT
This article focuses on the development and evaluation of Small-sized Czech sentence embedding models. Small models are important components for real-time industry applications in resource-constrained environments. Given limited availability labeled data, alternative approaches, including pre-training, knowledge distillation, unsupervised contrastive fine-tuning, investigated. Comprehensive intrinsic extrinsic analyses conducted, showcasing competitive performance our compared to significantly larger counterparts, with approximately 8 times smaller size 5 faster speed than conventional Base-sized To promote cooperation reproducibility, both pipeline made publicly accessible. Ultimately, this presents practical developed Seznam.cz, search engine. These have effectively replaced previous enhancing overall experience instance, organic search, featured snippets, image search. transition has yielded improved performance.
SUPPLEMENTAL MATERIAL
Coming soon ....
REFERENCES (0)
CITATIONS (0)
EXTERNAL LINKS
PlumX Metrics
RECOMMENDATIONS
FAIR ASSESSMENT
Coming soon ....
JUPYTER LAB
Coming soon ....