NFDI4DS | UHH-SEMS - Publication Details

Some Like It Small: Czech Semantic Embedding Models for Industry Applications

FOS: Computer and information sciences Computer Science - Computation and Language 9. Industry and infrastructure Computation and Language (cs.CL) Information Retrieval (cs.IR) Computer Science - Information Retrieval

DOI: 10.1609/aaai.v38i21.30307 Publication Date: 2024-03-25T13:03:20Z

Abstract Supplemental Material References Cited by

AUTHORS (4)

Jiří Bednář

Jakub Náplava

Petra Barančíková

Ondřej Lisický

ABSTRACT

This article focuses on the development and evaluation of Small-sized Czech sentence embedding models. Small models are important components for real-time industry applications in resource-constrained environments. Given limited availability labeled data, alternative approaches, including pre-training, knowledge distillation, unsupervised contrastive fine-tuning, investigated. Comprehensive intrinsic extrinsic analyses conducted, showcasing competitive performance our compared to significantly larger counterparts, with approximately 8 times smaller size 5 faster speed than conventional Base-sized To promote cooperation reproducibility, both pipeline made publicly accessible. Ultimately, this presents practical developed Seznam.cz, search engine. These have effectively replaced previous enhancing overall experience instance, organic search, featured snippets, image search. transition has yielded improved performance.

SUPPLEMENTAL MATERIAL

Coming soon ....

REFERENCES (0)

CITATIONS (0)

EXTERNAL LINKS

CROSSREF - Publications OPENAIRE - Products OPENALEX - Publications

PlumX Metrics

Some Like It Small: Czech Semantic Embedding Models for Industry Applications

RECOMMENDATIONS

FAIR ASSESSMENT

Coming soon ....

JUPYTER LAB

Coming soon ....