PyTerrier: Declarative Experimentation in Python from BM25 to Dense Retrieval
dense retrieval; experimentation; neural ranking
0202 electrical engineering, electronic engineering, information engineering
02 engineering and technology
DOI:
10.1145/3459637.3482013
Publication Date:
2021-10-30T18:34:11Z
AUTHORS (4)
ABSTRACT
PyTerrier is a Python-based retrieval framework for expressing simple and complex information retrieval (IR) pipelines in a declarative manner. While making use of the long-established Terrier IR platform for basic text indexing and retrieval, its salient utility comes from its expressive Python operators, which allow for individual IR operations to be pipelined and combined in different flexible manners as requested by the search application. Each operation applies a transformation upon a dataframe, while operators are defined with clear semantics in relational algebra. Going further, we have recently expanded the PyTerrier framework to include additional support for state-of-the-art BERT-based text re-rankers (such as EPIC) and dense retrieval implementations (such as ANCE and ColBERT). Transformer pipelines can be tuned and evaluated in a declarative manner. To increase the reusability of this framework as a resource for the IR community, PyTerrier provides easy access to a variety of standard benchmark datasets, including pre-built indices. Finally, we highlight the advantages of such a framework for information retrieval researchers and educators.
SUPPLEMENTAL MATERIAL
Coming soon ....
REFERENCES (43)
CITATIONS (64)
EXTERNAL LINKS
PlumX Metrics
RECOMMENDATIONS
FAIR ASSESSMENT
Coming soon ....
JUPYTER LAB
Coming soon ....