BioCoder: A Benchmark for Bioinformatics Code Generation with Contextual Pragmatic Knowledge

Python Benchmark (surveying) Code (set theory)
DOI: 10.48550/arxiv.2308.16458 Publication Date: 2023-01-01
ABSTRACT
Pre-trained large language models have significantly improved code generation. As these scale up, there is an increasing need for the output to handle more intricate tasks and be appropriately specialized particular domains. Here, we target bioinformatics due amount of domain knowledge, algorithms, data operations this discipline requires. We present BioCoder, a benchmark developed evaluate (LLMs) in generating bioinformatics-specific code. BioCoder spans broad spectrum field covers cross-file dependencies, class declarations, global variables. It incorporates 1026 Python functions 1243 Java methods extracted from GitHub, along with 253 examples Rosalind Project, all pertaining bioinformatics. Using topic modeling show that overall coverage included representative full calculations. fuzz-testing framework evaluation. applied it many including InCoder, CodeGen, CodeGen2, SantaCoder, StarCoder, StarCoder+, InstructCodeT5+, GPT-3.5, GPT-4. Furthermore, finetuned demonstrating how our dataset can effectively enhance performance LLMs on (by >15% terms Pass@K certain prompt configurations always >3%). The results highlight two key aspects successful models: (1) Successful accommodate long (> ~2600 tokens) context, functional dependencies. (2) They contain specific knowledge bioinformatics, beyond just general coding knowledge. This evident gain GPT-3.5/4 compared smaller (50% vs up ~25%). Our dataset, benchmark, Docker images, scripts required testing are available at https://github.com/gersteinlab/biocoder.
SUPPLEMENTAL MATERIAL
Coming soon ....
REFERENCES ()
CITATIONS ()
EXTERNAL LINKS
PlumX Metrics
RECOMMENDATIONS
FAIR ASSESSMENT
Coming soon ....
JUPYTER LAB
Coming soon ....