Dataverse: Open-Source ETL (Extract, Transform, Load) Pipeline for Large Language Models

Source model
DOI: 10.48550/arxiv.2403.19340 Publication Date: 2024-03-28
ABSTRACT
To address the challenges associated with data processing at scale, we propose Dataverse, a unified open-source Extract-Transform-Load (ETL) pipeline for large language models (LLMs) user-friendly design its core. Easy addition of custom processors block-based interface in Dataverse allows users to readily and efficiently use build their own ETL pipeline. We hope that will serve as vital tool LLM development open source entire library welcome community contribution. Additionally, provide concise, two-minute video demonstration our system, illustrating capabilities implementation.
SUPPLEMENTAL MATERIAL
Coming soon ....
REFERENCES ()
CITATIONS ()
EXTERNAL LINKS
PlumX Metrics
RECOMMENDATIONS
FAIR ASSESSMENT
Coming soon ....
JUPYTER LAB
Coming soon ....