NFDI4DS | UHH-SEMS - Publication Details

Copyright Violations and Large Language Models

Permission Redistribution Memorization Fair Use Code (set theory)

DOI: 10.48550/arxiv.2310.13771 Publication Date: 2023-01-01

Abstract Supplemental Material References Cited by

AUTHORS (4)

Antonia Karamolegkou

Jiaang Li

Zhou Li

Anders Søgaard

ABSTRACT

Language models may memorize more than just facts, including entire chunks of texts seen during training. Fair use exemptions to copyright laws typically allow for limited copyrighted material without permission from the holder, but extraction information materials, rather {\em verbatim} reproduction. This work explores issue violations and large language through lens verbatim memorization, focusing on possible redistribution text. We present experiments with a range over collection popular books coding problems, providing conservative characterization extent which can redistribute these materials. Overall, this research highlights need further examination potential impact future developments in natural processing ensure adherence regulations. Code is at \url{https://github.com/coastalcph/CopyrightLLMs}.

SUPPLEMENTAL MATERIAL

Coming soon ....

REFERENCES ()

CITATIONS ()

EXTERNAL LINKS

OPENAIRE - Products OPENALEX - Publications

PlumX Metrics

RECOMMENDATIONS

FAIR ASSESSMENT

Coming soon ....

JUPYTER LAB

Coming soon ....