Copyright Violations and Large Language Models
Permission
Redistribution
Memorization
Fair Use
Code (set theory)
DOI:
10.48550/arxiv.2310.13771
Publication Date:
2023-01-01
AUTHORS (4)
ABSTRACT
Language models may memorize more than just facts, including entire chunks of texts seen during training. Fair use exemptions to copyright laws typically allow for limited copyrighted material without permission from the holder, but extraction information materials, rather {\em verbatim} reproduction. This work explores issue violations and large language through lens verbatim memorization, focusing on possible redistribution text. We present experiments with a range over collection popular books coding problems, providing conservative characterization extent which can redistribute these materials. Overall, this research highlights need further examination potential impact future developments in natural processing ensure adherence regulations. Code is at \url{https://github.com/coastalcph/CopyrightLLMs}.
SUPPLEMENTAL MATERIAL
Coming soon ....
REFERENCES ()
CITATIONS ()
EXTERNAL LINKS
PlumX Metrics
RECOMMENDATIONS
FAIR ASSESSMENT
Coming soon ....
JUPYTER LAB
Coming soon ....