A Strategy-Aware Technique for Learning Behaviors from Discrete Human Feedback
Trainer
DOI:
10.1609/aaai.v28i1.8839
Publication Date:
2022-06-23T10:03:14Z
AUTHORS (7)
ABSTRACT
This paper introduces two novel algorithms for learning behaviors from human-provided rewards. The primary novelty of these is that instead treating the feedback as a numeric reward signal, they interpret form discrete communication depends on both behavior trainer trying to teach and teaching strategy used by trainer. For example, some human trainers use lack indicate whether actions are correct or incorrect, interpreting this accurately can significantly improve speed. Results user studies show humans variety training strategies in practice learn contextual bandit task faster than treat numeric. Simulated also employed evaluate sequential decision-making tasks with similar results.
SUPPLEMENTAL MATERIAL
Coming soon ....
REFERENCES (0)
CITATIONS (15)
EXTERNAL LINKS
PlumX Metrics
RECOMMENDATIONS
FAIR ASSESSMENT
Coming soon ....
JUPYTER LAB
Coming soon ....