Offline prompt reinforcement learning method based on feature extraction
ENCODE
DOI:
10.7717/peerj-cs.2490
Publication Date:
2025-01-02T09:08:13Z
AUTHORS (5)
ABSTRACT
Recent studies have shown that combining Transformer and conditional strategies to deal with offline reinforcement learning can bring better results. However, in a conventional scenario, the agent receive single frame of observations one by according its natural chronological sequence, but Transformer, series are received at each step. Individual features cannot be extracted efficiently make more accurate decisions, it is still difficult generalize effectively for data outside distribution. We focus on characteristic few-shot pre-trained models, combine prompt enhance ability real-time policy adjustment. By sampling specific information dataset as trajectory samples, task encoded help model quickly understand characteristics sequence generation paradigm adapt downstream tasks. In order dependencies accurately, we also divide fixed-size state blocks input trajectory, extract segmented sub-blocks respectively, finally encode whole into GPT generate decisions accurately. Experiments show proposed method achieves performance than baseline related tasks, generalized new environments tasks better, improves stability accuracy decision making.
SUPPLEMENTAL MATERIAL
Coming soon ....
REFERENCES (34)
CITATIONS (0)
EXTERNAL LINKS
PlumX Metrics
RECOMMENDATIONS
FAIR ASSESSMENT
Coming soon ....
JUPYTER LAB
Coming soon ....