Exploring and Characterizing Large Language Models For Embedded System Development and Debugging

Leverage (statistics)
DOI: 10.48550/arxiv.2307.03817 Publication Date: 2023-01-01
ABSTRACT
Large language models (LLMs) have shown remarkable abilities to generate code, however their ability develop software for embedded systems, which requires cross-domain knowledge of hardware and has not been studied. In this paper we an extensible, open source hardware-in-the-loop framework systematically evaluate leading LLMs (GPT-3.5, GPT-4, PaLM 2) assess capabilities limitations system development. We observe through our study that even when these tools fail produce working they consistently helpful reasoning about design tasks. leverage finding how human programmers interact with tools, human-AI based engineering workflow building systems. Our evaluation platform verifying LLM generated programs uses sensor actuator pairs physical evaluation. compare all three N=450 experiments find surprisingly GPT-4 especially shows exceptional level understanding reasoning, in some cases generating fully correct from a single prompt. N=50 trials, produces functional I2C interfaces 66% the time. also register-level drivers, code LoRa communication, context-specific power optimizations nRF52 program resulting over 740x current reduction 12.2uA. characterize models' generalizable using 15 users including novice expert programmers. improves productivity increases success rate environmental 25% 100%, zero or C/C++ experience.
SUPPLEMENTAL MATERIAL
Coming soon ....
REFERENCES ()
CITATIONS ()
EXTERNAL LINKS
PlumX Metrics
RECOMMENDATIONS
FAIR ASSESSMENT
Coming soon ....
JUPYTER LAB
Coming soon ....