TAP – Trustworthy Automated Programming [~Juillet 2024-Juin 2028]

The challenge is to achieve safety and cybersecurity in software without having to author voluminous formal specifications.

Recent developments on automated code generation from large language models (LLMs) provide us with a fresh perspective in this regard. Since LLM based code generation allows for programming from natural language specification – it seems to indicate a promise to achieve the goal of auto-coding or auto-correcting (e.g. VulRepair to automatically correct vulnerabilities). This raises not only the overall question of correctness of automatically generated code, but at what point we can start trusting automatically generated code.

In TAP, we tackle the following questions:

  • What is the acceptable evidence for which code from LLMs can be integrated into a software project? Are the acceptability criteria likely to be more stringent for automatically generated code?
  • Can automatic methods be used to improve safety and security of the code, and how? Can repair techniques generate evidence so that the improved code can be accepted into codebases? Specifically can we attempt to generate proofs of correctness (with respect to security properties) with the use of LLMs in an attempt to improve the code as well as the proof?
  • Can we construct a dataset of LLM code vulnerabilities and give measurable evidence on how the repair techniques, supplemented by LLM based repair, to help in autonomously handling github issues which appear in large-scale real-life library code?