Optimizing Program Verification Performance with Reinforcement Learning

Background

The past decades have seen the development of several tools like Viper [1], which enable the automated verification of concurrent, heap-manipulating imperative programs. They achieve this by generating a set of logical constraints whose validity implies the correctness of the given program, and using an SMT solver to automatically prove these constraints.

Internally, however, these tools can use substantially different verification algorithms, e.g. either symbolic execution (SE) or predicate transformers to extract proof obligations, and may use different ways to represent the heap. Additionally, SE-based tools often offer a plethora of additional configuration options, e.g. to choose how to deal with branches or how aggressively to simplify the state.
For any given program, different algorithms and options may result in substantial differences in verification performance and completeness [2]. Thus, a user looking to verify a given program could often benefit greatly from knowing in advance which algorithm will work best for their use case, but making such a prediction is difficult even for experts.

Further, given a program and a verification algorithm, identifying the combination of verification flags that leads to the fastest verification is a challenging and time-consuming task even for experts. This problem is analogous to compiler flag optimization that is known to be NP-hard [3]. Through experience, it is observed that certain algorithms and verification flags tend to lead to faster verification times for certain classes of programs. Unfortunately, the portability of these results is not guaranteed due to the diversity and the complexity of the verification pipeline.

To reduce the complexity of this laborious workload and make it accessible to non-expert users and domain professionals, we proposed to automate the discovery of the optimal verification configurations by taking advantage of the advances in compiler optimization. Such techniques are referred to in the literature as auto-tuning or auto-scheduling [4, 5]. These techniques appeal to classical search algorithms (e.g. depth first search [5], beam search [6], Monte-Carlo tree search [7]) guided by an evaluation function. Often, this evaluation function is approximated by data-driven predictive models [6, 7, 8, 9, 10, 11, 12] that offer an acceptable trade-off between accuracy and cost, given the significant limitations of analytical models. These predictive models (called cost models in the jargon) are largely trained in fully supervised, offline settings.

Your Contribution

The goal of this project is to build an auto-tuning framework targeting Viper program verification [1] based on deep reinforcement learning. Your contribution can be outlined in 4 stages:

Proposing a characterization of the verified program and its specification, the verification algorithm and the verification flags.
Implementing a space exploration algorithm to determine the next-best verification configuration to apply following a policy.
Designing a deep learning-based cost model and training it by a no-regret algorithm in online settings to clone the behavior of the expert.
Providing an evaluation on real-world benchmarks (such as programs from larger verification efforts or verification conditions) compared against random configurations and the best known configurations found by experts.

Prerequisites

Experience with deductive verification and/or SMT solvers, e.g. from FMFP or Program Verification at ETH, is advantageous but not strictly necessary.
Familiarity with reinforcement learning, for example through the course 263-5210-00L Probabilistic Artificial Intelligence, is highly beneficial.

Contact

Marco Eilers

Mohamed-Hicham Leghettas

References

[1] P. Müller, M. Schwerhoff, and A. J. Summers: Viper: A Verification Infrastructure for Permission-Based Reasoning. Verification, Model Checking, and Abstract Interpretation (VMCAI), 2016.
[2] M. Eilers, M. Schwerhoff, and P. Müller: Verification Algorithms for Automated Separation Logic Verifiers. Computer Aided Verification (CAV), 2024.
[3] Mark Stephenson, Saman Amarasinghe, Martin Martin, and Una-May O’Reilly. Meta optimization: Improving compiler heuristics with machine learning. SIGPLAN Not., 38(5):77–90, may 2003.
[4] Ravi Teja Mullapudi, Andrew Adams, Dillon Sharlet, Jonathan Ragan-Kelley, and Kayvon Fatahalian. Automatically scheduling halide image processing pipelines. ACM Trans. Graph., 35(4), jul 2016.
[5] Amir Hossein Ashouri, William Killian, John Cavazos, Gianluca Palermo, and Cristina Silvano. A survey on compiler autotuning using machine learning. CoRR, abs/1801.04405, 2018.
[6] Andrew Adams, Karima Ma, Luke Anderson, Riyadh Baghdadi, Tzu-Mao Li, Michaël Gharbi, Benoit Steiner, Steven Johnson, Kayvon Fatahalian, Frédo Durand, and Jonathan Ragan-Kelley. Learning to optimize halide with tree search and random programs. ACM Trans. Graph., 38(4), jul 2019.
[7] Riyadh Baghdadi, Massinissa Merouani, Mohamed-Hicham Leghettas, Kamel Abdous, Taha Arbaoui, Karima Benatchba, and Saman Amarasinghe. A deep learning based cost model for automatic code optimization. In Proceedings of the Fourth Conference on Machine Learning and Systems (MLSys), MLSys 2021, Apr 2021.
[8] Christophe Dubach, John Cavazos, Björn Franke, Grigori Fursin, Michael F.P. O’Boyle, and Olivier Temam. Fast compiler optimisation evaluation using code-feature based performance prediction. In Proceedings of the 4th International Conference on Computing Frontiers, CF ’07, page 131–142, New York, NY, USA, 2007. Association for Computing Machinery.
[9] Amir Hossein Ashouri, Andrea Bignoli, Gianluca Palermo, and Cristina Silvano. Predictive modeling methodology for compiler phase-ordering. In Proceedings of the 7th Work- shop on Parallel Programming and Run-Time Management Techniques for Many-Core Architectures and the 5th Workshop on Design Tools and Architectures For Multicore Embedded Computing Platforms, PARMA-DITAM ’16, page 7–12, New York, NY, USA, 2016. Association for Computing Machinery.
[10] Charith Mendis, Saman P. Amarasinghe, and Michael Carbin. Ithemal: Accurate, portable and fast basic block throughput estimation using deep neural networks. CoRR, abs/1808.07412, 2018.
[11] Tianqi Chen, Lianmin Zheng, Eddie Yan, Ziheng Jiang, Thierry Moreau, Luis Ceze, Carlos Guestrin, and Arvind Krishnamurthy. Learning to optimize tensor programs. In Proceedings of the 32nd International Conference on Neural Information Processing Systems, NIPS’18, page 3393–3404, Red Hook, NY, USA, 2018. Curran Associates Inc.
[12] Sanket Tavarageri, Alexander Heinecke, Sasikanth Avancha, Gagandeep Goyal, Ramakrishna Upadrasta, and Bharat Kaul. Polyscientist: Automatic loop transformations combined with microkernels for optimization of deep learning primitives. CoRR, abs/2002.02145, 2020.