Lyra
In the past few years, data science has grown considerably in importance and now heavily influences many domains, ranging from computer science to biology, medicine, economics and finance. As we rely more and more on data science for making decisions, we become increasingly vulnerable to programming errors. The likelihood that errors would remain unnoticed is particularly high for data science, as code is often written by domain experts rather than software engineers. Flawed code causes huge monetary losses in financial applications. In medical applications, programming errors can be deadly.
This research project is the first step in a longer research effort to enhance the reliability of data science code. The goal of the project is the development of foundations for the static analysis of data science code to provide rigorous mathematical guarantees of its behavior. For this purpose, we are currently targeting Python, one of the most popular programming languages for data science.
Project Members
The project has been completed. Please contact Peter Müller in case of questions or comments.
External Collaborators
Links
The project is open-source and available external page on GitHub
Publications
- M. Hassan and C. Urban and M. Eilers and P. Müller, MaxSMT-Based Type Inference for Python 3
In Computer Aided Verification (CAV), 2018. [Download PDF]
- C. Urban and P. Müller, An Abstract Interpretation Framework for Input Data Usage
In European Symposium on Programming (ESOP), 2018. [Download PDF]
Completed Student Projects
- Radwa Sherif Abdelbar, Bachelor's Thesis, SS 2018
Automatic Checking of Implicit Assumptions on Textual Data
- Lowis Engel, Bachelor's Thesis, SS 2018
Usage Analysis of Data Stored in Map Data Structures
- Madelin Schumacher, Master's Thesis, SS 2017
Automated Generation of Data Quality Checks
- Simon Wehrli, Master's Thesis, SS 2017
Static Program Analysis of Data Usage Properties - Mostafa Hassan, Bachelor's Thesis, SS 2017
Static Type Inference for Python
Acknowledgments
The Lyra project has been funded by ETH Zurich.