The pragmatic guide to create value with testing for machine learning

There are plenty of resources on the internet about testing in software engineering. However, as a data scientist, the code you need to test is often very different: The functions often input and output complicated data structures like dataframes, arrays, tensors, etc. The code is often very slow to run (e.g. a model that takes hours to train) Results of a function can be non-deterministic (e.g. a random forest model or a api call to a ML-service) The code is often very coupled to the data (e.g. a function that does preprocessing of a dataframe) The code is often very coupled to the model (e.g. a function that trains a model) We often need to test the whole pipeline (e.g. a function that trains a model and then evaluates it) Note this article is about functional testing and not evaluation of the model. The goal of testing is to make sure the code works and keeps working. The goal of evaluation is to make sure the model is good enough for the business case. ...

August 30, 2023 · 6 min · Martin Møldrup