In this article, I will introduce a development process can make an incredible difference in individual and team productivity. By focusing on getting feedback faster with this concept, my team and I have greatly increased our speed of delivery, and we are learning faster. The focus upon fast feedback is an important mindset that, although quite simple and basic, is often overlooked or down prioritized. By introducing the differentiation between inner and outer development loops, it becomes much easier to discover when faster feedback is needed and how to achieve it. I will introduce a development process tweaked for AI project. The framework will help you to identify what habits and practices to focus on to improve your development process both for traditional software development and for developing AI projects.
Small habits and practices are what differentiates the good from the great developers. It requires effort and time to change habits, but it is worth it.
Let’s start by giving you a quick overview of what we will be covering. Below you can see the two development processes that we will be introducing in this article:
As shown in the figure at a high level, the development process consists of two main parts:
- Inner Loop(s): Focuses on fast development and testing. For AI projects, I recommend having many inner loops of different cycle times, each focusing on a specific aspect of the AI system, such as data preprocessing, coding, model evaluation, etc.
- Outer Loop: Automation for comprehensive integration testing, user acceptance testing, deployment, and continuous monitoring. For AI projects, the outer loop is concerned with the entire pipeline from data collection to model deployment, including steps like data curation, model training, and model evaluation.
When developing AI projects, a well-suited and well-defined development process can greatly increase the speed of development and the quality of the product. The traditional software development loops model is a well-known concept and is often described as having an inner and outer loop, which is an important distinction to make because being efficient in both loops is crucial for the success of any software project, including AI projects. In this article, we will tweak the traditional software development loop to better suit AI projects, while still keeping the core principles intact, but first we need to understand the traditional software development loop.
The Traditional Software Development Loop
Let’s start by understanding the traditional software development loop. Data scientists and AI engineers are just a special kind of software developers, and if you are creating full apps for production it will often contain some parts to be managed using the traditional software development loop, so lets make sure we understand it.
As mentioned already the traditional software development loop consists of two main parts: the inner loop and the outer loop. The inner loop focuses on the development and testing of the code, while the outer loop focuses on the deployment and monitoring of the application.
Inner Loop
During development it is important to start from the inner loop(s) and move out to the outer loop to get fast feedback. If the inner loop fails, the outer loop will also fail.
The inner loop typically consists of the following steps:
- 🟦 Code: Writing the actual code for the application, this includes help from linters and type checkers. Writing tests might be part of the step if you apply test-driven-development (TDD).
- 🟥 Test: Running tests to ensure the code works as expected.
- 🟪 Debug: Identifying and fixing any issues or bugs in the code.
- ⬜ Refactor: Improving the code structure without changing its functionality.
- Repeat: Repeating the cycle to continuously improve the code until it meets the required standards.
Tweaking the inner loops requires much skill, time, and effort, you need to learn how to write well-structured code, how to write good tests, how to debug effectively, and how to refactor code without introducing new bugs.
Ways to have really fast inner loops:
- Use debugging tools, like
pdb
or a breakpoint in your IDE, then use the interactive debug console to write new code, or copy the failing code into the console to iterate fast on different solutions. Use thedir()
,help()
on objects to understand them better, or theinspect
module for more advanced introspection. - Split your code into small functions and test them individually. Be really good at writing unit tests, mocking out dependencies, managing test data and using fixtures.
- Use test-driven development (TDD) principles to get quick feedback on your code
- Become good at using generative AI tools build into the IDE (for instance GitHub Copilot or the Cursor IDE)
- Use a Jupyter notebook to avoid the overhead of running the whole script (not recommended)
- Develop a small script that can be run from the command line to test or validate a certain behaviour of the code. But make sure to use unit and integration tests where suitable.
The inner loop should be where you spend most of your time, and it should therefore be a good experience to work in this loop. Setup your IDE so that it is fun and easy to work here, with GenAI assistance, formatter, linters, type checkers, automation scripts and other tools that can help you remove friction in the development process. Checkout toolit my open-source Python library to making invoking development/automation tools easy and friction-free, it also automatically makes your tools available in an Model-Context-Protocol (MCP) server for use with AI agents (github copilot or cursor IDE).
Outer Loop
Outer loops can be implemented in many different ways, but the following steps is the simplest version of the outer loop that I have found to work well. It can be expanded with more steps for improved quality assurance.
The outer loop includes the following steps:
- 🟥 Automated QA Pipeline: Running automated quality assurance tests to ensure the code meets the required standards. This can include static code analysis, code formatting checks, unit tests, and more.
- 🟨 Deploy @Sand: Deploying the code to a sandbox environment for further testing.
- 🟥 Automated System Testing: Running automated tests (for instance smoke tests) to check the basic functionality of the application.
- 🟩 PR Code Review: Conducting a peer review of the code changes through pull requests.
- 🟨 Deploy @Test: Deploying the code to a test environment for more comprehensive testing.
- 🟥 Manual End2End Testing: Performing manual end-to-end testing to ensure the application works as expected from start to finish.
- 🟨 Deploy @Prod: Deploying the code to the production environment.
- 🟩 Monitor: Monitoring the application in production to ensure it runs smoothly and to identify any issues.
- Repeat: We move back to the inner loop if any issues are found and repeat the cycle.
This loop ensures that the software is developed, tested, and deployed in a systematic and efficient manner, leading to higher quality and more reliable applications. Because much of the process is automated it ensures you can often deploy and feel confident that the code works as expected.
Tweaking the development loops for AI projects
Developing an AI system is a bit different from traditional software development. Often an AI system is a combination of several different non-deterministic components, each working together to solve a specific problem. Each of these components have many hyperparameters and configurations that can be adjusted that have an impact on behaviour and accuracy of the AI system. This means that we require a different approach to development than what is described above. The additional dimensions of things to test and evaluate, need separate loops to measure and evaluate the behaviour these elements in separation from the rest of the system. As before, we need to ensure that the inner loops are fast enough to enable us to iterate quickly to improve the AI system.
Inner loops for AI projects
The inner loop is even more important in AI projects. In AI projects, I prefer to have many inner loops of different cycle times. Examples of inner loops are:
- Data preprocessing loop including data cleaning, feature engineering, and data augmentation. For faster development work on a small subset of data to confirm the code works, then scale it up to the full dataset.
- Coding loop including linting, type checking, and unit testing
- Model behavioural testing loop. It can be a good idea to have local scripts that test certain aspects of the model. For an LLM RAG system, it could be how good is the retrieval, how good is table reading, how good is it at rewriting questions based on the chat history, etc.
- Model evaluation loop measuring accuracy, precision, recall, F1 score, etc. This is a longer loop that gives the overall performance and should be run on the cloud where the results are stored to ensure lineage, reproducibility, and have a history of the model performance. This will make it easier for team members to understand what has been done and why.
There should be a focus on optimizing the inner loops to cover important aspects of the AI system and provide quick feedback.
Keeping your development loops small and focused becomes even more critical as your project grows. Multiple loops at different abstraction levels help you test specific aspects of your AI system efficiently. Always start with the fastest, most targeted loop that can give you the feedback you need.
Principle: Use the smallest, quickest loop that answers your current question or validates your code.
If your code fails in a fast inner loop, it will also fail in the slower, broader loops. Mastering these disciplines and choosing the right loop for each task is key to building robust AI systems fast.
I have developed a Python package called snappylapy, a tool that can help to easily increase the speed of the inner loops. It makes it really easy to manage data in unit tests when splitting a script into separate steps that can be tested individually. It solves the dilemma of needing tests to be independent, but also test the integration of the different steps, and keeping test data up-to-date. It handles snapshot testing, fuzzy expectations and much more. Read more about it here: snappylapy at GitHub.
I encourage you to think about how to tweak the inner loops to be faster. It takes practice and requires changes in development habits, but trust me - it is worth it.
The AI outer loop for ML Model Operations
The outer loop for AI projects is also a bit different from traditional software development.
For the MLOps model there are three categories of steps: the pipeline steps, the data steps and the ML steps. The pipeline steps are concerned with deploying a pipeline that can be used for training and evaluating the AI model. The pipeline can be configured for continuous training, where new model is trained as new data comes in. The data steps are concerned with gathering raw data from various sources that will be used for training and evaluating the AI model. The ML steps are for training the AI model, adjusting its parameters to improve its performance.
The outer loop can consist of the following steps:
- 🟦 Pipeline Validation: Ensuring that the entire pipeline, from data collection to model deployment, works as expected and meets the required standards.
- 🟦 Pipeline Deployment: Deploying the validated pipeline to a production environment where it can be used for real-world data processing and model training.
- 🟩 Data Collection: Gathering raw data from various sources that will be used for training and evaluating the AI model.
- 🟩 Data Curation: Cleaning and organizing the collected data to ensure it is of high quality and suitable for training the model.
- 🟩 Data Transformation: Converting the curated data into a format that can be used by the AI model, which may include normalization, encoding, and feature extraction.
- 🟩 Data Validation: Verifying that the transformed data is accurate, complete, and suitable for training the model.
- 🟪 Model Training: Using the validated data to train the AI model, adjusting its parameters to improve its performance.
- 🟪 Model Evaluation: Assessing the trained model’s performance using various metrics such as accuracy, precision, recall, and F1 score to ensure it meets the desired criteria. Also ran in an inner loop, but here we want to store a versioned benchmark, preferably on a different dataset than used in the inner loop, to ensure the model generalizes well.
- 🟪 Model Deployment: Deploying the trained and evaluated model to a production environment where it can be used to make predictions on new data.
- 🟪 Model Monitoring: Continuously monitoring the deployed model’s performance in the production environment to ensure it remains accurate and reliable.
- 🟪 Model Feedback: Collecting feedback from the model’s performance in the real world and using it to retrain or improve the AI system, ensuring it adapts to new data and scenarios.
The outer loop ensures that the AI system is developed, tested, and deployed in a systematic and efficient manner, leading to higher quality and more reliable applications.
In summary, understanding and optimizing both the inner and outer development loops is crucial for the success of AI projects. By focusing on fast, iterative inner loops and robust, automated outer loops, you can significantly enhance the quality and speed of your development process. Remember, the key is to start small, iterate quickly, and continuously improve. Whether you’re working on traditional software or cutting-edge AI systems, these principles will help you deliver better results more efficiently. Happy coding!