Modelling Software Development Processes of Machine Learning Systems

In the rapidly evolving landscape of technology, machine learning (ML) stands out as a transformative force, reshaping the functioning and adaptability of software systems. As a subset of artificial intelligence, ML empowers systems to autonomously learn patterns, make decisions, and enhance performance without explicit programming. These ML systems, comprising processes that transform raw data into mathematical algorithms or statistical models, play a pivotal role in making predictions or decisions across diverse domains. However, despite the proliferation of ML applications, the development of ML systems remains a challenging facet of software engineering. Developing ML systems has all the challenges of non-ML systems, plus the additional unique challenges of ML-specific projects. The iterative nature of ML system development, characterized by continuous data management, model learning, and the optimization of models through experiments, introduces dynamism and uncertainty into the development process. The lack of a defined development process and interdisciplinary teams of people with different technical backgrounds in ML projects have made collaboration and communication challenging, leading to a need for a common and standardized language. In response to these challenges, there is a critical demand for systematic and standardized approaches to model the development process and manage the complexities inherent in ML systems. Although there are some frameworks already proposed, such as CRISP-DM, for developing ML systems, there is still a gap for a formalized and standardized representation of an ML system development process model. This calls for the adoption of robust modeling methodologies that not only capture the complexities of ML development but also promote communication, collaboration, and adaptability in ML projects based on standardized process models. To pave the way for innovative solutions, this thesis delves into the concept of modeling the software development process within ML systems with the aid of metamodels. The exploration involves a systematic literature review (SLR) to pinpoint the key processes in ML system development, including core process elements such as activities, roles, and artifacts. The SLR results contribute to defining ML system development processes from a comprehensive perspective, forming the basis for introducing a process model. Additionally, we investigate situational factors, such as organization size or ML system requirements, that influence this generalized process model, emphasizing the need for tailored approaches. Finally, we present a process model incorporating formal notations based on the Software Process Engineering Metamodel (SPEM 2.0), which addresses the current absence of a standardized process model for ML system development.

Project information



Thesis for degree:



Mahta Khoobi

Part of research project:

SE4ML - Processes, People and Tools