Tool-Based Discovery of Machine Learning Software Development Processes

Machine Learning (ML) is a component of artificial intelligence and data science, enabling computer systems to improve through experience and by feeding on large volumes of data. The nature of ML system development is an iterative process characterized by different phases. Unlike standard software, ML systems are dependent on the data, the high dimensionality and diversity of which increases the complexity of this dependency. Additionally, there is a variety of ML tools developed to handle tasks ranging from the development of ML models to deployment and maintenance. However, there is no universal solution and hence no best tool for the entire ML development process, making the tool selection complex. In response to this challenge, in this thesis a practical solution is proposed, following a systematic approach, Grounded Requirements Engineering. The research procedure involves analyzing and documenting tool information using examples or tutorials on tool use cases available on their respective websites. The hierarchical grouping of the extracted activities from the tool analysis will form a tool taxonomy, aiding in understanding functionalities and facilitating tool selection. In the end, using the taxonomy to represent the ML phases and activities when creating the ML workflows that were derived from the analyzed tool use cases will offer a better insight of each tool’s contributions to distinct phases.

Project information



Thesis for degree:



Sara Prifti

Part of research project:

SE4ML - Processes, People and Tools