Generation of Experiment Workflow Models from Juypter Notebooks

The popularity of computational notebooks, e.g., Jupyter Notebooks, for experimenting in the field of machine learning has been increasing recently, with users benefiting from easy collaboration and sharing of results. However, problems with computational notebooks have been identified as well, like a lack of comprehensibility, no immediately visible flow of data, and results not being reproducible. Research has shown that graphical representations of the workflow of notebooks are a promising assistance for the involved stakeholders. We identify three objectives to overcome sub-problems of Jupyter Notebooks, namely providing a visual representation of the workflow of Jupyter Notebooks, visualizing the flow of data, and automatically tracking machine learning artifacts when cells are executed. A concept to achieve those objectives is developed, and by conducting interviews, we obtain positive feedback regarding its usefulness. We finally implement this concept in the form of a JupyterLab extension called JupyFlow. This extension is capable of generating workflow models from Jupyter Notebooks with the help of large language models, as well as tracking cell executions and logging the relevant artifacts with the help of external machine learning tools. The collected information is provided to the user in the form of an interactive user interface.

Project information

Status:

Finished

Thesis for degree:

Master

Student:

David Kierdorf

Supervisor:
Part of research project:

SE4ML - Processes, People and Tools

Id:

2024-010