Comprehending Data Science Workflows.
How can data scientists be aided in understanding data science code (workflows)?
Understanding code is a complex process. Data scientists do not only have to understand the code but also untangle the workflow and its reasoning that is implicit. Furthermore, data science code is generally implemented in a notebook which has a linear structure. A successful understanding can aid other activities like evaluating the solution, exploring other valid solutions, reasoning about the choices of features and methods, and aiding decision-making. While program understanding is a well-established area in software engineering (SE), it is still at a nascent stage when it comes to data science code. In my research work, I investigate how existing traditional SE methods can be applied for program understanding in data science.
Related Publications
[1] Ramasamy et. al, “Visualising Data Science Workflows to Support Third-Party Notebook Comprehension: An Empirical Study”, Empirical Software Engineering Journal 28, Article number: 58 (2023). Read more in the blog.