How are data science workflows implemented in real-life?

Past literature on understanding data science has primarily focused on data scientists and less on data science workflows themselves. My research focuses on empirically understanding real-world data science workflows implemented in notebooks. Particularly, I aim to develop methods to support large-scale analysis and understanding of data science workflows.

Related Publications

[1] Ramasamy et. al, Workflow analysis of data science code in public GitHub repositories, Empirical Software Engineering Journal 28, Article number: 7 (2023). Read more in the blog.

[6] Collaboration with JetBrains, “Observing Fine-Grained Changes in Jupyter Notebooks During Development Time” (2025).