Our journal article on AI-based coding assistants for data science published in EMSE.
🚀 Excited to share our latest publication, “𝗔𝗜 𝘀𝘂𝗽𝗽𝗼𝗿𝘁 𝗳𝗼𝗿 𝗱𝗮𝘁𝗮 𝘀𝗰𝗶𝗲𝗻𝘁𝗶𝘀𝘁𝘀: 𝗔𝗻 𝗲𝗺𝗽𝗶𝗿𝗶𝗰𝗮𝗹 𝘀𝘁𝘂𝗱𝘆 𝗼𝗻 𝘄𝗼𝗿𝗸𝗳𝗹𝗼𝘄 𝗮𝗻𝗱 𝗮𝗹𝘁𝗲𝗿𝗻𝗮𝘁𝗶𝘃𝗲 𝗰𝗼𝗱𝗲 𝗿𝗲𝗰𝗼𝗺𝗺𝗲𝗻𝗱𝗮𝘁𝗶𝗼𝗻𝘀,” in the Empirical Software Engineering Journal. This work marks my third first-author publication in EMSE!
Our study delivers one of the first in-depth empirical insights into how data scientists interact with AI-generated code recommendations from LLM-powered AI assistants (we studied GPT-4), across both descriptive and predictive data science tasks. We focused especially on whether providing alternative suggestions throughout the data science workflow influences acceptance of recommendations and task success. To support this study, we developed a model-agnostic Jupyter plugin (CATSci) that offers a dedicated interface to enhance how data scientists collaborate with AI assistants.
✨ 𝐊𝐞𝐲 𝐟𝐢𝐧𝐝𝐢𝐧𝐠𝐬:
- Adding explicit workflow step information (e.g., data exploration, modelling) in prompts significantly improves acceptance of recommendations.
- Alternative code recommendations, while not statistically improving acceptance or task completion, still helped users discover new methods and syntax.
- Acceptance and usefulness of recommendations vary notably between descriptive and predictive tasks, with descriptive tasks posing unique challenges.
- The number of recommendations requested varies significantly across workflow steps and task types.
- Overall, participants showed positive sentiment toward AI assistance and our tailored interface.
🎯 𝐖𝐡𝐚𝐭 𝐭𝐡𝐢𝐬 𝐦𝐞𝐚𝐧𝐬:
- AI assistants show promise in supporting data science but require improvements to better handle descriptive tasks and support exploratory workflows.
- Incorporating workflow context in prompts and refining alternatives generation are key to enhancing AI assistant usefulness and adoption in data science.
- Interfaces must balance simplicity with thoughtful design to overcome user biases and drive adoption of new capabilities.
🧭 Our work sheds light on the nuanced dynamics of Human-AI interaction in coding assistance for data science, highlighting exciting opportunities to advance AI-driven productivity tools.
🔗 𝐅𝐢𝐧𝐝 𝐦𝐨𝐫𝐞 𝐢𝐧 𝐨𝐮𝐫 𝐩𝐚𝐩𝐞𝐫 (𝐎𝐩𝐞𝐧 𝐀𝐜𝐜𝐞𝐬𝐬).