Talk: Building a Collaborative & Interactive Data System to Broaden Access to Data Science, AI, & ML
12-1 pm ET Friday, March 7, 2025, UMBC ITE 325b
In an era where data-driven decision-making shapes industries, governments, and everyday life, the ability to leverage data science has become an essential skill. Modern data science techniques, including artificial intelligence, machine learning, and large language models, offer advanced capabilities but often require programming expertise, limiting accessibility for a broader audience. In this talk, I will discuss my work on Texera, an open-source system designed to make data science, AI, and ML accessible to everyone. I will begin by introducing Texera’s no-code workflow interface and cloud-based platform, which enable users of varying backgrounds to seamlessly collaborate together in data science, providing an experience similar to Google Docs and Overleaf. Next, I will discuss the design choices behind Texera’s actor-based parallel execution engine that enable interactions during workflow execution. I will dive deep into my work on enhancing user interactions with the distributed parallel data engine, focusing on innovative data debugging techniques that improve transparency and usability. Specifically, I will present Udon, a debugger for user-defined functions (UDFs) in data systems, explaining how it allows users to interact with an operator with fine-grained control down to the code-line level. I will then present IcedTea, a time-travel debugger for data workflows, demonstrating how it allows users to interact with distributed operators while ensuring consistency. To conclude, I will outline future research directions of developing an ecosystem that integrates advanced interfaces and intelligent systems, enhancing accessibility, efficiency, and user empowerment in data science.
Yicong Huang is a final-year Ph.D. candidate from the Information Systems Group (ISG) in the Computer Science Department, University of California, Irvine. Under the guidance of Dr. Chen Li, his research focuses on big data management, data-processing systems, and systems for data science, AI and ML.
UMBC Center for AI
Posted: March 6, 2025, 4:01 PM
