How Helpful do Novice Programmers Find the Feedback of an Automated Repair Tool?

🔬 Research Summary by Oka Kurniawan, a Computer Science Faculty in the Singapore University of Technology and Design, Singapore.

[Original paper by Oka Kurniawan, Christopher M. Poskitt, Ismam Al Hoque, Norman Tiong Seng Lee, Cyrille Jégourel, and Nachamma Sockalingam]

Overview: It is important to provide immediate and accurate feedback to learners. The rise of Artificial Intelligence (AI) gives rise to the hope of a personal tutor for every learner. This paper studied undergraduate students’ perception of learning Python programming using an automated repair tool CLuster And RepAir (CLARA).

Introduction

The availability of ChatGPT created a buzz around artificial intelligence and how it can impact education. The latest version of ChatGPT allows students to interact with an AI agent for programming-related matters. Some papers report using Large Language Models (LLMs) to give feedback to students learning programming. At the same time, many papers also reported that LLMs may give incorrect or out-of-context feedback to learners. Such inaccurate feedback can be harmful to learning.

One way to give accurate feedback is through the use of automated repair tools. CLARA, which stands for CLuster And RepAir, is one such example of an automated repair tool. The basic idea of CLARA is to use a group of correct solutions from other students to automatically generate feedback for an incorrect solution. The authors enhanced CLARA to support more Python language features and made it available as a service through a RESTful API. They also created a Jupyter Notebook extension to allow students to access CLARA directly from their Jupyter Notebook assignments. They called this enhanced system CLARA-S for CLARA Service.

The authors studied how students use CLARA-S in solving programming problems. They found that one of the difficulties faced by novice programmers in using such a system is that they could not understand the feedback generated by CLARA. However, those who have programming experience can utilize it better. The most useful feature when using CLARA-S is to locate their logical or semantic errors. They found that such a system is still useful but mainly for students with some prior good programming skills or for instructors. For CLARA to be useful for novice programmers, the feedback message has to be made easier to understand by these users.

Key Insights

CLARA and CLARA-S

To generate feedback automatically from the correct solutions, CLARA uses clustering methods to group similar code of correct solutions. CLARA then creates a representative correct solution for each cluster. The incorrect solution to be repaired is then compared to these representative solutions. Once a representative solution with a minimum distance is found, CLARA generates a repair containing steps for modifying the incorrect solution to the correct one.

The original version of CLARA has to be run on a terminal and supports only a limited number of Python features. The authors of this paper enhanced CLARA in a few significant ways. First, the authors modified CLARA to support import statements, lambda functions, and some object-oriented programming (OOP) features. In particular, this enables CLARA to parse Python code that imports and utilizes built-in functions from other libraries. Another enhancement done by the authors is to indicate the position of the change to be made. The original CLARA only provides which code to be modified and the final correct modification. However, it takes time for users to identify the difference between the two code snippets (the original incorrect and correct code) without any visual clue. CLARA-S added a visual cue on changing the incorrect to the correct code. Moreover, the authors also deployed CLARA as a service through a RESTful API. This allows the use of CLARA by various applications. Lastly, the authors created a Jupyter Notebook extension that uses this RESTful API to provide feedback on students’ work in their Jupyter Notebook programming assignments.

Results

The authors ran a comparative study of students solving programming tasks with and without CLARA-S. Their results showed that though all participants found the tool useful, one of the main difficulties encountered by these participants was the feedback message generated by CLARA. Even though CLARA can generate correct feedback and steps to repair the incorrect solutions, participants who consider themselves novice programmers often cannot understand the generated message. Those who consider themselves to be intermediate programmers fare better in using the tool. Unsurprisingly, the intermediate programmers found CLARA-S more positive than the novice programmers.

The study also showed various ways students make use of the repair tool. Most students use the tool only when they get stuck. However, the authors also found that the repair tool can be used to guide the students to implement their code along the way. One participant, for example, knew that the solution should contain a for-loop and started writing a bare for-loop code. That participant immediately used CLARA-S to generate how to write the correct for-loop code. The participant used CLARA-S incrementally to arrive at the final solution.

Another finding was that most participants found the main usefulness of CLARA-S to be locating the logical error rather than the repair solution itself. Once they locate the logical error through the feedback message, they try to fix it themselves even when they cannot understand the change suggested by CLARA. This is interesting because many of these repair tools focus on creating steps to arrive at the correct solutions. However, for learners and instructors, locating the logical error itself is found to be most useful.

Between the lines

So, how do novice programmers find the AI tool CLARA-S?

The AI tool CLARA-S was found to be useful for students learning programming using Python. However, it may be more useful for intermediate than novice programmers. Yet, even for novice programmers, the tool is a welcome help from the learners’ perspective. The main challenge of such a tool for novice programmers is to create a message that novices can easily understand. The rise of ChatGPT and LLMs has occurred at the right moment: such LLMs can be integrated with automated repair tools to provide accurate and easy-to-understand feedback for novices.

Even with their current limitations, tools like CLARA-S can help learners locate their logical errors. Simply providing a test case with expected output may not help novices locate their logical errors. However, tools such as CLARA-S indicate the location of the logical error. This can help novice programmers to locate and fix their errors faster.

Lastly, many educators want to help students improve their thinking process during programming problem-solving and develop debugging skills. Therefore, automated systems allowing learners to discover errors will be more useful than systems that merely answer a programming problem. Future work should be done on facilitating these thinking skills using an automated system.

Learning is not just about reaching the destination but also about the journey. CLARA-S supports learners in their learning journey, and with more fine-tuning, we can better support the students.