laitimes

The difference between learning to code with AI and not with AI is ......

author:Not bald programmer
The difference between learning to code with AI and not with AI is ......

This article was co-authored by the author with Majeed Kazemitabaar, a PhD student at the University of Toronto, who has been researching AI tools for computer science education.

Original link: https://austinhenley.com/blog/learningwithai.html

Today, tools like ChatGPT can solve many introductory programming tasks. Often, you can simply copy and paste instructions and get detailed solutions and explanations without having to do anything else.

However, this has led to a lot of discussion about whether LLMs that generate code with AI affect the effectiveness of programming education. Educators are concerned that students will become overly reliant on AI tools to learn effectively, which in turn may lower the barrier to entry for programming and even expand people's participation in computers!

To understand what exactly is going on here, we have two basic questions:

(1) What is the impact of AI code generators on newbies in the introductory programming course?

(2) How can a newbie use an AI code generator in an introductory programming course?

The difference between learning to code with AI and not with AI is ......

Research Methods:

To answer these questions, we conducted a 10-session controlled study in August 2022 with 69 students (ages 10-17) with no experience in Python programming. In the first lesson, we used Scratch to teach students the basic concepts of programming, such as variables, conditions, and loops. Then, the students were immediately assessed on their advanced computational thinking skills with 25 Scratch code tracking questions.

For the next 7 lessons, students are divided into two groups: the Baseline group and the Codex group. In these 7 lessons, both groups of students completed 45 two-part tasks using Coding Steps, a tool developed for this study. Coding Steps includes newbie-friendly documentation and allows remote teaching assistants to provide real-time feedback on tasks submitted by students. The first part of each task is a code writing task in which the student must write the code according to the instructions provided, and the second part is a code modification task in which the student must modify the correct solution of the previous section according to the additional requirements. In the code writing task, students in the Codex group only have the option to use the LLM code generator.

The difference between learning to code with AI and not with AI is ......

This code generator is based on OpenAI Codex, where students simply type in a program described in natural language and it converts it into Python code.

The last two sessions focus on assessing student performance, and in both sessions, students are unable to use Codex or other AI-assisted tools. There are two assessment sessions, each consisting of 10 coding tasks and 40 multiple-choice questions. The first assessment test was taken one day after seven training sessions, and the second test was taken a week later, using a similar task with slight modifications.

Learning performance with and without AI

First, let's take a look at the training results from the first 7 lessons:

  • In the first 7 sessions, students in the Codex group made greater progress, with a significantly higher percentage of task completions (91%) than in the Baseline group (79%).
  • In the code writing task, students in the Codex group got it right (80%) significantly higher than in the Baseline group (44%), and completed the task significantly faster overall.
  • However, the accuracy of the two groups was similar in the code modification task, with the Codex group (66%) slightly higher than the Baseline group (58%).
  • The first two results were to be expected in a way, but it was gratifying that they were evenly matched for code modifications! The students in the Codex group relied heavily on AI for their code modifications, but performed almost as well, and in some cases even better than the Baseline group.

Next, let's take a look at the results of the test evaluation in the last two lessons:

  • In the immediate test conducted one day after the end of the training session, there was little difference in performance between the two groups on coding tasks (Codex: 61%, Baseline: 63%) and multiple-choice questions (Codex: 49%, Baseline: 42%).
  • In the test conducted a week later, the Codex group performed significantly better on coding tasks (Codex: 59%, Baseline: 50%) and multiple-choice questions (Codex: 44%, Baseline: 35%).
  • For the top 50% of students who scored high on the Scratch pre-test, students who used Codex (74%) performed significantly better than the Baseline group (60%).
  • In addition, students in the Codex group were more eager to continue learning programming and felt much less stress and discouragement during their training.

Learning with AI: Over-reliance vs. self-regulation

In addition, we conducted a thematic analysis of 1,666 uses of the AI code generator by students in the Codex group over seven training sessions. We focused on how they used the tools, what prompts they used, and how they validated and used AI-generated code. Based on this, we find various signs of over-dependence and self-regulation, which are discussed below:

  • Students often (n=501, 30%) directly copied the task description to generate the entire code, without prior attempts at hand-coding. (✕)
  • Sometimes (n=197, 12%), students would divide the task into multiple sub-objectives, requiring the AI to generate only the first sub-objective rather than the entire task. (✓)
  • When breaking down a task into multiple sub-goals, students sometimes (n=85,5%) ask to use code already in the editor. (✕)
  • Although relatively rare (n=16,1%), sometimes students generate code after they have a solution to examine and compare the output of the AI with their own solution. (✓)
  • Occasionally, students (n=89,5%) write pseudocode-like prompts (e.g., "For num in numbers, if num > large, set large to num"). (✓)
  • While most of the time, students test AI-generated code appropriately before submission, there are several students (n=63,4%) who do not test AI code when they submit it. (✕)
  • Although rarely, sometimes (n=30,2%) students actively optimize AI-generated code to properly understand syntax and logic. (✓)
  • Similarly, sometimes students manually add code (such as a "print" statement) to AI-generated code to help them verify that the code is correct. (✓)

Finally, we believe that future AI tool designers and educators should promote the proper use of LLM code generators, while discouraging the above-mentioned non-standard use.

Read on