AI Coding Agents: Powerful Magic that is Not Easy to Control by Ed Lyons

Image by Ed Lyons via Midjourney

The AI coding revolution has entered its next phase. First it was autocomplete and suggestions. Then you could chat with it. Now we have agents. They can use the command line to perform all kinds of tasks, and in just a few minutes, can produce more working code than you could previously have hand-written in a week. You can ask for a complete application, many versions of your code are generated, tested, and improved, then one appears for you in minutes, at low cost. Using them means the code isn’t completely yours anymore, and you can easily lose control of quality, maintainability, and architecture.

Advocates may scoff and say you are still in charge of your code, but after using a variety of agent-based AI tools to create many thousands of lines of JavaScript and Python - sometimes it was not mine, and at times, I did not fully understand it.  

I have been experimenting for two months with Aider, Cursor, and Claude Code. While there are dangers, their abilities are astonishing.

Picking up the Magic Hat

I will use my first experience with Claude Code - a recognized leader in the space  - as an example. 

Without any preparation, I started using Claude Code for a few hours. What power! I was like Mickey Mouse as the sorcerer’s apprentice in Disney’s classic “Fantasia,” who steals his master’s magical hat, and uses it to command a magical broomstick servant to do his mundane chores. 

I then rushed into creating a new, large application to fully manage chess tournaments. After listing the requirements, I told Claude to use a javascript front end and a python back end with a simple database it could set up (sqlite) - as I know those technologies well.

Agents feel great when they are installing packages and doing other mundane tasks for you at the shell. They keep asking for permission to do operations inside the directory you are in. 

It churned and churned, and output many messages about what it was doing. I thought I understood the architecture, but there was a lot of code. And I kept giving it more permission to do things as it iterated, found errors, and sought more dramatic solutions. I became concerned that it was often just throwing code away instead of fixing it.  

After ten minutes, the front end looked nice, but there were some front end and back end errors. They wouldn’t go away. It would regenerate to fix one set of errors, and others would appear. It felt close, though! I gave it more permissions.

It started doing things to fix the back end that included deleting the entire tree and redesigning the whole python application due to one unusual bug. I thought, “Wait, I could have fixed that.” Alas, the application I had just seen was gone. A new one suddenly appeared. 

I kept prompting it for more fixes, and it kept apologizing and trying harder. 

Despite my original intent to understand the code as it was being written, thousands of lines had been generated across many files, and I was convinced just a few more prompts would get it right and I wouldn’t have to look at the code closely.

I had arrived at the moment in The Sorcerer’s Apprentice when Mickey Mouse realizes he’s lost control of the magic hat he used to automate his chores, and the castle has flooded. 

Coincidentally, once I decided I couldn’t get the bugs fixed, Claude noticed that it could not redeploy as it left the front end running. It told me this:

After asking for so much control, it wanted one more power: to kill all Node processes on my machine. I thought, remembering that Claude runs on Node, “I… wouldn’t do that if I were you.” But as the application had become unsalvageable, I said yes, and it killed itself. 

After a whirlwind terminal session, all was quiet. It was just like when the Sorcerer in Fantasia returned. He nullified all of the magic, snatched back his hat, and Mickey Mouse meekly picked up his buckets, and went back to work.

As for me, I had shamefully fallen into… vibecoding.

Avoiding Vibecoding

“Vibecoding” is a term recently coined to describe how someone who is not a software developer can just prompt their way to a useful application without knowing anything about how the code works. There are good reasons this practice should be avoided by professional programmers.

First, unlike a hobby application at home, a professional writes an application that has to integrate with a lot of other infrastructure with specific requirements. Second, even AI this powerful makes mistakes and can cause problems that are not acceptable. You need to be able to prevent and fix these problems. Third, AI is not good at architecture and many higher-level engineering concerns, and will not remind you that you forgot to include important requirements and characteristics. Fourth, your vibecoded application at home does not need to be maintained by other people for years to come. Fifth, the skills of software developers matter for maintaining quality and coming up with new ways to achieve business goals. I learned nothing new from trying to create my chess tournament application. Normally, had I written a few thousand lines of code, I would have learned a thing or two. 

Software developers need to avoid the temptation to just keep prompting for more code. The speed and productivity do make it hard to stay in engineer mode. We are so biased toward seeing something “working” that we may not notice that we forgot to make the application accessible, or scalable, or perform well. And the more code you generate without reviewing it, the less inclined you will be to stop and look at it. 

Learning How to Remain in Charge

I resolved to be more mindful, and then continued my work with Claude Code, as well as similar tools such as Aider and Cursor. I had better results from taking things more slowly, and giving these agents more direction about how I wanted things done. This aided my ability to understand what was happening. I also turned git on and had it save versions, so I could go back in time if I wanted to. 

Yet even after changing how I used agents, I still got into another problem with something I built with Cursor.

I had generated a set of features I wanted to add to a different application, and was happy that I would now be done a couple of days earlier than hand-coding would have required. But integrating it was a much bigger problem than I had imagined. I hadn’t lapsed into vibecoding, but I didn’t anticipate that the architectural choices that Cursor made would not work with the integration I had in mind.

After re-taking complete control of the code, I ended up taking two days to resolve all of the problems. So much for saving time. 

Soon after that, I created a different JavaScript application with Claude. It had come out even better. But upon close inspection, I noticed it was generating an extra copy of a dataset every time a huge operation ran. That would not have been acceptable with heavy production load. 

After those experiences, I learned even more about how to keep instructions and principles “in context” for all my prompts. For example, Claude Code has a special markdown file that is always consulted for code generation. You can put all kinds of guidelines in there.

I also found that it was beneficial for the AI (and me) to chat about requirements and the approach before generating any code. In terms of the flow, I settled into a mix of using agents, chat, and inline suggestions in order to slow the process down even more. This allowed me to keep better track of what was being generated, and I even learned a thing or to during the discussions and disagreements.

All this learning, discussion, and specification sure wasn’t as much fun as just prompting for the features I wanted. But you cannot let the power of this magic make you lazy. Otherwise you will be just like Mickey Mouse, who became too confident, and fell asleep instead of paying attention to what his automated servants were doing. 

After all these experiments, I came away with two conclusions about agent-based coding tools. First, this technology is too powerful not to use, even if there are challenges. There is no future in professional developers installing packages, typing out obvious unit tests, and doing standard refactorings by hand. 

Second, you have to put in real work learning how to use agents. Because if you don’t learn how to be The Sorcerer, you will end up being the apprentice. Your castle will become flooded, and your boss is not going to be happy when he gets back.

Ed Lyons