AI Agents: Rich Oases of Knowledge, Barren Deserts of Wisdom

Flake stone tool (approx. 3 million years ago)

We've been building tools for over 3 million years; all conceptually similar. Then we made LLMs.

Everyone keeps writing or sharing about how powerful AI Agents are and how they can fundamentally change the way we do this or that.

But I kept wondering:

What exactly does it take to unlock that potential?
Why do I struggle to effectively use them for anything beyond basic tasks?
Why do I very frequently find myself frustrated by their inconsistent output and the need for endless tweaks in my prompts?

Rarely anyone talks about the answers to these questions.

💡See my follow-up post Brain-Work, Brain-Toil and Mentoring AI Agents for a more strategic dive into how to apply the foundational principles laid out here.

The Familiar Side

AI Agents (technically-speaking: LLMs) are tools. Just like any other, say sword or oscilloscope or programming IDE. They exist to help humans be more efficient for certain types of activities. To augment the limits of the user.

As the animal kingdom's ultimate tool masters, we have developed a repeatable and effective approach to learn how to employ tools: a combination of studying instruction manuals or books, practicing and tutoring. And it has been working for millenia. We just need to "pick up the skills".

Though, in this case, the good old process didn't work well. I was only able to scratch the surface and often felt more frustrated than truly "empowered".

One of a Kind

Our "pick up the skills" method was created and evolved for every other tool out there. 'Brainless' things that can be put to use by mechanically following a set of (sometimes very complex) instructions.

AI Agents, despite being just tools, are also absolutely not like any other tool in the history of humanity.

What sets them apart is their ability to reason (within their inherent limits). This single difference is the reason why our learning method hasn't been very effective: the fundamental mismatch between what AI Agents are and what we presume them to be.

A New Lens (What Worked for Me)

As I grappled with AI Agents in my daily workflow, and luckily didn't give up on them, I started to see an abstract pattern of my mistakes and the agents' low-quality completions forming up. And more importantly, how they fed into each other, making my experience even worse.

That realisation made me gradually develop a new, and substantially more effective, mental model for interacting with agents:

Treat an agent not like a vending machine or a bash script (one input, one output), but like someone who has read a thousand books on software engineering but has never actually seen a computer or written a single line of code.

Patterns, best practices, historical context about building software, definitions, theories, jargon; the agents know it all. They can tell you about SOLID principles, microservices architectures, design patterns and every design and programming language concept under the sun.

But they have no practical experience. They've never made a mistake, debugged a live system, felt the pain of a race condition or integrated a finicky third-party API. The kind of hands-on experiences that give birth to insights.

The agents, while immensely knowledgeable, have ZERO wisdom and insight.

This is precisely why the simple, "zero-shot" or "few-shot" prompts are almost always grossly overrated. They assume the agent can infer all the unspoken context, subtle implications, and real-world constraints that a software engineer instinctively understands. No! Despite their constant over-confidence, the agents are woefully in need of guidance.

The real value of agents is unlocked when you mentor the machine through an elaborate, back-and-forth conversational approach to solve a problem.

The "mentoring" is all about guiding the agent through the problem-solving process and seeing it as an active "collaborator". Note that a collaborator cannot be a bash script or a procedure to execute. No. It means someone who can, within constraints and boundaries, make decisions, devise solutions and make mistakes.

Show Me The Code Already!

Let's infuse that philosophy into actionable steps!

These days, the overall structure of my workflow can be boiled down to a few essential steps:

Decompose the problem (I and agent).
Provide holistic solutions (agent).
Iteratively refine the solution (I and agent).

If the solution feels too big, go to (1).

Turn the holistic solution into a set of concrete steps, i.e. the plan (I and agent).
Iteratively refine the plan (I and agent).
Save a snapshot of the plan (agent).
Implement the plan step by step, pausing for my review and approval before moving to the next step (agent).

To make it concrete, here's a trimmed down transcript of a typical interaction between me and the agent:

Me: 
Let's take a look at the classes in the following packages:
  ...
Ingest the source files of those packages and any neighbouring files that you may find relevant. 
Let me know when you've got the full context and are ready to proceed.


Agent: 
...


Me: 
To my mind, the classes in those packages seem quite tightly coupled. For instance, ...
What do you think?
If you agree, list at least 2 alternatives to structuring along with your reasoning and pros and cons.


Agent:
...


Me:
Option 1 looks good. But do we really need to implement X and Y?


Agent:
...


Me:
Makes sense. 
Save a copy of Option 1 to ...
Then list your proposed detailed step-by-step plan for implementing Option 1.


Agent:
...


Me:
Hmm...steps 5, 6 and 7 don't make sense. We shouldn't need to touch upstream classes. What alternatives exist?


Agent:
...


Me: 
Alternative 3 is way more intuitive.
Revise your plan with that. Then save the plan to ...


Agent:
...

Me:
Let's get started with the implementation. 
Once each step is done, pause and wait for my explicit approval before proceeding.

...

Final Words

By shifting our mental model and viewing AI agents for what they actually are, we can transform how we interact with them.

The idea is two fold:

To leverage their strength, i.e. immense theoretical knowledge, to augment our working memory.
To leverage our strength, i.e. wisdom and insight, to augment their cognitive abilities.

The synthesis sounds bizarre; I know. Are we tools to The Machine as much as it is to us!?

What I wrote about here deeply influenced my design philosophy when building my new open-source C# library, Flow. Flow helps developers structure business logic in a way that is not only clean and natural for humans to read, but also inherently more understandable for AI agents to analyse, generate and more importantly, process the context around them. It's about creating code that speaks clearly to both human intent and machine logic.

If this philosophy resonates with you, explore Flow further: Flow.BahmanmM.com.

Search...

Bahman's Musings