Brain-Work, Brain-Toil and Mentoring AI Agents

A Parthian archer delivering a Parthian shot.

Change is inevitable. Adapting to it, is nature's way. Steering its evolution sets humans apart.
Parthian Shot - A devastatingly effective tactical adaptation that repeatedly crushed the much hyped Roman legions.

I recently wrote about my enlightenment: A mental model for working with AI Agents.

TLDR; see it as a collaborator who has read a thousand books on programming but has never written a single line of code. A collaborator who possesses immense knowledge but lacks any sort of wisdom and insight that comes from experience.

That has become the guiding light in all my "augmented coding" sessions, be it in a professional setting or for my hobby projects, and it has let me have way more fruitful sessions than before.

I thought I'd share a recent experience (case study) of applying the mental model and using the agents to deliver value where it was needed the most.

Tidy-Up / Refactoring - Not Vibe-Coding

Vibe-Coding, Spec-Driven Development and Multi-Agent Architecture, etc. are all the rage these days; I know.

But, I very firmly believe the one area where agents can create the most profound and sustainable value is woefully underrated: the strategic tidy-up and refactoring of existing codebases.

This isn't about letting an agent "feel" its way through a new feature or generate a prototype based on ephemeral "vibes", no. This is about using the machine where it truly shines, i.e. mechanical reasoning, and avoiding its awful weakness, i.e. conscious reasoning.

Tidy-ups and refactorings play exactly to that distinctive strength. If you've already figured out the pattern to be applied, you can lead the agent to repeatedly apply it where it's needed.

Speaking of my own experience, the glory and satisfaction has invariably been in the conscious aspect with the mechanical part being a test of perseverance and pure toil, i.e. demanding but boring and thankless; precisely the kind of work the machine excels at.

The Challenge

For the past few months, I have been spending a lot of time on a particular corporate critical system: 1500+ source files, 100k+ LoC, with an average number of 18 pull-requests merged every week. A typical mature, live service.

And like most typically critical services the team had always found themselves where they had to keep moving forwards without enough pause to give the codebase the love it deserves!

The codebase' health is also typical: functionally-speaking a healthy workhorse backed by a massive test-suite with great coverage yet non-functionally speaking a fragmented forest of islands of non-uniform design patterns making high coupling and low cohesion a widespread characteristic.

Given that, I couldn't resist the typical urge to tidy things up while keeping all the tests happy. To test the waters, I decided to dive deep (😂): to improve and unify the different areas which dealt with transforming database records into domain entities. Fun!

A typical tidy-up; nothing special about it. Except that the scale meant any manual effort would be a monumental time-drain. And exactly because of that, I realised it is a perfect task for an AI agent: repetitive mechanical reasoning.

The Mental Model in Action

As you may have guessed, I took my own advice, the hybrid approach:

Deliberately separating the brain-work from the brain-toil.

Brain-Work

The mathematical function of the "alternating harmonic" series.

Brain-Work: Conscious, abstract reasoning.
Alternating Harmonic Series

I fired up my editor and spent a good bit of time doing the first instance of the tidy-up myself.

It obviously involved reading the code and thinking about what is the right way of abstracting the right things. In parallel, I kept prototyping and iterating until I was finally satisfied with my hand-written code; intention-revealing API which was comprehensive but not general.

That was brain-work: Intense. Challenging. Fun.

Brain-Toil

Brain-Toil: Mechanical, repetitive application of the result of brain-work.
The expansion of the series for the first 10 terms

Naturally, I was determined not to repeat the same pattern in the other 37 places!

So I summoned my genie in a terminal; I use CLI agents for DevX and tool ergonomics reasons. Then, it was all a conversational back-and-forth.

There are four key phases that usually happen during the brain-toil, and this tidy-up was no exception.

1️⃣ Understanding

The aim is to "teach" the agent the pattern and its nuances.

In a conversation, I point the agent to relevant pieces of code and changesets. To prime up its context, I make sure to explain "why" I want to do this task, pointing it to concrete places in the codebase that demonstrate my reasoning.

That is followed by an abstract description of "what" I have done during brain-work; no tactical or detailed information. The reason is that I want the machine to grok the "how" on its own as otherwise it would simply regurgitate my own words.

2️⃣ Articulating

The goal of this phase, which I usually run in a loop with phase 1, is to ensure the agent has got a proper grasp of where I want to arrive at.

As an important part of the teaching process, I ask the agent to explain what it has understood in its own words and save it as a file. This has got a twofold corrective benefit:

I can review the file to make sure it is on the right track, and to guide it in case it is not.
I have found that the act of the agent articulating its understanding helps it at later stages; it will make more consistent decisions and drift substantially less. YMMV.

At any point, if I don't feel confident with the results, I simply goto 1.

3️⃣ Planning

The objective of this phase is to have the agent arrive at a reasonable action plan on its own but under my strict supervision and guidance.

An effective strategy to enforce some level of consistency on the agent is to have it propose an implementation blueprint/recipe; a document which contains the general approach. A non-tactical text outlining the steps in terms of "what" not the "how". The crucial point here is to ensure that the agent is not dumbed down to a shell script as a result of an overly tactical plan. It also has the same corrective benefits as phase 2.

Occasionally, I realise that the agent's understanding is not deep enough to produce a decent plan. Those are the times when I mostly goto 1; though sometimes I just rage-quit the session and start a new chat 😂

A critical property of a good plan is that every step should be a meaningful checkpoint; just the right granularity. I always instruct the agent to pause after each checkpoint is done and wait for my explicit approval before proceeding further. This way, I can enforce consistency and sanity along the way and ensure I identify drifts and deviations from the plan early on.

4️⃣ And Finally, Code Generation...er, Implementation

Reaching this phase means that I am satisfied with the agent's knowledge of what needs to be done and why it needs to be done. And that I can rely on its immense knowledge base, i.e training dataset, to carry out the mechanical reasoning and navigate the "how" on its own. The major benefit is that, like any other implementation project, when it discovers nuances and limitations that I hadn't thought about in the previous phases, it can adapt its definition of "how" on the fly and fine-tune itself with less intervention by me.

This phase may seem straightforward, mechanical and not requiring a lot of back-and-forths. Occasionally it is so. But more often than not, it is where my technical wisdom and insight is fused with the machine's knowledge. That is, course-correcting and steering it in my desired direction.

Anyhow, here's the gist of what happens during implementation:

I instruct the agent to proceed with the next step.
It does its work and asks for my review and approval.
I carefully review and provide potential feedback, in which case goto 2.
I manually commit the changes with a terse but traceable message.
goto 1

The Machine's Efficiency?

If I had to give it a coarse "star rating", it would be a ★★★★☆.

The agent did quite a decent job when it came to its strength, i.e. mechanical reasoning:

🟢 Identified all the places which it was supposed to work on.
🟢 Applied the tidy-up across the codebase.
🟢 Its reasoning, actions and progress remained consistent throughout the session.
🟢 It had to do the all-famous LLM-apology only once 🙌

Why not the full score then, you may wonder? Well, the reason is quite telling:

The machine had ZERO sense of writing maintainable code.

It would write code that checked all the functional requirements yet almost none of the non-functional standards of modern programming.

My pain-points fit in three major categories:

🔴 It would write procedural code, a la FORTRAN, and decompose problems along the wrong axes.
🔴 It would write snippets of code that on the surface looked different but semantically were the same logic in different contexts, i.e. it couldn't come up with abstractions on its own.
🔴 It wanted to observe the golden duo of low coupling and high cohesion but had no idea what they meant beyond Wikipedia-like definitions.

I tried modifying my prompts to help it with the above but after a few attempts realised it was just a waste of time. So, at the end of the session, I sat down and manually applied the improvements I wanted to see.

The Session's Efficiency?

I'd give the whole session a ★★★★; seriously!

Aside from the brain-work phase (which was fun), if it wasn't for the approach I took, I would probably have never carried out the tidy-up; it would have been too repetitive and time consuming. Even if I did and assuming I wouldn't give up half way, I would have never been as fast as the machine.

But the real reason for my rating is that this approach, in addition to the initial brain-work fun, allowed me to have even more brain-y fun: I managed to read a few chapters of the book "The Parthians" and learn more about my heritage and motherland. And this was while I was whipping the machine to work 😎

Less brain-toil, more brain-work.

The Adaptation Strategy in a Shifting Landscape

This experiment was a pleasant success. Not because the machine did everything, no. But because it helped me see beyond the billionaires' hype: the emerging new division of labour.

The machine proved to be a helpful force-multiplier for the brain-toil, i.e. repetitive, mechanical grunt work of applying a known pattern. However, it would have been miserable and useless without the brain-work I did: the design, the architectural wisdom, the final pass for semantic consistency, and the judgment to know what "good" looks like.

And I believe this will be the new reality for software engineers: less typing and more thinking and reviewing. It will be about creating the blueprint of the solution, mentoring the non-human collaborator to execute the plan, and providing the crucial wisdom and judgement that only comes from genuine reasoning and experience.

Search...

Bahman's Musings