Epistles on the Robot God: Burnt Offerings of a Devout Parishioner

In the month of April, I burned 4.1 billion tokens communing with the robot god. Whether that's impressive or actually so low Jensen Huang would put me on an engineering P.I.P. I don't know, but it feels like a lot. Translated into other units of measure, that many seconds represents more than a century in the past. It's also enough text to write roughly 65,000 copies of The Great Gatsby (or this blog copy/pasted 1.39 million times).

That's a lot of tokens. That's a lot of burning. That's a lot of getting burned. Here is what the pyre taught me.

The First Commandment: Mise en Place

The constant temptation to treat AI like a microwave is high. Simply ask it to "fix this bug" and a result is spat out and it works! Or it doesn't, so you prompt again (and again) until it eventually does! Technically this works, but it's token-for-token the most expensive way to use these tools.

The problem isn't the model's capability – it's when you ask it to think and execute simultaneously. Without a plan, the model picks the first plausible interpretation of your prompt and sprints. You've reduced a Michelin-starred chef into a line cook interpreting incoherent scribbles: "you are the very best engineer: we are going to build this in one go. make no mistakesd".

The act of using Plan mode forces the model into a conversation with both you and itself that surfaces assumptions and conflicts while also giving you the opportunity to document where you're starting, where you'd like to go, and later how you got there (because your plan isn't done after your implementation is complete).

Create your plan document somewhere durable (ex. .cursor/plans/...), lay out your idea, how you think the solution should look, and your open questions. Then direct the model to spawn subagents for researching the codebase and any relevant additional sources like API docs, Slack context, or – the holy grail – recent research papers when applicable. (Models do really well with research papers when one exists and it can be applied). Mise en place.

The planning stage is also for spawning subagents that write throwaway validation scripts and explore APIs and package source code much more granularly than you might be willing to do by hand. Convert your open questions and your plan document into fact through testing and validation, rather than optimistic, educated theory. Ask the agent: does this API actually work the way I think it does? Does this library have a blessed path for what we're trying to achieve? Are there any open issues in this package that are going to impact us? Does the source code have any unreported bugs?

Don't reduce a Michelin-starred chef into a line cook interpreting incoherent scribbles: "you are the very best engineer: we are going to build this in one go. make no mistakesd".

To each their own, but my style of plan creation and editing involves me and the model working together iteration after iteration. Let the model create pass #1, then edit sections by hand as needed. You can even sketch out future sections with subtitles or inline comments about questions you have that the LLM will ingest and revise in future turns. Be an active participant in creating and modifying your plan's to-do list as well: I will always inject breakpoints halfway through a plan's implementation for a mid-plan review so the build-out is interrupted with a "stop and think about what we've done to this point" step.

When working on a bug fix, use the plan stage to ask the LLM what visibility it's missing. Counterintuitively, the fact it's a giant calculator makes it really good at the art of presuming and gambling. Skip that by giving it exactly the visibility it needs to explicitly diagnose the problem instead of letting it masquerade assumptions as objective truth. Ask the LLM to visualize the complete bugged codepath, not just locate what is wrong. Use it to understand the conditions that allow the bug to exist in the first place before you fix it – especially when code is nested and complex, making doing this in your head headache-inducing.

Spend the bulk of your tokens up front so implementation is straight-forward and mechanical to avoid uncovering an unforeseen constraint during dev-testing that now requires surgery on a patient who is already awake and asking questions.

The Second Commandment: Thou Shalt Not Poison the Well

As a session grows long, the model accumulates. It remembers the three approaches that didn't work and it carries forward your contradictory instruction from 30 prompts ago. Eventually your conversation history ends up being more scar tissue than signal, sometimes without you even knowing as the model begins forming opinions based on everything up to that point.

Eventually your conversation history ends up being more scar tissue than signal, sometimes without you even knowing as the model begins forming opinions based on everything up to that point. This is context poisoning and the treatment is a new session.

This is context poisoning and the treatment is a new session. The model has no memory between sessions – and that should be viewed as a blessing, so long as your plans become a system of memory. To finalize your plan, remove any scratchpad deliberations and all the work you did to get to this finish line. Leave it streamlined and stateless (keeping all of your research and validations of course).

Then start a new session and hand it your plan: a blank slate with precise and proven instructions makes implementation a breeze (maybe even a point where you can step away to another task if you're fine with reviewing and modifying after the first-pass is complete). If your plan is exceedingly large, you can also delegate each plan to-do to a new LLM session keeping things tightly scoped and incremental (or parallelized if and when applicable).

Once you've implemented your plan in code, mark it complete and update it to reflect what was built. Add language like "Completed", git commit hashes, and mark all your to-dos as done so the model has no room to be confused. Place your recap summary at the top of your document (or the bottom) – if the model decides to be lazy and only reads the head or tail of your plan, ensure it still receives the most definitive context possible.

Then move it somewhere durable and long-term (ex. .cursor/plans/completed/...) so future plans can build on past plans. Future LLM sessions can also reference past plans as validated context for how and why specific decisions were made, and what constraints were discovered.

You can even go as far as documenting the chat transcript ID(s) involved in creating your plan and implementing it so you can return later if you ever need to dig deep into why things are the way they are. (LLMs can also read their own transcripts if you need to speed up this process).

Finally, prune aggressively. Update your .cursorignore file liberally to isolate sessions from temporary noise like logs, throwaway scripts, or anything that it doesn't need for the task at hand. Similarly, if the model starts going in a direction that you don't like – don't let it finish, don't let it spin its wheels, and don't wait for the chance to correct it. Go scorched earth and rewind the offending prompt and replace it. This reverts the model back to its last checkpoint before you sent the prompt you just overwrote and with it all the potential poisoned context it just generated.

The Third Commandment: Canonize your Rituals

A map is useful because it shows you the lay of the land; a chartered course demonstrating how you traverse that land is infinitely more valuable. The models are remarkably good at figuring things out given enough context and token budget. But "eventually" is expensive and arriving somewhere useful by different routes each time means you're paying that cost repeatedly. Every time a session figures out a happy path for a problem that might arise is a golden opportunity to create a powerful new skill.

Resist making skills that explain what and instead create skills that outline the how so future sessions don't have to rediscover what you've already proven. Need to investigate a crashed pod for your service? Create a skill with precise kubectl CLI commands the model can run and the relevant flags needed for it to get all the visibility it needs into what went wrong. Leave documentation like a README for explaining the destination while your skills explain the route. The same goes for instruction documents; the model is already going to ingest your code so you don't need to waste context on describing the what – you are not leaving breadcrumbs for a human, you're preventing the model from asking questions or burning tokens on discovery when it doesn't need to.

Skills should be opinionated and direct rather than generalized and open-ended. Include concrete commands, chronological steps, example inputs and outputs, and reasoning at every stage, not just concepts the model will burn tokens understanding and interpreting freely. Perhaps one of the most human aspects of the models is their unrelenting need to discover and assign meaning. When you don't provide that meaning, they will think and churn until they've convinced themselves they've found it. For repeatable tasks and problems, discovery should only happen once; think of them like browser bookmarks, not a Wikipedia race to see how many clicks it takes to go from "Cheese" all the way to philosophy.

The Fourth Commandment: The Agentic Inquisition

The models are convincing: their answers are confident, fluent, and at times so polished you forget you're essentially interfacing with mankind's most expensive autocomplete at scale. Re-package this and leverage it to your advantage.

Create a /devils-advocate skill that spawns opposing subagents who passionately make a case for the option they are given to present. Once they complete, their results filter back to your parent agent thread where it can adjudicate all of the claims, weigh them against each other, and come to a thought-out conclusion. An added benefit is getting to read two thorough cases for why one option is better than the other and vice versa so you can understand the choice in front of you as if you went down both paths at once.

...you're essentially interfacing with mankind's most expensive autocomplete at scale.

Once the parent agent has made a recommendation, prod it further: how confident is it in its decision? What's leading it to be that confident (or not)? Does this raise any new open questions? Adversarial exchanges like this force thinking, re-thinking, evaluating claims, and re-evaluating them on scrutiny and lead to better outputs in my experience.

You can also use this to further automate your agentic loops by advising the parent agent to call this skill on its own when it comes to a complicated decision that shouldn't be taken lightly. Note the more powerful the model, the better chance the skill will get called; however, lower models can trivialize and miscalculate how important some decisions truly are.

The Fifth Commandment: No Leaps of Faith

Most tools provide the ability to give the LLM rules in the form of markdown documents. These can be handy and especially helpful with advanced workflows like glob patterns that expose rules dynamically based on filepaths – but rules were made to be broken. And break them an LLM does: either because it disagrees, maybe your given prompt seems to confidently contradict it, or it's under enough context pressure that it quietly sets the rule aside.

The models are probabilistic. Expecting them to do exactly what you ask, every time, the same way is a leap of faith; hooks are how you stop taking it. This is an incredibly powerful feature that lets you introduce deterministic scripts with bi-directional communication at defined points in the agentic loop, like before a file is edited, after a shell command, when a session starts, and when the agent stops. Not only will hooks always run whether the LLM wants them to or not, but their output is always provided back to the agent regardless of their results.

Hooks can be great for file linting, preventing unsavoury shell commands (or command optimization – perhaps you want to always replace grep calls with rg etc.), injecting context into a new session without manually including it in a prompt, scanning for secrets, running CI tests, sending notifications and so on.

Another idea here could be mixing the sessionStart and sessionEnd events with MacOS' caffeinate -i command to prevent your Mac from sleeping while your agents are busy. (The additional flag still allows your screen to lock after a period of inactivity, for added security).

The Sixth Commandment: Tithing Economically

Mix and match your models – there's no one-size-fits-all, or at least there doesn't need to be. This cuts both ways: as important as it is to scale down the model for simple tasks where speed is more valuable than reasoning, use the most powerful model for your most difficult deep thinking and strategy tasks so you don't get stuck with half-baked slop that leads you to burning extra tokens course-correcting later.

Render unto Haiku what is Haiku's: straightforward tasks like editing code comments and doc-strings, writing throwaway validation scripts, and commit messages. Let Sonnet implement highly-detailed and descriptive plans that outline exactly how and what needs to be done. Finally, burn tokens on Opus for extensive thinking and deep reasoning that cuts across a high number of sources and a ton of code.

Benediction

If you've made it this far, hopefully at least one idea has piqued your interest. Sharing these seemed like necessary penance after burning as many tokens as I did. Plan before you build, start fresh before you execute, enforce what matters rather than hoping the model remembers, canonize what works, interrogate every answer (and have the model do that for you), and don't use the sledgehammer on the finishing nail.

TLDR:

  1. Spend the bulk of your tokens up front so implementation is straight-forward and mechanical to avoid uncovering an unforeseen constraint during dev-testing that now requires surgery on a patient who is already awake and asking questions.
  2. Counteract context poisoning by starting new sessions regularly. The model has no memory between sessions – keep your plans long-term so they become a system of memory that build on top of each other and provide the LLM with context each new session.
  3. Resist making skills and instruction docs that explain what and instead create skills that outline the how (and why) so future sessions don't have to rediscover what you've already proven.
  4. Leverage AI's over-confidence to your advantage by creating adversarial exchanges that force comparing and contrasting options against each other and dialogue that works through how and why one option might be better than another.
  5. Hooks let you introduce deterministic actions into defined points in the probabilistic model's agentic loop.
  6. Mix and match your models – scaling down for simple tasks and scaling up for your most difficult deep thinking and strategization requirements.

Go in peace and love thy model provider.