Skip to content

Techniques for saving LLM tokens

Published: at 12:00 AM

Although LLM cost per token has been declining steadily, reasoning models have seen average output length increasing five times per year.

The net effect is that using LLMs is more expensive. So how can we save tokens and keep costs low without giving up high quality results?

Here are 10 techniques for saving tokens without sacrificing quality.

Techniques

  1. Keep conversations short
  2. Maintain a tree of README files
  3. Use consistent vocabulary
  4. Create and use skills and commands
  5. Manage memories
  6. Use strong verification methods
  7. Be specific
  8. Use a planning step
  9. Take complex interactions to a new planning step
  10. Keep educating yourself

Keep conversations short

Long conversations eat tokens quickly.

AI conversations are stateless: every time you send a message, all previous messages must be sent. The conversation isn’t a back-and-forth conversation where the AI remembers its place. LLMs do cache tokens, but LLMs do have to process the entire conversation with every request to the LLM.

Visualization: Stateless conversations quickly use up a lot of tokens.

View HTML version.

You can expect higher accuracy by focusing your energy on short conversations. Why? When you guide a model in small steps, you’re asking it to hold a growing conversation in its context window, reconcile potentially contradictory instructions, and maintain intent across turns.

Maintain a tree of README files

Maintain context and rules about your project in structured, discoverable documentation. Claude Code and others tend to scour your codebase on every conversation, but you can minimize that by providing good documentation.

I’ll call it a tree of README files. Here is an example structure:

project/
├── docs/
│   ├── rules/                  ← rules for various types of development
│   │   └── front-end-rules.md  ← about your front-end framework and conventions
│   │   └── back-end-rules.md   ← about your server-side endpoints and conventions
│   │   └── e2e-rules.md        ← how to write end-to-end tests
│   │   └── db-rules.md         ← how to write queries or interact with ORM
│   │   └── schema-rules.md     ← how to write database schema
│   │   └── scaffolding.md      ← how and where to scaffold new files
│   ├── plans/                  ← have AI write plans here
│   │   └── implement-a.md
│   └── modules/                ← details about specific application modules
│       └── module-a.md
├── AGENTS.md                   ← project overview, conventions, rules and links to the docs above
├── CLAUDE.md                   ← symlink to AGENTS.md for maximum compatibility

Principles:

  1. Write with both agents and humans in mind.
  2. Refer to README files from AGENTS.md and other files as applicable. For example, your AGENTS.md file should talk about your back-end frameworks and point agents to look at docs/rules/back-end-rules.md for more information. Then in back-end-rules.md, discuss database interaction and point agents to look at docs/rules/db-rules.md for more information.
  3. In your prompts, mention relevant implementation plans and modules that the agent should consider before moving forward.
  4. Use consistent vocabulary throughout. Add a domain knowledge glossary to AGENTS.md.
  5. Ask AI to analyze your codebase and compose first drafts for these markdown files, then tweak and finalize. Example prompt:
    Analyze the front-end patterns for this project and produce
    instructions suitable for AI agents and humans to follow
    established conventions. Record these instructions in
    docs/rules/front-end-rules.md and make a note in AGENTS.md
    referring to this new document.
  6. Keep files short—no more than 4kb to 8kb long. When files get too long, ask AI to make them more concise, split them into smaller units, or both.
  7. Ask AI to update these files when things change. Stale documentation can be worse than no documentation. Make it a habit to review module README every time you add features. Schedule a regular housekeeping task to have AI review the files for inconsistencies and contradictions.
  8. Consider using a skill such as Grill With Docs to create formal ADR documents for documenting and specifying plans.

Use consistent vocabulary

Agents are more focused and less prone to error if you use consistent vocabulary through your entire project. As mentioned above, consider adding a glossary to AGENTS.md that defines knowledge specific to your project. No need to outline general programming concepts, but do define business jargon, custom frameworks and uncommon patterns your project uses.

Create and use skills and commands

Claude Code and other agents support skills—reusable prompt fragments that encode how to do something specific in your project. They’re not code; they’re instructions, written in plain English, that live as files in your repo. Get smart on how to find community-written skills and make your own.

Agents see the skill description on every conversation and decide when to use them. In that way, they are kind of like a tool.

Commands are similar to skills but the agent isn’t aware of these on startup; you have to invoke the command explicitly.

Manage memories

Memories are persistent facts—things that are true about your environment, your preferences, or your project’s state that the model should always know without being told each time.

In Claude Code, for example, memories live in ~/. claude/projects/project/memory/MEMORY.md.

Every time a memory is created, consider adding the details to your tree of README files instead. As memories accumulate, they can start to take up a lot of tokens.

Use strong verification methods

At least Format, Lint, and Unit Test.

  1. Instruct your AI to first format your files with a tool such as Prettier or Biome.
  2. Then lint with the TypeScript compiler (tsc) and optionally something like ESLint or Biome.
  3. Ask the agent to verify that unit tests still pass after writing any code.
  4. Provide instructions for running any other scripts or tools that verify code.
  5. This is also where you might add a code-review step by a separate agent.

The stronger your verification methods, the fewer loops your agent will take. (The Agentic Loop is the process where an agent will write code, verify, and make updates.) Keep the agent’s rework to a minimum because it uses tokens the same way multi-turn conversations do. Remember that the Agentic Loop is stateless just like regular conversations.

Be specific

Vague language produces vague output. Take time to write good requirements and you will save time overall.

Say what you mean, exactly. “Add authentication to the checkout flow” is an intent. Better is “Add JWT verification middleware to the POST /checkout endpoint, following the patterns in docs/modules/authentication.md, rejecting unauthenticated requests with a 401 and the standard error envelope.” The latter is more tokens up front but will require less follow up.

State what you don’t want. For example, “Do not use approach X” saves a round of cleanup. If there are wrong answers, rule them out in the prompt.

State what is out of scope. For example, “Do not tackle front-end messaging about the new authentication requirement; we’ll tackle that in a future step.”

Use a planning step

For larger initiatives use one turn to create a plan and store it permanently in docs/plans. Ask the agent to break up the initiative into small steps that each can be completed in one short conversation.

Take complex interactions to a new planning step

Sometimes a task becomes unexpectedly complex or the agent runs into a problem or ambiguity. Take a turn to put the discovered situation into a new prompt (or a new docs/plans file for a larger initiative).

Keep educating yourself

Here are some great sources for continued learning:

  1. TLDR AI Newsletter
  2. Matt Pocock on YouTube
  3. Web Dev Simplified on YouTube
  4. Claude on YouTube
  5. Claude Code Docs

Conclusion

Shift your mindset from “I’ll iterate toward what I want” to “I’ll plan ahead and specify exactly what I want.” It takes practice, but these techniques will make a big difference.

Invest in this time up front and you’ll save time overall. You’ll use fewer tokens and get better results at the same time.