Cost Tracking

Track token usage and estimated costs across models with built-in pricing tables.

The cost tracking system records token usage per model and estimates USD costs using built-in pricing tables. It supports input, output, cache read, and cache creation tokens.

Configuration

import { Agent } from "noumen";
import { LocalSandbox } from "noumen/local";

const code = new Agent({
  provider,
  sandbox: LocalSandbox({ cwd: "/my/project" }),
  options: {
    costTracking: {
      enabled: true,
    },
  },
});

Getting cost summaries

After running threads, get a summary from the Agent instance:

const summary = code.getCostSummary();
console.log(`Total cost: $${summary.totalCostUSD.toFixed(4)}`);
console.log(`Input tokens: ${summary.totalInputTokens}`);
console.log(`Output tokens: ${summary.totalOutputTokens}`);

Or use the formatted output:

const tracker = new CostTracker();
// ... after some usage ...
console.log(tracker.formatSummary());
// Total cost: $0.0234
// Usage by model:
//   gpt-4o: 12.3k input, 2.1k output ($0.0180)
//   gpt-4o-mini: 5.2k input, 800 output ($0.0054)

CostSummary type

interface CostSummary {
  totalCostUSD: number;
  totalInputTokens: number;
  totalOutputTokens: number;
  totalCacheReadTokens: number;
  totalCacheCreationTokens: number;
  byModel: Record<string, ModelUsageSummary>;
  duration: { apiMs: number; wallMs: number };
}

Custom pricing

Override built-in pricing by passing a custom pricing table:

import { CostTracker, type ModelPricing } from "noumen";

const pricing: Record<string, ModelPricing> = {
  "my-custom-model": {
    inputPer1M: 3.0,
    outputPer1M: 15.0,
  },
};

const tracker = new CostTracker(pricing);

Stream events

Cost updates are emitted as cost_update stream events during agent runs:

for await (const event of thread.run("...")) {
  if (event.type === "cost_update") {
    console.log(`Running cost: $${event.summary.totalCostUSD.toFixed(4)}`);
  }
}

Prompt caching

Prompt caching reduces cost and latency by enabling providers to cache and reuse the prefix of each request. When enabled, noumen sorts tool definitions into a deterministic order and injects cache_control breakpoints into messages so that the provider can serve the shared prefix from cache on subsequent requests.

const code = new Agent({
  provider,
  sandbox,
  options: {
    promptCaching: {
      enabled: true,
      ttl: "1h",       // cache TTL (optional)
      scope: "global",  // "global" or "org" (optional)
    },
  },
});

This is opt-in because it changes the order of tool definitions sent to the provider (sorted alphabetically with built-in tools first, MCP tools second) and adds cache_control markers to messages. These changes are invisible to the model but can affect behavior if you're comparing raw API payloads or relying on a specific tool ordering. Prompt caching is only effective with providers that support it (Anthropic, some OpenAI models).

CacheControlConfig

FieldTypeDefaultDescription
enabledbooleanEnable prompt caching
ttl"1h"Cache time-to-live hint for the provider
scope"global" | "org"Cache sharing scope