Skip to content

Token Efficiency

Every tool response includes measurements of how many tokens it returned versus how many tokens a naive approach (reading entire files) would have consumed. This is not an estimate -- it is calculated for every call and tracked across sessions.

Per-response efficiency

Every response from the server includes a _meta.token_efficiency block:

"_meta": {
  "token_efficiency": {
    "returned": 244,
    "equivalent_file_read": 9817,
    "reduction_percent": 97.5,
    "method": "byte_estimate"
  }
}
Field Meaning
returned Tokens in this response
equivalent_file_read Tokens if you had read the full file(s) instead
reduction_percent How much was saved: (1 - returned/equivalent) * 100
method How tokens were counted (see below)

When read_symbol returns a 30-line function from a 500-line file, returned is the token count of those 30 lines and equivalent_file_read is the token count of the entire file. The difference is what your agent did not have to process.

Measurement methods

The method field tells you how tokens were counted:

Method When used Accuracy
tiktoken_cl100k Both sides can use tiktoken (Python) Exact
byte_estimate Tiktoken unavailable or cross-language Approximate (bytes / 4)

When the server and your agent both use tiktoken's cl100k_base encoding, the counts are exact. When that is not possible (e.g., the file content is only available as bytes), a byte-based estimate is used. The method is always reported so you know the precision of the numbers.

Session tracking

Token efficiency accumulates across all tool calls in a session. Use usage_stats to see the running totals:

usage_stats()
{
  "session": {
    "duration_seconds": 2366.2,
    "tool_calls": 35,
    "symbols_retrieved": 14,
    "tokens_returned": 28373,
    "tokens_avoided": 131090,
    "token_efficiency": {
      "total_returned": 25543,
      "total_equivalent": 104555,
      "reduction_percent": 75.6,
      "by_category": {
        "search": {
          "calls": 2,
          "returned": 17359,
          "equivalent": 40557
        },
        "retrieval": {
          "calls": 16,
          "returned": 8184,
          "equivalent": 63998
        },
        "analysis": {
          "calls": 0,
          "returned": 0,
          "equivalent": 0
        }
      }
    }
  }
}

The by_category breakdown shows efficiency by tool type:

Category Tools included
search find_code, find_text, find_docs, search_all_repos
retrieval read_symbol, read_doc_section, whats_in_file, understand_symbol
analysis what_breaks_if_i_change, who_calls_this, inheritance_chain
indexing index_project, reindex_file, index_multi_repo
meta usage_stats, open_dashboard, indexed_repos

Retrieval tools typically have the highest reduction percentages because they return small slices of large files. Search tools return compact result lists that replace what would otherwise be multiple file reads.

All-time tracking

Efficiency data persists across server restarts. The overall section of usage_stats shows lifetime totals:

"overall": {
  "repos_used": 2,
  "days_active": 14,
  "total_tool_calls": 1847,
  "total_tokens_returned": 482000,
  "total_tokens_avoided": 3210000,
  "total_symbols_retrieved": 920,
  "first_used": "2025-01-10",
  "last_used": "2025-01-24"
}

This gives you a long-term picture of how much the server is saving. If total_tokens_avoided is in the millions, the server is doing significant work to keep your agent's context window focused.

The efficiency ring

The dashboard (covered in the previous chapter) displays an efficiency ring -- a visual gauge showing the current session's reduction percentage. A 75% ring means three-quarters of the tokens your agent would have consumed were avoided.

The ring updates in real-time as tool calls are made. It is the quickest way to see whether the server is earning its keep during an active session.

Filtering by repo

To see efficiency for a specific repository:

usage_stats(repo="my-project")

This filters the session and overall statistics to only include tool calls that touched that repository. Useful for comparing how different projects benefit from indexing.

The input cost

Sylvan registers 65 tools with descriptions and parameter schemas. This adds approximately 9,200 tokens to your agent's context at session start. On a 200K context model that is 4.6%, on a 1M model under 1%. The cost is fixed and does not grow during the conversation.

See TRANSPARENCY.md in the repository root for the full breakdown, including what data sylvan does and does not access.

What the numbers mean in practice

  • 90%+ reduction is typical for read_symbol calls, where a single function is returned from a large file.
  • 70-85% reduction is typical for search calls, where a ranked result list replaces reading multiple files.
  • 50-70% reduction is typical for whats_in_file, where signatures replace full source.
  • Negative reduction can happen with find_text on small files, where the context lines plus metadata exceed the file size. This is rare and the absolute token count is small when it happens.

The overall session reduction percentage is the number that matters most. If it stays above 70%, the server is consistently returning focused results instead of dumping entire files into your agent's context.