Instrumentation & Integrations

How dexcost auto-instruments LLM providers and HTTP libraries, and how to record costs manually or via LangChain.

Automatic LLM capture

dexcost.init() calls CostTracker, which iterates ALL_SUPPORTED_INSTRUMENTS and monkey-patches each SDK that is installed using wrapt. The wrap happens once at process start; all call sites in your code continue to call the same functions unmodified — the patch is transparent.

Provider matrix

Provider	Package	Patched method
OpenAI	`openai`	`openai.resources.chat.completions.Completions.create` (sync + async)
Anthropic	`anthropic`	`anthropic.resources.messages.Messages.create` (sync + async)
LiteLLM	`litellm`	`litellm.completion` (sync) + `litellm.acompletion` (async)
Google Gemini	`google-genai`	`google.genai.models.Models.generate_content` (sync)
AWS Bedrock	`boto3` / `botocore`	`botocore.client.BaseClient._make_api_call` — only `bedrock-runtime` `InvokeModel` operations are intercepted
Cohere	`cohere`	`cohere.Client.chat` (sync) + `cohere.AsyncClient.chat` (async)
MCP	`mcp`	`mcp.ClientSession.call_tool`

Each provider module stores the original method before patching so uninstrument_* can restore it exactly.

Streaming support

All six LLM providers capture streamed responses. For a sync stream the wrapper returns an iterator that yields chunks unchanged and records the llm_call event — with model, token counts, and latency — once the stream is fully consumed. Providers that expose an async client (OpenAI, Anthropic, LiteLLM, Cohere) capture async streams the same way. For Gemini, the sync generate_content_stream method is patched alongside generate_content. For Cohere, chat_stream (sync and async) is patched alongside chat.

Controlling which providers are instrumented

By default all installed providers are auto-patched. Pass an explicit list to auto_instrument to limit the scope:

import dexcost

# Only instrument OpenAI and Anthropic
dexcost.init(api_key="dx_live_...", auto_instrument=["openai", "anthropic"])

Pass an empty list to skip all auto-instrumentation entirely:

dexcost.init(api_key="dx_live_...", auto_instrument=[])

Valid names are: "openai", "anthropic", "litellm", "gemini", "bedrock", "cohere", "mcp".

Manual instrument / uninstrument

The module-level functions instrument_* and uninstrument_* let you add or remove a provider after init():

from dexcost import CostTracker, instrument_openai, uninstrument_openai

# A tracker created with no auto-instrumentation
tracker = CostTracker(auto_instrument=[])

# Add a provider later — the module-level helpers take the tracker
instrument_openai(tracker)

# uninstrument restores the original, un-patched method
uninstrument_openai()

# The tracker also instruments by name
tracker.instrument("openai")   # raises RuntimeError if already instrumented
tracker.uninstrument("openai") # safe to call even if not currently instrumented

tracker.instrumented returns a frozenset[str] of names currently patched on that tracker.

HTTP & non-LLM cost capture

When track_http=True (the default), dexcost.init() calls track_http() from dexcost.adapters.http, which patches five HTTP libraries:

Library	Patched method
`requests`	`requests.Session.send`
`httpx`	`httpx.Client.send`
`aiohttp`	`aiohttp.ClientSession._request`
`botocore`	`botocore.httpsession.URLLib3Session.send`
`urllib3`	`urllib3.HTTPConnectionPool.urlopen`

Each library is patched only if it is installed; missing ones are silently skipped. A thread-local guard prevents double-counting when requests or botocore (which both use urllib3 internally) make a call that would otherwise be captured by both layers.

Service catalog

The bundled service catalog maps 163 external service domains to pricing rules. When an HTTP call's hostname matches a catalog entry, an external_cost event is recorded automatically — no rate registration needed. Examples: Pinecone, Stripe, Twilio, SendGrid, Firecrawl, Exa Search.

To refresh the catalog from a remote URL at startup:

dexcost.init(
    api_key="dx_live_...",
    service_catalog_url="https://catalog.example.com/dexcost-catalog.json",
)

Domain rate registry

For services not in the built-in catalog, register a per-request rate directly via register_domain_rate. User-registered rates take precedence over the service catalog:

from dexcost.adapters.http import register_domain_rate

register_domain_rate("api.example.com", cost_usd="0.01", per="request")

After registration, every HTTP call whose hostname is api.example.com records an external_cost event with cost_usd="0.01".

Manual recording

Use the TrackedTask methods when auto-instrumentation does not cover your cost source or you need fine-grained control.

Tasks

Open a task with the dexcost.task() context manager and all costs recorded inside it are grouped together:

import dexcost

with dexcost.task(task_type="generate_report") as t:
    # auto-captured LLM calls land here automatically
    t.record_cost(service="pdf_renderer", cost_usd="0.002")
    t.record_cost(service="cloud_storage", cost_usd="0.0001", event_type="compute_cost")

task() supports both sync (with) and async (async with) usage. When the block exits cleanly the task status is "success"; an uncaught exception marks it "failed" but the events are still persisted.

Nesting is supported. Any dexcost.task() opened inside an active task is automatically linked via parent_task_id:

with dexcost.task(task_type="pipeline") as outer:
    with dexcost.task(task_type="step_one") as inner:
        ...  # inner.task.parent_task_id == outer.task_id

Manual start/end (CostTracker.start_task) is available for architectures where context managers do not fit (e.g., Celery workers):

from dexcost import CostTracker

tracker = CostTracker()
t = tracker.start_task(task_type="celery_job", customer_id="acme")
try:
    t.record_cost(service="queue", cost_usd="0.0005")
    t.end(status="success")
except Exception:
    t.end(status="failed")
    raise

The caller must call t.end() exactly once; forgetting it emits a ResourceWarning when the object is garbage-collected.

`record_cost`

Record any non-LLM cost as a dollar amount:

t.record_cost(
    service="google_maps_api",
    cost_usd="0.005",
    event_type="external_cost",   # or "compute_cost"
    cost_confidence="exact",
    details={"endpoint": "/geocode", "region": "us-east-1"},
)

cost_usd accepts a Decimal or a string — never a float.

`record_usage`

Compute cost from the rate registry. Register the rate once, then call record_usage anywhere:

tracker.register_rate("maps.googleapis.com", per="request", cost_usd="0.005")

with tracker.task(task_type="route_calculation") as t:
    t.record_usage("maps.googleapis.com", units=3)  # records 3 × $0.005 = $0.015

Raises ValueError if no rate is registered for the given service name.

`record_llm_call`

Manually record an LLM call — useful when using a provider not yet in the auto-instrument list, or when testing:

t.record_llm_call(
    provider="openai",
    model="gpt-4o",
    input_tokens=800,
    output_tokens=150,
    # cost_usd omitted → auto-computed via PricingEngine
    cached_tokens=200,
    latency_ms=420,
    error_type="rate_limit",  # marks event for retry detection
)

When cost_usd is omitted, the PricingEngine computes it from the bundled LiteLLM pricing data.

Retry tracking

Flag a retry explicitly with mark_retry. This creates a retry_marker event and contributes to the task's retry_cost_usd aggregate:

try:
    response = client.chat.completions.create(...)
except openai.RateLimitError:
    t.mark_retry(reason="rate_limit", cost_usd="0.001")
    raise

mark_not_retry removes the retry flag from the most recent retry event, or from a specific event by event_id:

t.mark_not_retry()            # un-flag the most recent retry event
t.mark_not_retry(event_id=e)  # un-flag a specific event

Heuristic retry detection is opt-in via enable_retry_heuristics=True in dexcost.init(). When enabled, record_llm_call inspects recent events in the same task and automatically sets is_retry=True if a prior call for the same model failed with a transient error (rate_limit, timeout, 5xx, server_error, connection_error) within the configured window.

Framework integrations

LangChain

DexcostCallbackHandler is a duck-typed LangChain callback handler — it matches BaseCallbackHandler's interface without inheriting from it, so langchain is not a required dependency:

from dexcost.integrations import DexcostCallbackHandler
from dexcost import CostTracker
import dexcost

dexcost.init(api_key="dx_live_...")
dexcost.set_context(customer_id="acme-corp")

tracker = CostTracker()
handler = DexcostCallbackHandler(tracker)

from langchain_openai import ChatOpenAI

llm = ChatOpenAI(model="gpt-4o", callbacks=[handler])

with dexcost.task(task_type="lc_chain") as t:
    result = llm.invoke("Summarise this text")

The handler implements three lifecycle methods:

Method	When called	What it records
`on_llm_start(serialized, prompts, *, run_id)`	LLM generation starts	Stores start time and model name
`on_llm_end(response, *, run_id)`	LLM generation completes	Records `llm_call` event with tokens, cost, and latency
`on_llm_error(error, *, run_id)`	LLM call raises an exception	Records `llm_call` event with `error_type` and latency

Token usage is read from response.llm_output["token_usage"] (prompt_tokens and completion_tokens). If no usage data is present, cost is recorded as Decimal("0") with cost_confidence="unknown".

The handler requires an active dexcost.task() context when on_llm_end fires. If no task is active, the event is skipped and a warning is logged.

Instrumentation & Integrations

On this page