Instrumentation & Integrations
How dexcost auto-instruments LLM providers and HTTP libraries, and how to record costs manually or via LangChain.
Automatic LLM capture
dexcost.init() calls CostTracker, which iterates ALL_SUPPORTED_INSTRUMENTS and monkey-patches each SDK that is installed using wrapt. The wrap happens once at process start; all call sites in your code continue to call the same functions unmodified — the patch is transparent.
Provider matrix
| Provider | Package | Patched method |
|---|---|---|
| OpenAI | openai | openai.resources.chat.completions.Completions.create (sync + async) |
| Anthropic | anthropic | anthropic.resources.messages.Messages.create (sync + async) |
| LiteLLM | litellm | litellm.completion (sync) + litellm.acompletion (async) |
| Google Gemini | google-genai | google.genai.models.Models.generate_content (sync) |
| AWS Bedrock | boto3 / botocore | botocore.client.BaseClient._make_api_call — only bedrock-runtime InvokeModel operations are intercepted |
| Cohere | cohere | cohere.Client.chat (sync) + cohere.AsyncClient.chat (async) |
| MCP | mcp | mcp.ClientSession.call_tool |
Each provider module stores the original method before patching so uninstrument_* can restore it exactly.
Streaming support
All six LLM providers capture streamed responses. For a sync stream the wrapper returns an iterator that yields chunks unchanged and records the llm_call event — with model, token counts, and latency — once the stream is fully consumed. Providers that expose an async client (OpenAI, Anthropic, LiteLLM, Cohere) capture async streams the same way. For Gemini, the sync generate_content_stream method is patched alongside generate_content. For Cohere, chat_stream (sync and async) is patched alongside chat.
Controlling which providers are instrumented
By default all installed providers are auto-patched. Pass an explicit list to auto_instrument to limit the scope:
import dexcost
# Only instrument OpenAI and Anthropic
dexcost.init(api_key="dx_live_...", auto_instrument=["openai", "anthropic"])Pass an empty list to skip all auto-instrumentation entirely:
dexcost.init(api_key="dx_live_...", auto_instrument=[])Valid names are: "openai", "anthropic", "litellm", "gemini", "bedrock", "cohere", "mcp".
Manual instrument / uninstrument
The module-level functions instrument_* and uninstrument_* let you add or remove a provider after init():
from dexcost import CostTracker, instrument_openai, uninstrument_openai
# A tracker created with no auto-instrumentation
tracker = CostTracker(auto_instrument=[])
# Add a provider later — the module-level helpers take the tracker
instrument_openai(tracker)
# uninstrument restores the original, un-patched method
uninstrument_openai()
# The tracker also instruments by name
tracker.instrument("openai") # raises RuntimeError if already instrumented
tracker.uninstrument("openai") # safe to call even if not currently instrumentedtracker.instrumented returns a frozenset[str] of names currently patched on that tracker.
HTTP & non-LLM cost capture
When track_http=True (the default), dexcost.init() calls track_http() from dexcost.adapters.http, which patches five HTTP libraries:
| Library | Patched method |
|---|---|
requests | requests.Session.send |
httpx | httpx.Client.send |
aiohttp | aiohttp.ClientSession._request |
botocore | botocore.httpsession.URLLib3Session.send |
urllib3 | urllib3.HTTPConnectionPool.urlopen |
Each library is patched only if it is installed; missing ones are silently skipped. A thread-local guard prevents double-counting when requests or botocore (which both use urllib3 internally) make a call that would otherwise be captured by both layers.
Service catalog
The bundled service catalog maps 163 external service domains to pricing rules. When an HTTP call's hostname matches a catalog entry, an external_cost event is recorded automatically — no rate registration needed. Examples: Pinecone, Stripe, Twilio, SendGrid, Firecrawl, Exa Search.
To refresh the catalog from a remote URL at startup:
dexcost.init(
api_key="dx_live_...",
service_catalog_url="https://catalog.example.com/dexcost-catalog.json",
)Domain rate registry
For services not in the built-in catalog, register a per-request rate directly via register_domain_rate. User-registered rates take precedence over the service catalog:
from dexcost.adapters.http import register_domain_rate
register_domain_rate("api.example.com", cost_usd="0.01", per="request")After registration, every HTTP call whose hostname is api.example.com records an external_cost event with cost_usd="0.01".
Manual recording
Use the TrackedTask methods when auto-instrumentation does not cover your cost source or you need fine-grained control.
Tasks
Open a task with the dexcost.task() context manager and all costs recorded inside it are grouped together:
import dexcost
with dexcost.task(task_type="generate_report") as t:
# auto-captured LLM calls land here automatically
t.record_cost(service="pdf_renderer", cost_usd="0.002")
t.record_cost(service="cloud_storage", cost_usd="0.0001", event_type="compute_cost")task() supports both sync (with) and async (async with) usage. When the block exits cleanly the task status is "success"; an uncaught exception marks it "failed" but the events are still persisted.
Nesting is supported. Any dexcost.task() opened inside an active task is automatically linked via parent_task_id:
with dexcost.task(task_type="pipeline") as outer:
with dexcost.task(task_type="step_one") as inner:
... # inner.task.parent_task_id == outer.task_idManual start/end (CostTracker.start_task) is available for architectures where context managers do not fit (e.g., Celery workers):
from dexcost import CostTracker
tracker = CostTracker()
t = tracker.start_task(task_type="celery_job", customer_id="acme")
try:
t.record_cost(service="queue", cost_usd="0.0005")
t.end(status="success")
except Exception:
t.end(status="failed")
raiseThe caller must call t.end() exactly once; forgetting it emits a ResourceWarning when the object is garbage-collected.
record_cost
Record any non-LLM cost as a dollar amount:
t.record_cost(
service="google_maps_api",
cost_usd="0.005",
event_type="external_cost", # or "compute_cost"
cost_confidence="exact",
details={"endpoint": "/geocode", "region": "us-east-1"},
)cost_usd accepts a Decimal or a string — never a float.
record_usage
Compute cost from the rate registry. Register the rate once, then call record_usage anywhere:
tracker.register_rate("maps.googleapis.com", per="request", cost_usd="0.005")
with tracker.task(task_type="route_calculation") as t:
t.record_usage("maps.googleapis.com", units=3) # records 3 × $0.005 = $0.015Raises ValueError if no rate is registered for the given service name.
record_llm_call
Manually record an LLM call — useful when using a provider not yet in the auto-instrument list, or when testing:
t.record_llm_call(
provider="openai",
model="gpt-4o",
input_tokens=800,
output_tokens=150,
# cost_usd omitted → auto-computed via PricingEngine
cached_tokens=200,
latency_ms=420,
error_type="rate_limit", # marks event for retry detection
)When cost_usd is omitted, the PricingEngine computes it from the bundled LiteLLM pricing data.
Retry tracking
Flag a retry explicitly with mark_retry. This creates a retry_marker event and contributes to the task's retry_cost_usd aggregate:
try:
response = client.chat.completions.create(...)
except openai.RateLimitError:
t.mark_retry(reason="rate_limit", cost_usd="0.001")
raisemark_not_retry removes the retry flag from the most recent retry event, or from a specific event by event_id:
t.mark_not_retry() # un-flag the most recent retry event
t.mark_not_retry(event_id=e) # un-flag a specific eventHeuristic retry detection is opt-in via enable_retry_heuristics=True in dexcost.init(). When enabled, record_llm_call inspects recent events in the same task and automatically sets is_retry=True if a prior call for the same model failed with a transient error (rate_limit, timeout, 5xx, server_error, connection_error) within the configured window.
Framework integrations
LangChain
DexcostCallbackHandler is a duck-typed LangChain callback handler — it matches BaseCallbackHandler's interface without inheriting from it, so langchain is not a required dependency:
from dexcost.integrations import DexcostCallbackHandler
from dexcost import CostTracker
import dexcost
dexcost.init(api_key="dx_live_...")
dexcost.set_context(customer_id="acme-corp")
tracker = CostTracker()
handler = DexcostCallbackHandler(tracker)
from langchain_openai import ChatOpenAI
llm = ChatOpenAI(model="gpt-4o", callbacks=[handler])
with dexcost.task(task_type="lc_chain") as t:
result = llm.invoke("Summarise this text")The handler implements three lifecycle methods:
| Method | When called | What it records |
|---|---|---|
on_llm_start(serialized, prompts, *, run_id) | LLM generation starts | Stores start time and model name |
on_llm_end(response, *, run_id) | LLM generation completes | Records llm_call event with tokens, cost, and latency |
on_llm_error(error, *, run_id) | LLM call raises an exception | Records llm_call event with error_type and latency |
Token usage is read from response.llm_output["token_usage"] (prompt_tokens and completion_tokens). If no usage data is present, cost is recorded as Decimal("0") with cost_confidence="unknown".
The handler requires an active dexcost.task() context when on_llm_end fires. If no task is active, the event is skipped and a warning is logged.