Instrumentation & Integrations
How dexcost captures LLM costs via wrapper clients, records non-LLM HTTP costs, and integrates with axum, tower, and actix-web.
Automatic LLM capture
Rust cannot monkey-patch running code, so dexcost uses explicit wrapper clients. After calling init(Config::default()), construct the wrapper once, passing a shared PricingEngine, and call its record_response method immediately after each LLM call with the token counts from the provider response.
Provider matrix
| Provider | Wrapper type | record_response arguments |
|---|---|---|
| OpenAI | TrackedOpenAI | model, input_tokens, output_tokens, cached_tokens?, latency_ms? |
| Anthropic | TrackedAnthropic | model, input_tokens, output_tokens, cache_creation_tokens?, cache_read_tokens?, latency_ms? |
| Google Gemini | TrackedGemini | model, prompt_token_count, candidates_token_count, cached_content_token_count?, latency_ms? |
All three are re-exported from the top-level dexcost crate: dexcost::TrackedOpenAI, dexcost::TrackedAnthropic, dexcost::TrackedGemini.
use std::sync::Arc;
use dexcost::{init, Config, TrackedOpenAI, pricing::engine::PricingEngine};
#[tokio::main]
async fn main() {
init(Config::default()).unwrap();
let pricing = Arc::new(PricingEngine::new());
let openai_wrapper = TrackedOpenAI::new(pricing);
// After your OpenAI SDK call, extract token counts and pass them:
let event = openai_wrapper.record_response(
"gpt-4o",
1_000, // input_tokens
500, // output_tokens
None, // cached_tokens
Some(320), // latency_ms
);
}TrackedAnthropic separates cache_creation_tokens (written to the prompt cache at the higher creation rate) from cache_read_tokens (read from the cache at the discounted rate):
use std::sync::Arc;
use dexcost::{TrackedAnthropic, pricing::engine::PricingEngine};
let pricing = Arc::new(PricingEngine::new());
let anthropic_wrapper = TrackedAnthropic::new(pricing);
let event = anthropic_wrapper.record_response(
"claude-3-5-sonnet-20241022",
800, // input_tokens
300, // output_tokens
Some(100), // cache_creation_tokens
Some(50), // cache_read_tokens
Some(1_200), // latency_ms
);TrackedGemini uses Gemini's field names (prompt_token_count, candidates_token_count, cached_content_token_count) and records event.provider = "google":
use std::sync::Arc;
use dexcost::{TrackedGemini, pricing::engine::PricingEngine};
let pricing = Arc::new(PricingEngine::new());
let gemini_wrapper = TrackedGemini::new(pricing);
let event = gemini_wrapper.record_response(
"gemini-1.5-pro",
2_000, // prompt_token_count
800, // candidates_token_count
Some(300), // cached_content_token_count
Some(950), // latency_ms
);Each wrapper also provides a record_response_buffered variant that creates an auto-task, aggregates costs, and writes the event to an EventBuffer in a single call — useful when you do not have an explicit TrackedTask in scope.
Map recorders
When the response is already deserialized into a serde_json::Value, use the map recorders in dexcost::clients::wrappers:
| Function | Provider |
|---|---|
record_openai_response(buffer, pricing, task_id, response) | OpenAI (usage.prompt_tokens, usage.completion_tokens) |
record_anthropic_response(buffer, pricing, task_id, response) | Anthropic (usage.input_tokens, usage.output_tokens) |
record_gemini_response(buffer, pricing, task_id, response) | Google Gemini (usageMetadata.promptTokenCount, usageMetadata.candidatesTokenCount) |
record_litellm_response(buffer, pricing, task_id, response) | LiteLLM (OpenAI shape, stamps provider = "litellm") |
record_mcp_response(buffer, catalog, task_id, server_url, response_size_bytes?) | MCP server (matches URL against service catalog) |
record_mcp_tool_call(buffer, catalog, rates?, task_id, tool_name, server?, latency_ms?, is_error) | MCP tool call (resolves cost via rate registry then tool map then catalog) |
use dexcost::clients::wrappers::{record_openai_response, record_litellm_response};
use dexcost::transport::buffer::EventBuffer;
use dexcost::pricing::engine::PricingEngine;
let mut buffer = EventBuffer::new().unwrap();
let pricing = PricingEngine::new();
let response = serde_json::json!({
"model": "gpt-4o",
"usage": { "prompt_tokens": 1000, "completion_tokens": 500 }
});
let event = record_openai_response(&mut buffer, &pricing, "task-123", &response)
.await
.unwrap();Streaming
SSE streaming support is available behind the streaming Cargo feature. Enable it in Cargo.toml:
[dependencies]
dexcost = { version = "0.1", features = ["streaming"] }The dexcost::clients::streaming module provides drain_openai_stream and drain_anthropic_stream. Each function consumes a pinned futures::Stream<Item = Result<Bytes, E>>, aggregates token usage from the SSE chunks, and returns a synthetic response serde_json::Value suitable for the map recorders:
use futures::StreamExt;
use dexcost::clients::streaming::drain_openai_stream;
use dexcost::clients::wrappers::record_openai_response;
// `byte_stream` is a `Pin<Box<dyn Stream<Item = Result<Bytes, _>>>>` from your HTTP client
let synth = drain_openai_stream(byte_stream).await.unwrap();
record_openai_response(&mut buffer, &pricing, "task-123", &synth)
.await
.unwrap();For OpenAI, token usage is read from the final usage chunk (requires stream_options.include_usage = true in the request). For Anthropic, usage arrives in the message_start and message_delta events; drain_anthropic_stream captures both and returns final cumulative output token counts.
HTTP & non-LLM cost capture
DexcostMiddleware (reqwest-middleware)
Rust has no global HTTP hook. The reqwest-middleware Cargo feature adds DexcostMiddleware, which you attach manually to a single reqwest client using reqwest_middleware::ClientBuilder:
[dependencies]
dexcost = { version = "0.1", features = ["reqwest-middleware"] }
reqwest-middleware = "0.4"use std::sync::Arc;
use tokio::sync::Mutex;
use reqwest_middleware::ClientBuilder;
use dexcost::adapters::reqwest_middleware::DexcostMiddleware;
use dexcost::pricing::engine::PricingEngine;
use dexcost::pricing::service_catalog::ServiceCatalog;
use dexcost::transport::buffer::EventBuffer;
let catalog = Arc::new(ServiceCatalog::new());
let pricing = Arc::new(PricingEngine::new());
let buffer = Arc::new(Mutex::new(EventBuffer::new().unwrap()));
let mw = DexcostMiddleware::new(
catalog,
pricing,
buffer,
Some("task-id-to-attribute".to_string()), // pass None for per-host auto-tasks
);
let client = ClientBuilder::new(reqwest::Client::new())
.with(mw)
.build();After attaching, every outgoing request is inspected against the service catalog by hostname. The middleware checks three fallback layers in order:
- Catalog match — the URL matches a known service entry; cost is extracted from the JSON response body.
- LLM provider match — for
openai.comandanthropic.comhosts, token usage is extracted and anllm_callevent is recorded. - Domain rate fallback — the hostname is registered in the domain rate registry; a fixed per-call cost is recorded.
Events are written to the durable EventBuffer so they are picked up by the background pusher.
Service catalog
The bundled service catalog maps hundreds of external-service domains to pricing rules. When DexcostMiddleware intercepts a request to a matching hostname, it automatically records an external_cost event without any additional configuration. Examples include Exa Search, Pinecone, Stripe, Twilio, SendGrid, and Firecrawl.
To extend the catalog with a remote URL at startup, set service_catalog_url in Config. The SDK fetches and merges the remote catalog in the background on init:
dexcost::init(Config {
service_catalog_url: Some("https://catalog.example.com/dexcost-catalog.json".into()),
..Default::default()
}).unwrap();Domain rate registry
For services not in the bundled catalog, register a per-request rate with register_domain_rate. Domain rates take precedence over catalog entries when both match:
use rust_decimal_macros::dec;
use dexcost::adapters::http::register_domain_rate;
register_domain_rate("api.example.com", dec!(0.005), "request");After registration, any request to api.example.com intercepted by DexcostMiddleware (or a record_http_cost call) records an external_cost event with that cost. Use get_domain_rates() to inspect the registry and clear_domain_rates() to remove all registrations.
Manual recording
record_cost
Records a non-LLM cost event on a TrackedTask. Defaults to CostConfidence::Exact and PricingSource::Manual:
use rust_decimal_macros::dec;
use dexcost::core::models::EventType;
task.record_cost("google_maps", dec!(0.005), None, None).await?;
// With an explicit event type:
task.record_cost(
"my_compute",
dec!(0.01),
None,
Some(EventType::ComputeCost),
).await?;Use record_cost_with to override CostConfidence, PricingSource, or pricing_version:
use dexcost::core::tracker::RecordCostOptions;
use dexcost::core::models::{CostConfidence, PricingSource};
task.record_cost_with(
"pinecone",
dec!(0.004),
None,
RecordCostOptions {
cost_confidence: Some(CostConfidence::Estimated),
pricing_source: Some(PricingSource::ServiceCatalog),
pricing_version: Some("v-abc".into()),
details: None,
},
).await?;record_usage
Looks up the service in the RateRegistry, multiplies the registered rate by units, and records an external_cost event. Returns an error when no rate is registered for the service:
// Register the rate once, typically at startup
{
let reg = dexcost::rate_registry().unwrap();
let mut reg = reg.lock().await;
reg.register("maps.googleapis.com", "request", dec!(0.005));
}
// Per call:
task.record_usage("maps.googleapis.com", 3).await?; // records 3 × $0.005 = $0.015record_llm_call
Manually records an llm_call event. When cost_usd is None or zero, the pricing engine computes cost from the bundled LiteLLM data:
use rust_decimal_macros::dec;
task.record_llm_call(
"openai",
"gpt-4o",
1_000, // input_tokens
500, // output_tokens
None, // cost_usd — auto-priced
Some(200), // cached_tokens
Some(420), // latency_ms
).await?;Use record_llm_call_with to pass RecordLlmCallOptions for error_type, extra details, or cost confidence overrides:
use dexcost::core::tracker::RecordLlmCallOptions;
task.record_llm_call_with(
"openai", "gpt-4o",
1_000, 500, None, None, None,
RecordLlmCallOptions {
error_type: Some("rate_limit".into()),
..Default::default()
},
).await?;Tasks — start_task and nesting
start_task returns a TrackedTask. Pass TaskOptions to set attribution and call task.end(TaskStatus::Success) when done:
use dexcost::{start_task, flush, TaskOptions, TaskStatus};
let mut task = start_task("resolve_ticket", TaskOptions {
customer_id: Some("acme-corp".into()),
project_id: Some("support".into()),
..Default::default()
}).await?;
task.record_llm_call("openai", "gpt-4o", 1_000, 500, None, None, None)
.await?;
task.end(TaskStatus::Success).await?;
flush().await?;Nest tasks with scope. Any start_task call inside a scope future automatically sets parent_task_id to the parent task:
let mut parent = start_task("pipeline", TaskOptions::default()).await?;
parent.scope(async {
// child.parent_task_id == parent.task_id
let mut child = start_task("step_one", TaskOptions::default()).await?;
child.end(TaskStatus::Success).await
}).await?;
parent.end(TaskStatus::Success).await?;Retry tracking
Flag a retry explicitly with mark_retry. This creates a retry_marker event and increments task.retry_count and task.retry_cost_usd:
task.mark_retry("rate_limit", dec!(0.002)).await?;mark_not_retry clears the retry flag. Pass None to target the most recent retry event, or Some(event_id) to target a specific one:
task.mark_not_retry(None).await?; // clear most recent
task.mark_not_retry(Some("event-id")).await?; // clear specific eventHeuristic retry detection is opt-in via TaskOptions::heuristics. When HeuristicConfig is set, the RetryHeuristicEngine checks each record_llm_call against recent events in the same task and automatically sets is_retry = true when a prior call with the same model and a transient error type occurred within the sliding window:
use dexcost::{start_task, TaskOptions};
use dexcost::core::tracker::HeuristicConfig;
let mut task = start_task("resolve_ticket", TaskOptions {
heuristics: Some(HeuristicConfig {
window_seconds: 30.0, // default
threshold: 0.8, // default
}),
..Default::default()
}).await?;Framework integrations
All framework integrations create a dexcost task per request, record the request latency as a trace link, and end the task with TaskStatus::Success for responses below 500 or TaskStatus::Failed for 5xx responses.
axum
The axum-middleware feature enables dexcost::middleware::axum::dexcost_middleware:
dexcost = { version = "0.1", features = ["axum-middleware"] }use std::sync::Arc;
use tokio::sync::Mutex;
use axum::{Router, middleware};
use dexcost::middleware::axum::dexcost_middleware;
use dexcost::transport::buffer::EventBuffer;
let buffer = Arc::new(Mutex::new(EventBuffer::new().unwrap()));
let app = Router::new()
.layer(middleware::from_fn(move |req, next| {
dexcost_middleware(req, next, buffer.clone(), None)
}));tower
The tower-middleware feature enables dexcost::middleware::tower::DexcostLayer, compatible with any tower::Service-based framework (Hyper, Tonic, Axum's lower layers):
dexcost = { version = "0.1", features = ["tower-middleware"] }use std::sync::Arc;
use tokio::sync::Mutex;
use tower::ServiceBuilder;
use dexcost::middleware::tower::DexcostLayer;
use dexcost::transport::buffer::EventBuffer;
let buffer = Arc::new(Mutex::new(EventBuffer::new().unwrap()));
let svc = ServiceBuilder::new()
.layer(DexcostLayer::new(buffer.clone(), None))
.service(my_inner_service);actix-web
The actix-middleware feature enables dexcost::middleware::actix::DexcostMiddleware:
dexcost = { version = "0.1", features = ["actix-middleware"] }use std::sync::Arc;
use tokio::sync::Mutex;
use actix_web::{App, HttpServer};
use dexcost::middleware::actix::DexcostMiddleware;
use dexcost::transport::buffer::EventBuffer;
#[actix_web::main]
async fn main() -> std::io::Result<()> {
let buffer = Arc::new(Mutex::new(EventBuffer::new().unwrap()));
HttpServer::new(move || {
App::new().wrap(DexcostMiddleware::new(buffer.clone(), None))
})
.bind("127.0.0.1:8080")?
.run()
.await
}