Instrumentation & Integrations

How dexcost captures LLM costs via wrapper clients, records non-LLM HTTP costs, and integrates with axum, tower, and actix-web.

Automatic LLM capture

Rust cannot monkey-patch running code, so dexcost uses explicit wrapper clients. After calling init(Config::default()), construct the wrapper once, passing a shared PricingEngine, and call its record_response method immediately after each LLM call with the token counts from the provider response.

Provider matrix

Provider	Wrapper type	`record_response` arguments
OpenAI	`TrackedOpenAI`	`model`, `input_tokens`, `output_tokens`, `cached_tokens?`, `latency_ms?`
Anthropic	`TrackedAnthropic`	`model`, `input_tokens`, `output_tokens`, `cache_creation_tokens?`, `cache_read_tokens?`, `latency_ms?`
Google Gemini	`TrackedGemini`	`model`, `prompt_token_count`, `candidates_token_count`, `cached_content_token_count?`, `latency_ms?`

All three are re-exported from the top-level dexcost crate: dexcost::TrackedOpenAI, dexcost::TrackedAnthropic, dexcost::TrackedGemini.

use std::sync::Arc;
use dexcost::{init, Config, TrackedOpenAI, pricing::engine::PricingEngine};

#[tokio::main]
async fn main() {
    init(Config::default()).unwrap();

    let pricing = Arc::new(PricingEngine::new());
    let openai_wrapper = TrackedOpenAI::new(pricing);

    // After your OpenAI SDK call, extract token counts and pass them:
    let event = openai_wrapper.record_response(
        "gpt-4o",
        1_000,       // input_tokens
        500,         // output_tokens
        None,        // cached_tokens
        Some(320),   // latency_ms
    );
}

TrackedAnthropic separates cache_creation_tokens (written to the prompt cache at the higher creation rate) from cache_read_tokens (read from the cache at the discounted rate):

use std::sync::Arc;
use dexcost::{TrackedAnthropic, pricing::engine::PricingEngine};

let pricing = Arc::new(PricingEngine::new());
let anthropic_wrapper = TrackedAnthropic::new(pricing);

let event = anthropic_wrapper.record_response(
    "claude-3-5-sonnet-20241022",
    800,         // input_tokens
    300,         // output_tokens
    Some(100),   // cache_creation_tokens
    Some(50),    // cache_read_tokens
    Some(1_200), // latency_ms
);

TrackedGemini uses Gemini's field names (prompt_token_count, candidates_token_count, cached_content_token_count) and records event.provider = "google":

use std::sync::Arc;
use dexcost::{TrackedGemini, pricing::engine::PricingEngine};

let pricing = Arc::new(PricingEngine::new());
let gemini_wrapper = TrackedGemini::new(pricing);

let event = gemini_wrapper.record_response(
    "gemini-1.5-pro",
    2_000,       // prompt_token_count
    800,         // candidates_token_count
    Some(300),   // cached_content_token_count
    Some(950),   // latency_ms
);

Each wrapper also provides a record_response_buffered variant that creates an auto-task, aggregates costs, and writes the event to an EventBuffer in a single call — useful when you do not have an explicit TrackedTask in scope.

Map recorders

When the response is already deserialized into a serde_json::Value, use the map recorders in dexcost::clients::wrappers:

Function	Provider
`record_openai_response(buffer, pricing, task_id, response)`	OpenAI (`usage.prompt_tokens`, `usage.completion_tokens`)
`record_anthropic_response(buffer, pricing, task_id, response)`	Anthropic (`usage.input_tokens`, `usage.output_tokens`)
`record_gemini_response(buffer, pricing, task_id, response)`	Google Gemini (`usageMetadata.promptTokenCount`, `usageMetadata.candidatesTokenCount`)
`record_litellm_response(buffer, pricing, task_id, response)`	LiteLLM (OpenAI shape, stamps `provider = "litellm"`)
`record_mcp_response(buffer, catalog, task_id, server_url, response_size_bytes?)`	MCP server (matches URL against service catalog)
`record_mcp_tool_call(buffer, catalog, rates?, task_id, tool_name, server?, latency_ms?, is_error)`	MCP tool call (resolves cost via rate registry then tool map then catalog)

use dexcost::clients::wrappers::{record_openai_response, record_litellm_response};
use dexcost::transport::buffer::EventBuffer;
use dexcost::pricing::engine::PricingEngine;

let mut buffer = EventBuffer::new().unwrap();
let pricing = PricingEngine::new();
let response = serde_json::json!({
    "model": "gpt-4o",
    "usage": { "prompt_tokens": 1000, "completion_tokens": 500 }
});

let event = record_openai_response(&mut buffer, &pricing, "task-123", &response)
    .await
    .unwrap();

Streaming

SSE streaming support is available behind the streaming Cargo feature. Enable it in Cargo.toml:

[dependencies]
dexcost = { version = "0.1", features = ["streaming"] }

The dexcost::clients::streaming module provides drain_openai_stream and drain_anthropic_stream. Each function consumes a pinned futures::Stream<Item = Result<Bytes, E>>, aggregates token usage from the SSE chunks, and returns a synthetic response serde_json::Value suitable for the map recorders:

use futures::StreamExt;
use dexcost::clients::streaming::drain_openai_stream;
use dexcost::clients::wrappers::record_openai_response;

// `byte_stream` is a `Pin<Box<dyn Stream<Item = Result<Bytes, _>>>>` from your HTTP client
let synth = drain_openai_stream(byte_stream).await.unwrap();
record_openai_response(&mut buffer, &pricing, "task-123", &synth)
    .await
    .unwrap();

For OpenAI, token usage is read from the final usage chunk (requires stream_options.include_usage = true in the request). For Anthropic, usage arrives in the message_start and message_delta events; drain_anthropic_stream captures both and returns final cumulative output token counts.

HTTP & non-LLM cost capture

`DexcostMiddleware` (reqwest-middleware)

Rust has no global HTTP hook. The reqwest-middleware Cargo feature adds DexcostMiddleware, which you attach manually to a single reqwest client using reqwest_middleware::ClientBuilder:

[dependencies]
dexcost = { version = "0.1", features = ["reqwest-middleware"] }
reqwest-middleware = "0.4"

use std::sync::Arc;
use tokio::sync::Mutex;
use reqwest_middleware::ClientBuilder;
use dexcost::adapters::reqwest_middleware::DexcostMiddleware;
use dexcost::pricing::engine::PricingEngine;
use dexcost::pricing::service_catalog::ServiceCatalog;
use dexcost::transport::buffer::EventBuffer;

let catalog = Arc::new(ServiceCatalog::new());
let pricing = Arc::new(PricingEngine::new());
let buffer = Arc::new(Mutex::new(EventBuffer::new().unwrap()));

let mw = DexcostMiddleware::new(
    catalog,
    pricing,
    buffer,
    Some("task-id-to-attribute".to_string()), // pass None for per-host auto-tasks
);

let client = ClientBuilder::new(reqwest::Client::new())
    .with(mw)
    .build();

After attaching, every outgoing request is inspected against the service catalog by hostname. The middleware checks three fallback layers in order:

Catalog match — the URL matches a known service entry; cost is extracted from the JSON response body.
LLM provider match — for openai.com and anthropic.com hosts, token usage is extracted and an llm_call event is recorded.
Domain rate fallback — the hostname is registered in the domain rate registry; a fixed per-call cost is recorded.

Events are written to the durable EventBuffer so they are picked up by the background pusher.

Service catalog

The bundled service catalog maps hundreds of external-service domains to pricing rules. When DexcostMiddleware intercepts a request to a matching hostname, it automatically records an external_cost event without any additional configuration. Examples include Exa Search, Pinecone, Stripe, Twilio, SendGrid, and Firecrawl.

To extend the catalog with a remote URL at startup, set service_catalog_url in Config. The SDK fetches and merges the remote catalog in the background on init:

dexcost::init(Config {
    service_catalog_url: Some("https://catalog.example.com/dexcost-catalog.json".into()),
    ..Default::default()
}).unwrap();

Domain rate registry

For services not in the bundled catalog, register a per-request rate with register_domain_rate. Domain rates take precedence over catalog entries when both match:

use rust_decimal_macros::dec;
use dexcost::adapters::http::register_domain_rate;

register_domain_rate("api.example.com", dec!(0.005), "request");

After registration, any request to api.example.com intercepted by DexcostMiddleware (or a record_http_cost call) records an external_cost event with that cost. Use get_domain_rates() to inspect the registry and clear_domain_rates() to remove all registrations.

Manual recording

`record_cost`

Records a non-LLM cost event on a TrackedTask. Defaults to CostConfidence::Exact and PricingSource::Manual:

use rust_decimal_macros::dec;
use dexcost::core::models::EventType;

task.record_cost("google_maps", dec!(0.005), None, None).await?;

// With an explicit event type:
task.record_cost(
    "my_compute",
    dec!(0.01),
    None,
    Some(EventType::ComputeCost),
).await?;

Use record_cost_with to override CostConfidence, PricingSource, or pricing_version:

use dexcost::core::tracker::RecordCostOptions;
use dexcost::core::models::{CostConfidence, PricingSource};

task.record_cost_with(
    "pinecone",
    dec!(0.004),
    None,
    RecordCostOptions {
        cost_confidence: Some(CostConfidence::Estimated),
        pricing_source: Some(PricingSource::ServiceCatalog),
        pricing_version: Some("v-abc".into()),
        details: None,
    },
).await?;

`record_usage`

Looks up the service in the RateRegistry, multiplies the registered rate by units, and records an external_cost event. Returns an error when no rate is registered for the service:

// Register the rate once, typically at startup
{
    let reg = dexcost::rate_registry().unwrap();
    let mut reg = reg.lock().await;
    reg.register("maps.googleapis.com", "request", dec!(0.005));
}

// Per call:
task.record_usage("maps.googleapis.com", 3).await?; // records 3 × $0.005 = $0.015

`record_llm_call`

Manually records an llm_call event. When cost_usd is None or zero, the pricing engine computes cost from the bundled LiteLLM data:

use rust_decimal_macros::dec;

task.record_llm_call(
    "openai",
    "gpt-4o",
    1_000,       // input_tokens
    500,         // output_tokens
    None,        // cost_usd — auto-priced
    Some(200),   // cached_tokens
    Some(420),   // latency_ms
).await?;

Use record_llm_call_with to pass RecordLlmCallOptions for error_type, extra details, or cost confidence overrides:

use dexcost::core::tracker::RecordLlmCallOptions;

task.record_llm_call_with(
    "openai", "gpt-4o",
    1_000, 500, None, None, None,
    RecordLlmCallOptions {
        error_type: Some("rate_limit".into()),
        ..Default::default()
    },
).await?;

Tasks — `start_task` and nesting

start_task returns a TrackedTask. Pass TaskOptions to set attribution and call task.end(TaskStatus::Success) when done:

use dexcost::{start_task, flush, TaskOptions, TaskStatus};

let mut task = start_task("resolve_ticket", TaskOptions {
    customer_id: Some("acme-corp".into()),
    project_id: Some("support".into()),
    ..Default::default()
}).await?;

task.record_llm_call("openai", "gpt-4o", 1_000, 500, None, None, None)
    .await?;

task.end(TaskStatus::Success).await?;
flush().await?;

Nest tasks with scope. Any start_task call inside a scope future automatically sets parent_task_id to the parent task:

let mut parent = start_task("pipeline", TaskOptions::default()).await?;

parent.scope(async {
    // child.parent_task_id == parent.task_id
    let mut child = start_task("step_one", TaskOptions::default()).await?;
    child.end(TaskStatus::Success).await
}).await?;

parent.end(TaskStatus::Success).await?;

Retry tracking

Flag a retry explicitly with mark_retry. This creates a retry_marker event and increments task.retry_count and task.retry_cost_usd:

task.mark_retry("rate_limit", dec!(0.002)).await?;

mark_not_retry clears the retry flag. Pass None to target the most recent retry event, or Some(event_id) to target a specific one:

task.mark_not_retry(None).await?;            // clear most recent
task.mark_not_retry(Some("event-id")).await?; // clear specific event

Heuristic retry detection is opt-in via TaskOptions::heuristics. When HeuristicConfig is set, the RetryHeuristicEngine checks each record_llm_call against recent events in the same task and automatically sets is_retry = true when a prior call with the same model and a transient error type occurred within the sliding window:

use dexcost::{start_task, TaskOptions};
use dexcost::core::tracker::HeuristicConfig;

let mut task = start_task("resolve_ticket", TaskOptions {
    heuristics: Some(HeuristicConfig {
        window_seconds: 30.0,  // default
        threshold: 0.8,        // default
    }),
    ..Default::default()
}).await?;

Framework integrations

All framework integrations create a dexcost task per request, record the request latency as a trace link, and end the task with TaskStatus::Success for responses below 500 or TaskStatus::Failed for 5xx responses.

axum

The axum-middleware feature enables dexcost::middleware::axum::dexcost_middleware:

dexcost = { version = "0.1", features = ["axum-middleware"] }

use std::sync::Arc;
use tokio::sync::Mutex;
use axum::{Router, middleware};
use dexcost::middleware::axum::dexcost_middleware;
use dexcost::transport::buffer::EventBuffer;

let buffer = Arc::new(Mutex::new(EventBuffer::new().unwrap()));

let app = Router::new()
    .layer(middleware::from_fn(move |req, next| {
        dexcost_middleware(req, next, buffer.clone(), None)
    }));

tower

The tower-middleware feature enables dexcost::middleware::tower::DexcostLayer, compatible with any tower::Service-based framework (Hyper, Tonic, Axum's lower layers):

dexcost = { version = "0.1", features = ["tower-middleware"] }

use std::sync::Arc;
use tokio::sync::Mutex;
use tower::ServiceBuilder;
use dexcost::middleware::tower::DexcostLayer;
use dexcost::transport::buffer::EventBuffer;

let buffer = Arc::new(Mutex::new(EventBuffer::new().unwrap()));

let svc = ServiceBuilder::new()
    .layer(DexcostLayer::new(buffer.clone(), None))
    .service(my_inner_service);

actix-web

The actix-middleware feature enables dexcost::middleware::actix::DexcostMiddleware:

dexcost = { version = "0.1", features = ["actix-middleware"] }

use std::sync::Arc;
use tokio::sync::Mutex;
use actix_web::{App, HttpServer};
use dexcost::middleware::actix::DexcostMiddleware;
use dexcost::transport::buffer::EventBuffer;

#[actix_web::main]
async fn main() -> std::io::Result<()> {
    let buffer = Arc::new(Mutex::new(EventBuffer::new().unwrap()));
    HttpServer::new(move || {
        App::new().wrap(DexcostMiddleware::new(buffer.clone(), None))
    })
    .bind("127.0.0.1:8080")?
    .run()
    .await
}

Instrumentation & Integrations

On this page