dexcost Documentation

Capture serverless compute, GPU, and browser-session costs alongside LLM and API spend, attributed to the same tasks.

Beyond LLM and HTTP costs, the Python SDK can capture the compute your agent burns — serverless invocations, GPU seconds, and headless-browser time — and attribute it to the same tasks and customers as everything else. These costs land as compute_cost, gpu_cost, and gpu_utilization_signal events.

How it works

Compute and GPU capture is opt-in. Unlike LLM and HTTP instrumentation, it is not started by dexcost.init(). You enable it by wrapping the entry point you want to measure — a serverless handler with one of the wrap_* decorators, or a browser session with track_browser().

Two rules apply to every wrapper on this page:

It only records inside an active task. If there is no dexcost.task() (or auto-session) active when the wrapped code runs, the wrapper is a transparent pass-through and records nothing.
Pricing is deferred. At capture time the event is stored with a zero placeholder cost; the dollar amount is computed at task finalize from the bundled compute/GPU pricing catalogs. This keeps the hot path cheap and lets pricing data refresh without re-instrumenting.

Costs incurred by a handler that raises are still recorded — the event is persisted before the exception is re-raised, because the compute was consumed regardless.

Serverless compute

Wrap your function's handler with the decorator for its platform. Each wrapper times the invocation (wall-clock), reads peak memory from the Linux cgroup, and emits one compute_cost event per invocation. All five are importable directly from dexcost:

Platform	Decorator
AWS Lambda	`dexcost.wrap_lambda_handler`
Google Cloud Run	`dexcost.wrap_cloud_run_handler`
Google Cloud Functions	`dexcost.wrap_cloud_functions_handler`
Azure Functions	`dexcost.wrap_azure_functions_handler`
Vercel Functions	`dexcost.wrap_vercel_handler`

import dexcost

dexcost.init(api_key="dx_live_...")

@dexcost.wrap_lambda_handler
def handler(event, context):
    dexcost.set_context(customer_id="acme-corp")
    with dexcost.task(task_type="process_order"):
        # LLM/API costs captured here are grouped with the compute cost
        ...
    return {"ok": True}

The wrappers read platform environment variables to enrich the event — for example AWS Lambda uses AWS_LAMBDA_FUNCTION_MEMORY_SIZE, AWS_LAMBDA_INITIALIZATION_TYPE (cold vs. warm start), and AWS_REGION; Vercel uses VERCEL_REGION; Azure Functions uses REGION_NAME. These are populated automatically by the platform at runtime.

Cloud Run billing model

Cloud Run is billed as request-based by default (recorded with estimated confidence). If your account is billed on the instance-based model, switch the math with a billing override at init():

dexcost.init(
    api_key="dx_live_...",
    compute_billing_overrides={"cloud_run": "instance"},
)

{"cloud_run": "instance"} is the only override recognized in this release.

GPU

Three decorators capture GPU usage for serverless GPU platforms:

Platform	Decorator
Modal	`dexcost.wrap_modal_handler`
RunPod	`dexcost.wrap_runpod_handler`
Replicate	`dexcost.wrap_replicate_handler`

Each one snapshots GPU state with NVML before the handler runs, integrates per-process SM (streaming-multiprocessor) time over the invocation, and emits one gpu_cost event plus one gpu_utilization_signal event per GPU device touched. GPU work is billed per active GPU-second.

import dexcost

dexcost.init(api_key="dx_live_...")

@dexcost.wrap_modal_handler
def run_inference(payload):
    with dexcost.task(task_type="image_generation"):
        ...  # your GPU workload
    return result

gpu_cost carries the priced GPU-seconds. gpu_utilization_signal is observability only — it records SM utilization, memory utilization, and peak VRAM per device, and always has cost_usd = 0, so it is never summed into cost totals.

GPU requirements

GPU capture reads NVIDIA's management library through pynvml, which is an optional, separately installed dependency:

pip install nvidia-ml-py

It then requires, at runtime, a loaded NVIDIA driver and at least one visible GPU. On any host without all three — no pynvml, no driver, or zero devices — the GPU wrappers record nothing (they do not error). GPU capture is NVIDIA-only.

Browser sessions

For agents that drive a headless browser (for example Playwright), track_browser() records the wall-clock time a page is open as a compute_cost event. It is an async context manager:

from dexcost import CostTracker
from dexcost.adapters.browser import track_browser

tracker = CostTracker()

async def scrape(page):
    async with tracker.task(task_type="scrape_listing"):
        async with track_browser(page, rate_per_minute="0.02") as p:
            await p.goto("https://example.com")
            ...

track_browser() requires an active task in context. Use tracker.task() (the CostTracker method, which supports both with and async with) rather than the module-level dexcost.task() (which is sync-only via @contextmanager).

rate_per_minute is the per-minute USD rate (default 0.01) and accepts a Decimal or string. The recorded cost is elapsed_minutes × rate_per_minute, with service_name="playwright_browser". page is duck-typed — any object with a .url attribute works, so Playwright is not a required dependency. Like the other wrappers, it is a no-op if no task is active.

Standalone Lambda cost estimator

If you want to estimate an AWS Lambda invocation's cost without instrumenting a live handler, lambda_cost() is a pure function that computes it from duration, memory, and region:

from dexcost.adapters.aws_lambda import lambda_cost, get_supported_regions

result = lambda_cost(duration_ms=1500, memory_mb=512, region="us-east-1")
print(result["cost_usd"])      # Decimal, total invocation cost
print(result["details"])       # region, duration_ms, memory_mb, gb_seconds, duration_cost_usd, request_cost_usd, rate_per_gb_second

get_supported_regions()        # list of region codes the bundled rate table covers

It performs no I/O and has no side effects (it reads a bundled rate table). It raises ValueError for an unknown region, a negative duration_ms, or a non-positive memory_mb. This estimator uses its own bundled Lambda rate table and is independent of the wrap_lambda_handler capture path.

Requirements & current limitations

Opt-in. Nothing on this page is enabled by init() alone — you must apply a wrapper and run it inside a dexcost.task().
Serverless + browser only, today. The serverless wrap_* handlers and track_browser() are the capture paths wired in this release. Pricing for always-on runtimes (EC2, Fargate, GCE, Azure VMs, Kubernetes pods) and their GPU equivalents exists in the engine, but automatic capture for those long-running runtimes is not yet enabled. The k8s_node_aware flag on init() is reserved for that work and currently has no effect.
Compute memory peak is Linux cgroup v2. Peak-memory readings use cgroup v2 (memory.peak needs Linux kernel ≥ 5.19). On other platforms the duration is still captured; memory degrades gracefully to unavailable.
GPU is NVIDIA + NVML. Requires nvidia-ml-py, a loaded NVIDIA driver, and at least one device.

Next steps

Configuration — compute_billing_overrides, k8s_node_aware, and the rest of dexcost.init().
API Reference — event types and the full public API.

Compute & GPU