Compute & GPU
Capture serverless compute, GPU, and browser-session costs alongside LLM and API spend, attributed to the same tasks.
Beyond LLM and HTTP costs, the Python SDK can capture the compute your agent burns — serverless invocations, GPU seconds, and headless-browser time — and attribute it to the same tasks and customers as everything else. These costs land as compute_cost, gpu_cost, and gpu_utilization_signal events.
How it works
Compute and GPU capture is opt-in. Unlike LLM and HTTP instrumentation, it is not started by dexcost.init(). You enable it by wrapping the entry point you want to measure — a serverless handler with one of the wrap_* decorators, or a browser session with track_browser().
Two rules apply to every wrapper on this page:
- It only records inside an active task. If there is no
dexcost.task()(or auto-session) active when the wrapped code runs, the wrapper is a transparent pass-through and records nothing. - Pricing is deferred. At capture time the event is stored with a zero placeholder cost; the dollar amount is computed at task finalize from the bundled compute/GPU pricing catalogs. This keeps the hot path cheap and lets pricing data refresh without re-instrumenting.
Costs incurred by a handler that raises are still recorded — the event is persisted before the exception is re-raised, because the compute was consumed regardless.
Serverless compute
Wrap your function's handler with the decorator for its platform. Each wrapper times the invocation (wall-clock), reads peak memory from the Linux cgroup, and emits one compute_cost event per invocation. All five are importable directly from dexcost:
| Platform | Decorator |
|---|---|
| AWS Lambda | dexcost.wrap_lambda_handler |
| Google Cloud Run | dexcost.wrap_cloud_run_handler |
| Google Cloud Functions | dexcost.wrap_cloud_functions_handler |
| Azure Functions | dexcost.wrap_azure_functions_handler |
| Vercel Functions | dexcost.wrap_vercel_handler |
import dexcost
dexcost.init(api_key="dx_live_...")
@dexcost.wrap_lambda_handler
def handler(event, context):
dexcost.set_context(customer_id="acme-corp")
with dexcost.task(task_type="process_order"):
# LLM/API costs captured here are grouped with the compute cost
...
return {"ok": True}The wrappers read platform environment variables to enrich the event — for example AWS Lambda uses AWS_LAMBDA_FUNCTION_MEMORY_SIZE, AWS_LAMBDA_INITIALIZATION_TYPE (cold vs. warm start), and AWS_REGION; Vercel uses VERCEL_REGION; Azure Functions uses REGION_NAME. These are populated automatically by the platform at runtime.
Cloud Run billing model
Cloud Run is billed as request-based by default (recorded with estimated confidence). If your account is billed on the instance-based model, switch the math with a billing override at init():
dexcost.init(
api_key="dx_live_...",
compute_billing_overrides={"cloud_run": "instance"},
){"cloud_run": "instance"} is the only override recognized in this release.
GPU
Three decorators capture GPU usage for serverless GPU platforms:
| Platform | Decorator |
|---|---|
| Modal | dexcost.wrap_modal_handler |
| RunPod | dexcost.wrap_runpod_handler |
| Replicate | dexcost.wrap_replicate_handler |
Each one snapshots GPU state with NVML before the handler runs, integrates per-process SM (streaming-multiprocessor) time over the invocation, and emits one gpu_cost event plus one gpu_utilization_signal event per GPU device touched. GPU work is billed per active GPU-second.
import dexcost
dexcost.init(api_key="dx_live_...")
@dexcost.wrap_modal_handler
def run_inference(payload):
with dexcost.task(task_type="image_generation"):
... # your GPU workload
return resultgpu_cost carries the priced GPU-seconds. gpu_utilization_signal is observability only — it records SM utilization, memory utilization, and peak VRAM per device, and always has cost_usd = 0, so it is never summed into cost totals.
GPU requirements
GPU capture reads NVIDIA's management library through pynvml, which is an optional, separately installed dependency:
pip install nvidia-ml-pyIt then requires, at runtime, a loaded NVIDIA driver and at least one visible GPU. On any host without all three — no pynvml, no driver, or zero devices — the GPU wrappers record nothing (they do not error). GPU capture is NVIDIA-only.
Browser sessions
For agents that drive a headless browser (for example Playwright), track_browser() records the wall-clock time a page is open as a compute_cost event. It is an async context manager:
from dexcost.adapters.browser import track_browser
async with dexcost.task(task_type="scrape_listing"):
async with track_browser(page, rate_per_minute="0.02") as p:
await p.goto("https://example.com")
...rate_per_minute is the per-minute USD rate (default 0.01) and accepts a Decimal or string. The recorded cost is elapsed_minutes × rate_per_minute, with service_name="playwright_browser". page is duck-typed — any object with a .url attribute works, so Playwright is not a required dependency. Like the other wrappers, it is a no-op if no task is active.
Standalone Lambda cost estimator
If you want to estimate an AWS Lambda invocation's cost without instrumenting a live handler, lambda_cost() is a pure function that computes it from duration, memory, and region:
from dexcost.adapters.aws_lambda import lambda_cost, get_supported_regions
result = lambda_cost(duration_ms=1500, memory_mb=512, region="us-east-1")
print(result["cost_usd"]) # Decimal, total invocation cost
print(result["details"]) # gb_seconds, duration_cost_usd, request_cost_usd, rate_per_gb_second, ...
get_supported_regions() # list of region codes the bundled rate table coversIt performs no I/O and has no side effects (it reads a bundled rate table). It raises ValueError for an unknown region, a negative duration_ms, or a non-positive memory_mb. This estimator uses its own bundled Lambda rate table and is independent of the wrap_lambda_handler capture path.
Requirements & current limitations
- Opt-in. Nothing on this page is enabled by
init()alone — you must apply a wrapper and run it inside adexcost.task(). - Serverless + browser only, today. The serverless
wrap_*handlers andtrack_browser()are the capture paths wired in this release. Pricing for always-on runtimes (EC2, Fargate, GCE, Azure VMs, Kubernetes pods) and their GPU equivalents exists in the engine, but automatic capture for those long-running runtimes is not yet enabled. Thek8s_node_awareflag oninit()is reserved for that work and currently has no effect. - Compute memory peak is Linux cgroup v2. Peak-memory readings use cgroup v2 (
memory.peakneeds Linux kernel ≥ 5.19). On other platforms the duration is still captured; memory degrades gracefully to unavailable. - GPU is NVIDIA + NVML. Requires
nvidia-ml-py, a loaded NVIDIA driver, and at least one device.
Next steps
- Configuration —
compute_billing_overrides,k8s_node_aware, and the rest ofdexcost.init(). - API Reference — event types and the full public API.