dexcost
TypeScript

Compute & GPU

Capture serverless compute, GPU, and browser-session costs alongside LLM and API spend, attributed to the same tasks.

Beyond LLM and HTTP costs, the TypeScript SDK can capture the compute your agent burns — serverless invocations, GPU seconds, and headless-browser time — and attribute it to the same tasks and customers as everything else. These costs land as compute_cost, gpu_cost, and gpu_utilization_signal events. Every symbol on this page is exported from the package root, @dexcost/sdk.

How it works

Compute and GPU capture is opt-in. Unlike LLM and HTTP instrumentation, it is not started by init(). You enable it by wrapping the entry point you want to measure with one of the wrap*Handler functions (or, for browser work, trackBrowser).

Two rules apply to every wrapper here:

  1. It only records inside an active task. Each wrapper checks getCurrentTask() — if no task is active when the wrapped handler runs and returns, the wrapper is a transparent pass-through and records nothing.
  2. Pricing is deferred. At capture time the event is stored with a zero placeholder cost; the dollar amount is computed when the task ends, from the bundled compute/GPU pricing catalogs.

Costs incurred by a handler that throws are still recorded — the event is persisted in a finally block before the error propagates.

Serverless compute

Each wrapper times the invocation, reads peak memory, and emits one compute_cost event per invocation:

PlatformWrapper
AWS LambdawrapLambdaHandler
Google Cloud RunwrapCloudRunHandler
Google Cloud FunctionswrapCloudFunctionsHandler
Azure FunctionswrapAzureFunctionsHandler
Vercel FunctionswrapVercelHandler

Because the wrapper records when the handler returns, the task must stay active across the whole invocation. The reliable composition is to open the task with track() on the outside and call the wrapped handler inside it:

import { init, track, wrapLambdaHandler } from '@dexcost/sdk';

init({ apiKey: 'dx_live_...' });

// 1. Wrap your business logic.
const run = wrapLambdaHandler(async (event, context) => {
  // LLM/API costs captured in here are grouped with the compute cost.
  return { ok: true };
});

// 2. Lambda entry point: open a task, then invoke the wrapped handler inside it.
export const handler = (event: unknown, context: unknown) =>
  track({ taskType: 'process_order', customerId: 'acme-corp' }, () => run(event, context));

If you call the wrapped handler without an enclosing track() (or other active task), nothing is recorded — the wrapper passes straight through.

wrapLambdaHandler wraps a (event, context) handler; the other four wrap a (...args) handler. Each reads its platform's environment variables to enrich the event — for example AWS Lambda reads AWS_LAMBDA_FUNCTION_MEMORY_SIZE, AWS_LAMBDA_INITIALIZATION_TYPE, and AWS_REGION; Vercel reads VERCEL_REGION; Azure Functions reads REGION_NAME. These are populated by the platform at runtime.

Cloud Run billing model

Cloud Run is billed as request-based by default (recorded with estimated confidence). If your account is billed on the instance-based model, switch the math with a billing override at init():

init({
  apiKey: 'dx_live_...',
  computeBillingOverrides: { cloud_run: 'instance' },
});

{ cloud_run: 'instance' } is the only override recognized in this release.

GPU

Three wrappers capture GPU usage for serverless GPU platforms:

PlatformWrapper
ModalwrapModalHandler
RunPodwrapRunpodHandler
ReplicatewrapReplicateHandler

Each integrates per-process SM (streaming-multiprocessor) time over the invocation and emits one gpu_cost event plus one gpu_utilization_signal event per GPU device touched, billed per active GPU-second. Compose them the same way — wrapped handler called inside a track():

import { init, track, wrapModalHandler } from '@dexcost/sdk';

init({ apiKey: 'dx_live_...' });

const run = wrapModalHandler(async (payload: unknown) => {
  // your GPU workload
  return result;
});

export const handler = (payload: unknown) =>
  track({ taskType: 'image_generation', customerId: 'acme-corp' }, () => run(payload));

gpu_cost carries the priced GPU-seconds. gpu_utilization_signal is observability only — SM utilization, memory utilization, and peak VRAM per device — and always has a cost of 0, so it is never summed into cost totals.

GPU requirements

There is no maintained native NVML binding for Node, so GPU capture reads NVIDIA's tooling by shelling out to the nvidia-smi CLI. That means:

  • No npm package to install. GPU capture needs nvidia-smi on the host's PATH — which ships with the NVIDIA driver — and at least one NVIDIA GPU.
  • If nvidia-smi is missing or reports zero devices, the GPU wrappers record nothing (they do not throw).
  • GPU capture is NVIDIA-only and Node-only (it short-circuits in browser environments).

Browser sessions

For agents that drive a headless browser, trackBrowser(page, fn, options?) records the wall-clock time of a browser block as a compute_cost event (default rate 0.01/minute). It is documented in full, with a Playwright example, under Instrumentation → Playwright browser adapter.

Standalone Lambda cost estimator

To estimate an AWS Lambda invocation's cost without instrumenting a live handler, lambdaCost() is a pure function:

import { lambdaCost, getSupportedRegions } from '@dexcost/sdk';

const result = lambdaCost(1500, 512, 'us-east-1');  // durationMs, memoryMb, region
console.log(result.costUsd);   // number — total invocation cost
console.log(result.details);   // { gbSeconds, durationCostUsd, requestCostUsd, ratePerGbSecond, ... }

getSupportedRegions();         // string[] of region codes the bundled rate table covers

It performs no I/O and throws for an unknown region, a negative durationMs, or a non-positive memoryMb. This estimator uses its own bundled Lambda rate table and is independent of the wrapLambdaHandler capture path.

Requirements & current limitations

  • Opt-in. Nothing here is enabled by init() alone — you must apply a wrapper and run it inside a track().
  • Serverless + browser only, today. The serverless wrap*Handler functions and trackBrowser() are the capture paths wired in this release. Pricing for always-on runtimes (EC2, Fargate, GCE, Azure VMs, Kubernetes pods) and their GPU equivalents exists in the engine, but automatic capture for those long-running runtimes is not yet enabled. The k8sNodeAware option on init() is reserved for that work and currently has no effect.
  • GPU needs nvidia-smi. Requires the NVIDIA driver (which provides nvidia-smi) and at least one device. Per-probe overhead (~50 ms) makes it suitable for end-of-task finalization, which is exactly when it runs.

Next steps

  • ConfigurationcomputeBillingOverrides, k8sNodeAware, and the rest of init().
  • API Reference — event types and the full public API.

On this page