GPU Monitoring

tmam can collect real-time GPU metrics from NVIDIA and AMD GPUs and display them in the dashboard's GPU Analytics view.

Enabling GPU Monitoring

Pass collect_gpu_stats=True to init():

from tmam import init

init(
    url="http://localhost:5050/api/sdk",
    public_key="pk-tmam-xxxxxxxx",
    secrect_key="sk-tmam-xxxxxxxx",
    application_name="my-gpu-app",
    collect_gpu_stats=True,
)

tmam will auto-detect whether an NVIDIA or AMD GPU is present.

GPU Vendor Requirements

NVIDIA GPUs

Install the pynvml library (NVIDIA Management Library Python bindings):

pip install pynvml

Requires NVIDIA drivers to be installed on the host. Works with any CUDA-capable GPU.

AMD GPUs

Install the amdsmi library:

pip install amdsmi

Requires AMD ROCm drivers to be installed on the host.

Metrics Collected

For each GPU in the system, tmam collects the following metrics, tagged with the GPU index, UUID, and name:

Metric Name	OTel Name	Description
Utilization	`gpu.utilization`	Core utilization %
Encoder Utilization	`gpu.enc.utilization`	Video encoder utilization %
Decoder Utilization	`gpu.dec.utilization`	Video decoder utilization %
Temperature	`gpu.temperature`	Temperature in °C
Fan Speed	`gpu.fan_speed`	Fan speed (NVIDIA only)
Memory Available	`gpu.memory.available`	Available VRAM in MB
Memory Total	`gpu.memory.total`	Total VRAM in MB
Memory Used	`gpu.memory.used`	Used VRAM in MB
Memory Free	`gpu.memory.free`	Free VRAM in MB
Power Draw	`gpu.power.draw`	Current power draw in W
Power Limit	`gpu.power.limit`	Power limit in W

All metrics are tagged with:

gpu.index — GPU index (0, 1, 2...)
gpu.uuid — GPU UUID
gpu.name — GPU model name
service.name — your application_name
deployment.environment — your environment

Viewing GPU Metrics

In the dashboard, navigate to Analytics → GPU to see:

GPU utilization over time
Memory usage (used vs. total)
Temperature and power draw
Per-GPU breakdowns for multi-GPU systems

No GPU Detected

If collect_gpu_stats=True but no supported GPU is found, tmam logs:

Tmam GPU Instrumentation Error: No supported GPUs found.
If this is a non-GPU host, set `collect_gpu_stats=False` to disable GPU stats.

This does not affect other tracing or metrics collection — it is non-fatal.

Example: LLM + GPU Monitoring

from tmam import init
from transformers import pipeline

init(
    url="http://localhost:5050/api/sdk",
    public_key="pk-tmam-xxxxxxxx",
    secrect_key="sk-tmam-xxxxxxxx",
    application_name="local-llm",
    environment="dev",
    collect_gpu_stats=True,  # monitor GPU while running inference
)

# Transformers calls are auto-instrumented
generator = pipeline("text-generation", model="gpt2", device=0)
output = generator("The future of AI is", max_new_tokens=50)
print(output[0]["generated_text"])

While inference runs, tmam records both the LLM span (tokens, latency) and GPU metrics (VRAM usage, utilization) — correlatable by timestamp in the dashboard.

PreviousMetrics

NextCost Tracing

Quickstart

Features

integration

API