GPU Monitoring
tmam can collect real-time GPU metrics from NVIDIA and AMD GPUs and display them in the dashboard's GPU Analytics view.
Enabling GPU Monitoring
Pass collect_gpu_stats=True to init():
from tmam import init
init(
url="http://localhost:5050/api/sdk",
public_key="pk-tmam-xxxxxxxx",
secrect_key="sk-tmam-xxxxxxxx",
application_name="my-gpu-app",
collect_gpu_stats=True,
)
tmam will auto-detect whether an NVIDIA or AMD GPU is present.
GPU Vendor Requirements
NVIDIA GPUs
Install the pynvml library (NVIDIA Management Library Python bindings):
pip install pynvml
Requires NVIDIA drivers to be installed on the host. Works with any CUDA-capable GPU.
AMD GPUs
Install the amdsmi library:
pip install amdsmi
Requires AMD ROCm drivers to be installed on the host.
Metrics Collected
For each GPU in the system, tmam collects the following metrics, tagged with the GPU index, UUID, and name:
| Metric Name | OTel Name | Description |
|---|---|---|
| Utilization | gpu.utilization | Core utilization % |
| Encoder Utilization | gpu.enc.utilization | Video encoder utilization % |
| Decoder Utilization | gpu.dec.utilization | Video decoder utilization % |
| Temperature | gpu.temperature | Temperature in °C |
| Fan Speed | gpu.fan_speed | Fan speed (NVIDIA only) |
| Memory Available | gpu.memory.available | Available VRAM in MB |
| Memory Total | gpu.memory.total | Total VRAM in MB |
| Memory Used | gpu.memory.used | Used VRAM in MB |
| Memory Free | gpu.memory.free | Free VRAM in MB |
| Power Draw | gpu.power.draw | Current power draw in W |
| Power Limit | gpu.power.limit | Power limit in W |
All metrics are tagged with:
gpu.index— GPU index (0, 1, 2...)gpu.uuid— GPU UUIDgpu.name— GPU model nameservice.name— yourapplication_namedeployment.environment— yourenvironment
Viewing GPU Metrics
In the dashboard, navigate to Analytics → GPU to see:
- GPU utilization over time
- Memory usage (used vs. total)
- Temperature and power draw
- Per-GPU breakdowns for multi-GPU systems
No GPU Detected
If collect_gpu_stats=True but no supported GPU is found, tmam logs:
Tmam GPU Instrumentation Error: No supported GPUs found.
If this is a non-GPU host, set `collect_gpu_stats=False` to disable GPU stats.
This does not affect other tracing or metrics collection — it is non-fatal.
Example: LLM + GPU Monitoring
from tmam import init
from transformers import pipeline
init(
url="http://localhost:5050/api/sdk",
public_key="pk-tmam-xxxxxxxx",
secrect_key="sk-tmam-xxxxxxxx",
application_name="local-llm",
environment="dev",
collect_gpu_stats=True, # monitor GPU while running inference
)
# Transformers calls are auto-instrumented
generator = pipeline("text-generation", model="gpt2", device=0)
output = generator("The future of AI is", max_new_tokens=50)
print(output[0]["generated_text"])
While inference runs, tmam records both the LLM span (tokens, latency) and GPU metrics (VRAM usage, utilization) — correlatable by timestamp in the dashboard.