Month: May 2019

Monitoring Nvidia GPUs via Telegraf

Monitoring Nvidia GPUs via Telegraf

The nvida-smi plugin for Telegraf basically gives you an overview of your GPU usage in the most current iteration in v1.10.4. This “guide” assumes you are using Windows as your host OS. Linux should be fairly easy to get going as long as you know where your nvidia-smi executable is located.

If you do not have Telegraf installed, check out my guides here.

Create a new conf file in telegraf.d folder.

notepad.exe C:\telegraf\telegraf.d\nvidiasmi.conf

Paste the following into the new file and save/close it.

# Pulls statistics from nvidia GPUs attached to the host
[[inputs.nvidia_smi]]
  ## Optional: path to nvidia-smi binary, defaults to $PATH via exec.LookPath
  bin_path = "C:\\Program Files\\NVIDIA Corporation\\NVSMI\\nvidia-smi.exe"

  ## Optional: timeout for GPU polling
  timeout = "5s"

Restart Telegraf.

net stop telegraf
net start telegraf

NOTE: With Windows you have to use an escape \ when setting the bin_path otherwise you’ll get errors when Telegraf queries nvidia-smi.exe.

Once you have verified Telegraf is reporting Nvidia stats you can start creating your panels in Grafana. Use nvidia-smi from your telegraf data source to build the panels.

Monitoring Hyper-V via Telegraf

Monitoring Hyper-V via Telegraf

Now the cool thing about Telegraf on Windows is that you can basically monitor any system service that reports to the Windows performance counters. So creating a Hyper-V dashboard is actually fairly easy.

You’ll first want to edit your telegraf.conf file and add the following configs:

[[inputs.win_perf_counters.object]]
    ObjectName = "Hyper-V Virtual Machine Health Summary"
    Instances = ["------"]
    Measurement = "hyperv_health"
    Counters = [
      "Health Ok",
      "Health Critical",
    ]
    
    [[inputs.win_perf_counters.object]]
    ObjectName = "Hyper-V Hypervisor"
    Instances = ["------"]
    Measurement = "hyperv_hypervisor"
    Counters = [
      "Logical Processors",
      "Partitions",
    ]

    [[inputs.win_perf_counters.object]]
    ObjectName = "Hyper-V Hypervisor Virtual Processor"
    Instances = ["*"]
    Measurement = "hyperv_processor"
    Counters = [
      "% Guest Run Time",
      "% Hypervisor Run Time",
      "% Idle Time",
      "% Total Run Time",
    ]
    
    [[inputs.win_perf_counters.object]]
    ObjectName = "Hyper-V Dynamic Memory VM"
    Instances = ["*"]
    Measurement = "hyperv_dynamic_memory"
    Counters = [
      "Current Pressure",
      "Guest Visible Physical Memory",
    ]

    [[inputs.win_perf_counters.object]]
    ObjectName = "Hyper-V VM Vid Partition"
    Instances = ["*"]
    Measurement = "hyperv_vid"
    Counters = [
      "Physical Pages Allocated",
    ]
    
    [[inputs.win_perf_counters.object]]
    ObjectName = "Hyper-V Virtual Switch"
    Instances = ["*"]
    Measurement = "hyperv_vswitch"
    Counters = [
      "Bytes Received/Sec",
      "Bytes Sent/Sec",
      "Packets Received/Sec",
      "Packets Sent/Sec",
    ]
    
    [[inputs.win_perf_counters.object]]
    ObjectName = "Hyper-V Virtual Network Adapter"
    Instances = ["*"]
    Measurement = "hyperv_vmnet"
    Counters = [
      "Bytes Received/Sec",
      "Bytes Sent/Sec",
      "Packets Received/Sec",
      "Packets Sent/Sec",
    ]
    
    [[inputs.win_perf_counters.object]]
    ObjectName = "Hyper-V Virtual IDE Controller"
    Instances = ["*"]
    Measurement = "hyperv_vmdisk"
    Counters = [
      "Read Bytes/Sec",
      "Write Bytes/Sec",
      "Read Sectors/Sec",
      "Write Sectors/Sec",
    ]
    
    [[inputs.win_perf_counters.object]]
    ObjectName = "Hyper-V Virtual Storage Device"
    Instances = ["*"]
    Measurement = "hyperv_storage"
    Counters = [
      "Write Operations/Sec",
      "Read Operations/Sec",
      "Read Bytes/Sec",
      "Write Bytes/Sec",
      "Latency",
      "Throughput",
    ]

Restart Telegraf with the new config file and import dashboard ID: 2618 into Grafana and set your data source to telegraf.

To see all Hyper-V counters you can check out this PowerShell counters export, here.