Skip to content

Continuous Monitoring

About Continuous Monitoring

This is the latest, most complete version of the scripts for Detailed system and process monitoring.

To run in slow systems such as Raspberry Pi computers, including any one from the Zero W (v1) to the 4 model B, the single-thread script (conmon-st) is preferred, with a sampling delay based on each Raspberry CPU to avoid interfering with other processes. To run in faster systems such as servers and desktop PCs, the multi-thread script (conmon-mt) is preferred, with much smaller sampling delays to capture fast changes in resource utilization more accurately.

In addition to the main conmon process, there are a few scripts that usually only make sense to run on a single host in the network:

  • conmon-speedtest reports the download and upload speeds (Mbps) and ping (ms) as reported by sivel/speedtest-cli
  • conmon-mystrom reports the telemetry obtained via the Get report method in the myStrom REST API from one or more smart plug/switch devices that report energy usage.
  • conmon-tapo.py reports temperature, humidity, power consumption and other monitoring data from TAPO devices.
  • tapo.yaml is a minimal configuration file for conmon-tapo.py.

Kubernetes Setup

This setup has been migrated to run on a single-node Kubernetes cluster and the scripts have been updated to support dual-targeting InfluxDB over HTTP without auth and/or HTTPS with Basic Auth.

Install InfluxDB

Install InfluxDB OSS or simply

# apt install influxdb influxdb-client -y

Set it up and test test writing data.

Create a monitoring database (name can be different, then update the DBNAME variable in the conmon scripts).

Install Conmon

The monitoring script can be installed in any system to report metrics back to the InfluxDB server. A few common tools are required which are not always installed by default:

# apt install -y curl jq iotop-c lm-sensors 

To gather metrics from GPUs, install also

  • intel-gpu-tools for Intel GPUs.
  • The latest nvidia-utils-xxx for NVidia GPUs (check with apt-cache search nvidia-smi).

If the target InfluxDB server requires HTTP authenticatoin, copy or create the credentials into /etc/conmon/influxdb-auth(and chmod 400 it).

Then choose a version of the conmon script and install it as /usr/local/bin/conmon and run it as a service by creating /etc/systemd/system/conmon.service as follows:

[Unit]
Description=Continuous Monitoring
After=influxd.service
Wants=influxd.service

[Service]
ExecStart=/usr/local/bin/conmon
Restart=on-failure
StandardOutput=null

[Install]
WantedBy=multi-user.target

Then enable and start the services in systemd:

# systemctl enable conmon.service
# systemctl daemon-reload
# systemctl start conmon.service
# systemctl status conmon.service

Once all this is working, the monitoring script can be updated with deploy-to-pcs or deploy-to-rpis.

After a minute or so, there should be enough metrics already in InfluxDB to create a dashboard in Grafana to display them. A good starting point can be cloning an existing dashboard from a similar system, then tweaking the tag::host filter in all queries and some of the Max values.

The future of Flux

Flux is going into maintenance mode. You can continue using it as you currently are without any changes to your code.

Flux is going into maintenance mode and will not be supported in InfluxDB 3.0. This was a decision based on the broad demand for SQL and the continued growth and adoption of InfluxQL. We are continuing to support Flux for users in 1.x and 2.x so you can continue using it with no changes to your code. If you are interested in transitioning to InfluxDB 3.0 and want to future-proof your code, we suggest using InfluxQL.

For information about the future of Flux, see the following:

Install Grafana

To visualize monitoring in this setup Grafana is used.

Follow the steps to install Grafana on Ubuntu, start the server with systemd and reset its admin password:

# grafana-cli admin reset-admin-password \
  PLEASE_CHOOSE_A_SENSIBLE_PASSWORD
INFO[03-20|15:02:11] Connecting to DB                         logger=sqlstore dbtype=sqlite3
INFO[03-20|15:02:11] Starting DB migrations                   logger=migrator
Admin password changed successfully ✔

Add your InfluxDB data source to Grafana, create a new Dashboard and Add > Visualization for each measurement.

Finally, enable anonymous authentication. by tweaking /etc/grafana/grafana.ini as follows:

#################################### Anonymous Auth ######################
[auth.anonymous]
# enable anonymous access
enabled = true

# specify organization name that should be used for unauthenticated users
org_name = Main Org.

# specify role for unauthenticated users
org_role = Viewer

# systemctl restart grafana-server.service

Install Conmon

Single-thread

deploy-to-rpis

In environments with multiple Raspberry Pi computers, it may be useful to use this script (deploy-to-rpis) to deploy the latest version of the script to all computers at once.

deploy-to-rpis
#!/bin/bash
#
# Deplay conmon-st to Raspberry Pi hosts.

for host in alfred pi-f1 pi-z1 pi-z2 pi3a; do
  if nc 2>&1 -zv ${host} 22 | grep -q succeeded; then
    echo "Deploying to ${host} ..."
    scp 2>/dev/null \
      -qr \
      ../conmon \
      pi@${host}:src/
    ssh 2>/dev/null \
      pi@${host} \
      "sudo cp /home/pi/src/conmon/conmon-st /usr/local/bin/conmon"
    ssh 2>/dev/null \
      pi@${host} \
      "sudo systemctl restart conmon.service"
    etc_influxdb_auth=/etc/conmon/influxdb-auth
    for src_influxdb_auth in /etc/conmon/influxdb-auth "${HOME}/.conmon-influxdb-auth"; do
      if [ -f $src_influxdb_auth ]; then
        auth=$(cat $src_influxdb_auth 2>/dev/null)
        ssh 2>/dev/null \
          pi@${host} \
          "sudo mkdir -p $(dirname $etc_influxdb_auth)"
        ssh 2>&1 >/dev/null \
          pi@${host} \
          "echo '${auth}' | sudo tee $etc_influxdb_auth"
        ssh 2>/dev/null \
          pi@${host} \
          "sudo chmod 400 $etc_influxdb_auth"
      fi
    done
  fi
done

conmon-st

conmon-st
  1
  2
  3
  4
  5
  6
  7
  8
  9
 10
 11
 12
 13
 14
 15
 16
 17
 18
 19
 20
 21
 22
 23
 24
 25
 26
 27
 28
 29
 30
 31
 32
 33
 34
 35
 36
 37
 38
 39
 40
 41
 42
 43
 44
 45
 46
 47
 48
 49
 50
 51
 52
 53
 54
 55
 56
 57
 58
 59
 60
 61
 62
 63
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
#!/bin/bash
#
# Export system monitoring metrics to influxdb.

# InfluxDB target.
DBNAME=monitoring
TARGET_HTTP='' # Leave empty to skip.
TARGET_HTTPS='http://octavo:30086'

# Default delay between each POST request to InfluxDB.
DELAY_POST=4

# Data file for batch POST.
DDIR="/dev/shm/$$"
DATA="${DDIR}/DATA.txt"
mkdir -p "${DDIR}"

host=$(hostname)

timestamp_ns() {
  date +'%s%N'
}

store_line() {
  # Write a line of data to the temporary in-memory file.
  # Exit immediately if this fails.
  echo $1 >>"${DATA}" || exit 1
}

report_top_per_process() {
  # Per-process CPU(%) & MEM(bytes) usage.
  # Re-use the ptop file created by report_top().
  ts=$1
  ptop=$2
  awk '{print $4}' "${ptop}" | sort -u | while read cmd; do
    user=$(grep " ${cmd}\$" ${ptop} | cut -f1 -d' ' | sort -u | head -1)
    # CPU per proccess, only when > 0
    cpu=$(grep " ${cmd}\$" ${ptop} | cut -f2 -d' ' | grep -v '^0\.0$' | tr '\n' '+' | sed 's/+$/\n/' | bc -ql)
    if [[ ! -z "${cpu}" ]]; then
      store_line "top_cpu,host=${host},user=${user},command=${cmd} value=${cpu} ${ts}"
    fi
    # Memory per proccess, only when > 1%
    mem=$(grep " ${cmd}\$" ${ptop} | cut -f3 -d' ' | grep -v '^0\.0$' | tr '\n' '+' | sed 's/+$/\n/' | bc -ql | sed 's/^\./0./')
    # Multiply %mem by total system RAM.
    tot=$(free -b | grep 'Mem:' | awk '{print $2}')
    ram=$(bc -ql <<<"${tot} * ${mem} / 100")
    if [[ ! -z "${ram}" ]]; then
      store_line "top_mem,host=${host},user=${user},command=${cmd} value=${ram} ${ts}"
    fi
  done
  rm -f "${ptop}"
}

report_top() {
  # Stats from top: CPU (overall and per process) and RAM (per process).
  # Depends on: top, free.
  top_cmd=$(command -v top)
  top="${DDIR}/top"
  ts=$(timestamp_ns)
  ${top_cmd} -b -c -n 1 -w 512 |
    grep -vE ' 0\..   0\..|^top|^Tasks|^%|^MiB|^$|[[:blank:]]*PID USER' |
    awk '{print $2,$9,$10,$12,$13,$14,$15,$16,$17,$18,$19,$20}' |
    tr '\\' '/' |
    sed 's/\.minecraft\/bin\/[0-9a-f]\+.*/minecraft/' |
    sed 's/[C-Z]:\/.*\/\([a-zA-Z0-9 _-]\+\.[a-z][a-z][a-z]\).*/\1/' |
    sed 's/\/[^ ]*\///' | sed 's/\(bash\|sh\|python\|python3\) .*\///' |
    tr -d '[' |
    tr -d ']' |
    awk '{print $1,$2,$3,$4}' >"${top}"
  # Total CPU(%) usage.
  cpu_load=$(awk '{print $2}' ${top} | grep -v '^0\.0$' | tr '\n' '+' | sed 's/+$/\n/' | bc -ql)
  store_line "top,host=${host} value=${cpu_load} ${ts}"
  # Launch the slower per-process metrics in the background.
  ptop="${top}.$RANDOM"
  mv "${top}" "${ptop}"
  report_top_per_process "${ts}" "${ptop}" &
}

report_vcgencmd_clock() {
  # Raspberry Pi CPU clock frequency.
  # Depends on: vcgencmd.
  vcgencmd=$(command -v vcgencmd)
  if [ -z "${vcgencmd}" ] || [ ! -f "${vcgencmd}" ]; then
    return
  fi
  ts=$(timestamp_ns)
  cpu_clock=$(echo $(echo "scale=2; $(vcgencmd measure_clock arm | cut -d '=' -f 2) / 1000000" | bc))
  store_line "vcgencmd,metric=clock,host=${host} value=${cpu_clock} ${ts}"
}

report_vcgencmd_temp() {
  # Raspberry Pi CPU temperature.
  # Depends on: vcgencmd.
  vcgencmd=$(command -v vcgencmd)
  if [ -z "${vcgencmd}" ] || [ ! -f "${vcgencmd}" ]; then
    return
  fi
  ts=$(timestamp_ns)
  cpu_temp=$(${vcgencmd} measure_temp | cut -f2 -d= | cut -f1 -d"'")
  store_line "vcgencmd,metric=temp,host=${host} value=${cpu_temp} ${ts}"
  if grep -q 'Pi 4' /proc/device-tree/model; then
    ts=$(timestamp_ns)
    pmic_temp=$(${vcgencmd} measure_temp pmic | cut -f2 -d= | cut -f1 -d"'")
    store_line "vcgencmd,metric=temp_pmic,host=${host} value=${pmic_temp} ${ts}"
  fi
}

report_sensors() {
  # CPU, SSD, NVMe temperatures and other sensors (if available).
  # Depends on: jq, sensors.
  jq=$(command -v jq)
  if [ -z "${jq}" ] || [ ! -f "${jq}" ]; then
    return
  fi
  sensors=$(command -v sensors)
  if [ -z "${sensors}" ] || [ ! -f "${sensors}" ]; then
    return
  fi
  sensors_json="${DDIR}/sensors"
  ts=$(timestamp_ns)
  "${sensors}" -j >"${sensors_json}"
  $jq 'keys' "${sensors_json}" | grep '^  "' | cut -f2 -d'"' | while read adapter; do
    echo "adapter: $adapter"
    $jq ".\"${adapter}\"" "${sensors_json}" | $jq 'keys' | grep '^  "' | grep -v '"Adapter"' | cut -f2 -d'"' | while read name; do
      key=$($jq ".\"${adapter}\".\"${name}\"" "${sensors_json}" | $jq 'keys' | grep '^  "' | grep '_input"' | cut -f2 -d'"')
      value=$($jq ".\"${adapter}\".\"${name}\".\"${key}\"" "${sensors_json}")
      store_line "sensors,host=${host},adapter=${adapter},name=${name/ /_} value=${value} ${ts}"
    done
  done
}

report_intel_gpu_top() {
  # Intel GPU.
  # Depends on: intel_gpu_top (as root).
  intel_gpu_top=$(command -v intel_gpu_top)
  if [ -z $intel_gpu_top ] || [ ! -f $intel_gpu_top ]; then
    return
  fi
  ts=$(timestamp_ns)
  # TODO: find out why this works only when running in a console.
  timeout 1s sudo ${intel_gpu_top} -c | head -2 >${intel_gpu_csv}
  for N in $(seq 18); do
    metric=$(awk -v cn=$N -F',' '{print $cn}' ${intel_gpu_csv} | head -1)
    value=$(awk -v cn=$N -F',' '{print $cn}' ${intel_gpu_csv} | tail -1)
    if echo "$value" | egrep -q '^0\.0+$'; then continue; fi
    echo "intel_gpu,host=${host},metric=${metric/ /_} value=${value} ${ts}"
    store_line "intel_gpu,host=${host},metric=${metric/ /_} value=${value} ${ts}"
  done
}

report_nvidia_smi() {
  # NVidia GPU.
  # Depends on: nvidia-smi
  nvidia_smi=$(command -v nvidia-smi)
  if [ -z "${nvidia_smi}" ] || [ ! -f "${nvidia_smi}" ]; then
    return
  fi
  ts=$(timestamp_ns)
  temp=$(${nvidia_smi} -i 0 --query-gpu=temperature.gpu --format=csv,noheader)
  util=$(${nvidia_smi} -i 0 --query-gpu=utilization.gpu --format=csv,noheader | cut -f1 -d' ')
  vram=$(${nvidia_smi} -i 0 --query-gpu=memory.used --format=csv,noheader | cut -f1 -d' ')
  draw=$(${nvidia_smi} -i 0 --query-gpu=power.draw --format=csv,noheader | cut -f1 -d' ')
  fans=$(${nvidia_smi} -i 0 --query-gpu=fan.speed --format=csv,noheader | cut -f1 -d' ')
  store_line "nvidia_smi,host=${host},metric=temperature value=${temp} ${ts}"
  store_line "nvidia_smi,host=${host},metric=utilization value=${util} ${ts}"
  store_line "nvidia_smi,host=${host},metric=memory value=${vram} ${ts}"
  store_line "nvidia_smi,host=${host},metric=power value=${draw} ${ts}"
  store_line "nvidia_smi,host=${host},metric=fan value=${fans} ${ts}"
}

report_free() {
  # Stats from free: RAM used, buffered/cached, free.
  # Depends on: free.
  ts=$(timestamp_ns)
  mem_used=$(free -b | grep 'Mem:' | awk '{print $3}')
  mem_free=$(free -b | grep 'Mem:' | awk '{print $4}')
  mem_buff=$(free -b | grep 'Mem:' | awk '{print $6}')
  store_line "free,host=${host},metric=used value=${mem_used} ${ts}"
  store_line "free,host=${host},metric=free value=${mem_free} ${ts}"
  store_line "free,host=${host},metric=buff_cache value=${mem_buff} ${ts}"
}

report_df() {
  # Stats from df: used/free space per file system.
  # Depends on: df.
  ts=$(timestamp_ns)
  df -k | egrep -v 'udev|loop|/sys|/run|/dev$' |
    grep ' /' | awk '{print $4,$5,$6}' |
    while read line; do
      # Note free space is given in 1KB blocks.
      fs_path=$(echo "${line}" | cut -f3 -d' ')
      fs_used=$(echo "${line}" | cut -f2 -d' ' | cut -f1 -d'%')
      fs_free=$(echo "${line}" | cut -f1 -d' ')
      fs_free=$((1024 * fs_free))
      store_line "df,host=${host},path=${fs_path},metric=tot_free value=${fs_free} ${ts}"
      store_line "df,host=${host},path=${fs_path},metric=pct_used value=${fs_used} ${ts}"
    done
}

report_diskstats() {
  # Disk I/O stats from /proc/diskstats.
  # Depends on: /proc/diskstats.
  ts=$(timestamp_ns)
  # The /proc/diskstats file displays the I/O statistics of block devices.
  # Each line contains the following fields (and more, omitted here):
  #  1  major number
  #  2  minor mumber
  #  3  device name
  #  4  reads completed successfully
  #  5  reads merged
  #  6  sectors read
  #  7  time spent reading (ms)
  #  8  writes completed
  #  9  writes merged
  # 10  sectors written
  # 11  time spent writing (ms)
  # Source: https://www.kernel.org/doc/Documentation/ABI/testing/procfs-diskstats
  grep -v ' 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0' /proc/diskstats | egrep -v 'loop|[hs]d[a-z][1-9]|k0p[0-9]' | awk '{print $3,$4,$6,$7,$8,$10,$11}' | sort >$DDIR/disk1
  if [[ -f "${DDIR}/disk0" ]]; then
    # Columns used: disk reads rsect rtime writs wsect wtime
    # disk:   3  device name
    # reads:  4  reads completed successfully
    # rsect:  6  sectors read
    # rtime:  7  time spent reading (ms)
    # writs:  8  writes completed
    # wsect: 10  sectors written
    # wtime: 11  time spent writing (ms)
    paste "${DDIR}/disk0" "${DDIR}/disk1" |
      while read \
        disk0 reads0 rsect0 rtime0 writs0 wsect0 wtime0 \
        disk1 reads1 rsect1 rtime1 writs1 wsect1 wtime1; do
        # Skip momentary inconsistencies when disks are added or removed.
        if [ "${disk0}" == "${disk1}" ]; then
          disk=$disk0
          # Assume 512-byte sectors.
          rsect=$((512 * (rsect1 - rsect0)))
          wsect=$((512 * (wsect1 - wsect0)))
          store_line "diskstats,host=${host},device=${disk},metric=bytes_read    value=${rsect} ${ts}"
          store_line "diskstats,host=${host},device=${disk},metric=bytes_written value=${wsect} ${ts}"
          # Successful I/O ops.
          reads=$((reads1 - reads0))
          writs=$((writs1 - writs0))
          store_line "diskstats,host=${host},device=${disk},metric=successful_reads  value=${reads} ${ts}"
          store_line "diskstats,host=${host},device=${disk},metric=successful_writes value=${writs} ${ts}"
          # Time spent.
          read_ms=$((rtime1 - rtime0))
          writ_ms=$((wtime1 - wtime0))
          store_line "diskstats,host=${host},device=${disk},metric=time_spent_reads  value=${read_ms} ${ts}"
          store_line "diskstats,host=${host},device=${disk},metric=time_spent_writes value=${writ_ms} ${ts}"
        fi
      done
  fi
  mv "${DDIR}/disk1" "${DDIR}/disk0"
}

bytes_from_value_and_units() {
  value=$1
  units=$2
  factor=1
  case "${units}" in
  "K/s")
    factor=1024
    ;;
  "M/s")
    factor=1024*1024
    ;;
  "G/s")
    factor=1024*1024*1024
    ;;
  "T/s")
    factor=1024*1024*1024*1024
    ;;
  esac
  bc -ql <<<"${factor} * ${value}"
}

report_iotop() {
  # I/O stats from iotop (per process).
  # Depends on: iotop (as root).
  iotop=$(command -v iotop)
  if [ -z $iotop ] || [ ! -f $iotop ]; then
    return
  fi
  ts=$(timestamp_ns)
  iotop_stats=$DDIR/iotop
  # The Linux kernel interfaces that iotop relies on now require root
  # privileges or the NET_ADMIN capability. This change occurred because a
  # security issue (CVE-2011-2494) was found that allows leakage of sensitive
  # data across user boundaries. If you require the ability to run iotop as a
  # non-root user, please configure sudo to allow you to run iotop as root.
  # WARNING: using -n 1 results in only zero values all the time.
  sudo $iotop -b -P -n 2 |
    grep -Ev 'task_delayacct|locale|DISK READ|0.00 B/s    0.00 B/s' |
    tr -d '[' |
    tr -d ']' |
    sed 's/kworker\///' |
    sed 's/u64:[0-9]\+-//' |
    awk '{print $4,$5,$6,$7,$9}' \
      >"${iotop_stats}"
  # Return if iotop could not be run successfully.
  # This is typically caused by 'Netlink error: Operation not permitted'.
  if [[ $? -gt 0 ]]; then
    return
  fi
  # Report I/O stats for each process.
  # Note: this aggregates I/O per process name (basename), across all PIDs.
  cat "${iotop_stats}" |
    cut -f5 -d' ' |
    sort -u |
    while read -r process; do
      process_read=0
      process_write=0
      while read -r line; do
        read_value=$(echo "${line}" | cut -f1 -d' ')
        read_units=$(echo "${line}" | cut -f2 -d' ')
        read_bytes=$(bytes_from_value_and_units "${read_value}" "${read_units}")
        process_read=$(bc -ql <<<"${process_read}+${read_bytes}")
        write_value=$(echo "${line}" | cut -f3 -d' ')
        write_units=$(echo "${line}" | cut -f4 -d' ')
        write_bytes=$(bytes_from_value_and_units "${write_value}" "${write_units}")
        process_write=$(bc -ql <<<"${process_write}+${write_bytes}")
      done < <(grep " ${process}\$" "${iotop_stats}")
      if [[ $(echo "${process_read}" | sed 's/\..*//') -gt 0 ]]; then
        store_line "iotop,host=${host},command=${process},metric=read value=${process_read} ${ts}"
      fi
      if [[ "$(echo "${process_write}" | sed 's/\..*//')" -gt 0 ]]; then
        store_line "iotop,host=${host},command=${process},metric=write value=${process_write} ${ts}"
      fi
    done
}

report_net_dev() {
  # Network I/O stats from /proc/net/dev.
  # Depends on: /proc/net/dev.
  ts=$(timestamp_ns)
  grep -Ev 'Inter|face' /proc/net/dev | tr -d ':' | awk '{print $1,$2,$10}' | sort >"${DDIR}/net1"
  if [[ -f "${DDIR}/net0" ]]; then
    ts0=$(cat "${DDIR}/net0-ts")
    echo "${ts}" >"${DDIR}/net1-ts"
    paste "${DDIR}/net0" "${DDIR}/net1" | while read -r dev0 rx0 tx0 dev1 rx1 tx1; do
      # Skip momentary inconsistencies when devs are added or removed.
      if [[ "${dev0}" == "${dev1}" ]]; then
        # Compute rx/tx bytes / sec (ts is in nanoseconds).
        rx=$(bc -ql <<<"1000000000 * (${rx1} - ${rx0}) / (${ts} - ${ts0})")
        tx=$(bc -ql <<<"1000000000 * (${tx1} - ${tx0}) / (${ts} - ${ts0})")
        store_line "net_dev,host=${host},device=${dev0},metric=rx value=${rx} ${ts}"
        store_line "net_dev,host=${host},device=${dev0},metric=tx value=${tx} ${ts}"
      fi
    done
  fi
  mv "${DDIR}/net1" "${DDIR}/net0"
  mv "${DDIR}/net1-ts" "${DDIR}/net0-ts"
}

report_du() {
  # Disk usage stats from du.
  # Files and directories to monitor in /etc/conmon/du
  # Depends on du (as root).
  if [ -z /etc/conmon/du ] || [ ! -f $/etc/conmon/du ]; then
    return
  fi
  ts=$(timestamp_ns)
  while read -r path; do
    kbytes=$(sudo du -s "${path}" | awk '{print $1}')
    bytes=$((1024 * kbytes))
    store_line "du,host=${host},path=${path} value=${bytes} ${ts}"
  done </etc/conmon/du
}

post_lines_to_influxdb() {
  # POST data to InfluxDB in batch, when target is available.
  # Depends on: nc.
  # All other tasks write data to the file in append mode (>>).
  # This task reads everything at once and immediately deletes the file.
  # This makes all the other tasks write to the same file, created anew.
  sleep ${DELAY_POST}
  mv -f "${DATA}" "${DATA}.POST"
  # Post over HTTP without auth.
  if [ -n "$TARGET_HTTP" ]; then
    host_and_port=$(echo "$TARGET_HTTP" | sed 's/.*\///' | tr : ' ')
    if nc 2>&1 -zv $host_and_port | grep -q succeeded; then
      curl >/dev/null 2>/dev/null \
        -i -XPOST "${TARGET_HTTP}/write?db=${DBNAME}" \
        --data-binary @"${DATA}.POST"
    fi
  fi
  # Post over HTTPS with Basic Auth, provided credentials are
  # found in /etc/conmon/influxdb-auth
  if [ -n "$TARGET_HTTPS" ]; then
    influxdb_auth=/etc/conmon/influxdb-auth
    if [ -z $influxdb_auth ] || [ ! -f $influxdb_auth ]; then
      return
    fi
    host_and_port=$(echo "$TARGET_HTTPS" | sed 's/.*\///' | tr : ' ')
    if nc 2>&1 -zv $host_and_port | grep -q succeeded; then
      curl >/dev/null 2>/dev/null \
        -u $(sudo cat $influxdb_auth) \
        -i -XPOST "${TARGET_HTTPS}/write?db=${DBNAME}" \
        --data-binary @"${DATA}.POST"
    fi
  fi
}

pause_depending_on_rpi_model() {
  # Pause for a few seconds depending on which model of Raspberry Pi.
  # The delay depends also on the system load as reported by top.
  # Depends on: grep, /proc/cpu_info.
  top_cmd=$(command -v top)
  load=$(${top_cmd} -b -n 1 | head -1 | sed 's/.*load average: //' | cut -f1 -d, | tr -d '.' | sed 's/^0//')
  declare -A cpu_mhz_per_model=(
    ["Raspberry_Pi_Zero_W_Rev_1_1"]=1000         # 1x1000 MHz
    ["Raspberry_Pi_Zero_2_W_Rev_1_0"]=4000       # 4x1000 MHz
    ["Raspberry_Pi_3_Model_A_Plus_Rev_1_0"]=1400 # 1x1400 MHz
    ["Raspberry_Pi_3_Model_B_Rev_1_2"]=4800      # 4x1200 MHz
    ["Raspberry_Pi_3_Model_B_Rev_1_3"]=4800      # 4x1200 MHz
    ["Raspberry_Pi_4_Model_B_Rev_1_4"]=6000      # 4x1500 MHz
    ["Raspberry_Pi_5_Model_B_Rev_1_0"]=9600      # 4x2400 MHz
  )
  model=$(grep Model /proc/cpuinfo | sed 's/.*: //' | tr ' ' '_' | tr '.' '_')
  mhz=${cpu_mhz_per_model["${model}"]}
  delay=$((250 * load / mhz))
  sleep ${delay}
}

# Run all the above tasks in a loop.
# Each task is responsible of its own checks.
while true; do
  report_df
  report_du
  report_top
  report_free
  report_iotop
  report_sensors
  report_net_dev
  report_vcgencmd_clock
  report_vcgencmd_temp
  report_diskstats
  report_nvidia_smi
  report_intel_gpu_top
  post_lines_to_influxdb
  pause_depending_on_rpi_model
done

Multi-thread

To run in slow systems such as Raspberry Pi computers, including any one from the Zero W (v1) to the 4 model B, the single-thread version below (conmon-st) is preferred, with a sampling delay based on each Raspberry CPU to avoid interfering with other processes.

deploy-to-pcs

In environments with multiple (desktop / server) computers, it may be useful to use this script (deploy-to-pcs) to deploy the latest version of the script to all computers.

deploy-to-pcs
#!/bin/bash
#
# Deplay conmon-mt to PC hosts.

for host in octavo cubito super-tuna computer smart-computer lexicon rapture; do
  if nc 2>&1 -zv ${host} 22 | grep -q succeeded; then
    echo "Deploying to ${host} ..."
    scp 2>/dev/null \
      -qr \
      ../conmon \
      root@${host}:
    ssh 2>/dev/null \
      root@${host} \
      "cp /root/conmon/conmon-mt /usr/local/bin/conmon"
    ssh 2>/dev/null \
      root@${host} \
      "systemctl restart conmon.service"
    etc_influxdb_auth=/etc/conmon/influxdb-auth
    for src_influxdb_auth in /etc/conmon/influxdb-auth "${HOME}/.conmon-influxdb-auth"; do
      if [ -f $src_influxdb_auth ]; then
        ssh 2>/dev/null \
          root@${host} \
          "mkdir -p $(dirname $etc_influxdb_auth)"
        scp 2>/dev/null \
          -qr \
          $src_influxdb_auth \
          root@${host}:$etc_influxdb_auth
        ssh 2>/dev/null \
          root@${host} \
          "chmod 400 $etc_influxdb_auth"
      fi
    done
  fi
done

conmon-mt

conmon-mt
  1
  2
  3
  4
  5
  6
  7
  8
  9
 10
 11
 12
 13
 14
 15
 16
 17
 18
 19
 20
 21
 22
 23
 24
 25
 26
 27
 28
 29
 30
 31
 32
 33
 34
 35
 36
 37
 38
 39
 40
 41
 42
 43
 44
 45
 46
 47
 48
 49
 50
 51
 52
 53
 54
 55
 56
 57
 58
 59
 60
 61
 62
 63
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
#!/bin/bash
#
# Export system monitoring metrics to influxdb.

# InfluxDB target.
DBNAME=monitoring
TARGET_HTTP='' # Leave empty to skip.
TARGET_HTTPS='http://octavo:30086'

# Default delay between each POST request to InfluxDB.
DELAY_POST=4

# Default delay between each round of "fast" metrics.
DELAY_FAST=2

# Default delay between each round of top (neither fast nor slow).
DELAY_TOP=2

# Default delay between each round of slow metrics (e.g. I/O bound).
DELAY_SLOW=300

# Data file for batch POST.
DDIR="/dev/shm/$$"
DATA="${DDIR}/DATA.txt"
mkdir -p "${DDIR}"

host=$(hostname)

timestamp_ns() {
  date +'%s%N'
}

store_line() {
  # Write a line of data to the temporary in-memory file.
  # Exit immediately if this fails.
  echo $1 >>"${DATA}" || exit 1
}

report_top_per_process() {
  # Per-process CPU(%) & MEM(bytes) usage.
  # Re-use the ptop file created by report_top().
  ts=$1
  ptop=$2
  awk '{print $4}' "${ptop}" | sort -u | while read cmd; do
    user=$(grep " ${cmd}\$" ${ptop} | cut -f1 -d' ' | sort -u | head -1)
    # CPU per proccess, only when > 0
    cpu=$(grep " ${cmd}\$" ${ptop} | cut -f2 -d' ' | grep -v '^0\.0$' | tr '\n' '+' | sed 's/+$/\n/' | bc -ql)
    if [[ ! -z "${cpu}" ]]; then
      store_line "top_cpu,host=${host},user=${user},command=${cmd} value=${cpu} ${ts}"
    fi
    # Memory per proccess, only when > 1%
    mem=$(grep " ${cmd}\$" ${ptop} | cut -f3 -d' ' | grep -v '^0\.0$' | tr '\n' '+' | sed 's/+$/\n/' | bc -ql | sed 's/^\./0./')
    # Multiply %mem by total system RAM.
    tot=$(free -b | grep 'Mem:' | awk '{print $2}')
    ram=$(bc -ql <<<"${tot} * ${mem} / 100")
    if [[ ! -z "${ram}" ]]; then
      store_line "top_mem,host=${host},user=${user},command=${cmd} value=${ram} ${ts}"
    fi
  done
  rm -f "${ptop}"
}

report_cpufreq() {
  # CPU frequency.
  # Depends on: bc.
  while true; do
    top="${DDIR}/cpufreq"
    ts=$(timestamp_ns)
    ncpus=$(grep -c 'cpu MHz' /proc/cpuinfo)
    cpufreq=$(echo "($(grep 'cpu MHz' /proc/cpuinfo | sed 's/.*: //' | tr '\n' +)0)/$ncpus" | bc -ql)
    store_line "cpufreq,host=${host} value=${cpufreq} ${ts}"
    sleep ${DELAY_FAST}
  done
}

report_top() {
  # Stats from top: CPU (overall and per process) and RAM (per process).
  # Depends on: top, free.
  while true; do
    top_cmd=$(command -v top)
    top="${DDIR}/top"
    ts=$(timestamp_ns)
    ${top_cmd} -b -c -n 1 -w 512 |
      grep -vE ' 0\..   0\..|^top|^Tasks|^%|^MiB|^$|[[:blank:]]*PID USER' |
      awk '{print $2,$9,$10,$12,$13,$14,$15,$16,$17,$18,$19,$20}' |
      tr '\\' '/' |
      sed 's/\.minecraft\/bin\/[0-9a-f]\+.*/minecraft/' |
      sed 's/[C-Z]:\/.*\/\([a-zA-Z0-9 _-]\+\.[a-z][a-z][a-z]\).*/\1/' |
      sed 's/\/[^ ]*\///' | sed 's/\(bash\|sh\|python\|python3\) .*\///' |
      tr -d '[' |
      tr -d ']' |
      awk '{print $1,$2,$3,$4}' |
      tr ',' '.' >"${top}"
    # Total CPU(%) usage.
    cpu_load=$(awk '{print $2}' ${top} | grep -v '^0\.0$' | tr '\n' '+' | sed 's/+$/\n/' | bc -ql)
    store_line "top,host=${host} value=${cpu_load} ${ts}"
    # Launch the slower per-process metrics in the background.
    ptop="${top}.$RANDOM"
    mv "${top}" "${ptop}"
    report_top_per_process "${ts}" "${ptop}" &
    sleep ${DELAY_TOP}
  done
}

report_vcgencmd() {
  # Raspberry Pi CPU temperature.
  # Depends on: vcgencmd.
  vcgencmd=$(command -v vcgencmd)
  if [ -z "${vcgencmd}" ] || [ ! -f "${vcgencmd}" ]; then
    return
  fi
  while true; do
    ts=$(timestamp_ns)
    cpu_temp=$(${vcgencmd} measure_temp | cut -f2 -d= | cut -f1 -d"'")
    store_line "vcgencmd,metric=temp,host=${host} value=${cpu_temp} ${ts}"
    if grep -q 'Pi 4' /proc/device-tree/model; then
      ts=$(timestamp_ns)
      pmic_temp=$(${vcgencmd} measure_temp pmic | cut -f2 -d= | cut -f1 -d"'")
      store_line "vcgencmd,metric=temp_pmic,host=${host} value=${pmic_temp} ${ts}"
    fi
    cat "${DATA}" | cut -f1 -d, | sort | uniq -c
    sleep ${DELAY_FAST}
  done
}

report_sensors() {
  # CPU, SSD, NVMe temperatures and other sensors (if available).
  # Depends on: jq, sensors.
  jq=$(command -v jq)
  if [ -z "${jq}" ] || [ ! -f "${jq}" ]; then
    return
  fi
  sensors=$(command -v sensors)
  if [ -z "${sensors}" ] || [ ! -f "${sensors}" ]; then
    return
  fi
  sensors_json="${DDIR}/sensors"
  while true; do
    ts=$(timestamp_ns)
    "${sensors}" -j >"${sensors_json}"
    $jq 'keys' "${sensors_json}" | grep '^  "' | cut -f2 -d'"' | while read adapter; do
      echo "adapter: $adapter"
      $jq ".\"${adapter}\"" "${sensors_json}" | $jq 'keys' | grep '^  "' | grep -v '"Adapter"' | cut -f2 -d'"' | while read name; do
        key=$($jq ".\"${adapter}\".\"${name}\"" "${sensors_json}" | $jq 'keys' | grep '^  "' | grep '_input"' | cut -f2 -d'"')
        value=$($jq ".\"${adapter}\".\"${name}\".\"${key}\"" "${sensors_json}")
        store_line "sensors,host=${host},adapter=${adapter},name=${name/ /_} value=${value} ${ts}"
      done
    done
    sleep ${DELAY_FAST}
  done
}

report_intel_gpu_top() {
  # Intel GPU.
  # Depends on: intel_gpu_top (as root).
  intel_gpu_top=$(command -v intel_gpu_top)
  if [ -z $intel_gpu_top ] || [ ! -f $intel_gpu_top ]; then
    return
  fi
  while true; do
    ts=$(timestamp_ns)
    # TODO: find out why this works only when running in a console.
    timeout 1s sudo ${intel_gpu_top} -c | head -2 >${intel_gpu_csv}
    for N in $(seq 18); do
      metric=$(awk -v cn=$N -F',' '{print $cn}' ${intel_gpu_csv} | head -1)
      value=$(awk -v cn=$N -F',' '{print $cn}' ${intel_gpu_csv} | tail -1)
      if echo "$value" | egrep -q '^0\.0+$'; then continue; fi
      echo "intel_gpu,host=${host},metric=${metric/ /_} value=${value} ${ts}"
      store_line "intel_gpu,host=${host},metric=${metric/ /_} value=${value} ${ts}"
    done
    sleep ${DELAY_FAST}
  done
}

report_nvidia_smi() {
  # NVidia GPU.
  # Depends on: nvidia-smi
  nvidia_smi=$(command -v nvidia-smi)
  if [ -z "${nvidia_smi}" ] || [ ! -f "${nvidia_smi}" ]; then
    return
  fi
  while true; do
    ts=$(timestamp_ns)
    temp=$(${nvidia_smi} -i 0 --query-gpu=temperature.gpu --format=csv,noheader)
    util=$(${nvidia_smi} -i 0 --query-gpu=utilization.gpu --format=csv,noheader | cut -f1 -d' ')
    vram=$(${nvidia_smi} -i 0 --query-gpu=memory.used --format=csv,noheader | cut -f1 -d' ')
    draw=$(${nvidia_smi} -i 0 --query-gpu=power.draw --format=csv,noheader | cut -f1 -d' ')
    fans=$(${nvidia_smi} -i 0 --query-gpu=fan.speed --format=csv,noheader | cut -f1 -d' ')
    store_line "nvidia_smi,host=${host},metric=temperature value=${temp} ${ts}"
    store_line "nvidia_smi,host=${host},metric=utilization value=${util} ${ts}"
    store_line "nvidia_smi,host=${host},metric=memory value=${vram} ${ts}"
    store_line "nvidia_smi,host=${host},metric=power value=${draw} ${ts}"
    store_line "nvidia_smi,host=${host},metric=fan value=${fans} ${ts}"
    sleep ${DELAY_FAST}
  done
}

report_free() {
  # Stats from free: RAM used, buffered/cached, free.
  # Depends on: free.
  while true; do
    ts=$(timestamp_ns)
    mem_used=$(free -b | grep 'Mem:' | awk '{print $3}')
    mem_free=$(free -b | grep 'Mem:' | awk '{print $4}')
    mem_buff=$(free -b | grep 'Mem:' | awk '{print $6}')
    store_line "free,host=${host},metric=used value=${mem_used} ${ts}"
    store_line "free,host=${host},metric=free value=${mem_free} ${ts}"
    store_line "free,host=${host},metric=buff_cache value=${mem_buff} ${ts}"
    sleep ${DELAY_FAST}
  done
}

report_df() {
  # Stats from df: used/free space per file system.
  # Depends on: df.
  while true; do
    ts=$(timestamp_ns)
    df -k | egrep -v '/run|/pods|udev|loop|/sys|/run|/dev$' |
      grep ' /' | awk '{print $4,$5,$6}' |
      while read line; do
        # Note free space is given in 1KB blocks.
        fs_path=$(echo "${line}" | cut -f3 -d' ')
        fs_used=$(echo "${line}" | cut -f2 -d' ' | cut -f1 -d'%')
        fs_free=$(echo "${line}" | cut -f1 -d' ')
        fs_free=$((1024 * fs_free))
        store_line "df,host=${host},path=${fs_path},metric=tot_free value=${fs_free} ${ts}"
        store_line "df,host=${host},path=${fs_path},metric=pct_used value=${fs_used} ${ts}"
      done
    sleep ${DELAY_FAST}
  done
}

report_diskstats() {
  # Disk I/O stats from /proc/diskstats.
  # Depends on: /proc/diskstats.
  while true; do
    ts=$(timestamp_ns)
    # The /proc/diskstats file displays the I/O statistics of block devices.
    # Each line contains the following fields (and more, omitted here):
    #  1  major number
    #  2  minor mumber
    #  3  device name
    #  4  reads completed successfully
    #  5  reads merged
    #  6  sectors read
    #  7  time spent reading (ms)
    #  8  writes completed
    #  9  writes merged
    # 10  sectors written
    # 11  time spent writing (ms)
    # Source: https://www.kernel.org/doc/Documentation/ABI/testing/procfs-diskstats
    grep -v ' 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0' /proc/diskstats | egrep -v 'loop|[hs]d[a-z][1-9]|k0p[0-9]' | awk '{print $3,$4,$6,$7,$8,$10,$11}' | sort >$DDIR/disk1
    if [[ -f "${DDIR}/disk0" ]]; then
      # Columns used: disk reads rsect rtime writs wsect wtime
      # disk:   3  device name
      # reads:  4  reads completed successfully
      # rsect:  6  sectors read
      # rtime:  7  time spent reading (ms)
      # writs:  8  writes completed
      # wsect: 10  sectors written
      # wtime: 11  time spent writing (ms)
      paste "${DDIR}/disk0" "${DDIR}/disk1" |
        while read \
          disk0 reads0 rsect0 rtime0 writs0 wsect0 wtime0 \
          disk1 reads1 rsect1 rtime1 writs1 wsect1 wtime1; do
          # Skip momentary inconsistencies when disks are added or removed.
          if [ "${disk0}" == "${disk1}" ]; then
            disk=$disk0
            # Assume 512-byte sectors.
            rsect=$((512 * (rsect1 - rsect0)))
            wsect=$((512 * (wsect1 - wsect0)))
            store_line "diskstats,host=${host},device=${disk},metric=bytes_read    value=${rsect} ${ts}"
            store_line "diskstats,host=${host},device=${disk},metric=bytes_written value=${wsect} ${ts}"
            # Successful I/O ops.
            reads=$((reads1 - reads0))
            writs=$((writs1 - writs0))
            store_line "diskstats,host=${host},device=${disk},metric=successful_reads  value=${reads} ${ts}"
            store_line "diskstats,host=${host},device=${disk},metric=successful_writes value=${writs} ${ts}"
            # Time spent.
            read_ms=$((rtime1 - rtime0))
            writ_ms=$((wtime1 - wtime0))
            store_line "diskstats,host=${host},device=${disk},metric=time_spent_reads  value=${read_ms} ${ts}"
            store_line "diskstats,host=${host},device=${disk},metric=time_spent_writes value=${writ_ms} ${ts}"
          fi
        done
    fi
    mv "${DDIR}/disk1" "${DDIR}/disk0"
    sleep ${DELAY_FAST}
  done
}

bytes_from_value_and_units() {
  value=$1
  units=$2
  factor=1
  case "${units}" in
  "K/s")
    factor=1024
    ;;
  "M/s")
    factor=1024*1024
    ;;
  "G/s")
    factor=1024*1024*1024
    ;;
  "T/s")
    factor=1024*1024*1024*1024
    ;;
  esac
  bc -ql <<<"${factor} * ${value}"
}

report_iotop() {
  # I/O stats from iotop (per process).
  # Depends on: iotop (as root).
  iotop=$(command -v iotop)
  if [ -z $iotop ] || [ ! -f $iotop ]; then
    return
  fi
  while true; do
    ts=$(timestamp_ns)
    iotop_stats=$DDIR/iotop
    # The Linux kernel interfaces that iotop relies on now require root
    # privileges or the NET_ADMIN capability. This change occurred because a
    # security issue (CVE-2011-2494) was found that allows leakage of sensitive
    # data across user boundaries. If you require the ability to run iotop as a
    # non-root user, please configure sudo to allow you to run iotop as root.
    # WARNING: using -n 1 results in only zero values all the time.
    sudo $iotop -b -P -n 2 |
      grep -Ev 'task_delayacct|locale|DISK READ|0.00 B/s    0.00 B/s' |
      tr -d '[' |
      tr -d ']' |
      sed 's/ %//g' |
      sed 's/kworker\///' |
      sed 's/u64:[0-9]\+-//' |
      awk '{print $4,$5,$6,$7,$10}' \
        >"${iotop_stats}"
    # Return if iotop could not be run successfully.
    # This is typically caused by 'Netlink error: Operation not permitted'.
    if [[ $? -gt 0 ]]; then
      return
    fi
    # Report I/O stats for each process.
    # Note: this aggregates I/O per process name (basename), across all PIDs.
    cat "${iotop_stats}" |
      cut -f5 -d' ' |
      sort -u |
      while read -r process; do
        process_read=0
        process_write=0
        while read -r line; do
          read_value=$(echo "${line}" | cut -f1 -d' ')
          read_units=$(echo "${line}" | cut -f2 -d' ')
          read_bytes=$(bytes_from_value_and_units "${read_value}" "${read_units}")
          process_read=$(bc -ql <<<"${process_read}+${read_bytes}")
          write_value=$(echo "${line}" | cut -f3 -d' ')
          write_units=$(echo "${line}" | cut -f4 -d' ')
          write_bytes=$(bytes_from_value_and_units "${write_value}" "${write_units}")
          process_write=$(bc -ql <<<"${process_write}+${write_bytes}")
        done < <(grep " ${process}\$" "${iotop_stats}")
        if [[ $(echo "${process_read}" | sed 's/\..*//') -gt 0 ]]; then
          store_line "iotop,host=${host},command=${process},metric=read value=${process_read} ${ts}"
        fi
        if [[ "$(echo "${process_write}" | sed 's/\..*//')" -gt 0 ]]; then
          store_line "iotop,host=${host},command=${process},metric=write value=${process_write} ${ts}"
        fi
      done
    sleep ${DELAY_FAST}
  done
}

report_net_dev() {
  # Network I/O stats from /proc/net/dev.
  # Depends on: /proc/net/dev.
  while true; do
    ts=$(timestamp_ns)
    grep -Ev 'Inter|face' /proc/net/dev | tr -d ':' | awk '{print $1,$2,$10}' | sort >"${DDIR}/net1"
    if [[ -f "${DDIR}/net0" ]]; then
      ts0=$(cat "${DDIR}/net0-ts")
      echo "${ts}" >"${DDIR}/net1-ts"
      paste "${DDIR}/net0" "${DDIR}/net1" | while read -r dev0 rx0 tx0 dev1 rx1 tx1; do
        # Skip momentary inconsistencies when devs are added or removed.
        if [[ "${dev0}" == "${dev1}" ]]; then
          # Compute rx/tx bytes / sec (ts is in nanoseconds).
          rx=$(bc -ql <<<"1000000000 * (${rx1} - ${rx0}) / (${ts} - ${ts0})")
          tx=$(bc -ql <<<"1000000000 * (${tx1} - ${tx0}) / (${ts} - ${ts0})")
          store_line "net_dev,host=${host},device=${dev0},metric=rx value=${rx} ${ts}"
          store_line "net_dev,host=${host},device=${dev0},metric=tx value=${tx} ${ts}"
        fi
      done
    fi
    mv "${DDIR}/net1" "${DDIR}/net0"
    mv "${DDIR}/net1-ts" "${DDIR}/net0-ts"
    sleep ${DELAY_FAST}
  done
}

report_du() {
  # Disk usage stats from du.
  # Files and directories to monitor in /etc/conmon/du
  # Depends on du (as root).
  if [ -z /etc/conmon/du ] || [ ! -f $/etc/conmon/du ]; then
    return
  fi
  while true; do
    ts=$(timestamp_ns)
    while read -r path; do
      kbytes=$(sudo du -s "${path}" | awk '{print $1}')
      bytes=$((1024 * kbytes))
      store_line "du,host=${host},path=${path} value=${bytes} ${ts}"
    done </etc/conmon/du
    sleep ${DELAY_SLOW}
  done
}

post_lines_to_influxdb() {
  # POST data to InfluxDB in batch, when targets are available.
  # Depends on: curl, nc.
  # All other tasks write data to the file in append mode (>>).
  # This task reads everything at once and immediately deletes the file.
  # This makes all the other tasks write to the same file, created anew.
  # To avoid losing data between reading and delete the file, rename it,
  # wait a little (DELAY_FAST) for the writes of tasks that already had
  # it open, then other tasks will have to write to a new file.
  while true; do
    sleep ${DELAY_POST}
    mv -f "${DATA}" "${DATA}.POST"
    # Post over HTTP without auth.
    if [ -n "$TARGET_HTTP" ]; then
      host_and_port=$(echo "$TARGET_HTTP" | sed 's/.*\///' | tr : ' ')
      if nc 2>&1 -zv $host_and_port | grep -q succeeded; then
        curl >/dev/null 2>/dev/null \
          -i -XPOST "${TARGET_HTTP}/write?db=${DBNAME}" \
          --data-binary @"${DATA}.POST"
      fi
    fi
    # Post over HTTPS with Basic Auth, provided credentials are
    # found in /etc/conmon/influxdb-auth
    if [ -n "$TARGET_HTTPS" ]; then
      influxdb_auth=/etc/conmon/influxdb-auth
      if [ -z $influxdb_auth ] || [ ! -f $influxdb_auth ]; then
        return
      fi
      host_and_port=$(echo "$TARGET_HTTPS" | sed 's/.*\///' | tr : ' ')
      if nc 2>&1 -zv $host_and_port | grep -q succeeded; then
        curl >/dev/null 2>/dev/null \
          -u $(sudo cat $influxdb_auth) \
          -i -XPOST "${TARGET_HTTPS}/write?db=${DBNAME}" \
          --data-binary @"${DATA}.POST"
      fi
    fi
  done
}

# Run all the above tasks in parallel.
# Each task is responsible of its own checks and sampling ratio / delay.
report_df &
report_du &
report_top &
report_free &
report_iotop &
report_cpufreq &
report_sensors &
report_net_dev &
report_vcgencmd &
report_diskstats &
report_nvidia_smi &
report_intel_gpu_top &
post_lines_to_influxdb &

# HACK: sleep for a while before leaving all the above tasks running in the
# background. This is so that this can be ended with Ctrl+C.
sleep 10000000000

Additional scripts

conmon-speedtest

conmon-speedtest
#!/bin/bash

# InfluxDB target.
DBNAME=monitoring
TARGET_HTTP='' # Leave empty to skip.
TARGET_HTTPS='http://octavo:30086'

host=$(hostname)

# Data file for batch POST.
DATA="/dev/shm/$$.txt"

tmp=/dev/shm/speedtest
speedtest-cli --secure >$tmp
netspd_up=$(grep -E 'Upload' $tmp | cut -f2 -d: | egrep -o '[0-9]+\.[0-9]')
netspd_down=$(cat ${tmp} | egrep 'Download' | cut -f2 -d: | egrep -o '[0-9]+\.[0-9]')
netspd_ping=$(cat ${tmp} | egrep 'ms$' | cut -f2 -d: | egrep -o '[0-9]+\.[0-9]')

echo "inet_up,host=${host} value=${netspd_up}" >>"$DATA"
echo "inet_down,host=${host} value=${netspd_down}" >>"$DATA"
echo "inet_ping,host=${host} value=${netspd_ping}" >>"$DATA"

# POST all data points in one batch request.
# Post over HTTP without auth.
if [ -n "$TARGET_HTTP" ]; then
    host_and_port=$(echo $TARGET_HTTP | sed 's/.*\///' | tr : ' ')
    if nc 2>&1 -zv $host_and_port | grep -q succeeded; then
        curl >/dev/null 2>/dev/null \
            -i -XPOST "${TARGET_HTTP}/write?db=${DBNAME}" \
            --data-binary @"${DATA}"
    fi
fi
# Post over HTTPS with Basic Auth, provided credentials are
# found in /etc/conmon/influxdb-auth
if [ -n "$TARGET_HTTPS" ]; then
    influxdb_auth=/etc/conmon/influxdb-auth
    if [ ! -f $influxdb_auth ]; then
        exit 1
    fi
    host_and_port=$(echo $TARGET_HTTPS | sed 's/.*\///' | tr : ' ')
    if nc 2>&1 -zv $host_and_port | grep -q succeeded; then
        curl >/dev/null 2>/dev/null \
            -u $(cat $influxdb_auth) \
            -i -XPOST "${TARGET_HTTPS}/write?db=${DBNAME}" \
            --data-binary @"${DATA}"
    fi
fi
rm -f "${DATA}"

conmon-mystrom

conmon-mystrom
#!/bin/bash
#
# Export system monitoring metrics to influxdb.

# InfluxDB target.
DBNAME=monitoring
TARGET_HTTP='' # Leave empty to skip.
TARGET_HTTPS='http://octavo:30086'

# MyStrom switches.
declare -A switches=(
    ["office"]="192.168.0.191"
)

# Default delay between each POST request to InfluxDB.
DELAY_POST=4

# Data file for batch POST.
DDIR="/dev/shm/$$"
DATA="${DDIR}/DATA.txt"
mkdir -p "${DDIR}"

host=$(hostname)

timestamp_ns() {
    date +'%s%N'
}

store_line() {
    # Write a line of data to the temporary in-memory file.
    # Exit immediately if this fails.
    echo $1 >>"${DATA}" || exit 1
}

post_lines_to_influxdb() {
    # POST data to InfluxDB in batch, when target is available.
    # Depends on: nc.
    # All other tasks write data to the file in append mode (>>).
    # This task reads everything at once and immediately deletes the file.
    # This makes all the other tasks write to the same file, created anew.
    # To avoid losing data between reading and delete the file, rename it,
    # wait a little (DELAY_FAST) for the writes of tasks that already had
    # it open, then other tasks will have to write to a new file.
    # Post over HTTP without auth.
    sleep ${DELAY_POST}
    if [ -n "$TARGET_HTTP" ]; then
        host_and_port=$(echo $TARGET_HTTP | sed 's/.*\///' | tr : ' ')
        if nc 2>&1 -zv $host_and_port | grep -q succeeded; then
            curl >/dev/null 2>/dev/null \
                -i -XPOST "${TARGET_HTTP}/write?db=${DBNAME}" \
                --data-binary @"${DATA}"
        fi
    fi
    # Post over HTTPS with Basic Auth, provided credentials are
    # found in /etc/conmon/influxdb-auth
    if [ -n "$TARGET_HTTPS" ]; then
        influxdb_auth=/etc/conmon/influxdb-auth
        if [ -z $influxdb_auth ] || [ ! -f $influxdb_auth ]; then
            return
        fi
        host_and_port=$(echo $TARGET_HTTPS | sed 's/.*\///' | tr : ' ')
        if nc 2>&1 -zv $host_and_port | grep -q succeeded; then
            curl >/dev/null 2>/dev/null \
                -u $(cat $influxdb_auth) \
                -i -XPOST "${TARGET_HTTPS}/write?db=${DBNAME}" \
                --data-binary @"${DATA}"
        fi
    fi
}

# myStrom WiFi switch.
# API: https://api.mystrom.ch/
# Depends on: curl.
while true; do
    for name in "${!switches[@]}"; do
        ts=$(timestamp_ns)
        switch_ip="${switches[$name]}"
        json=$(curl 2>/dev/null http://${switch_ip}/report)
        for metric in Ws power temperature; do
            value=$(echo ${json} | grep -o ".${metric}.:[^,}]*" | cut -f2 -d:)
            store_line "mystrom,switch=${name},metric=${metric} value=${value} ${ts}"
        done
    done
    post_lines_to_influxdb
done

conmon-tapo.py

Continuous Monitoring for TP-Link Tapo devices explains this script and its dependencies in more details.

#!/usr/bin/env python3

"""Script to poll TAPO devices for monitoring data and post it to InfluxDB.

Run conmon-tapo --help to see all available options.

Usage:
  conmon-tapo [options]

Requires:
  absl-py
  https://github.com/abseil/abseil-py

  tapo
  https://github.com/mihai-dinculescu/tapo/
"""

import asyncio
import json
import os
import socket
import sys
import time
import yaml

from absl import app, flags
from datetime import datetime
from influxdb import InfluxDBClient

from tapo import ApiClient
from tapo.requests import EnergyDataInterval
from tapo.responses import T31XResult

FLAGS = flags.FLAGS

flags.DEFINE_string(
    "config",
    "/etc/conmon/tapo.yaml",
    "Configuration file with settings for InfluxDB and Tapo devices.",
    short_name="c"
)


def load_config(filepath):
    with open(filepath) as f:
        config = yaml.load(f, Loader=yaml.FullLoader)
    # Override values from environment variables.
    for section in ("influxdb", "tapo_auth"):
        for variable in config[section]:
            if variable[-8:] in ("username", "password"):
                env_value = os.getenv(config[section][variable])
                if env_value:
                    config[section][variable] = env_value
    return config


async def fetch_reports(config):
    client = ApiClient(**config["tapo_auth"])
    reports = []
    for device in config["devices"]:
        model = device["model"]
        report=dict(model=model)
        try:
            if model == "H100":
                device_conn = await client.h100(device["ip"])
                device_info = await device_conn.get_device_info()
                report=dict(
                    model=device_info.to_dict().get("model"),
                    nickname=device_info.to_dict().get("nickname")
                )
                children=[]
                child_device_list = await device_conn.get_child_device_list()
                for child in child_device_list:
                    if isinstance(child, T31XResult):
                        t315 = await device_conn.t315(device_id=child.device_id)
                        children.append(dict(
                            model="T315",
                            nickname=child.nickname,
                            humidity=child.current_humidity,
                            temperature=child.current_temperature
                        ))
                report.update(dict(children=children))
            elif model in ("P110", "P115"):
                device_conn = await client.p110(device["ip"])
                device_info = await device_conn.get_device_info()
                energy_usage = await device_conn.get_energy_usage()
                report=dict(
                    model=device_info.to_dict().get("model"),
                    nickname=device_info.to_dict().get("nickname"),
                    current_power=float(energy_usage.to_dict().get("current_power")/1000)
                )
        except:
            print("Could not get data from %s on %s " % (
                device["model"], device["ip"]))
        else:
            reports.append(report)
    return reports


def always_on_reports(config):
    reports = []
    if "always_on" not in config:
      return reports
    for device in config["always_on"]:
        reports.append(dict(
            model="P115",
            nickname=device["name"],
            current_power=float(device["power"])
        ))
    return reports


def json_body_point(measurement, value, ts, tags):
    tags.update(dict(host=socket.gethostname()))
    return {
      "measurement": measurement,
      "tags": tags,
      "time": ts,
      "fields": {
        "value": value
      }
    }


def json_body_points_from_report(report, ts):
    json_body = []
    tags = dict(model=report["model"], nickname=report["nickname"])
    for field in report:
        if field in ("model", "nickname"): continue
        json_body.append(json_body_point("tapo_%s" % field, report[field], ts, tags))
    return json_body


def post_reports(config, reports):
    client = InfluxDBClient(**config["influxdb"])
    json_body = []
    ts = int(1000000000 * time.mktime(time.localtime()))
    for report in reports:
        if "children" in report:
            for child in report["children"]:
                json_body.extend(json_body_points_from_report(child, ts))
        else:
            json_body.extend(json_body_points_from_report(report, ts))
    client.write_points(json_body)


def main(argv):
    config = load_config(FLAGS.config)
    reports = asyncio.run(fetch_reports(config))
    reports.extend(always_on_reports(config))
    post_reports(config, reports)


if __name__ == "__main__":
    app.run(main)

tapo.yaml

always_on:
  - name: "Dehumidifier"
    power: "250"
  - name: "PC"
    power: "150"
devices:
  - ip: "192.168.0.115"
    model: "P115"
  - ip: "192.168.0.100"
    model: "H100"
influxdb:
  host: inf.ssl.uu.am
  port: 443
  database: "home"
  username: "INFLUXDB_USERNAME"
  password: "INFLUXDB_PASSWORD"
  ssl: True
  verify_ssl: True
tapo_auth:
  tapo_username: "TAPO_USERNAME"
  tapo_password: "TAPO_PASSWORD"

Python dependencies

The absl and influxdb modules can be installed withapt:

# apt install python3-absl python3-influxdb

The tapo module is more complicated to install. The recommended approach is to use virtualenv:

# apt install python3-venv -y
# apt install python3-venv -y
Reading package lists... Done
Building dependency tree... Done
Reading state information... Done
The following additional packages will be installed:
  python3-pip-whl python3-setuptools-whl python3.12-venv
The following NEW packages will be installed:
  python3-pip-whl python3-setuptools-whl python3-venv python3.12-venv
0 upgraded, 4 newly installed, 0 to remove and 3 not upgraded.
Need to get 2,425 kB of archives.
After this operation, 2,777 kB of additional disk space will be used.
Get:1 http://archive.ubuntu.com/ubuntu noble-updates/universe amd64 python3-pip-whl all 24.0+dfsg-1ubuntu1.1 [1,703 kB]
Get:2 http://archive.ubuntu.com/ubuntu noble-updates/universe amd64 python3-setuptools-whl all 68.1.2-2ubuntu1.1 [716 kB]
Get:3 http://archive.ubuntu.com/ubuntu noble-updates/universe amd64 python3.12-venv amd64 3.12.3-1ubuntu0.3 [5,678 B]
Get:4 http://archive.ubuntu.com/ubuntu noble-updates/universe amd64 python3-venv amd64 3.12.3-0ubuntu2 [1,034 B]
Fetched 2,425 kB in 1s (1,736 kB/s)      
Selecting previously unselected package python3-pip-whl.
(Reading database ... 425834 files and directories currently installed.)
Preparing to unpack .../python3-pip-whl_24.0+dfsg-1ubuntu1.1_all.deb ...
Unpacking python3-pip-whl (24.0+dfsg-1ubuntu1.1) ...
Selecting previously unselected package python3-setuptools-whl.
Preparing to unpack .../python3-setuptools-whl_68.1.2-2ubuntu1.1_all.deb ...
Unpacking python3-setuptools-whl (68.1.2-2ubuntu1.1) ...
Selecting previously unselected package python3.12-venv.
Preparing to unpack .../python3.12-venv_3.12.3-1ubuntu0.3_amd64.deb ...
Unpacking python3.12-venv (3.12.3-1ubuntu0.3) ...
Selecting previously unselected package python3-venv.
Preparing to unpack .../python3-venv_3.12.3-0ubuntu2_amd64.deb ...
Unpacking python3-venv (3.12.3-0ubuntu2) ...
Setting up python3-setuptools-whl (68.1.2-2ubuntu1.1) ...
Setting up python3-pip-whl (24.0+dfsg-1ubuntu1.1) ...
Setting up python3.12-venv (3.12.3-1ubuntu0.3) ...
Setting up python3-venv (3.12.3-0ubuntu2) ...
$ mkdir -p ~/.venvs
$ python3 -m venv --system-site-packages ~/.venvs/tapo
$ ~/.venvs/tapo/bin/python -m pip install tapo
Collecting tapo
  Using cached tapo-0.8.0-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (12 kB)
Using cached tapo-0.8.0-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (3.3 MB)
Installing collected packages: tapo
Successfully installed tapo-0.8.0

$ ~/.venvs/tapo/bin/python ./conmon-tapo.py

The tapo module can also be installed the not recommended way:

# pip install --break-system-packages tapo
Collecting tapo
  Using cached tapo-0.8.0-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (12 kB)
Using cached tapo-0.8.0-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (3.3 MB)
Installing collected packages: tapo
Successfully installed tapo-0.8.0
WARNING: Running pip as the 'root' user can result in broken permissions and conflicting behaviour with the system package manager. It is recommended to use a virtual environment instead: https://pip.pypa.io/warnings/venv

Device IP dependencies

The "conmon-tapo.py script was initially writen on the assumption that Tapo devices would keep their IP address constant long-term, so if they change IP addresses the script crashes when trying to read the wrong fields, or simply not being able to connect to one of them:

Trying to reach a device at nobody's IP address
$ ~/.venvs/tapo/bin/python ./conmon-tapo.py 
Traceback (most recent call last):
  File "/home/coder/src/conmon/./conmon-tapo.py", line 138, in <module>
    app.run(main)
  File "/usr/lib/python3/dist-packages/absl/app.py", line 308, in run
    _run_main(main, args)
  File "/usr/lib/python3/dist-packages/absl/app.py", line 254, in _run_main
    sys.exit(main(argv))
            ^^^^^^^^^^
  File "/home/coder/src/conmon/./conmon-tapo.py", line 132, in main
    reports = asyncio.run(fetch_reports(config))
              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3.12/asyncio/runners.py", line 194, in run
    return runner.run(main)
          ^^^^^^^^^^^^^^^^
  File "/usr/lib/python3.12/asyncio/runners.py", line 118, in run
    return self._loop.run_until_complete(task)
          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3.12/asyncio/base_events.py", line 687, in run_until_complete
    return future.result()
          ^^^^^^^^^^^^^^^
  File "/home/coder/src/conmon/./conmon-tapo.py", line 83, in fetch_reports
    device_conn = await client.p110(device["ip"])
                  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Exception: Http(reqwest::Error { kind: Request, url: "http://192.168.0.41/app", source: hyper_util::client::legacy::Error(Connect, ConnectError("tcp connect error", Os { code: 113, kind: HostUnreachable, message: "No route to host" })) })
Trying to reach a device at somebody else's IP address
$ ~/.venvs/tapo/bin/python ./conmon-tapo.py 
Traceback (most recent call last):
  File "/home/coder/src/conmon/./conmon-tapo.py", line 138, in <module>
    app.run(main)
  File "/usr/lib/python3/dist-packages/absl/app.py", line 308, in run
    _run_main(main, args)
  File "/usr/lib/python3/dist-packages/absl/app.py", line 254, in _run_main
    sys.exit(main(argv))
            ^^^^^^^^^^
  File "/home/coder/src/conmon/./conmon-tapo.py", line 132, in main
    reports = asyncio.run(fetch_reports(config))
              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3.12/asyncio/runners.py", line 194, in run
    return runner.run(main)
          ^^^^^^^^^^^^^^^^
  File "/usr/lib/python3.12/asyncio/runners.py", line 118, in run
    return self._loop.run_until_complete(task)
          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3.12/asyncio/base_events.py", line 687, in run_until_complete
    return future.result()
          ^^^^^^^^^^^^^^^
  File "/home/coder/src/conmon/./conmon-tapo.py", line 83, in fetch_reports
    device_conn = await client.p110(device["ip"])
                  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Exception: Http(reqwest::Error { kind: Request, url: "http://192.168.0.32/app", source: hyper_util::client::legacy::Error(SendRequest, hyper::Error(Io, Os { code: 104, kind: ConnectionReset, message: "Connection reset by peer" })) })
Trying to read a P115 like it was an H100
$ ~/.venvs/tapo/bin/python ./conmon-tapo.py
Traceback (most recent call last):
  File "/home/coder/src/conmon/./conmon-tapo.py", line 138, in <module>
    app.run(main)
  File "/usr/lib/python3/dist-packages/absl/app.py", line 308, in run
    _run_main(main, args)
  File "/usr/lib/python3/dist-packages/absl/app.py", line 254, in _run_main
    sys.exit(main(argv))
            ^^^^^^^^^^
  File "/home/coder/src/conmon/./conmon-tapo.py", line 132, in main
    reports = asyncio.run(fetch_reports(config))
              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3.12/asyncio/runners.py", line 194, in run
    return runner.run(main)
          ^^^^^^^^^^^^^^^^
  File "/usr/lib/python3.12/asyncio/runners.py", line 118, in run
    return self._loop.run_until_complete(task)
          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3.12/asyncio/base_events.py", line 687, in run_until_complete
    return future.result()
          ^^^^^^^^^^^^^^^
  File "/home/coder/src/conmon/./conmon-tapo.py", line 65, in fetch_reports
    device_info = await device_conn.get_device_info()
                  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Exception: Serde(Error("missing field `in_alarm_source`", line: 1, column: 822))