Performance stats and count instances of a process - log it to single line in file

Andyofhumbleknowledge

from LinuxQuestions.org on 2020-11-19 10:46 (#5AK31)

I'm trying to analyse the performance of a file processing setup. Client is given files, sends to server for processing, records result and moves files into two folders depending on whether the file is wanted or not. The server's job is more intensive, so the bottleneck is on the server, but the load profiles look odd. It's multi-stage, so some files need analysing at only level1, some need level1 then level2, and some go all the way to level3. At level3, processing speed is directly related to number of times it's running 'interesting process'. Sometimes during the processing of a batch we see a smaller number of these than we expect.

The aim is to get a combination of cpu statistics, number of times 'interesting process' is running and log this out to a file so that we can a) graph what's going on to get a feel of the thing and b) cross-reference this to the logs on the client to find what the client is up to in the slow bits of the batch cycle.

I've tried Code:sar, and it's great for most of the metrics, but I can't find a way to count the instances of 'interesting process' with it.

Currently I'm running: Code:watch "uptime | tee -a /path/logfile.txt; pgrep -c processname | tee -a /path/logfile.txt" This gives me a nice text file with a row every 2 seconds containing

a timestamp for the record
load averages
number of times the machine is running the process I'm interested in

Load average isn't accurate enough, or rather more specifically, it lags too much. I need to include instantaneous CPU usage. I could do with the something like the output from the top row of Code:topor Code:sar -u 1 1, but just appended to the line in the output file, and unlabeled, so that I can then run analysis on it and x-ref to the client logs.

I've also tried adding Code:"mpstat | awk '$12 ~ /[0-9.]+/ { print 100 - $12 }'" into the watch, but it doesn't work. On its own the command works fine, but when you wrap it into the watch command, you get idle time for each cpu, on separate lines - not funny with multiple cores.

Ideal output would be Timestamp, Number of times 'interesting process' is running, cpu %usr %sys %ni %idle %wait %hi %si

so the data would be

10:04:42, 16, 58.0 us, 26.0 sy, 0.0 ni, 12.0 id, 0.0 wa, 0.0 hi, 0.0 si
10:04:44, 18, 64.0 us, 29.0 sy, 0.0 ni, 7.0 id, 0.0 wa, 0.0 hi, 0.0 si

As a bonus section, is there a way I can add onto the end of the row "percentage of I/O channel to disk used" to help work out if I'm actually I/O bound?

Am I heading the right way or is there a better way? - any ideas appreciated.

latest?i=Sc42EZDNHig:pis3HHiL8sQ:F7zBnMy

latest?i=Sc42EZDNHig:pis3HHiL8sQ:V_sGLiP

latest?i=Sc42EZDNHig:pis3HHiL8sQ:gIN9vFw

Source	RSS or Atom Feed
Feed Location	https://feeds.feedburner.com/linuxquestions/latest
Feed Title	LinuxQuestions.org
Feed Link	https://www.linuxquestions.org/questions/