Performance stats and count instances of a process - log it to single line in file
by Andyofhumbleknowledge from LinuxQuestions.org on (#5AK31)
I'm trying to analyse the performance of a file processing setup. Client is given files, sends to server for processing, records result and moves files into two folders depending on whether the file is wanted or not. The server's job is more intensive, so the bottleneck is on the server, but the load profiles look odd. It's multi-stage, so some files need analysing at only level1, some need level1 then level2, and some go all the way to level3. At level3, processing speed is directly related to number of times it's running 'interesting process'. Sometimes during the processing of a batch we see a smaller number of these than we expect.
The aim is to get a combination of cpu statistics, number of times 'interesting process' is running and log this out to a file so that we can a) graph what's going on to get a feel of the thing and b) cross-reference this to the logs on the client to find what the client is up to in the slow bits of the batch cycle.
I've tried Code:sar, and it's great for most of the metrics, but I can't find a way to count the instances of 'interesting process' with it.
Currently I'm running: Code:watch "uptime | tee -a /path/logfile.txt; pgrep -c processname | tee -a /path/logfile.txt" This gives me a nice text file with a row every 2 seconds containing
I've also tried adding Code:"mpstat | awk '$12 ~ /[0-9.]+/ { print 100 - $12 }'" into the watch, but it doesn't work. On its own the command works fine, but when you wrap it into the watch command, you get idle time for each cpu, on separate lines - not funny with multiple cores.
Ideal output would be Timestamp, Number of times 'interesting process' is running, cpu %usr %sys %ni %idle %wait %hi %si
so the data would be
10:04:42, 16, 58.0 us, 26.0 sy, 0.0 ni, 12.0 id, 0.0 wa, 0.0 hi, 0.0 si
10:04:44, 18, 64.0 us, 29.0 sy, 0.0 ni, 7.0 id, 0.0 wa, 0.0 hi, 0.0 si
As a bonus section, is there a way I can add onto the end of the row "percentage of I/O channel to disk used" to help work out if I'm actually I/O bound?
Am I heading the right way or is there a better way? - any ideas appreciated.


The aim is to get a combination of cpu statistics, number of times 'interesting process' is running and log this out to a file so that we can a) graph what's going on to get a feel of the thing and b) cross-reference this to the logs on the client to find what the client is up to in the slow bits of the batch cycle.
I've tried Code:sar, and it's great for most of the metrics, but I can't find a way to count the instances of 'interesting process' with it.
Currently I'm running: Code:watch "uptime | tee -a /path/logfile.txt; pgrep -c processname | tee -a /path/logfile.txt" This gives me a nice text file with a row every 2 seconds containing
- a timestamp for the record
- load averages
- number of times the machine is running the process I'm interested in
I've also tried adding Code:"mpstat | awk '$12 ~ /[0-9.]+/ { print 100 - $12 }'" into the watch, but it doesn't work. On its own the command works fine, but when you wrap it into the watch command, you get idle time for each cpu, on separate lines - not funny with multiple cores.
Ideal output would be Timestamp, Number of times 'interesting process' is running, cpu %usr %sys %ni %idle %wait %hi %si
so the data would be
10:04:42, 16, 58.0 us, 26.0 sy, 0.0 ni, 12.0 id, 0.0 wa, 0.0 hi, 0.0 si
10:04:44, 18, 64.0 us, 29.0 sy, 0.0 ni, 7.0 id, 0.0 wa, 0.0 hi, 0.0 si
As a bonus section, is there a way I can add onto the end of the row "percentage of I/O channel to disk used" to help work out if I'm actually I/O bound?
Am I heading the right way or is there a better way? - any ideas appreciated.