Back to the main page
SAR (system activity reporter) on Solaris
As someone who's responsible for running and performance of servers, you'll probably one day find yourself in situation where you need to explain that your server is okay (if you are lucky), but performance problem is most likely within network or users desktop. The tool that can help you is SAR (system activity reporter).
Solaris packages are:
It originates from Sun and basically gathers system activity and creates periodic reports. Run it from cronjob.
On Solaris, there is predefined cronjob file /var/spool/cron/crontabs/sys, so you see that SAR is run through cronjob by user sys.
# pkginfo | grep SUNWacc
system SUNWaccr System Accounting, (Root)
system SUNWaccu System Accounting, (Usr)
The SAR works in two steps.
The cronjob for sys user (file /var/spool/cron/crontabs/sys ) can for example look like:
- Collecting system activity, which is done using predefined shell script /usr/lib/sa/sa1 . This script uses binary executable file /usr/lib/sa/sadc . This is ' system activity data collector ' that samples data certain number of times within specific time frame between samples and writes results in binary format to a file.
- Creating system activity report, which is done using second predefined script /usr/lib/sa/sa2 . This script uses binary executable file /usr/bin/sar which is ' system activity reporter '. This reporter reads result binary file (with collected data from step one) and creates the report, here you can specify time interval between reported data and what data to report (like only disk performance, or only CPU or all).
# collects 1 sample every 120 sec (2 min) and write in file /var/adm/sa/sa`/usr/bin/date +%d`
* * * * * /usr/lib/sa/sa1 120 1
# creates report with interval (-i) of 900 sec (15 min) using all data (-A)
45 23 * * * /usr/lib/sa/sa2 -i 900 -A
Now when you have report you want to view if using some kind of graphs.
I've tried the kSar (version 5.0.6) http://sourceforge.net/projects/ksar/ so for example, a PDF report can be created.
Here are notes (from kSar docs) to better understand Solaris graphs:
CPU (sar -u)
The CPU spends time doing one of 4 things:
So if CPU idle graph is at 0 for a long period of time, this probably means that your host needs to have more CPU power, but also look at the time spent in system mode.
- run in user mode (spends time on application processes)
- run in system (kernel) mode (processing system calls, dealing with hardware, etc)
- be idle (CPU available to process instructions)
- waiting for I/O (better check iostat if you suspect I/O bottleneck since this can be misleading with multiprocessors)
If you are running NFS server and your system time is very high, maybe you could have a look at your NFS parameter and try to tune some parameters to get things running better.
If you got many waiting I/O you should have a look at the disks performance to see if your box is really wasting time on disk utilization. It was said that waiting I/O has been disabled on Solaris 10.
Disk (Disk Transfer and Disk Wait) (sar -d)
Disk has these two pages.
Disk Transfer has three graphs about data transferred from/to the disk.
Disk Wait has statistics about queue and waiting I/O statistics.
- bytes/s (how many bytes are read/write per second to disk)
- read+write/s (number of read and write commands issued, one command can span many blocks)
- avserv/ms (average time spend to handle one request, including seek, rotational latency and data transfer)
- avque (average number of queries in queue waiting disk to be processed)
- avwait (how long request stays idle in queue, the more it stays means that disk has trouble to empty queue)
- %busy (% of disk utilization, if you hit for a long period of time 100% getting a faster disk and/or changing data stripping might help)
Run Queue (sar -q)
- Run Queue Size (runq-sz) graph is the number of process/thread that are ready to run
- The % of time (runqocc) graph is when run queue has at least one process. If run queue has more than 2 process/thread by processor then maybe you need to add CPU because processes are waiting for idle CPU
The Swapping panel report information about lightweight process (LWP).
A LWP run in user space on top of kernel thread and shares its address space with other LWPs.
The graph report the number of 512 bytes pages swapped out to disk or swapped in to memory and the number of swap request in and out.
There is also a graph reporting the number of LWP switch done by the CPU (review together with the 'Run Queue Size' graph).
Swap Queue (sar -w)
The Swap Queue is pretty always at zero, if you got some value, then you got a problem or you have had a problem.
In typical situation the swap queue is only used when there is big memory exhaustion.
The swpq-sz (swap queue size) is the number of process the system had to swap to the disk for freeing some memory. To find out which process has been swapped to the disk, you can search for process where rss (resident set size) is 0 either with "prstat" or "ps -efly".
Buffers (sar -b)
The Buffers has 4 graphs.
- The read from the system buffers, raw disk read and disk read (Read)
- The same as the previous for writing operations (Write)
- The buffer writes cache (%wcache), the value from this graph is not very useful
- The most important graph is the buffer read cache (%rcache), if you value fall down 99% then your buffers is probably not very useful, adding some memory can get your application faster to work
Syscalls (sar -c)
The Syscalls has 4 graphs.
- The amount of read/write call per second
- The number of syscalls per second
- The fork/exec per second
- The amount of character read/write issued by the read/write system call
File Access activity (sar -a)
The File panel report information about file-system utilization
- The iget is the number of inode request done by second
- The namei is the number of name resolution per second
- The dirbk is the number of directory block read by second
TTY (Terminal) activity (sar -y)
This reports TTY activity per second.
- rawch/s = input character rate
- canch/s = input character processed by canon
- outch/s = output character rate
- rcvin/s = receive interrupt rates
- xmtin/s = transmit interrupt rates
- mdmin/s = modem interrupt rates
Messages & Sempaphores
It shows shared memory activity. Note that Oracle is using semaphore a lot.
Paging (first page, page-OUT activity) (sar -g)
This Paging panel has four graphs.
- pgfree/s = pages/sec placed on free list by page stealing daemon
- pgout/s and ppgout/s = number of page request and the number of page transferred FROM the swap file system
- pgscan/s = pages per second scanned by page stealing daemon
- %ufs_ipf = % of UFS inodes taken off the freelist by iget which had reusable pages associated with them
Paging (second page, page-IN activity) (sar -p)
This panel has four graphs.
- atch/s = number of page that has been freed in one second by the page scanner
- pgin/s and ppgin/s = number of page request and the number of page put INTO the swap file system
- pflt/s and vflt/s = how fast the page scanner search page to be freed, if you got high value that mean that the page scanner is frenetically search for page to freed, you maybe got a memory shortage
- slock/s = faults per second caused by software lock requests requiring physical I/O
Memory Usage (sar -r)
The Memory usage panel has two graphs.
- freeswap = number of 512-byte disk blocks available for page swapping
- freemem = average number of memory pages available to user processes, size of memory page is machine-dependent (find it by command ' pagesize ', return value in bytes)
Kernel Memory Allocation (sar -k)
The KMA allows a kernel subsystem to allocate and free memory as needed.
Rather than statically allocating the maximum amount of memory it is expected to require under peak load, the KMA divides requests for memory into three categories: small (less than 256 bytes), large (512 to 4 Kbytes), and oversized (greater than 4 Kbytes).
It keeps two pools of memory to satisfy small and large requests. The oversized requests are satisfied by allocating memory from the system page allocator.
Back to the main page