Original: Unix Review - January 2006 Hemant Sharma
The performance of a system is dependent on the I/O, memory, CPU, and network. In most Unix environments, iostat, vmstat, and netstat are the commonly available tools for system reporting. These tools provide live data about the system's memory, disks, and network and enable us to work out the possible problem source. In this article, I will revisit these tools and explain how to use them to maximize your understanding of system activities.
Iostat stands for input output statistics and it provides data about the input output devices such as disk, terminals, serial devices, and CPU, but we will use it here for disk-related data only.
The basic syntax is:
iostat <options> interval count
option – This may differ among operating system. Refer to the man iostat on your system for supported options.
interval – Time period in seconds between two samples. iostat 5 will give data at each 5 seconds interval.
count – Number of times the data is needed. iostat 5 4 will give data at 5 seconds interval 4 times. If no count is specified, the samples continue indefinitely and must be terminated by pressing ^c. Commonly, the command is run without count, and samples are observed to get a feel of system state.
The very first iostat command gives a snapshot of activity since the last reboot. Subsequent statistics are for the given interval and correspond to the real-time data. Because the system state changes very quickly, the first line of output can be safely ignored as it doesn't give the useful information about current system state.
These commands use system resources and, as such, the time interval should be used with care to avoid the impact on the results. Too little interval (fewer than 5 seconds) may add command overhead to the output results and may not reflect the correct system state.
A typical iostat command on a Solaris system would give the following output:
example% iostat -xtc 5 2 extended disk statistics tty cpu disk r/s w/s Kr/s Kw/s wait actv svc_t %w %b tin tout us sy wt id d0 6.2 0.0 21.5 0.0 0.0 0.1 24.1 0 15 0 84 4 94 2 0 d1 1.8 0.0 14.3 0.0 0.0 0.1 41.6 0 7
wait – Average number of transactions waiting for service (queue length).
actv – Average number of transactions actively being serviced (removed from the queue but not yet completed).
svc _t – Average service time.
%w – Percent of time there are transactions waiting for service.
%b – Percent of time the disk is busy (transactions).
The columns r/s, w/s, kr/s, and kw/s denote read and write per seconds in bytes and KB, respectively. Similarly, in AIX, it would give the following output:
# iostat 5 2 tty: tin tout cpu: % user % sys % idle % iowait 0.0 0.0 23.1 9.0 65.9 2.0 Disks: % tm_act Kbps tps Kb_read Kb_wrtn hdisk0 2.4 6.4 1.6 0 32 cd0 0.0 0.0 0.0 0 0
% tm_act – Percentage of time that the physical disk was active.
Kbps – Amount of data transferred to the drive in KB per second.
tps – A transfer is an I/O request through the device driver level to the physical disk.
Kb_read, Kb_wrtn – Reports the total data (in KB) read/written from the physical volume during the measured interval.
Important columns to observe are svc_t, wait %w and %b – the higher the service time, the lower the performance. Together service time and %busy time gives an idea of whether a system is facing a disk I/O problem. If a disk is showing a constant disk busy state greater than 65% and service time greater than 50ms, then it is suffering from a performance bottleneck.
If the tps and the corresponding %tm_act are higher than expected, but the transfer rate is low and the busy rate is high, then the disk may have problems with the file system or the partition.
1. Balance the load from one disk to multiple disks using striping and adding more disks.
2. Distribute the swap on multiple disks. This may be relevant where swamping happens frequently.
3. Related data (such as data files and index files) should be present in the same partition.
4. The system should have sufficient memory so that unnecessary swapping does not occur, and system resources are free for application processing.
5. Tune the application to use disk I/O more efficiently by modifying the disk queries and using available cache facilities of application servers.
6. If disks have databases on them, tune the SQL queries to read tables efficiently and avoid full table scans. Regularly analyze tables to fix any errors.
7. If the disk is suffering from 100% utilization, spread the file system onto two or more disks using disk volume management software.
8. Move the file system to another, faster disk/controller or replace the existing disk/controller with a faster one.
Vmstat reports virtual memory statistics of processes, virtual memory, disk, trap, and CPU activity. Vmstat is useful to correlate memory, processes, and CPU utilization of a system. On multi-processor systems, vmstat averages the number of CPUs into the output.
The basic syntax for vmstat is to use the command followed by the option and time interval:
# vmstat <options> interval count
options – Options are used to get information about specific activity. A complete set of options can be seen by referring to man vmstat on your system.
interval – This is the time period in seconds between two samples. Too small an interval (< 5 sec) may add command overhead to the output results, and the results may not reflect the true system state.
count – This is the number of times the data is needed. The first line of vmstat gives the average of data since the time of system reboot and can be ignored. Without the count option, the command produces continuous samples until terminated by ^c. As a common practice, the samples are observed for some time without count argument to get a feel of system state.
Without interval or count options, vmstat displays a one-line summary of the virtual memory activity since the system was booted.
example# vmstat 5 procs memory page disk faults cpu r b w swap free re mf pi p fr de sr s0 s1 s2 s3 in sy cs us sy id 0 0 0 11456 4120 1 41 19 1 3 0 2 0 4 0 0 48 112 130 4 14 82 0 0 1 10132 4280 0 4 44 0 0 0 0 0 23 0 0 211 230 144 3 35 62 0 0 1 10132 4616 0 0 20 0 0 0 0 0 19 0 0 150 172 146 3 33 64
The relevant fields of vmstat's display are:
r in run queue b blocked for resources I/O, paging etc. w swapped
page (per second)
re page reclaims mf minor faults pi kilobytes paged in po kilobytes paged out fr kilobytes freed de anticipated short-term memory shortfall (Kbytes) sr pages scanned by clock algorithm
cpu – Percentage usage of CPU time. On multiprocessors, this is an average across all processors.
% vmstat 5 3 kthr memory page faults cpu ----- ----------- ------------------------ ------------ ------------ r b avm fre re pi po fr sr cy in sy cs us sy id wa 2 1 504038 3489832 0 0 0 0 0 0 2760 8705 1730 4 3 92 0 1 0 503080 3490975 0 0 0 0 0 0 5676 8619 4277 0 15 85 0 0 0 503073 3490982 0 0 0 0 0 0 1937 2707 232 0 0 99 0
* 1 page = 4096 bytes
r – Kernel threads placed on the run queue.
b – Kernel threads blocked.
avm – Active virtual memory pages.
fre – Free memory pages.
re – Pager input/output list.
wa – % CPU time waiting on I/O.
To diagnose CPU-related issues, we need to look at the processes in the run queue under the “proc” and “cpu” columns (procs r).
procs cpu r b w us sy id 0 0 0 4 14 82 0 0 1 3 35 62 0 0 1 3 33 64
The output column denotes the following:
r – Average number of runnable kernel threads over the sampling interval. Runnable refers to threads that are ready but waiting to run.
b – Average number of kernel threads awaiting resource and input/output over the sampling interval.
w – Idle and swapped processes.
us – The percent of CPU time spent in user mode. When in user mode, a process executes in its application code and does not require kernel resources.
sy – Percentage of time the CPU spends executing a process in system mode, making system calls for accessing the kernel.
id – Percentage of time the CPU is idle, that is, there are no threads available for execution or the run queue is empty. Vmstat does not take into account iowait when calculating CPU idle time, whereas top and mpstat do. So, whenever there is a high iowait, vmstat and top will show different results.
wa – Percentage of time the CPU is idle with pending local disk and NFS-mounted disks operations.
Problems can be identified by observing the following points:
r – This value should be lower than 5 for non-SMP systems. For SMP systems, this value should be lower than 5 x (Ntotal - Nbind), where Ntotal is the total number of processors and Nbind is the number of processors that have been bound to processes.
b – This value should be near zero. If the run-queue value increases, the wait-queue also increases. If threads are awakened simultaneously during a 1-second interval, the run-queue could be high but still show low CPU utilization if the threads go right back to sleep. If processes are suspended due to memory, the blocked column (b) in the vmstat report indicates the increase in the number of threads rather than the run queue.
wa – A “wa” value of more than 25 percent indicates that the impacted disks are heavily loaded.
1. If the number of processes in the run queue column “r” under procs is consistently greater than the number of CPUs on the system, it will slow down the system because there are more processes than available CPUs.
2. If this number is more than four times the number of available CPUs in the system, then the system is facing a shortage of CPU resources.
3. If the idle time (cpu id) is consistently 0 and the system time (cpu sy) is double the user time (cpu us), the system is facing a shortage of CPU resources.
Resolution of these types of issues involves:
1. Identifying and tuning heavy CPU-using processes.
2. Tuning of application procedures to make efficient use of CPU.
3. Upgrading to more powerful CPU or adding more/faster cpu to system.
4. The CPU bottleneck may also result from memory, so if the scan rate is high, adding memory can relieve CPU bottlenecks.
5. Adjusting process priorities so that lower-priority processes do not consume all the CPU resources.
6. Control processor utilization to keep the CPU-intensive processes limited to certain processors.
7. Adjust time slices so that the CPU provides a good response time. Time slices can be adjusted using kernel parameters.
Memory-related information can be obtained from the following vmstat columns:
pi – The number of pages paged in from virtual memory that reside on disk.
po – The number of pages paged out to paging space. When a process terminates normally, any paging space allocated to that process is freed.
fr – Number of pages that were freed per second. If I/O is not necessary, minimal system resources are required to free a page.
sr – Scan rate is the number of pages examined per second by the page-replacement algorithm. On systems with multiple processes using many different pages, the scan rate might greatly exceed the free rate.
de – In Solaris, the “de” column represents the anticipated short-term memory shortfall in Kbytes, which is used by the page scanner to free enough pages to satisfy requests.
1. Memory is over-committed when the ratio of fr to sr (fr:sr) is high.
2. An fr:sr ratio of 1:4 means that for every page freed, four pages had to be examined. We can't determine memory constraint based on this ratio because it is workload/application dependent.
3. Memory bottlenecks are determined by the scan rate (sr). The scan rate is the number of pages scanned by the clock algorithm per second. If the scan rate (sr) is continuously more than 200 pages per second, then there is a memory shortage.
4. If the “de” column is non-zero, then the system is anticipating a memory shortfall, and free memory will be reclaimed in expectation of its use. For overall memory availability, “de” gives a good indication of whether the system is experiencing memory shortfall.
5. If there are I/Os queued for the swap device, application paging is occurring. If there is heavy I/O to the swap device, memory upgrade may be needed.
6. A zombie process is not swapped out, so it continues to run in the memory and occupies the memory.
Resolution of these types of issues involves:
1. Gather more information about the processes using the highest amount of memory and investigate to reduce the memory consumption by the thread.
2. Tune the Java memory allocation for Java server instances and stop the unnecessary instances.
3. Tune the applications and servers to make efficient use of memory and cache.
4. Increase system memory.
5. Tune the paging process by implementing priority paging in pre-Solaris 8 versions by adding line “set priority paging=1” in /etc/system. Remove this line if upgrading from Solaris 7 to 8 and retaining old /etc/system file.
6. Remove zombie processes from the system.
Netstat displays the contents of various network-related data structures depending on the options selected. Options are dependent on the operating system, so check man netstat on your system for all supported options.
Multiple options can be given at one time from the following list:
-a – Displays the state of all sockets.
-r – Shows the system routing tables.
-i – Gives statistics on a per-interface basis.
-v – Verbose mode.
-n – Show network addresses as numbers, IP addresses. By default, netstat tries to resolve IP addresses into name for display. This option applies to the -a, -i, -r, and -v options. Because the network is being debugged, it is possible that netstat may not be able to reach naming services and it may hang looking for names. It is always a good idea to use the -n option.
interval – Number for continuous display of static tics.
The different options for netstat provide data about different network aspects, such as network connectivity, network response, and network socket status.
$netstat -rn Routing Table: IPv4 Destination Gateway Flags Ref Use Interface -------------------- -------------------- ----- ----- ------ --------- 192.168.1.0 192.168.1.11 U 1 1444 le0 126.96.36.199 192.168.1.11 U 1 0 le0 default 192.168.1.1 UG 1 68276 127.0.0.1 127.0.0.1 UH 1 10497 lo0
This shows the output on a Solaris machine whose IP address is 192.168.1.11 with a default router at 192.168.1.1. This command is mostly useful in troubleshooting network connectivity issues. When the outside network is not accessible, check the following:
1. Make sure the default router IP address is correct.
2. Make sure you can ping the router from your machine.
3. If router address is incorrect, it can be changed with route add command. See man route for more information.
4. If the router address is correct but still you can't ping it, there may be some network cable hub/switch problem and you will need to eliminate the faulty component.
The -i option to netstat gives the packet transmission and collision data about network interfaces:
$ netstat -i Name Mtu Net/Dest Address Ipkts Ierrs Opkts Oerrs Collis Queue lo0 8232 loopback localhost 77814 0 77814 0 0 0 hme0 1500 server1 server1 10658566 3 4832511 0 279257 0
Values to note in the above example are:
The value in the collision column depends on the type of network. A network using the switches for connectivity will not have collisions as compared to hub-based network.
The values in the netstat -i output can be used to make an intelligent guess about network problem areas, as follows:
1. (Collis+Ierrs+Oerrs)/(Ipkts+Opkts) > 2% – This may indicate a network hardware issue.
2. (Collis/Opkts) > 10% – Indicates that the interface is overloaded and network load must be redistributed.
3. (Ierrs/Ipkts) > 25% – Indicates packets dropped by the host as a result of an overloaded network or host.
4. If there are more than 120 collisions/second, the network is overloaded.
5. If the sum of input and output packets is higher than about 600 for a 10-Mbs interface or 6000 for a 100-Mbs interface, the network segment is too busy and needs a redistribution of load.
6. A high number of errors in the “Ierrs” and “Oerrs” columns indicates a transmitting/receiving problem. The most common reason may be that your source and destination have different network settings, like full-duplex and half-duplex.
netstat -a gives important information about network socket and TCP state. Network states returned by netstat -a are as follows:
CLOSED – Closed; the socket is not being used.
LISTEN – Listening for incoming connections.
SYN_SENT – Actively trying to establish connection.
SYN_RECEIVED – Initial synchronization of the connection underway.
ESTABLISHED – Connection has been established.
CLOSE_WAIT – Remote shutdown; waiting for the socket to close.
FIN_WAIT_1 – Socket closed; shutting down connection.
CLOSING – Closed, then remote shutdown; awaiting acknowledgement.
LAST_ACK – Remote shutdown, then closed; awaiting acknowledgement.
FIN_WAIT_2 – Socket closed; waiting for shutdown from remote.
TIME_WAIT – Wait after close for remote shutdown retransmission.
#netstat -a Local Address Remote Address Swind Send-Q Rwind Recv-Q State *.* *.* 0 0 24576 0 IDLE *.22 *.* 0 0 24576 0 LISTEN *.32775 *.* 0 0 24576 0 LISTEN *.32776 *.* 0 0 24576 0 LISTEN *.* *.* 0 0 24576 0 IDLE 192.168.1.184.22 192.168.1.186.56806 38912 0 24616 0 ESTABLISHED 192.168.1.184.22 192.168.1.183.58672 18048 0 24616 0 ESTABLISHED
If you see lots of connections in FIN_WAIT state, TCP/IP parameters need to be tuned because the connections are not being closed and are accumulating. After some time, the system may run out of resources. TCP parameters can be tuned to define a timeout so that connections can be released and used by new connections.
This article provided knowledge about the tools of trade in Unix, and I hope it will help you troubleshoot some of your system performance issues.
Hemant Sharma has more than 10 years of experience in Unix systems administration and application management. He runs his own Web site (http://www.adminschoice.com) for sys admins and writes articles on Unix systems administration. He can be contacted at: email@example.com.