2012年2月9日 星期四

Linux下各種監控系統的命令小結


linux裡面監控了各種系統資源,也採取了不同的命令,這裡說下幾種常用的命令
1、top命令
以下是top命令中,幾個關鍵字段的含義
Table 17-1. top Column headers
Header  Meaning
PID    Tasks process ID. This unique identifier allows you to manipulate a task.
USER  The user name of the tasks owner, the account it runs as.
PR     The task priority.
NI        The task niceness, an indication of how willing this task is to yield CPU cycles to other tasks. A lower or negative niceness means a high priority.
VIRT   The total amount of memory used by the task, including shared and swap memory.
RES     The total amount of physical memory used by the task, excluding swap memory.
SHR    The amount of shared memory used by the task. This memory is usually allocated by libraries and also usable by other tasks.
S         Task status. This indicates whether a task is running (R), sleeping (D or S), stopped (T),or zombie (Z).
%CPU  Percentage of available CPU cycles this task has used since the last screen update.
%MEM Percentage of available RAM used by this task. --記住,這裡不包括swap
TIME+ Total CPU time the task has used since it started.
COMMANDThe name of the task being monitored.
----------------------------------------------------------------------------------------------
還有一點要記住,top命令對於cpu不是疊加的,也就是說,如果有8個cpu,那麼,有可能top出來的某個pid的cpu佔有率已經100%了,我們只能說這個程式耗掉了一個cpu的資源,其它cpu可能是空閒的。

2、vmstat
以下是從man vmstat裡面摘抄出來的vmstat的解釋:
Procs --process
      r: The number of processes waiting for run time.
      b: The number of processes in uninterruptible sleep.

  Memory
      swpd: the amount of virtual memory used.
      free: the amount of idle memory.
      buff: the amount of memory used as buffers.
      cache: the amount of memory used as cache.
      inact: the amount of inactive memory. (-a option)
      active: the amount of active memory. (-a option)

  Swap
      si: Amount of memory swapped in from disk (/s).
      so: Amount of memory swapped to disk (/s).

  IO
      bi: Blocks received from a block device (blocks/s).
      bo: Blocks sent to a block device (blocks/s).

  System
      in: The number of interrupts per second, including the clock.
      cs: The number of context switches per second.

  CPU
      These are percentages of total CPU time.
      us: Time spent running non-kernel code. (user time, including nice time)
      sy: Time spent running kernel code. (system time)
      id: Time spent idle. Prior to Linux 2.5.41, this includes IO-wait time.
      wa: Time spent waiting for IO. Prior to Linux 2.5.41, included in idle.
      st: Time stolen from a virtual machine. Prior to Linux 2.6.11, unknown.
這裡,我重點說下swap、IO和CPU這三個值
首先說下swap和CPU的wa
當os出現性能下降時,我們通過vmstat觀察發現swap的si和so兩項值特別高時,我們就應該注意,是不是os的記憶體不夠用,導致了過度的 swap的page in和page out,而當swap的si和so值較高時,往往CPU的wa值也很高,這個wa值表示cpu花在等待swap資料的時間。所以,一旦這三個指標都很高時,就要注意,應用是否佔用了過多的記憶體,導致了swap不斷得page in和page out

接著,再來說下IO列
Bi表示讀入資料,bo表示寫入資料
這個值監控系統的IO狀態,我們可以根據vmstat的採樣時間,和bo或bi的資料,估算下,這段時間內,系統的IO負載,然後根據硬碟的最大讀寫負載,看看,是否有IO問題。

3、iostat
這個命令比較簡單,查看IO的,基本上都應該看得懂

4、mpstat
mpstat是Multiprocessor Statistics的縮寫,是即時系統監控工具。其報告與CPU的一些統計資訊,這些資訊存放在/proc/stat檔中。在多CPUs系統裡,其不但能查看所有CPU的平均狀況資訊,而且能夠查看特定CPU的資訊。下面只介紹mpstat與CPU相關的參數,
CPU處理器ID
user在internal時間段裡,用戶態的CPU時間(%),不包含nice值為負進程?usr/?total*100
nice在internal時間段裡,nice值為負進程的CPU時間(%)?nice/?total*100
system在internal時間段裡,核心時間(%)?system/?total*100
iowait在internal時間段裡,硬碟IO等待時間(%)?iowait/?total*100
irq在internal時間段裡,軟中斷時間(%)?irq/?total*100
soft在internal時間段裡,軟中斷時間(%)?softirq/?total*100
idle在internal時間段裡,CPU除去等待磁片IO操作外的因為任何原因而空閒的時間閒置時間(%)?idle/?total*100
intr/s在internal時間段裡,每秒CPU接收的中斷的次數?intr/?total*100

mpstat的語句如下:
$mpstat –查看所有cpu的匯總資訊
$mpstat –P ALL分開顯示不同cpu的匯總資訊
$mpstat –P 1   只顯示cpu 1的匯總資訊

5、pidstat
可以列出具體的某個進程所使用的IO、cpu、memory等等

6、netstat
列出網路資源狀態

7、sar
collects, reports and saves system activity information (CPU, memory, disks, interrupts, network interfaces, TTY, kernel tables,etc.)
可以說,sar的功能是很強大的,它會收集歷史的統計資訊,保存到文本文檔或者二進位的文檔中,以後要看時,就可以隨時查看了。

沒有留言: