>> High but not full CPU utilization (70~90%) 

>> CPU Time spent in user application (85+% in user)

>> Low Disk utilization (5~15% for each disk)

>> Low Network utilization (10~30% per network, less 5% of collision)

 

2. 튜닝의 일반적인 모델


  Agree level of Performance to reach

->

Gather Data using monitoring tool(vmstat, iostat, netstat,...)

->

Analyze Data

->

Work from the biggest bottleneck first

 주의: 시스템의 성능(Performance)은 시스템의 리소스를 어떻게 사용하느냐에 좌우된다.

 3. 튜닝의 일반적인 순서


 

1st. Application Tuning

2nd. DataBase Tuning

3rd. OS Tuning(System Tuning)

 

4. 실행가능한 몇가지 모니터링 툴


 

- vmstat : command to view status of memory and CPU

- ps : command to find which process are hogs

- swap : command for available swap space

- iostat : command for terminal, disk, cpu utilization

- netstat, nfsstat : command for network performance

- sar : command to view system activity (need SUNWaccr, SUNWaccu packages)

- mpstat : command to view status of multi-cpu

- truss : command to trace system calls what is going on

- cachefs : mechanism to speed up read-mostly NFS

- PrestoServe : application for synchronous writes (many small files, Mail server)

- DiskPak of Eagle company : application for fragmented disks

 

5. 힌트 : 데이터베이스 관리시스템 튜닝


 

  • Configure disk for speed, Not capacity
  • I/O load needs many random access disks
  • 3*1.05GB is over twice as fast as 1*2.9GB
  • Use raw disk for tablespaces to reduce CPU load
  • Save inode and indirect block updates
  • Use dd | compress into filesystem for snap backup then ufsdump normally
  • Use UFS for tablespaces to reduce I/O load
  • Extra level of caching needs more RAM
  • With UFS, use PrestoServe/NVSIMM or Logging option
  • Use large shared memory area (up to 25% of RAM)
  • If uo value is upper 25%, must expand shared memory

 

 

6. 시스템 성능을 좌우하는 요소들


 

  • CPU : number of CPUs
  • I/O Devices : disk, printer, terminal, transfer information
  • Memory : primary memory(RAM), secondary memory(on disk)
  • Kernel : kernel parameters (/etc/system)
  • Network

주의 : 튜닝시 반드시 시스템과 네트웍을 같이 고려해야 한다.

 

 

7. 주의깊게 봐야할 변수들


 

            

          

 Source Code

 Alogorithm, Language, Programming Model, Compiler

 Executable

 Environment, Filesytem Type

 DataBase

 Buffer Sizes, Indexing

 Kernel

 Buffer Sizes, Paging, Tuning, Configuring

 Memory

 Cache Type, Line Size and Miss Cost

 Disk

 Driver Algorithms, Disk Type, Load Balance

 Windows & Graphics

 Window System, Graphics Library, Bus Throughput

 CPU

 Processor Implementation

 Multiprocessors

 Load Balancing, Concurrency, Bus Throughput

 Network

 Protocol, Hardware, Usage Pattern

 

 8. 첫 번째 시스템 튜닝 10 단계


 

1st. The system will usually have a disk bottleneck.

2nd. You will be told that the system is NOT I/O bound.

3rd. After first pass tuning the system will still have a disk bottleneck.

4th. Poor NFS response times are hard to pin down.

5th. Avoid the common memory usage misconceptions.

6th. Don't panic when you seee page-ins and page-outs in vmstat.

7th. Look for page scanner activity.

8th. Look for a long run queue (vmstat procs r).

9th. Look for processes blocked waiting for I/O (vmstat proc b).

10th. Look for CPU system time dominating user time.

 

9. 첫 번째 튜닝 접근


 

1st. Clear up any RAM shortage.

     => If at first the monitoring indicates paging, Add more RAM.

2nd. Make sure that processor speed, or number of process.

     => Clear out unnecessary processes.

     => Make sure that the run queue is as small as possible.

3rd. Focus I/O subsystems (disk, networks)

4th. Use "iostat -x" to monitor the disk.

     => Check busy(%b) and service time(svc_t)

5th. Continue to cycle around the tuning path until all subsytems, and indeed the machine itself is fast enough to reach the required perfomance metric.

Analysis using tool & How to read them

 

 

 

 

 

CPU Performance

 

ps command


 

How to use ps

 

#> ps

TIME : the total amount of CPU time used by the process since it began

#> ps -efl

SZ : shows the amount of virtual memory required by the process

 

Example of ps

 

#> ps
 PID TTY       TIME COMD
 346 pts/2     0:01 ksh
1029 pts/2     0:00 ps
1199 pts/2     0:01 ksh
#> ps -efl
 UID  PID PPID  C PRI NI     ADDR    SZ    WCHAN    STIME TTY TIME COMD
root    0    0 80   0 SY f01706f0     0          19:47:44 ?   0:01 sched
root    1    0 80  99 20 fc18f800   173 fc18f9c8 19:47:48 ?   0:36 /etc/init -

  

vmstat command


 How to use vmstat

 

 

#> vmstat 5

procs

r b w

r : In the run queue, waiting for processing time

b : Blocked, waiting for resources

cpu

cs us sy id

us : percentage of CPU time spent in USER mode

sy : percentage of CPU time spent in SYSTEM mode

id : percentage of CPU time spent idle

 

How to read vmstat data

 

If CPU spends most of its time in USER mode, one or more processes may be monopolizing the CPU.

=> Check "ps -ef"

A low value "id" indicates a MEMORY starved or I/O bound system.

=> RECOMMENDATION : idle time "id" should be grater than 20% most of time

=> In GENERAL guide

   id < 15% : have to wait before being put into execution

   us > 70% : the application load may be NEED some BALANCING

   sy = 30% : is a good water mark

Compute Intensive program.

=> One Compute Intensive program can push utilization rate to 100%.

If the CPU is mainly in system mode, then it is probably I/O bound.

 

Example of vmstat

 

#> vmstat
procs   memory          page              disk    faults    cpu
r b w  swap free re mf pi po fr de sr f0 s0 s1 s2 in sy cs us sy id
0 0 0 72836 7688  0  1  5  1  4  0  2  0  0  0  1 16 37 30  1  1 98

주의 : sr 값이 계속 높으면 메모리부족을 의심해 봐야 한다.

 CPU Solutions


 If your CPU is often busy, or

If it often deals with jobs than monopolize the system,

 

=> Be lower the priority of other processes (nice command).

=> Check for runaway processes, or other processes monopolizing the CPU or MEMORY.

   check "ps -ef" (TIME field)

   check "ps -efl" (SZ field)

=> Evaluate your system's memory usage.

 주의 : 부족한 메모리는 과다한 스와핑(swapping)과 페이징(paging)의 원인이 된다.

 

MEMORY Performance

swap command


 

How to use swap

 

#> swap

blocks : 512 bytes block

free   : 512 bytes block

#> swap -s

How much swap space is available

 

 

Example of ps

 

#> swap -l
swapfile          dev   swaplo blocks  free
/dev/dsk/c0t3d0s1 32,25      8 156232 95200
#> swap -s
total: 33092k bytes allocated + 9104k reserved = 42196k used, 53172k available

 

 

 

vmstat -S command


 How to use vmstat -S

 

#> vmstat -S 5

po : kbytes paged out

pi : kbytes paged in

si : number of pages swapped in per second

so : number of whole processes swapped out

 

 

 

How to read vmstat -S output

  po = 0 : no paging occuring

Note 1. check "proc r" field

r > 1     : indicative of processor speed

r > 2 - 4 : adding more CPU

Note 2. check "proc b" field

If this column has values, run "iostat" to tune the disk I/O.

po > 0 : not sufficient RAM for the application

Consult the application vendor regarding the memory requirements

pi rate is NOT important

 주의 : 메모리와 디스크의 액세스 시간을 구별하여야 한다.

 Example of vmstat -S

 

#> vmstat -S 5
procs   memory          page              disk    faults    cpu
r b w  swap free si so pi po fr de sr f0 s0 s1 s2 in sy cs us sy id
0 0 0 53036 2848  0  0 15  8 19  0 11  0  0  0  0 52 37 92  4  3 93
0 0 0 53140 6328  0  1  5  0  0  0  0  0  0  0  0 10 21 30  1  1 98
0 0 0 53140 6328  0  0  0  0  0  0  0  0  0  0  0 31 26 62  0  1 99

 

 

 

 

sar -g command


 

How to use sar -g

 

#> sar -g 5 20

pgout/s : The average number of page-out requests per second

<= A good indication of memory performance

pgfree/s : The average number of pages per second that were added to the free list

pgscan/s : The average number of pages that needed to be scanned in order to find more memory

 

 

How to read sar data

 

consistently pgout = 0 : NO memory problem

several interval pgout > 0 : System perfomance is suffering

pgfree and pgscan should be small (less than 5)

 

 

Example of sar -g

 

#> sar -g 5 20

SunOS hostname 5.5.1 Generic_103640-12 sun4u 11/12/98

16:59:36 pgout/s ppgout/s pgfree/s pgscan/s %ufs_ipf
16:59:41    1.39     5.78    23.90    79.88     0.00
16:59:46    1.60   113.77    27.74   253.69     0.00
16:59:51    0.80     2.00    13.80    81.40     0.00
16:59:56    0.20     0.20     0.20     0.00     0.00

 

Memory Solutions


 

1st. Two Memory Problem

=> The system spend a lot of time paging and/or swapping.

=> Run out of SWAP space

2nd. Check SZ field from "ps -efl"

3rd. For 1st. problem;

=> Adding physical memory until no paging and swapping

 

 

 

 

 

DISK Performance

 

Factors for Disk Performance


 

  • Speed of disk
    • transfer rate
    • seek time
    • rotation latency
  • Load balance across the multiple disk
  • Access type
    • single user vs multi user
    • sequential vs random
  • Memory
    • disk buffers, used when transfering information to and from disk, are stored

df -k command


 How to use df

 

#> df

capacity : How much of the file system's total capacity has been used

 How to read df output

 

%capacity = 100%

=> remove core and any s/w packages

=> add more disk space or move files to another partition

 

Example of df

 

#> df -k
Filesytem         kbytes   used avail capacity Mounted on
/dev/dsk/c0t3d0s0  21615  14909  4546      77% /
/dev/dsk/c0t3d0s6 240463 211348  5075      98% /usr
/proc                  0      0     0       0% /proc
 

 

iostat command


 

How to use iostat

 

#> iostat 5

disk

serv : average service time, in milliseconds

cpu

us : time spent in user mode

sy : time spent in system mode

wt : time spent waiting I/O

#> iostat -x 5

svc_t : service time

w%    : percentage of time the queue is not empty

%b    : percentage of time the disk busy

 

How to read iostat -x output

 

%b

5%  > %b : ignore

30% < %b : be concerned about anything

60% < %b : need fixing

svc_t

if %b < 5%, svc_t : ignore

if %w has values, 10 - 50ms : OK

                  100 - 150 : Need fixing

 

 

 

 

 

iostat -D command


 

How to use iostat -D

 

#> iostat -D 5

util : percentage of disk utilization

We can find the load balance between disks.

 

Example of iostat

 

#> iostat 5
     tty          fd0          sd0          sd1          sd2         cpu
tin tout Kps tps serv Kps tps serv Kps tps serv Kps tps serv us sy wt id
  0  560   0   0    0   0   0   99   1   0   49  10   2   54  4  4  4 88
  0   16   0   0    0   0   0    0   0   0    0   0   0    0  0  0  0 99

#> iostat -x 5
                    extended disk statistics
disk r/s w/s Kr/s Kw/s wait actv svc_t %w %b
fd0  0.0 0.0  0.0  0.0  0.0  0.0   0.0  0  0
sd0  0.0 0.0  0.2  0.1  0.0  0.0  99.5  0  0

 

 


#> iostat -D
         fd0          sd0          sd1          sd2
rps wps util rps wps util rps wps util rps wps util
  0   0  0.0   0   0  0.2   0   0  0.4   1   0  3.1

 

sar -a -b -d command


 

How to use sar

 

#> sar -a 5 3

-a : Report use of file access system routines (report on file access)

iget/s

namei/s

dirbk/s

Average

#> sar -b 5 3

-b : Report buffer activity (report on disk buffers)

%rcache : Fraction of logical reads found in the system buffers

%wcache : Fraction of logical writes found in the system buffers

 

#> sar -d 5 3

-d : Report activity for each block device (report on disk transfers)

r+w/s : read + write per second

%busy : Percentage of time the device spend servicing a transfer

blk/s : Number of 512-byte blocks transferred to device, per second

 

How to read sar data

 

sar -a

The large the values, the more time the kernel is spending to ACCESS user files

This report is USEFUL for understanding "HOW disk-dependent a system is"

 

sar -b

%rcache < 90% and %wcache < 65%

=> may be possible to improve performance by increasing the buffer space

 

sar -d

%busy > 85% : high utilization, load problem

r+w/s > 65% : overload

 

 

Example of sar output

 

#> sar -a 5 3
16:59:36 iget/s namei/s dirbk/s
16:59:41      0       0       0
16:59:46      8      26      15
16:59:51    271     297     288

Average      93     108     101
#> sar -b 5 3
16:59:36 bread/s lread/s %rcache bwrit/s lwrit/s %wcache pread/s pwrit/s
16:59:41      19      57      67      22      26      17       0       0
16:59:46      23      73      69      18      23      20       0       0
16:59:51      25      51      51      15      27      46       0       0

Average       22      60      63      18      25      28       0       0
#> sar -d 5 3
13:21:07 device %busy avque r+w/s blks/s avwait avserv
13:21:12 sd1       52   0.8    30    216    4.8   21.1
         sd3        6   0.1     1     14    0.0   45.4
average
         sd1       32   0.5    17    187    3.5   23.9
         sd2       45   0.4    24    208    0.0   18.9
         sd3        4   0.0     1      9    0.0   51.1

 

DISK Solutions


 

1st. Check the file system overload (above 90 or 95% capacity)

=> Clean out unused files from /va/adm, /var/adm/sa and /var/lp/logs.

=> Clean out the core files.

=> Find the files what are unused more than 60 days.

   #> find /home -type f pmtime +60 -print

   #> find /home -name core -exec rm {}\;

2nd. If you have more than one disks,

=> Distribute the file systems for a more balanced load between the disks.

=> Try reducing disk seeking through careful planning in data positioning on disk.

   (I/O on the outer sectors can be far faster)

3rd. Consider adding Memory.

=> Additional memory reduces swapping and paging, and allows an expanded buffer pool.

4st. Consider buying faster disks.

5st. Make sure disks are not overloading SCSI controller.

=> Below 60% utilization of SCSI bus.

6st. Consider adding disks.

=> Not be busier more than 40 - 60% of the time.

   (%b : iostat -xct 5 and %busy : sar -d 5 3)

7st. Considering using an in-memory file system for /tmp directory.

ð It's default in Solaris 2.X.

 

NETWORK Performance

 

Overview of Network Performance


 

Congested or Collision - resend

Network has the bankwidth, and so can only transmit a certain amount of data

EtherNet

10Mbit/sec(14,400 packes/sec)

Size of 1 packet - 64 bytes (about 14,400 packets/sec)

Maximum packet size : 1518 bytes

Inter packet gab : 9.6 micro-second

30-40% utilization because of collision contention

Latency

Not important as much as disk

Must consider that remote system has resources including disk

 

NFS

UDP : Common protocol in use, being part of TCP/IP and allows fast network throughput with little overhead.

Logical packet size : 9Kbytes

On ethernet : 6 * 1518 bytes

After collision, have to resend ALL serveral ethernet packets

Slower remote server

The remote server is CPU bound

Network Monitoring Tool

nfsstat

netstat

snoop

ping

spray

 

 

ping command


 

How to use ping

 

#> ping

Send a packet to a host on the network.

-s : send one packet per second

 

How to read ping -s output

 

2 single SPARCstations on a quiet EtherNet always respond with less than 1 millisecond

 

Example of ping

 

#> ping -s host
PING host: 56 data bytes
64 bytes from host (1.1.1.1): icmp_seq=0. time7. ms
64 bytes from host (1.1.1.1): icmp_seq=0. time7. ms


----host PING Statistics----
5 packets transmitted, 5 packets received, 0% packet loss
round-trip (ms) min/avg/max = 1/2/7

 spray command


 

How to use spray

 

#> spray

Send a one-way stream of packets to a host

Reports How may were received and the transfer RATE.

-c : count(number) of packets

-d : specifies the delay, in microseconds

     Default: 9.6 microsecond

-l : specifies the Length(size) of the packet

 

How to read spray output

 

If you use -d option, if there are many packet dropped the packet,

=> Check Hardware such as loose cables or missing termination

=> Check the possible a congested network

=> Use the netstat command to get more information

 

 

Example of spray

 

#> spray -d 20 -c 100 -l 2048 host

sending 100 packets of length 2048 to host ...
        no packets dropped by host
        560 packets/sec, 1147576 bytes/sec

 

 

netstat -i command


 

How to use netstat -i

 

#> netstat -i 5

errs : the number of errors

packets : the number of packets

colls : the number of collisions

* Collision percentage rate = colls/output packets * 100

 

How to read netstat data

 

collision percentage > 5% (one system)

=> Checking the network interface and cabling

collision percentage > 5% (all system)

=> The network is congested

errs field has data

=> Suspect BAD hardware generating illegal sized packets

=> Check Repeated Network

 

Example of netstat

 

#> netstat -i 5
    input   le0      output         input  (Total)   output
packets errs packets errs colls packets errs packets errs colls
71853   1    27270   8    4839  72526   1    27943   8    4839
7       0    0       0    0     7       0    0       0    0   
14      0    0       0    0     14      0    0       0    0   

 

snoop command


 

How to use snoop

 

#> snoop

Capture packets from the network

Display their contents

 

Example of snoop

 #> snoop host1
#> snoop -o filename host1 host2
#> snoop -i filename -t r | more
#> snoop -i filename -p99,108
#> snoop -i filename -v -p101
#> snoop -i filename rpc nfs and host1 and host2

 nfsstat -c command


 

How to use nfsstat -c

 

#> nfsstat -c

Display a summary of servers and client statistics

Can be used to IDENTIFY NFS problems

retrans : Number of remote procedure calls(RPCs) that were retransmitted

badxids : Number of times that a duplicat acknowledgement was received for a single NFS request

timeout : Number of calls that timed out

readlink : Number of reads to symbolic links

 

How to read nfsstat data

 

% of retrans of calls > 5% : maybe network problem

=> Looking for network congestion

=> Looking for overloaded servers

=> Check ethernet interface

high badxid, as well as timeout : remote server slow

=> Increase the time-out period

   #> mount host:/home /home rw,soft,timeout=15 0 0

% of readlink of the calls > 10% : too many symbolic links

 

Example of nfsstat

 

#> nfsstat -c

Client rpc:
calls   badcalls retrans badxid timeout wait newcred timers
13185   0        8       0      8       0    0       50

Client nfs:
calls   badcalls nclget  nclcreate
13147   0        13147   0
null    getattr  setattr root   lookup   readlink read
0  0%   794  6%  10   0% 0  0%  2141 16% 2720 21% 6283 48%
wrcache write    create  remove rename   link     symlink
0  0%   581  4%  33   0% 29  0% 4  0%    0  0%    0  0%
mkdir   rmdir    readdir statf
0  0%   0  0%    539  4% 13 0%

 Network Solutions


 

1st. Consider adding the Prestoserve NFS Write accelerator.

=> Write % from nfsstat -s > 15%, consider installing it.

2nd. Subneting

=> If your network is congested, consider subnetting.

=> That is collision rate > 5%, subnetting.

3rd. Install the bridge

=> If your network is congested and physical segmentation is NOT possible.

=> Isolate physical segments of a busy network.

4st. Install the local disk into diskless machines.

 

>> Bottlenecks

 

 

I/O Bottleneck

Detection


         

 sar 필드

 Uneven workload

 r+w/s, avque

 Many threads blocked waiting on I/O

 %wio

 High disk utilization rate

 %busy

 Active disk with no free space

 

 

Solutions


Balance the disk load

Use mmap instead of read and write

Use shared libraries

Put busier filesystems on smaller disks

Organize I/O requests to be more contiguous

Add more/faster disks

Memory Bottleneck

 

Detection


 I/O Bottleneck        

 sar 필드

 Steady page-out activity

 ppgout/s

 Scan rate is non-zero(page daemon is active)

 pgscan/s

 Swapper is active

 swpot/s,swpq-sz

 Free memory is at or below lotsfree

 freemem,pgfree/s

 Hardware cache misses

 

 

 

 

Solutions


 

Modify process load

Tune paging parameters

Add more memory

Use shared libraries

Use memcntl to use memory more efficiently within application

Analyze locality of reference in applications 

Set memory limits - setrlimit

 

CPU Bottleneck

 

       

 sar 필드

 CPU idle time is low

 %usr,%sys,%idle

 Threads waiting on run queue

 runq-sz,%runocc

 Slower response/interactive performance

 

 Solutions


 

Use priocntl / nice to modify process/thread priorities

Modify dispatch parameter tables

Modify applications to use system calls more efficiently

System daemons

Device interrupts

Modify/limit process load

Custom device drivers

More, faster CPUs

>> Rules Table

 

Network Rules


 

Notation used in tables


Rules

측정결과를 나타내기 위해 명령어 이름과 "."와 변수 이름을 조합하여 표시하였다. 예를 들어 "iostat -x" 명령을 사용하여 디스크 서비스 타임을 30초 간격으로 측정하였다면 이름을 "iostat-x30.svc_t"와 같이 표시하였다.

변수들간의 조합은 논리연산자 "&&", "||", "=="를 사용하였고 간결하게 하기 위해 범위는 "0 <= X < 100" 와 같이 표기하였다.

Levels

테이블의 level은 상태의 심각함 정도를 나타내며 아래표와 같다.

 

Level

 Discription

white

 low usage

blue

 under-utilization/imbalance of resource

green

 target utilization/no problem

amber

 warning level

red

 critical level that needs to be fixed

black

 problems that can pervent your system

 

Actions

각 테이블의 rules에서 취해져야할 조치를 표기했다. 문제에 대한 간단한 메모와 관련사항들을 나타내고 있다.

  

Rules based upon ethernet collizions


 

Rule for each network interface

Level

Action

 (0<netstat-i30.output.packets<10)&&(100*netstat-i30.

 output.colls/netstat-i30.output.packets<0.5%)&&(other

 nets white or green)

White

No Problem

 (0<netstat-i30.output.packets<10)&&(100*netstat-i30.

 output.colls/netstat-i30.output.packets<0.5%)&&(other

 nets amber or red)

Blue

Inactive Net

 (10<=netstat-i30.output.packets)&&(0.5%<=100*netstat-

 i30.output.colls/netsat-i30.output.packets<2.0%)

Green

No Problem

 (10<=netstat-i30.output.packets)&&(2.0%<=100*netstat-

 i30.output.colls/netstat-i30.output.packets<5.0%)

Amber

Busy Net

 (10<=netstat-i30.output.packets)&&(5.0%<=100*

 netstat-i30.output.colls/netstat-i30.output.packets)

Red

Busy Net

 network type is not "ie","le","ne",or "qe", it is "bf" or "nf".

Green

Not Ether

 

Inactive Net

An inactive network is a waste of throughput when other networks are overloaded. Rebalance the load so that all networks are used more evenly.

Busy Net

A network with too many collisions reduces throughput and increases response time for users. Move some of the load to inactive networks if there are any. Add more thernets or upgrade to a faster interface type like FDDI, 100MBit ethernet or ATM.

Not Ether

If the last letter of the interface name is not "e" then this not an ethernet so the collision based network performance rule should not be used.

Network Rules


 

Notation used in tables


Rules

측정결과를 나타내기 위해 명령어 이름과 "."와 변수 이름을 조합하여 표시하였다. 예를 들어 "iostat -x" 명령을 사용하여 디스크 서비스 타임을 30초 간격으로 측정하였다면 이름을 "iostat-x30.svc_t"와 같이 표시하였다.

변수들간의 조합은 논리연산자 "&&", "||", "=="를 사용하였고 간결하게 하기 위해 범위는 "0 <= X < 100" 와 같이 표기하였다.

Levels

테이블의 level은 상태의 심각함 정도를 나타내며 아래표와 같다.

 

Level

 Discription

white

 low usage

blue

 under-utilization/imbalance of resource

green

 target utilization/no problem

amber

 warning level

red

 critical level that needs to be fixed

black

 problems that can pervent your system

 

Actions

각 테이블의 rules에서 취해져야할 조치를 표기했다. 문제에 대한 간단한 메모와 관련사항들을 나타내고 있다.

 

Rules based upon ethernet collizions


  Rule for each network interface

Level

Action

 (0<netstat-i30.output.packets<10)&&(100*netstat-i30.

 output.colls/netstat-i30.output.packets<0.5%)&&(other

 nets white or green)

White

No Problem

 (0<netstat-i30.output.packets<10)&&(100*netstat-i30.

 output.colls/netstat-i30.output.packets<0.5%)&&(other

 nets amber or red)

Blue

Inactive Net

 (10<=netstat-i30.output.packets)&&(0.5%<=100*netstat-

 i30.output.colls/netsat-i30.output.packets<2.0%)

Green

No Problem

 (10<=netstat-i30.output.packets)&&(2.0%<=100*netstat-

 i30.output.colls/netstat-i30.output.packets<5.0%)

Amber

Busy Net

 (10<=netstat-i30.output.packets)&&(5.0%<=100*

 netstat-i30.output.colls/netstat-i30.output.packets)

Red

Busy Net

 network type is not "ie","le","ne",or "qe", it is "bf" or "nf".

Green

Not Ether

 

Inactive Net

An inactive network is a waste of throughput when other networks are overloaded. Rebalance the load so that all networks are used more evenly.

Busy Net

A network with too many collisions reduces throughput and increases response time for users. Move some of the load to inactive networks if there are any. Add more thernets or upgrade to a faster interface type like FDDI, 100MBit ethernet or ATM.

Not Ether

If the last letter of the interface name is not "e" then this not an ethernet so the collision based network performance rule should not be used.

 

CPU Rules


 

Notation used in tables


Rules

측정결과를 나타내기 위해 명령어 이름과 "."와 변수 이름을 조합하여 표시하였다. 예를 들어 "iostat -x" 명령을 사용하여 디스크 서비스 타임을 30초 간격으로 측정하였다면 이름을 "iostat-x30.svc_t"와 같이 표시하였다.

변수들간의 조합은 논리연산자 "&&", "||", "=="를 사용하였고 간결하게 하기 위해 범위는 "0 <= X < 100" 와 같이 표기하였다.

Levels

테이블의 level은 상태의 심각함 정도를 나타내며 아래표와 같다.

 

Level

 Discription

white

 low usage

blue

 under-utilization/imbalance of resource

green

 target utilization/no problem

amber

 warning level

red

 critical level that needs to be fixed

black

 problems that can pervent your system

 

 

Actions

각 테이블의 rules에서 취해져야할 조치를 표기했다. 문제에 대한 간단한 메모와 관련사항들을 나타내고 있다.

 

 

Rules for SunOS4  and Solaris2


 

CPU Rule

Level

Action

 0 == vmstat30.r

White

CPU Idle

 0 < (vmstat30.r / ncpus) < 3.0

Green

No problem

 3.0 <= (vmstat30.r / ncpus) <= 5.0

Amber

CPU Busy

 5.0 <= (vmstat30.r /ncpus)

Red

CPU Busy

 mpstat30.smtx < 200

Green

No problem

 200 <= mpstat30.smtx < 400

Amber

Mutex Stall

 400 <= mpstat30.smtx

Red

Mutex Stall

 

CPU Idle

The CPU power of this system is underutilized. Fewer or less powerful CPUs could be used to do this job.

CPU Busy

There is insufficient CPU power and jobs are spending an increasing amount of time in the queue before being assigned to a CPU. This reduces throughput and increases interactive response times.

Mutex Stall

If the number of stalls per CPU per second exceeds the limit there is mutex contention happening in the kernel which wastes CPU time and degrades multiprocessor scaling.

>> Tunable Kernel Parameters

 

커널 변수는 운영체제 릴리즈에 굉장히 의존적이다. 그래서 어느 한 릴리즈에서 잘 작동하는 것이 다른 릴리즈에서는 제대로 작동하지 않을 수도 있다.

아래에 보이는 표는 솔라리스 2.3 2.4 에서 권장하는 변수값이다.

 

Name

Default

Min

Max

maxusers

MB available

(physmem)

8

2048

pt_cnt

48

48

3000

ncsize

~(maxusers*17) + 90

226

34906

ufs_ninode

~(maxusers*17) + 90

226

34906

autoup

30 sec

If (fsflush > 5% of CPU time)

=> double autoup

=> tune_t_fsflushr += 5

tune_t_fsflushr

5 sec

 

주의 : "autoup" 변수값을 120초 이상으로 설정하지 마라.

 

 

maxusers : kernel-tunable variable


 

대개의 경우 maxusssers 변수값은 시스템 메모리의 메가바이트 수로 설정될 것이다. /etc/system 파일을 사용하여 최대 2048까지 설정할 수 있지만 시스템은 maxusers 값을 결코 1024 보다 크게 설정하지는 않을 것이다.

 아래표는 SunOS 5.4 시스템에서 maxusers 변수값에 영향을 받는 커널변수들이다.

 

 max_npprocs = ( 10 + 16 * maxusers )

 ufs_ninode = ( max_nprocs + 16 + maxusers ) + 64

 ndquot = ( ( maxusers * NMOUNT ) / 4 ) + max_nprocs

 maxuprc = ( max_nprocs - 5 )

 ncsize = ( max_nprocs + 16 + maxusers ) + 64

 

>> Tuning Tips

 

▶ 한번에 한가지씩 점검하라.

▶ 영향을 가장 크게 미치는 요소에 최대한 시간을 투자하라.

▶ 다음 모두를 함께 고려하라.

  • Disk Access
  • CPU Access
  • Main Memory Access
  • Network I/O devices

▶ 튜닝은 한 쪽으로 치우친 것을 고르게 분배시키는 것이라는 것을 염두에 두라.

Disk bottleneck을 검토해서 busy 30% 이상이고 서비스 시간이 50ms 이상이면 데이터를 다른 곳으로 분산시키거나 DiskSuite 같은 툴로서 스트라이핑(striping)하라.

▶ 디스크가 문제없다는 말을 믿지 마라. "iostat -x 30" 명령을 사용하여 주의깊게 살펴봐라.

▶ 튜닝작업후 시스템의 성능을 향상시켰다면 디스크 busy를 다시 점검하라.

NFS 클라이언트는 I/O wait가 아닌 idle로서 서버를 기다린다. 속도가 낮은 NFS 클라이언트에서 "nfsstat -m" 명령을 사용하여 네트웍의 문제인지 NFS 서버의 문제인지를 점검하라.

vmstat 명령 실행시 free RAM의 값이 높음에 신경을 써지 마라. 전체의 불과 1/6이 머무른다.

vmstat 명령 실행시 page-in page-out의 값이 높음에 신경을 써지 마라. 모든 파일시스템의 I/O 작업은 page-in page-out을 통하여 행해진다.

▶ 실행중인 queue length 또는 load average CPU수의 4배가 넘으면 CPU가 더 필요할 것이다.

vmstat 명령 실행시 procs r 값 만큼의 procs b 값이 많으면 디스크가 느리지 않은지 점검하라

 

 

사업자 정보 표시
(주)블루원 | 김홍태 | 서울특별시 용산구 원효로 4가 135 금홍 2빌딩 | 사업자 등록번호 : 106-86-76684 | TEL : 02-3272-7200 | Mail : support_ora@blueone.co.kr | 통신판매신고번호 : 호 | 사이버몰의 이용약관 바로가기

'SuperCluster,EXADATA,ODA' 카테고리의 다른 글

Clouding Computing 필요성  (0) 2015.12.29
OVM (LDOM) 이란?  (0) 2015.12.29
TCP 파라미터 정리  (0) 2015.12.29
dsk, rdsk 차의  (0) 2015.12.29
CORE 파일 분석 방법  (0) 2015.12.29

+ Recent posts