'튜닝' 태그의 글 목록

튜닝

이상적인 시스템(Ideal System)의 요건 2015.12.29

이상적인 시스템(Ideal System)의 요건

알 수 없는 사용자 2015. 12. 29. 22:10

2015. 12. 29. 22:10

>> High but not full CPU utilization (70~90%)

>> CPU Time spent in user application (85+% in user)

>> Low Disk utilization (5~15% for each disk)

>> Low Network utilization (10~30% per network, less 5% of collision)

2. 튜닝의 일반적인 모델

Agree level of Performance to reach

Gather Data using monitoring tool(vmstat, iostat, netstat,...)

Analyze Data

Work from the biggest bottleneck first

주의: 시스템의 성능(Performance)은 시스템의 리소스를 어떻게 사용하느냐에 좌우된다.

3. 튜닝의 일반적인 순서

1st. Application Tuning

2nd. DataBase Tuning

3rd. OS Tuning(System Tuning)

4. 실행가능한 몇가지 모니터링 툴

- vmstat : command to view status of memory and CPU

- ps : command to find which process are hogs

- swap : command for available swap space

- iostat : command for terminal, disk, cpu utilization

- netstat, nfsstat : command for network performance

- sar : command to view system activity (need SUNWaccr, SUNWaccu packages)

- mpstat : command to view status of multi-cpu

- truss : command to trace system calls what is going on

- cachefs : mechanism to speed up read-mostly NFS

- PrestoServe : application for synchronous writes (many small files, Mail server)

- DiskPak of Eagle company : application for fragmented disks

5. 힌트 : 데이터베이스 관리시스템 튜닝

Configure disk for speed, Not capacity
I/O load needs many random access disks
3*1.05GB is over twice as fast as 1*2.9GB
Use raw disk for tablespaces to reduce CPU load
Save inode and indirect block updates
Use dd | compress into filesystem for snap backup then ufsdump normally
Use UFS for tablespaces to reduce I/O load
Extra level of caching needs more RAM
With UFS, use PrestoServe/NVSIMM or Logging option
Use large shared memory area (up to 25% of RAM)
If uo value is upper 25%, must expand shared memory

6. 시스템 성능을 좌우하는 요소들

CPU : number of CPUs
I/O Devices : disk, printer, terminal, transfer information
Memory : primary memory(RAM), secondary memory(on disk)
Kernel : kernel parameters (/etc/system)
Network

주의 : 튜닝시 반드시 시스템과 네트웍을 같이 고려해야 한다.

7. 주의깊게 봐야할 변수들

항 목	변 수 사 항
Source Code	Alogorithm, Language, Programming Model, Compiler
Executable	Environment, Filesytem Type
DataBase	Buffer Sizes, Indexing
Kernel	Buffer Sizes, Paging, Tuning, Configuring
Memory	Cache Type, Line Size and Miss Cost
Disk	Driver Algorithms, Disk Type, Load Balance
Windows & Graphics	Window System, Graphics Library, Bus Throughput
CPU	Processor Implementation
Multiprocessors	Load Balancing, Concurrency, Bus Throughput
Network	Protocol, Hardware, Usage Pattern

8. 첫 번째 시스템 튜닝 10 단계

1st. The system will usually have a disk bottleneck.

2nd. You will be told that the system is NOT I/O bound.

3rd. After first pass tuning the system will still have a disk bottleneck.

4th. Poor NFS response times are hard to pin down.

5th. Avoid the common memory usage misconceptions.

6th. Don't panic when you seee page-ins and page-outs in vmstat.

7th. Look for page scanner activity.

8th. Look for a long run queue (vmstat procs r).

9th. Look for processes blocked waiting for I/O (vmstat proc b).

10th. Look for CPU system time dominating user time.

9. 첫 번째 튜닝 접근

1st. Clear up any RAM shortage.

=> If at first the monitoring indicates paging, Add more RAM.

2nd. Make sure that processor speed, or number of process.

=> Clear out unnecessary processes.

=> Make sure that the run queue is as small as possible.

3rd. Focus I/O subsystems (disk, networks)

4th. Use "iostat -x" to monitor the disk.

=> Check busy(%b) and service time(svc_t)

5th. Continue to cycle around the tuning path until all subsytems, and indeed the machine itself is fast enough to reach the required perfomance metric.

Analysis using tool & How to read them

CPU Performance

ps command

How to use ps

#> ps

TIME : the total amount of CPU time used by the process since it began

#> ps -efl

SZ : shows the amount of virtual memory required by the process

Example of ps

#> ps
PID TTY       TIME COMD
346 pts/2     0:01 ksh
1029 pts/2     0:00 ps
1199 pts/2     0:01 ksh
#> ps -efl
UID PID PPID C PRI NI     ADDR    SZ    WCHAN    STIME TTY TIME COMD
root    0    0 80   0 SY f01706f0     0          19:47:44 ?   0:01 sched
root    1    0 80 99 20 fc18f800   173 fc18f9c8 19:47:48 ?   0:36 /etc/init -

vmstat command

How to use vmstat

#> vmstat 5

procs

r b w

r : In the run queue, waiting for processing time

b : Blocked, waiting for resources

cpu

cs us sy id

us : percentage of CPU time spent in USER mode

sy : percentage of CPU time spent in SYSTEM mode

id : percentage of CPU time spent idle

How to read vmstat data

▶ If CPU spends most of its time in USER mode, one or more processes may be monopolizing the CPU.

=> Check "ps -ef"

▶ A low value "id" indicates a MEMORY starved or I/O bound system.

=> RECOMMENDATION : idle time "id" should be grater than 20% most of time

=> In GENERAL guide

id < 15% : have to wait before being put into execution

us > 70% : the application load may be NEED some BALANCING

sy = 30% : is a good water mark

▶ Compute Intensive program.

=> One Compute Intensive program can push utilization rate to 100%.

▶ If the CPU is mainly in system mode, then it is probably I/O bound.

Example of vmstat

#> vmstat
procs memory page disk faults cpu
r b w swap free re mf pi po fr de sr f0 s0 s1 s2 in sy cs us sy id
0 0 0 72836 7688 0 1 5 1 4 0 2 0 0 0 1 16 37 30 1 1 98

주의 : sr 값이 계속 높으면 메모리부족을 의심해 봐야 한다.

CPU Solutions

If your CPU is often busy, or

If it often deals with jobs than monopolize the system,

=> Be lower the priority of other processes (nice command).

=> Check for runaway processes, or other processes monopolizing the CPU or MEMORY.

check "ps -ef" (TIME field)

check "ps -efl" (SZ field)

=> Evaluate your system's memory usage.

주의 : 부족한 메모리는 과다한 스와핑(swapping)과 페이징(paging)의 원인이 된다.

MEMORY Performance

swap command

How to use swap

#> swap

blocks : 512 bytes block

free : 512 bytes block

#> swap -s

How much swap space is available

Example of ps

#> swap -l
swapfile dev swaplo blocks free
/dev/dsk/c0t3d0s1 32,25 8 156232 95200
#> swap -s
total: 33092k bytes allocated + 9104k reserved = 42196k used, 53172k available

vmstat -S command

How to use vmstat -S

#> vmstat -S 5

po : kbytes paged out

pi : kbytes paged in

si : number of pages swapped in per second

so : number of whole processes swapped out

How to read vmstat -S output

▶ po = 0 : no paging occuring

Note 1. check "proc r" field

r > 1 : indicative of processor speed

r > 2 - 4 : adding more CPU

Note 2. check "proc b" field

If this column has values, run "iostat" to tune the disk I/O.

▶ po > 0 : not sufficient RAM for the application

Consult the application vendor regarding the memory requirements

▶ pi rate is NOT important

주의 : 메모리와 디스크의 액세스 시간을 구별하여야 한다.

Example of vmstat -S

#> vmstat -S 5
procs memory page disk faults cpu
r b w swap free si so pi po fr de sr f0 s0 s1 s2 in sy cs us sy id
0 0 0 53036 2848 0 0 15 8 19 0 11 0 0 0 0 52 37 92 4 3 93
0 0 0 53140 6328 0 1 5 0 0 0 0 0 0 0 0 10 21 30 1 1 98
0 0 0 53140 6328 0 0 0 0 0 0 0 0 0 0 0 31 26 62 0 1 99

sar -g command

How to use sar -g

#> sar -g 5 20

pgout/s : The average number of page-out requests per second

<= A good indication of memory performance

pgfree/s : The average number of pages per second that were added to the free list

pgscan/s : The average number of pages that needed to be scanned in order to find more memory

How to read sar data

▶ consistently pgout = 0 : NO memory problem

▶ several interval pgout > 0 : System perfomance is suffering

▶ pgfree and pgscan should be small (less than 5)

Example of sar -g

#> sar -g 5 20

SunOS hostname 5.5.1 Generic_103640-12 sun4u 11/12/98

16:59:36 pgout/s ppgout/s pgfree/s pgscan/s %ufs_ipf
16:59:41    1.39     5.78    23.90    79.88     0.00
16:59:46    1.60   113.77    27.74   253.69     0.00
16:59:51    0.80     2.00    13.80    81.40     0.00
16:59:56    0.20     0.20     0.20     0.00     0.00

Memory Solutions

1st. Two Memory Problem

=> The system spend a lot of time paging and/or swapping.

=> Run out of SWAP space

2nd. Check SZ field from "ps -efl"

3rd. For 1st. problem;

=> Adding physical memory until no paging and swapping

DISK Performance

Factors for Disk Performance

Speed of disk

transfer rate
seek time
rotation latency

Load balance across the multiple disk
Access type

single user vs multi user
sequential vs random

Memory

disk buffers, used when transfering information to and from disk, are stored

df -k command

How to use df

#> df

capacity : How much of the file system's total capacity has been used

How to read df output

▶ %capacity = 100%

=> remove core and any s/w packages

=> add more disk space or move files to another partition

Example of df

#> df -k
Filesytem         kbytes   used avail capacity Mounted on
/dev/dsk/c0t3d0s0 21615 14909 4546      77% /
/dev/dsk/c0t3d0s6 240463 211348 5075      98% /usr
/proc                  0      0     0       0% /proc

iostat command

How to use iostat

#> iostat 5

disk

serv : average service time, in milliseconds

cpu

us : time spent in user mode

sy : time spent in system mode

wt : time spent waiting I/O

#> iostat -x 5

svc_t : service time

w% : percentage of time the queue is not empty

%b : percentage of time the disk busy

How to read iostat -x output

▶ %b

5% > %b : ignore

30% < %b : be concerned about anything

60% < %b : need fixing

▶ svc_t

if %b < 5%, svc_t : ignore

if %w has values, 10 - 50ms : OK

100 - 150 : Need fixing

iostat -D command

How to use iostat -D

#> iostat -D 5

util : percentage of disk utilization

We can find the load balance between disks.

Example of iostat

#> iostat 5
     tty          fd0          sd0          sd1          sd2         cpu
tin tout Kps tps serv Kps tps serv Kps tps serv Kps tps serv us sy wt id
  0 560   0   0    0   0   0   99   1   0   49 10   2   54 4 4 4 88
  0   16   0   0    0   0   0    0   0   0    0   0   0    0 0 0 0 99

#> iostat -x 5
                    extended disk statistics
disk r/s w/s Kr/s Kw/s wait actv svc_t %w %b
fd0 0.0 0.0 0.0 0.0 0.0 0.0   0.0 0 0
sd0 0.0 0.0 0.2 0.1 0.0 0.0 99.5 0 0

#> iostat -D
fd0 sd0 sd1 sd2
rps wps util rps wps util rps wps util rps wps util
0 0 0.0 0 0 0.2 0 0 0.4 1 0 3.1

sar -a -b -d command

How to use sar

#> sar -a 5 3

-a : Report use of file access system routines (report on file access)

iget/s

namei/s

dirbk/s

Average

#> sar -b 5 3

-b : Report buffer activity (report on disk buffers)

%rcache : Fraction of logical reads found in the system buffers

%wcache : Fraction of logical writes found in the system buffers

#> sar -d 5 3

-d : Report activity for each block device (report on disk transfers)

r+w/s : read + write per second

%busy : Percentage of time the device spend servicing a transfer

blk/s : Number of 512-byte blocks transferred to device, per second

How to read sar data

sar -a

▶ The large the values, the more time the kernel is spending to ACCESS user files

▶ This report is USEFUL for understanding "HOW disk-dependent a system is"

sar -b

▶ %rcache < 90% and %wcache < 65%

=> may be possible to improve performance by increasing the buffer space

sar -d

▶ %busy > 85% : high utilization, load problem

▶ r+w/s > 65% : overload

Example of sar output

#> sar -a 5 3
16:59:36 iget/s namei/s dirbk/s
16:59:41      0       0       0
16:59:46      8      26      15
16:59:51    271     297     288

Average      93     108     101
#> sar -b 5 3
16:59:36 bread/s lread/s %rcache bwrit/s lwrit/s %wcache pread/s pwrit/s
16:59:41      19      57      67      22      26      17       0       0
16:59:46      23      73      69      18      23      20       0       0
16:59:51      25      51      51      15      27      46       0       0

Average       22      60      63      18      25      28       0       0
#> sar -d 5 3
13:21:07 device %busy avque r+w/s blks/s avwait avserv
13:21:12 sd1       52   0.8    30    216    4.8   21.1
         sd3        6   0.1     1     14    0.0   45.4
average
         sd1       32   0.5    17    187    3.5   23.9
         sd2       45   0.4    24    208    0.0   18.9
         sd3        4   0.0     1      9    0.0   51.1

DISK Solutions

1st. Check the file system overload (above 90 or 95% capacity)

=> Clean out unused files from /va/adm, /var/adm/sa and /var/lp/logs.

=> Clean out the core files.

=> Find the files what are unused more than 60 days.

#> find /home -type f pmtime +60 -print

#> find /home -name core -exec rm {}\;

2nd. If you have more than one disks,

=> Distribute the file systems for a more balanced load between the disks.

=> Try reducing disk seeking through careful planning in data positioning on disk.

(I/O on the outer sectors can be far faster)

3rd. Consider adding Memory.

=> Additional memory reduces swapping and paging, and allows an expanded buffer pool.

4st. Consider buying faster disks.

5st. Make sure disks are not overloading SCSI controller.

=> Below 60% utilization of SCSI bus.

6st. Consider adding disks.

=> Not be busier more than 40 - 60% of the time.

(%b : iostat -xct 5 and %busy : sar -d 5 3)

7st. Considering using an in-memory file system for /tmp directory.

ð It's default in Solaris 2.X.

NETWORK Performance

Overview of Network Performance

● Congested or Collision - resend

Network has the bankwidth, and so can only transmit a certain amount of data

● EtherNet

10Mbit/sec(14,400 packes/sec)

Size of 1 packet - 64 bytes (about 14,400 packets/sec)

Maximum packet size : 1518 bytes

Inter packet gab : 9.6 micro-second

30-40% utilization because of collision contention

● Latency

Not important as much as disk

Must consider that remote system has resources including disk

● NFS

UDP : Common protocol in use, being part of TCP/IP and allows fast network throughput with little overhead.

Logical packet size : 9Kbytes

On ethernet : 6 * 1518 bytes

After collision, have to resend ALL serveral ethernet packets

● Slower remote server

The remote server is CPU bound

● Network Monitoring Tool

nfsstat

netstat

snoop

ping

spray

ping command

How to use ping

#> ping

Send a packet to a host on the network.

-s : send one packet per second

How to read ping -s output

▶ 2 single SPARCstations on a quiet EtherNet always respond with less than 1 millisecond

Example of ping

#> ping -s host
PING host: 56 data bytes
64 bytes from host (1.1.1.1): icmp_seq=0. time7. ms
64 bytes from host (1.1.1.1): icmp_seq=0. time7. ms

----host PING Statistics----
5 packets transmitted, 5 packets received, 0% packet loss
round-trip (ms) min/avg/max = 1/2/7

spray command

How to use spray

#> spray

Send a one-way stream of packets to a host

Reports How may were received and the transfer RATE.

-c : count(number) of packets

-d : specifies the delay, in microseconds

Default: 9.6 microsecond

-l : specifies the Length(size) of the packet

How to read spray output

▶ If you use -d option, if there are many packet dropped the packet,

=> Check Hardware such as loose cables or missing termination

=> Check the possible a congested network

=> Use the netstat command to get more information

Example of spray

#> spray -d 20 -c 100 -l 2048 host

sending 100 packets of length 2048 to host ...
no packets dropped by host
560 packets/sec, 1147576 bytes/sec

netstat -i command

How to use netstat -i

#> netstat -i 5

errs : the number of errors

packets : the number of packets

colls : the number of collisions

* Collision percentage rate = colls/output packets * 100

How to read netstat data

▶ collision percentage > 5% (one system)

=> Checking the network interface and cabling

▶ collision percentage > 5% (all system)

=> The network is congested

▶ errs field has data

=> Suspect BAD hardware generating illegal sized packets

=> Check Repeated Network

Example of netstat

#> netstat -i 5
    input   le0      output         input (Total)   output
packets errs packets errs colls packets errs packets errs colls
71853   1    27270   8    4839 72526   1    27943   8    4839
7       0    0       0    0     7       0    0       0    0
14      0    0       0    0    14      0    0       0    0

snoop command

How to use snoop

#> snoop

Capture packets from the network

Display their contents

Example of snoop

#> snoop host1
#> snoop -o filename host1 host2
#> snoop -i filename -t r | more
#> snoop -i filename -p99,108
#> snoop -i filename -v -p101
#> snoop -i filename rpc nfs and host1 and host2

nfsstat -c command

How to use nfsstat -c

#> nfsstat -c

Display a summary of servers and client statistics

Can be used to IDENTIFY NFS problems

retrans : Number of remote procedure calls(RPCs) that were retransmitted

badxids : Number of times that a duplicat acknowledgement was received for a single NFS request

timeout : Number of calls that timed out

readlink : Number of reads to symbolic links

How to read nfsstat data

▶ % of retrans of calls > 5% : maybe network problem

=> Looking for network congestion

=> Looking for overloaded servers

=> Check ethernet interface

▶ high badxid, as well as timeout : remote server slow

=> Increase the time-out period

#> mount host:/home /home rw,soft,timeout=15 0 0

▶ % of readlink of the calls > 10% : too many symbolic links

Example of nfsstat

#> nfsstat -c

Client rpc:
calls   badcalls retrans badxid timeout wait newcred timers
13185   0        8       0      8       0    0       50

Client nfs:
calls   badcalls nclget nclcreate
13147   0        13147   0
null    getattr setattr root   lookup   readlink read
0 0%   794 6%  10   0% 0 0%  2141 16% 2720 21% 6283 48%
wrcache write    create remove rename   link     symlink
0 0%   581 4% 33   0% 29 0% 4 0%    0 0%    0 0%
mkdir   rmdir    readdir statf
0 0%   0  0%    539  4% 13 0%

Network Solutions

1st. Consider adding the Prestoserve NFS Write accelerator.

=> Write % from nfsstat -s > 15%, consider installing it.

2nd. Subneting

=> If your network is congested, consider subnetting.

=> That is collision rate > 5%, subnetting.

3rd. Install the bridge

=> If your network is congested and physical segmentation is NOT possible.

=> Isolate physical segments of a busy network.

4st. Install the local disk into diskless machines.

>> Bottlenecks

I/O Bottleneck

Detection

증 상	sar 필드
Uneven workload	r+w/s, avque
Many threads blocked waiting on I/O	%wio
High disk utilization rate	%busy
Active disk with no free space

Solutions

▶ Balance the disk load

▶ Use mmap instead of read and write

▶ Use shared libraries

▶ Put busier filesystems on smaller disks

▶ Organize I/O requests to be more contiguous

▶ Add more/faster disks

Memory Bottleneck

Detection

I/O Bottleneck 상	sar 필드
Steady page-out activity	ppgout/s
Scan rate is non-zero(page daemon is active)	pgscan/s
Swapper is active	swpot/s,swpq-sz
Free memory is at or below lotsfree	freemem,pgfree/s
Hardware cache misses

Solutions

▶ Modify process load

▶ Tune paging parameters

▶ Add more memory

▶ Use shared libraries

▶ Use memcntl to use memory more efficiently within application

▶ Analyze locality of reference in applications

▶ Set memory limits - setrlimit

CPU Bottleneck

증 상	sar 필드
CPU idle time is low	%usr,%sys,%idle
Threads waiting on run queue	runq-sz,%runocc
Slower response/interactive performance

Solutions

▶ Use priocntl / nice to modify process/thread priorities

▶ Modify dispatch parameter tables

▶ Modify applications to use system calls more efficiently

▶ System daemons

▶ Device interrupts

▶ Modify/limit process load

▶ Custom device drivers

▶ More, faster CPUs

>> Rules Table

Network Rules

Notation used in tables

▶ Rules

측정결과를 나타내기 위해 명령어 이름과 "."와 변수 이름을 조합하여 표시하였다. 예를 들어 "iostat -x" 명령을 사용하여 디스크 서비스 타임을 30초 간격으로 측정하였다면 이름을 "iostat-x30.svc_t"와 같이 표시하였다.

변수들간의 조합은 논리연산자 "&&", "||", "=="를 사용하였고 간결하게 하기 위해 범위는 "0 <= X < 100" 와 같이 표기하였다.

▶ Levels

테이블의 level은 상태의 심각함 정도를 나타내며 아래표와 같다.

Level	Discription
white	low usage
blue	under-utilization/imbalance of resource
green	target utilization/no problem
amber	warning level
red	critical level that needs to be fixed
black	problems that can pervent your system

▶ Actions

각 테이블의 rules에서 취해져야할 조치를 표기했다. 문제에 대한 간단한 메모와 관련사항들을 나타내고 있다.

Rules based upon ethernet collizions

Rule for each network interface	Level	Action
(0<netstat-i30.output.packets<10)&&(100*netstat-i30. output.colls/netstat-i30.output.packets<0.5%)&&(other nets white or green)	White	No Problem
(0<netstat-i30.output.packets<10)&&(100*netstat-i30. output.colls/netstat-i30.output.packets<0.5%)&&(other nets amber or red)	Blue	Inactive Net
(10<=netstat-i30.output.packets)&&(0.5%<=100*netstat- i30.output.colls/netsat-i30.output.packets<2.0%)	Green	No Problem
(10<=netstat-i30.output.packets)&&(2.0%<=100*netstat- i30.output.colls/netstat-i30.output.packets<5.0%)	Amber	Busy Net
(10<=netstat-i30.output.packets)&&(5.0%<=100* netstat-i30.output.colls/netstat-i30.output.packets)	Red	Busy Net
network type is not "ie","le","ne",or "qe", it is "bf" or "nf".	Green	Not Ether

☞ Inactive Net

An inactive network is a waste of throughput when other networks are overloaded. Rebalance the load so that all networks are used more evenly.

☞ Busy Net

A network with too many collisions reduces throughput and increases response time for users. Move some of the load to inactive networks if there are any. Add more thernets or upgrade to a faster interface type like FDDI, 100MBit ethernet or ATM.

☞ Not Ether

If the last letter of the interface name is not "e" then this not an ethernet so the collision based network performance rule should not be used.

Network Rules

Notation used in tables

▶ Rules

변수들간의 조합은 논리연산자 "&&", "||", "=="를 사용하였고 간결하게 하기 위해 범위는 "0 <= X < 100" 와 같이 표기하였다.

▶ Levels

테이블의 level은 상태의 심각함 정도를 나타내며 아래표와 같다.

Level	Discription
white	low usage
blue	under-utilization/imbalance of resource
green	target utilization/no problem
amber	warning level
red	critical level that needs to be fixed
black	problems that can pervent your system

▶ Actions

각 테이블의 rules에서 취해져야할 조치를 표기했다. 문제에 대한 간단한 메모와 관련사항들을 나타내고 있다.

Rules based upon ethernet collizions

Rule for each network interface	Level	Action
(0<netstat-i30.output.packets<10)&&(100*netstat-i30. output.colls/netstat-i30.output.packets<0.5%)&&(other nets white or green)	White	No Problem
(0<netstat-i30.output.packets<10)&&(100*netstat-i30. output.colls/netstat-i30.output.packets<0.5%)&&(other nets amber or red)	Blue	Inactive Net
(10<=netstat-i30.output.packets)&&(0.5%<=100*netstat- i30.output.colls/netsat-i30.output.packets<2.0%)	Green	No Problem
(10<=netstat-i30.output.packets)&&(2.0%<=100*netstat- i30.output.colls/netstat-i30.output.packets<5.0%)	Amber	Busy Net
(10<=netstat-i30.output.packets)&&(5.0%<=100* netstat-i30.output.colls/netstat-i30.output.packets)	Red	Busy Net
network type is not "ie","le","ne",or "qe", it is "bf" or "nf".	Green	Not Ether

☞ Inactive Net

An inactive network is a waste of throughput when other networks are overloaded. Rebalance the load so that all networks are used more evenly.

☞ Busy Net

☞ Not Ether

If the last letter of the interface name is not "e" then this not an ethernet so the collision based network performance rule should not be used.

CPU Rules

Notation used in tables

▶ Rules

변수들간의 조합은 논리연산자 "&&", "||", "=="를 사용하였고 간결하게 하기 위해 범위는 "0 <= X < 100" 와 같이 표기하였다.

▶ Levels

테이블의 level은 상태의 심각함 정도를 나타내며 아래표와 같다.

Level	Discription
white	low usage
blue	under-utilization/imbalance of resource
green	target utilization/no problem
amber	warning level
red	critical level that needs to be fixed
black	problems that can pervent your system

▶ Actions

각 테이블의 rules에서 취해져야할 조치를 표기했다. 문제에 대한 간단한 메모와 관련사항들을 나타내고 있다.

Rules for SunOS4 and Solaris2

CPU Rule	Level	Action
0 == vmstat30.r	White	CPU Idle
0 < (vmstat30.r / ncpus) < 3.0	Green	No problem
3.0 <= (vmstat30.r / ncpus) <= 5.0	Amber	CPU Busy
5.0 <= (vmstat30.r /ncpus)	Red	CPU Busy
mpstat30.smtx < 200	Green	No problem
200 <= mpstat30.smtx < 400	Amber	Mutex Stall
400 <= mpstat30.smtx	Red	Mutex Stall

☞ CPU Idle

The CPU power of this system is underutilized. Fewer or less powerful CPUs could be used to do this job.

☞ CPU Busy

There is insufficient CPU power and jobs are spending an increasing amount of time in the queue before being assigned to a CPU. This reduces throughput and increases interactive response times.

☞ Mutex Stall

If the number of stalls per CPU per second exceeds the limit there is mutex contention happening in the kernel which wastes CPU time and degrades multiprocessor scaling.

>> Tunable Kernel Parameters

커널 변수는 운영체제 릴리즈에 굉장히 의존적이다. 그래서 어느 한 릴리즈에서 잘 작동하는 것이 다른 릴리즈에서는 제대로 작동하지 않을 수도 있다.

아래에 보이는 표는 솔라리스 2.3과 2.4 에서 권장하는 변수값이다.

Name	Default	Min	Max
maxusers	MB available (physmem)	8	2048
pt_cnt	48	48	3000
ncsize	~(maxusers*17) + 90	226	34906
ufs_ninode	~(maxusers*17) + 90	226	34906
autoup	30 sec	If (fsflush > 5% of CPU time) => double autoup => tune_t_fsflushr += 5
tune_t_fsflushr	5 sec

주의 : "autoup" 변수값을 120초 이상으로 설정하지 마라.

maxusers : kernel-tunable variable

대개의 경우 maxusssers 변수값은 시스템 메모리의 메가바이트 수로 설정될 것이다. /etc/system 파일을 사용하여 최대 2048까지 설정할 수 있지만 시스템은 maxusers 값을 결코 1024 보다 크게 설정하지는 않을 것이다.

▶ 아래표는 SunOS 5.4 시스템에서 maxusers 변수값에 영향을 받는 커널변수들이다.

max_npprocs = ( 10 + 16 * maxusers )

ufs_ninode = ( max_nprocs + 16 + maxusers ) + 64

ndquot = ( ( maxusers * NMOUNT ) / 4 ) + max_nprocs

maxuprc = ( max_nprocs - 5 )

ncsize = ( max_nprocs + 16 + maxusers ) + 64

>> Tuning Tips

▶ 한번에 한가지씩 점검하라.

▶ 영향을 가장 크게 미치는 요소에 최대한 시간을 투자하라.

▶ 다음 모두를 함께 고려하라.

Disk Access
CPU Access
Main Memory Access
Network I/O devices

▶ 튜닝은 한 쪽으로 치우친 것을 고르게 분배시키는 것이라는 것을 염두에 두라.

▶ Disk bottleneck을 검토해서 busy가 30% 이상이고 서비스 시간이 50ms 이상이면 데이터를 다른 곳으로 분산시키거나 DiskSuite 같은 툴로서 스트라이핑(striping)하라.

▶ 디스크가 문제없다는 말을 믿지 마라. "iostat -x 30" 명령을 사용하여 주의깊게 살펴봐라.

▶ 튜닝작업후 시스템의 성능을 향상시켰다면 디스크 busy를 다시 점검하라.

▶ NFS 클라이언트는 I/O wait가 아닌 idle로서 서버를 기다린다. 속도가 낮은 NFS 클라이언트에서 "nfsstat -m" 명령을 사용하여 네트웍의 문제인지 NFS 서버의 문제인지를 점검하라.

▶ vmstat 명령 실행시 free RAM의 값이 높음에 신경을 써지 마라. 전체의 불과 1/6이 머무른다.

▶ vmstat 명령 실행시 page-in과 page-out의 값이 높음에 신경을 써지 마라. 모든 파일시스템의 I/O 작업은 page-in과 page-out을 통하여 행해진다.

▶ 실행중인 queue length 또는 load average가 CPU수의 4배가 넘으면 CPU가 더 필요할 것이다.

▶ vmstat 명령 실행시 procs r 값 만큼의 procs b 값이 많으면 디스크가 느리지 않은지 점검하라

사업자 정보 표시

'SuperCluster,EXADATA,ODA' 카테고리의 다른 글

Clouding Computing 필요성 (0)	2015.12.29
OVM (LDOM) 이란? (0)	2015.12.29
TCP 파라미터 정리 (0)	2015.12.29
dsk, rdsk 차의 (0)	2015.12.29
CORE 파일 분석 방법 (0)	2015.12.29

PREV 이전 1 2 3 4 ···6 NEXT 다음

Techdata VINA 지식 공유

튜닝

이상적인 시스템(Ideal System)의 요건

'SuperCluster,EXADATA,ODA' 카테고리의 다른 글

+ Recent posts

티스토리툴바