講解DB2數(shù)據(jù)庫性能調(diào)整的十個(gè)實(shí)用技巧
1. SQL COST ANALYSIS
許多情況下,一個(gè)簡(jiǎn)單的SQL就可能讓DB2系統(tǒng)處于尷尬的狀態(tài)。調(diào)整參數(shù)也不能解決此問題。由于DBA很難去改變這些垃圾SQL的現(xiàn)狀,所以留給DBA的就是下面的情況:
(1). Change or add indexes
(2). Change clustering
(3). Change catalog statistics.
注:一個(gè)SQL語句的cost= 每次執(zhí)行的資源代價(jià)*執(zhí)行的次數(shù)。
目前,DBA面臨的挑戰(zhàn)就是要找到那些有很高cost的語句,并且盡力去減少它的代價(jià)。可以借助DB2 Explain 工具或者DB2 UDB SQL Event Monitor數(shù)據(jù)來分析SQL語句的代價(jià)。尤其是對(duì)SQL Event Monitor的數(shù)據(jù)分析,但這么做需要耗費(fèi)很大的精力和時(shí)間。
一般DBA的流程是:
(1). Create an SQL Event Monitor, write to file:
$> db2 'create event monitor SQLCOST for statements write to ...'
(2). Activate the event monitor (be sure ample free disk space is available):
$> db2 'set event monitor SQLCOST state = 1'
(3). Let the application run.
(4). Deactivate the event monitor:
$> db2 'set event monitor SQLCOST state = 0'
(5). Use the DB2-supplied db2evmon tool to format the raw SQL Event Monitor data (hundreds of megabytes of free disk space may be required depending on SQL throughput rates):
$> db2evmon -db DBNAME -evm SQLCOST
> sqltrace.txt
(6). Browse through the formatted file scanning for unusually large cost numbers, a time-consuming process:
$> more sqltrace.txt
(7). Undertake a more complete analysis of the formatted file that attempts to identify unique statements (independent of literal values), each unique statement's frequency (how many times it occurred), and the aggregate of its total CPU, sort, and other resource costs. Such a thorough analysis could take a week or more on just a 30-minute sample of application SQL activity.
為了以最快的速度找到相應(yīng)的SQL,我們可以考慮上文講過的一些方法:
針對(duì)第4個(gè)tip:計(jì)算每個(gè)交易從一個(gè)table里面取出的行數(shù)。如果數(shù)值很高,就可以找到相應(yīng)的語句。
針對(duì)第3個(gè)tip:計(jì)算每個(gè)tablespace的asynchronous read percentage and physical I/O read rates.如果一個(gè)tablespace有很高的asynchronous read percentage 和高于平均的physical I/O read rates,那么有可能這個(gè)tablesapce里面有table scan情況。從catalog中可以找尋tablespace中相應(yīng)的table(如果一個(gè)tablespace上只有一個(gè)表,那么很容易定位了),然后從SQL Event Monitor 中尋找相關(guān)的table。這樣也可以縮小范圍。
觀察DB2 Explain信息,尋找可疑的地方。有時(shí)候,經(jīng)常執(zhí)行的、而且是代價(jià)比較低的語句也會(huì)瘋狂占用系統(tǒng)資源!
很多時(shí)候,我們可以充分借助工具!這樣能省時(shí)省力。
Staying in Tune
需要特別注意的是,性能優(yōu)化不能僅僅只是消除那些好的SQL語句,也要保證合理的物理構(gòu)架,確保高性能的結(jié)果、內(nèi)存分配在pool和heap中,I/O都在DISk之間平衡分布。
2. BUFFER POOL OPTIMIZATION
目前一般的系統(tǒng)內(nèi)存都可以達(dá)到2G,4G,8G了,但是DB2缺省的IBMDEFAULTBP只有16M。在此情況下,一般可以建立一個(gè)buffer pool 給SYSCATSPACE catalog tablespace, 一個(gè)buffer pool給 TEMPSPACE tablespace, 至少兩個(gè)BP_RAND and BP_SEQ. 隨機(jī)存取的Tablespaces 應(yīng)該有一個(gè)buffer pool來應(yīng)付隨機(jī)的objectives,這就是 BP_RAND. 順序存取的Tablespaces (with asynchronous prefetch I/O) 應(yīng)該建立一個(gè)buffer pool給sequential objectives, BP_SEQ. 也可以建立其它的buffer pools,這要根據(jù)應(yīng)用來說。比如可以建立一個(gè)足夠大的buffer pool 來存放熱點(diǎn)經(jīng)常存取的數(shù)據(jù)。有時(shí)候需要為大的table建立單一的buffer pool.
太小的buffer pool會(huì)導(dǎo)致大量的、不必要的物理I/O。太大的biffer pool有可能會(huì)產(chǎn)生系統(tǒng)paging,增加不必要的CPU管理內(nèi)存開銷。
buffer pool的大與小是相對(duì)的,一個(gè)系統(tǒng)的buffer pool大小應(yīng)該'合適的'!當(dāng)達(dá)到diminishing return達(dá)到時(shí),就是合適的。如果不是使用自動(dòng)工具,應(yīng)該有條理的測(cè)試buffer pool性能,比如命中率,I/O次數(shù),物理I/O讀的比率,直到達(dá)到合適狀態(tài)。當(dāng)然,應(yīng)用是變化的,所以最優(yōu)狀態(tài)不是不邊的,也是要定期的評(píng)估。
3. TABLESPACE ANALYSIS
tablespace snapshot對(duì)理解哪些數(shù)據(jù)被訪問和怎么訪問的有很大的價(jià)值。
db2 'get snapshot for tablespaces on DBNAME'
對(duì)每一個(gè)tablespace,要注意:
What is the average read time (ms)?
What is the average write time (ms)?
What percentage of the physical I/O is asynchronous (prefetched) vs. synchronous (random)?
What are the buffer pool hit ratios for each tablespace?
How many physical pages are being read each minute?
How many physical and logical pages are being read for each transaction?
對(duì)所有的tablespaces,注意:
Which tablespaces have the slowest read and write times? Why?
Containers on slow disks? Are container sizes unequal?
Are the access attributes, asynchronous versus synchronous access, consistent with expectations?
Randomly read tables should have randomly read tablespaces, meaning high synchronous read percentages, usually higher buffer pool hit ratios, and lower physical I/O rates.
對(duì)每個(gè)tablespace,要注意Prefetch size是Extent size的倍數(shù)。如果必要,可以修改tablespace的prefetch size。
顯示tablespace信息:db2 'list tablespaces show detail'
顯示containers 信息:db2 'list tablespace containers for N show detail'
4. TABLE ACCESS
要查出來每次查詢讀出的row,
1) db2 'get snapshot for database on DBNAME'
看到多少交易發(fā)生,the sum of Commit statements attempted + Rollback statements attempted
2) db2 'get snapshot for tables on DBNAME'
區(qū)分出交易讀出的row。divide the number of rows read by the number of transactions (RowsPerTX).OLTP一般每次交易從一個(gè)table里面讀出20 row,如果發(fā)現(xiàn)一個(gè)交易能讀出成百上千行數(shù)據(jù),表掃描就可能出現(xiàn),可能需要看看index是否需要。簡(jiǎn)單情況下是運(yùn)行runstats收集信息。
Sample output from 'get snapshot for tables on DBNAME' follows:
Snapshot timestamp = 09-25-2000 4:47:09.970811
Database name= DGIDB
Database path= /fs/inst1/inst1/NODE0000/SQL00001/
Input database alias= DGIDB
Number of accessed tables= 8
Table List
Table Schema= INST1
Table Name= DGI_SALES_ LOGS_TB
Table Type= User
Rows Written= 0
Rows Read= 98857
Overflows= 0
Page Reorgs= 0
有很高的Overflows ,就需要re-org table。當(dāng)一行寬度改變,可能DB2就會(huì)把一行放到不同的頁中。
5. SORT MEMORY
OLTP應(yīng)該沒有大規(guī)模的sort,因?yàn)閟ort會(huì)消耗大量的CPU, I/O和時(shí)間。
缺省的SORTHEAP = 256*4K=1M,一般是足夠了。應(yīng)該知道sort overflows 的數(shù)目和每個(gè)交易的sort number。
Db2 'get snapshot for database on DBNAME'
察看如下項(xiàng)目:
Total sort heap allocated= 0
Total sorts = 1
Total sort time (ms)= 8
Sort overflows = 0
Active sorts = 0
Commit statements attempted = 3
Rollback statements attempted = 0
Let transactions = Commit statements attempted + Rollback statements
attempted
Let SortsPerTX= Total sorts / transactions
Let PercentSortOverflows = Sort overflows * 100 / Total sorts
如果PercentSortOverflows 超過3%,可能說明應(yīng)用中有比較嚴(yán)重的sort SQL。因?yàn)榇罅康膐verflows說明有大量的sort出現(xiàn),為零或者小于1時(shí)比較理想的。
如果有大量的overflow出現(xiàn),權(quán)宜之計(jì)是增加SORTHEAP,但是這么做只是隱藏了問題。根本解決是:要定位SQL,通過調(diào)整SQL,INDEX,clustering 來減少sort 代價(jià)。
如果SortsPerTX 大于5,說明每個(gè)交易的sort數(shù)目過多,某些應(yīng)用可能執(zhí)行了大量的小復(fù)合查詢,不會(huì)overflow,但是有很小的時(shí)間段。但是會(huì)消耗大量的CPU。同樣是要調(diào)整SQL,INDEX,clustering來解決問題。
6. Temporary Tablespaces
臨時(shí)表空間一般要有3個(gè)containers在不同的disk上,可以實(shí)現(xiàn)并行I/O,提高sorts, hash joins,或者其他在TEMPSPACE上的動(dòng)作的性能。
db2 'list tablespaces show detail',可查看臨時(shí)表空間的container:
Tablespace ID= 1
Name= TEMPSPACE1
Type= System managed space
Contents= Temporary data
State= 0x0000
Detailed explanation: Normal
Total pages= 1
Useable pages= 1
Used pages= 1
Free pages= Not applicable
High water mark (pages)= Not applicable
Page size (bytes)= 4096
Extent size (pages)= 32
Prefetch size (pages)= 96
Number of containers= 3
這里表示有3個(gè)container,Prefetch size是Extent size的3倍。為了最好的并行性能,最好Prefetch size是Extent size的倍數(shù)。一般倍數(shù)是container的數(shù)目。
db2 'list tablespace containers for 1 show detail'
可以看到containers的定義。
7. Locks
缺省的LOCKTIMEOUT=-1,就是說不設(shè)置lock的timeout,在OLTP中這可能是一個(gè)災(zāi)難。我們要設(shè)置比較小的數(shù)值,比如設(shè)置LOCKTIMEOUT=10或者15秒。
查看命令:
db2 'get db cfg for DBNAME',
繼續(xù)查看下面的信息:
Lock timeout (sec) (LOCKTIMEOUT) = -1
要和應(yīng)用人員將明白,他們是否已經(jīng)在程序中可以處理timeout的情況。然后設(shè)置:
db2 'update db cfg for DBNAME using LOCKTIMEOUT 15'
可以在系統(tǒng)中察看lock wait的數(shù)目,lock wait time, lock list 使用的內(nèi)存量。
db2 'get snapshot for database on DBNAME'
查看:
Locks held currently= 0
Lock waits= 0
Time database waited on locks (ms)= 0
Lock list memory in use (Bytes)= 576
Deadlocks detected= 0
Lock escalations= 0
Exclusive lock escalations= 0
Agents currently waiting on locks= 0
Lock Timeouts= 0
假如lock list的內(nèi)存量(bytes)超過LOCKLIST 的50%,那么需要增加LOCKLIST的量,LOCKLIST是按4k計(jì)算。
8. Maximum Open Files
最大的打開文件數(shù)目
DB2限制同時(shí)打開的文件數(shù)目,數(shù)據(jù)庫參數(shù)'MAXFILOP'限定了并發(fā)打開的文件數(shù)目。如達(dá)到這個(gè)數(shù)目,DB2就會(huì)開始關(guān)閉和打開Tablespace文件,包括raw device,這樣會(huì)降低SQL反映時(shí)間和占用CPU。
使用命令來查看是否有文件關(guān)閉情況:
db2 'get snapshot for database on DBNAME'
看看其中的 'Database files closed = 0'
如果值不是零,就需要修改MAXFILOP,
db2 'update db cfg for DBNAME using MAXFILOP N'
9. Agents
需要保證有足夠的agent應(yīng)付系統(tǒng)負(fù)載。
命令:db2 'get snapshot for database manager'
此時(shí)需要觀察“Agents waiting for a token” 或者“ Agents stolen from another application”,假如有值,就需要增加DB manager的agent值,也就是修改MAXAGENTS 和/或者 MAX_COORDAGENTS的值。
High water mark for agents registered = 7
High water mark for agents waiting for a token = 0
Agents registered= 7
Agents waiting for a token= 0
Idle agents= 5
Agents assigned from pool= 158
Agents created from empty Pool = 7
Agents stolen from another application= 0
High water mark for coordinating agents= 7
Max agents overflow= 0
10. Monitor Switches
打開Monitor Switch后才可以獲得性能方面的信息,詳細(xì)命令如下:
db2 'update monitor switches using lock ON sort ON bufferpool ON uow ON table ON statement ON'
