這裡接着前面的系列文章繼續進行學習探索(本文就是純技術研究,僅代表個人觀點,與公司無關)。達夢資料庫本質上是一個單程序多線程架構,跟MySQL類似。如下所示。
[[email protected] ~]# ps -ef|grep dmserver|grep -v grep
dmdba 9539 1 0 00:47 ? 00:00:02 /opt/dmdbms/bin/dmserver /opt/dm/dmdbms/data/enmotech/dm.ini -noconsole
[[email protected] ~]#
SQL> select pname,trace_name,type$,pid from v$process;
LINEID PNAME TRACE_NAME TYPE$ PID
---------- -------- ---------- ----------- -----------
1 dmserver 1 9539
[[email protected] enmotech]# pstree -p |grep dmserver
|-dmserver(9539)-+-{dmserver}(9902)
| |-{dmserver}(9956)
| |-{dmserver}(9957)
| |-{dmserver}(9958)
| |-{dmserver}(9959)
| |-{dmserver}(10051)
| |-{dmserver}(10052)
| |-{dmserver}(10151)
| |-{dmserver}(10152)
| |-{dmserver}(10153)
| |-{dmserver}(10154)
| |-{dmserver}(10187)
| |-{dmserver}(10190)
| |-{dmserver}(10192)
| |-{dmserver}(10206)
| |-{dmserver}(10207)
| |-{dmserver}(10208)
| |-{dmserver}(10209)
| |-{dmserver}(10210)
| |-{dmserver}(10211)
| |-{dmserver}(10212)
| |-{dmserver}(10213)
| |-{dmserver}(10219)
| |-{dmserver}(10220)
| `-{dmserver}(10787)
可以看到這裡的PID即是dmserver的作業系統程序号。跟Oracle 中的pid不同。 接下來我們看看dm8中有哪些線程:
SQL> select id,name,THREAD_DESC from v$threads;
LINEID ID NAME THREAD_DESC
---------- -------------------- -------------- -------------------------------------------------------------------------------------
1 9902 dm_quit_thd Thread for executing shutdown-normal operation
2 9956 dm_io_thd IO thread
3 9957 dm_io_thd IO thread
4 9958 dm_io_thd IO thread
5 9959 dm_io_thd IO thread
6 10051 dm_chkpnt_thd Flush checkpoint thread
7 10052 dm_redolog_thd Redo log thread, used to flush log
8 10153 dm_hio_thd IO thread for HFS to read data pages
9 10152 dm_hio_thd IO thread for HFS to read data pages
10 10151 dm_hio_thd IO thread for HFS to read data pages
11 10154 dm_hio_thd IO thread for HFS to read data pages
12 10187 dm_sqllog_thd Thread for writing dmsql dmserver
13 10190 dm_purge_thd Purge thread
14 10211 dm_wrkgrp_thd User working thread
15 10206 dm_tskwrk_thd Task Worker Thread for SQL parsing and execution for sevrer itself
16 10210 dm_trctsk_thd Thread for writing trace information
17 10209 dm_tskwrk_thd Task Worker Thread for SQL parsing and execution for sevrer itself
18 10208 dm_tskwrk_thd Task Worker Thread for SQL parsing and execution for sevrer itself
19 10207 dm_tskwrk_thd Task Worker Thread for SQL parsing and execution for sevrer itself
20 10212 dm_wrkgrp_thd User working thread
21 10213 dm_audit_thd Thread for flush audit logs
22 10219 dm_sched_thd Server scheduling thread,used to trigger background checkpoint, time-related triggers
23 10220 dm_lsnr_thd Service listener thread
24 10787 dm_sql_thd User session thread
SQL> select name,THREAD_DESC,count(1) from v$threads group by
2 name,THREAD_DESC order by 3;
LINEID NAME THREAD_DESC COUNT(1)
---------- -------------- ------------------------------------------------------------------------------------- --------------------
1 dm_sched_thd Server scheduling thread,used to trigger background checkpoint, time-related triggers 1
2 dm_chkpnt_thd Flush checkpoint thread 1
3 dm_lsnr_thd Service listener thread 1
4 dm_sql_thd User session thread 1
5 dm_sqllog_thd Thread for writing dmsql dmserver 1
6 dm_purge_thd Purge thread 1
7 dm_quit_thd Thread for executing shutdown-normal operation 1
8 dm_redolog_thd Redo log thread, used to flush log 1
9 dm_trctsk_thd Thread for writing trace information 1
10 dm_audit_thd Thread for flush audit logs 1
11 dm_wrkgrp_thd User working thread 2
12 dm_hio_thd IO thread for HFS to read data pages 4
13 dm_tskwrk_thd Task Worker Thread for SQL parsing and execution for sevrer itself 4
14 dm_io_thd IO thread 4
14 rows got
used time: 31.360(ms). Execute id is 15.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
SQL> select id,name,THREAD_DESC from v$threads;
LINEID ID NAME THREAD_DESC
---------- -------------------- -------------- -------------------------------------------------------------------------------------
1 9902 dm_quit_thd Thread for executing shutdown-normal operation
2 9956 dm_io_thd IO thread
3 9957 dm_io_thd IO thread
4 9958 dm_io_thd IO thread
5 9959 dm_io_thd IO thread
6 10051 dm_chkpnt_thd Flush checkpoint thread
7 10052 dm_redolog_thd Redo log thread, used to flush log
8 10153 dm_hio_thd IO thread for HFS to read data pages
9 10152 dm_hio_thd IO thread for HFS to read data pages
10 10151 dm_hio_thd IO thread for HFS to read data pages
11 10154 dm_hio_thd IO thread for HFS to read data pages
12 10187 dm_sqllog_thd Thread for writing dmsql dmserver
13 10190 dm_purge_thd Purge thread
14 10211 dm_wrkgrp_thd User working thread
15 10206 dm_tskwrk_thd Task Worker Thread for SQL parsing and execution for sevrer itself
16 10210 dm_trctsk_thd Thread for writing trace information
17 10209 dm_tskwrk_thd Task Worker Thread for SQL parsing and execution for sevrer itself
18 10208 dm_tskwrk_thd Task Worker Thread for SQL parsing and execution for sevrer itself
19 10207 dm_tskwrk_thd Task Worker Thread for SQL parsing and execution for sevrer itself
20 10212 dm_wrkgrp_thd User working thread
21 10213 dm_audit_thd Thread for flush audit logs
22 10219 dm_sched_thd Server scheduling thread,used to trigger background checkpoint, time-related triggers
23 10220 dm_lsnr_thd Service listener thread
24 10787 dm_sql_thd User session thread
SQL> select name,THREAD_DESC,count(1) from v$threads group by
2 name,THREAD_DESC order by 3;
LINEID NAME THREAD_DESC COUNT(1)
---------- -------------- ------------------------------------------------------------------------------------- --------------------
1 dm_sched_thd Server scheduling thread,used to trigger background checkpoint, time-related triggers 1
2 dm_chkpnt_thd Flush checkpoint thread 1
3 dm_lsnr_thd Service listener thread 1
4 dm_sql_thd User session thread 1
5 dm_sqllog_thd Thread for writing dmsql dmserver 1
6 dm_purge_thd Purge thread 1
7 dm_quit_thd Thread for executing shutdown-normal operation 1
8 dm_redolog_thd Redo log thread, used to flush log 1
9 dm_trctsk_thd Thread for writing trace information 1
10 dm_audit_thd Thread for flush audit logs 1
11 dm_wrkgrp_thd User working thread 2
12 dm_hio_thd IO thread for HFS to read data pages 4
13 dm_tskwrk_thd Task Worker Thread for SQL parsing and execution for sevrer itself 4
14 dm_io_thd IO thread 4
14 rows got
used time: 31.360(ms). Execute id is 15.
下面我們針對上述線程簡單進行描述和總結。
1、dm_lsnr_thd 監聽線程
類似Oracle的listener 程式,任務是在伺服器端口上進行循環監聽,一旦有來自客戶的連接配接請求,監 聽線程被喚醒并生成一個會話申請任務,加入工作線程的任務隊列,等待工作線程進行處理。
達夢資料庫中監聽線程的監聽端口範圍為1024 – 65534;預設監聽的端口為5236。通過分析參數檔案配置,可以看到與端口相關的一些參數:
[[email protected] enmotech]# cat dm.ini |grep PORT
PORT_NUM = 5236 #Port number on which the database server will listen
ELOG_REPORT_LINK_SQL = 0 #Whether to write the SQLs that sent to remote database by DBLINKs into error log file
DFS_HUGE_SUPPORT = 1 #Whether support huge table operation in DDFS
EXTERNAL_JFUN_PORT = 6363 #DmAgent port for external java fun.
EXTERNAL_AP_PORT = 4236 #DmAp port for external fun.
DCRS_PORT_NUM = 6236 #Port number on which the database dcrs will listen
AP_PORT_NUM = 0 #Port number on which the database ap will listen
DW_PORT = 0 #Instance tcp port for watch2
DCP_PORT_NUM = 5237 #Port number on which DCP will listen
HA_INST_CHECK_PORT = 65534 #HA instance check port
[[email protected] enmotech]#
根據達夢官方手冊的描述,為了保證在處理大量客戶連接配接 時系統具有較短的響應時間,監聽線程比普通線程優先級更高。從目前來看,DM8中隻能有一個監聽線程;在高并發場景,可能面臨請求處理不及時的情況(大家知道在Oracle中,如果單個監聽程序無法處理時,我們可以還有建立多個監聽程式)。
2、dm_tskwrk_thd 工作線程
這是達夢資料庫的核心線程;它從任務隊列中取出任務,并根據任務的類型進行相應的處理,負責所有實際的資料相關操作。
DM8 的初始工作線程個數由配置檔案指定,随着會話連接配接的增加,工作線程也會同步增加,以保持每個會話都有專門的工作線程處理請求。
看上去其預設值是cpu core的2倍。高并發場景下,該參數應該是需要調大的。
[[email protected] enmotech]# cat dm.ini |grep TASK
TASK_THREADS = 4 #Number Of Task Threads
3、dm_io_thd 即IO thread
該線程也是核心線程,負責髒資料寫出,類似Oracle中的DB writer程序。根據官方的手冊描述,有如下幾種情況下,需要IO線程來完成工作。
DM Server需要進行IO操作的時機主要有以下三種:
1)需要處理的資料頁不在緩沖區中,此時需要将相關資料頁讀入緩沖區;
2)緩沖區滿或系統關閉時,此時需要将部分髒資料頁寫入磁盤;
3)檢查點到來時,需要将所有髒資料頁寫入磁盤。
通過如下2個參數來進行IO線程數量的控制;
[[email protected] enmotech]# cat dm.ini |grep IO_THR
IO_THR_GROUPS = 2 #The Number Of Io Thread Groups(Non-Windows Only)
HIO_THR_GROUPS = 2 #The Number Of Huge Io Thread Groups(Non-Windows Only)
這裡簡單建立測試表,并觸發檢查點,來觀察一下IO thread的行為:
SQL> create table test0825 as select * from dba_objects;
executed successfully
used time: 68.693(ms). Execute id is 29.
SQL> select checkpoint(20);
LINEID CHECKPOINT(20)
---------- --------------
1 0
used time: 9.623(ms). Execute id is 30.
SQL>
我們可以通過perf top -p 9956 來觀察IO線程的函數調用情況:
70.49% [kernel] [k] finish_task_switch
4.13% [kernel] [k] __do_softirq
2.58% [kernel] [k] _raw_spin_unlock_irqrestore
2.26% libpthread-2.17.so [.] pthread_mutex_lock
1.30% dmserver [.] ini_get_value
1.17% dmserver [.] trx4_clear_ptab_version
1.17% libpthread-2.17.so [.] __pthread_enable_asynccancel
1.16% [vdso] [.] __vdso_clock_gettime
1.16% libc-2.17.so [.] __GI___libc_poll
1.00% [kernel] [k] system_call_after_swapgs
0.78% [kernel] [k] __audit_syscall_exit
0.77% [kernel] [k] timespec_add_safe
0.77% dmserver [.] rlog4_log_get
0.76% dmserver [.] buf4_pool_is_empty_or_recycle_or_fast
0.69% [kernel] [k] do_select
0.67% [vdso] [.] __vdso_time
0.67% dmserver [.] uevent_reset
0.67% libpthread-2.17.so [.] _L_unlock_738
0.65% [kernel] [k] dput
0.59% libc-2.17.so [.] __localtime_r
0.57% libpthread-2.17.so [.] pthread_mutex_unlock
0.56% [kernel] [k] static_key_enabled
0.51% [kernel] [k] unroll_tree_refs
0.45% [kernel] [k] mntput
0.45% libc-2.17.so [.] __clock_gettime
0.39% [kernel] [k] __virt_addr_valid
0.39% [kernel] [k] sysret_signal
0.39% dmserver [.] capt_enter
0.39% libpthread-2.17.so [.] 0x000000000000ee41
0.35% libpthread-2.17.so [.] pthread_cond_timedwait@@GLIBC_2.3.2
0.34% [kernel] [k] plist_del
0.34% dmserver [.] ini_get_double_value
0.26% [kernel] [k] path_put
0.23% dmserver [.] buf4_pool_force_flush_pages
0.20% [kernel] [k] __audit_syscall_entry
0.20% dmserver [.] sys_get_sys_mode_low
0.14% [kernel] [k] sys_poll
0.14% [vdso] [.] __vdso_gettimeofday
0.12% dmserver [.] os_thread_sleep_low
0.12% dmserver [.] purg2_thread
這裡我建立一個測試表,插入數十萬資料,觸發全量檢查點,對程序進行跟蹤一下看看具體的資料寫出行為:
SQL> select checkpoint(100);
LINEID CHECKPOINT(100)
---------- ---------------
1 0
[[email protected] yum.repos.d]# ps -ef|grep dms
dmdba 9539 1 0 00:47 ? 00:00:06 /opt/dmdbms/bin/dmserver /opt/dm/dmdbms/data/enmotech/dm.ini -noconsole
root 11437 11138 0 01:37 pts/2 00:00:00 grep --color=auto dms
[[email protected] yum.repos.d]# strace -fr -o /tmp/io_thread.log -p 9539
strace: Process 9539 attached with 26 threads
^Cstrace: Process 9539 detached
strace: Process 9902 detached
strace: Process 9956 detached
strace: Process 9957 detached
strace: Process 9958 detached
strace: Process 9959 detached
strace: Process 10051 detached
strace: Process 10052 detached
strace: Process 10151 detached
。。。。。
strace: Process 10213 detached
strace: Process 10219 detached
strace: Process 10220 detached
strace: Process 10787 detached
[[email protected] ~]# ls -ltr /proc/9539/fd
total 0
lrwx------. 1 dmdba dinstall 64 Aug 25 01:41 9 -> /opt/dm/dmdbms/data/enmotech/enmotech03.log
lrwx------. 1 dmdba dinstall 64 Aug 25 01:41 8 -> /opt/dm/dmdbms/data/enmotech/enmotech02.log
lrwx------. 1 dmdba dinstall 64 Aug 25 01:41 7 -> /opt/dm/dmdbms/data/enmotech/enmotech01.log
lrwx------. 1 dmdba dinstall 64 Aug 25 01:41 6 -> /opt/dm/dmdbms/data/enmotech/TEMP.DBF
lrwx------. 1 dmdba dinstall 64 Aug 25 01:41 5 -> /opt/dm/dmdbms/data/enmotech/SYSTEM.DBF
lrwx------. 1 dmdba dinstall 64 Aug 25 01:41 4 -> socket:[62747]
lrwx------. 1 dmdba dinstall 64 Aug 25 01:41 3 -> socket:[62746]
l-wx------. 1 dmdba dinstall 64 Aug 25 01:41 2 -> /opt/dmdbms/log/DmServicedmdb.log
lrwx------. 1 dmdba dinstall 64 Aug 25 01:41 17 -> socket:[73233]
l-wx------. 1 dmdba dinstall 64 Aug 25 01:41 16 -> pipe:[64988]
lr-x------. 1 dmdba dinstall 64 Aug 25 01:41 15 -> pipe:[64988]
l-wx------. 1 dmdba dinstall 64 Aug 25 01:41 14 -> pipe:[64987]
lr-x------. 1 dmdba dinstall 64 Aug 25 01:41 13 -> pipe:[64987]
lrwx------. 1 dmdba dinstall 64 Aug 25 01:41 12 -> /opt/dm/dmdbms/data/enmotech/MAIN.DBF
lrwx------. 1 dmdba dinstall 64 Aug 25 01:41 11 -> /opt/dm/dmdbms/data/enmotech/ROLL.DBF
lrwx------. 1 dmdba dinstall 64 Aug 25 01:41 10 -> /opt/dm/dmdbms/data/enmotech/enmotech01.dbf
l-wx------. 1 dmdba dinstall 64 Aug 25 01:41 1 -> /opt/dmdbms/log/DmServicedmdb.log
lr-x------. 1 dmdba dinstall 64 Aug 25 01:41 0 -> /dev/null
[[email protected] ~]# cat /tmp/io_thread.log |grep "9956" |grep pwrite64|more
9956 0.000016 pwrite64(11, "\1\0\0\0\320\234\0\0\377\377\377\377\377\377\377\377\377\377\377\377$\0\0\0\0\0\0\0\31Q\260\2"..., 8192, 328859648 <unfinished ...>
9956 0.000005 <... pwrite64 resumed> ) = 8192
9956 0.000030 pwrite64(12, "\4\0\0\0000\0\0\0\377\377\377\377\377\377\377\377\377\377\377\377\25\0\0\0\0\0\0\[email protected]\260\2"..., 8192, 393216 <unfinished ...>
9956 0.000012 <... pwrite64 resumed> ) = 8192
9956 0.000020 pwrite64(12, "\4\0\0\0001\0\0\0\377\377\377\377\377\377\377\377\377\377\377\377\32\32\32\32\0\0\0\0\206Q\260\2"..., 8192, 401408 <unfinished ...>
9956 0.000008 <... pwrite64 resumed> ) = 8192
9956 0.000017 pwrite64(12, "\4\0\0\0\0\0\0\0\377\377\377\377\377\377\377\377\377\377\377\377\23\0\0\0\0\0\0\0,R\260\2"..., 8192, 0 <unfinished ...>
9956 0.000007 <... pwrite64 resumed> ) = 8192
9956 0.000015 pwrite64(12, "\4\0\0\0q\0\0\0\0\0p\0\0\0\0\0r\0\0\0\24\0\0\0\0\0\0\0gQ\260\2"..., 8192, 925696 <unfinished ...>
9956 0.000008 <... pwrite64 resumed> ) = 8192
9956 0.000020 pwrite64(12, "\4\0\0\0\10\0\0\0\377\377\377\377\377\377\377\377\377\377\377\377\21\0\0\0\0\0\0\0,R\260\2"..., 8192, 65536 <unfinished ...>
9956 0.000008 <... pwrite64 resumed> ) = 8192
9956 0.000019 pwrite64(12, "\4\0\0\0r\0\0\0\0\0q\0\0\0\0\0s\0\0\0\24\0\0\0\0\0\0\0iQ\260\2"..., 8192, 933888 <unfinished ...>
9956 0.000008 <... pwrite64 resumed> ) = 8192
9956 0.000029 pwrite64(12, "\4\0\0\0\260\0\0\0\0\0\257\0\0\0\0\0\261\0\0\0\24\0\0\0\0\0\0\0\370Q\260\2"..., 8192, 1441792 <unfinished ...>
9956 0.000009 <... pwrite64 resumed> ) = 8192
9956 0.000029 pwrite64(12, "\4\0\0\0e\0\0\0\0\0d\0\0\0\0\0f\0\0\0\24\0\0\0\0\0\0\0KQ\260\2"..., 8192, 827392 <unfinished ...>
9956 0.000008 <... pwrite64 resumed> ) = 8192
9956 0.000105 pwrite64(12, "\4\0\0\0\242\0\0\0\0\0\241\0\0\0\0\0\243\0\0\0\24\0\0\0\0\0\0\0\330Q\260\2"..., 8192, 1327104) = 8192
9956 0.000019 pwrite64(12, "\4\0\0\0f\0\0\0\0\0e\0\0\0\0\0g\0\0\0\24\0\0\0\0\0\0\0MQ\260\2"..., 8192, 835584 <unfinished ...>
9956 0.000007 <... pwrite64 resumed> ) = 8192
9956 0.000103 pwrite64(12, "\4\0\0\0\243\0\0\0\0\0\242\0\0\0\0\0\244\0\0\0\24\0\0\0\0\0\0\0\332Q\260\2"..., 8192, 1335296) = 8192
9956 0.000019 pwrite64(12, "\4\0\0\0g\0\0\0\0\0f\0\0\0\0\0h\0\0\0\24\0\0\0\0\0\0\0OQ\260\2"..., 8192, 843776 <unfinished ...>
9956 0.000008 <... pwrite64 resumed> ) = 8192
9956 0.000210 pwrite64(12, "\4\0\0\0\244\0\0\0\0\0\243\0\0\0\0\0\245\0\0\0\24\0\0\0\0\0\0\0\334Q\260\2"..., 8192, 1343488) = 8192
9956 0.000022 pwrite64(12, "\4\0\0\0h\0\0\0\0\0g\0\0\0\0\0i\0\0\0\24\0\0\0\0\0\0\0QQ\260\2"..., 8192, 851968 <unfinished ...>
9956 0.000009 <... pwrite64 resumed> ) = 8192
9956 0.000119 pwrite64(12, "\4\0\0\0\245\0\0\0\0\0\244\0\0\0\0\0\246\0\0\0\24\0\0\0\0\0\0\0\336Q\260\2"..., 8192, 1351680) = 8192
9956 0.000013 pwrite64(12, "\4\0\0\0i\0\0\0\0\0h\0\0\0\0\0j\0\0\0\24\0\0\0\0\0\0\0SQ\260\2"..., 8192, 860160 <unfinished ...>
9956 0.000005 <... pwrite64 resumed> ) = 8192
9956 0.000070 pwrite64(12, "\4\0\0\0\246\0\0\0\0\0\245\0\0\0\0\0\247\0\0\0\24\0\0\0\0\0\0\0\340Q\260\2"..., 8192, 1359872) = 8192
9956 0.000019 pwrite64(12, "\4\0\0\0j\0\0\0\0\0i\0\0\0\0\0k\0\0\0\24\0\0\0\0\0\0\0UQ\260\2"..., 8192, 868352 <unfinished ...>
9956 0.000027 <... pwrite64 resumed> ) = 8192
9956 0.000110 pwrite64(12, "\4\0\0\0\247\0\0\0\0\0\246\0\0\0\0\0\250\0\0\0\24\0\0\0\0\0\0\0\345Q\260\2"..., 8192, 1368064) = 8192
9956 0.000019 pwrite64(12, "\4\0\0\0001\0\0\0\377\377\377\377\377\377\377\377\377\377\377\377\32\32\32\32\0\0\0\0lS\260\2"..., 8192, 401408 <unfinished ...>
9956 0.000009 <... pwrite64 resumed> ) = 8192
9956 0.000016 pwrite64(12, "\4\0\0\0y\0\0\0\0\0x\0\0\0\0\0z\0\0\0\24\0\0\0\0\0\0\0zQ\260\2"..., 8192, 991232 <unfinished ...>
9956 0.000007 <... pwrite64 resumed> ) = 8192
9956 0.000013 pwrite64(12, "\4\0\0\0\267\0\0\0\0\0\266\0\0\0\0\0\270\0\0\0\24\0\0\0\0\0\0\0\tR\260\2"..., 8192, 1499136 <unfinished ...>
9956 0.000005 <... pwrite64 resumed> ) = 8192
9956 0.000013 pwrite64(12, "\4\0\0\0\361\0\0\0\0\0\360\0\0\0\0\0\362\0\0\0\24\0\0\0\0\0\0\0\223R\260\2"..., 8192, 1974272 <unfinished ...>
。。。。。。
SQL> select name,value,DESCRIPTION from v$parameter where NAME like '%PAGE_SIZE%';
LINEID NAME VALUE DESCRIPTION
---------- --------------------- ---------- ---------------------
1 GLOBAL_PAGE_SIZE_MASK 4294959104 global_page_size_mask
2 GLOBAL_PAGE_SIZE 8192 global_page_size
3 GLOBAL_LOG_PAGE_SIZE 512 global_log_page_size
看上去這裡DM的IO線程是單page寫入(DM預設page size是8k);居然不會進行IO合并。這看起來不太對,否則性能太低。使用blktrace來深入分析看看IO情況。
[[email protected] tmp]# blktrace -d /dev/sda3
^C=== sda3 ===
CPU 0: 3116 events, 147 KiB data
CPU 1: 7807 events, 366 KiB data
Total: 10923 events (dropped 0), 513 KiB data
通過btt格式化後可以進一步進行觀察btt -i sda3.blktrace.0 -B sda3_io_thread.log
[[email protected] tmp]# blkparse -i sda3.blktrace.0 |grep dm_io_thd|more
8,3 0 156 8.082262362 9956 Q WS 55248800 + 1024 [dm_io_thd]
8,3 0 157 8.082264411 9956 G WS 55248800 + 1024 [dm_io_thd]
8,3 0 158 8.082265881 9956 I WS 55248800 + 1024 [dm_io_thd]
8,3 0 159 8.082266808 9956 D WS 55248800 + 1024 [dm_io_thd]
8,3 0 161 8.082361026 9956 Q WS 55249824 + 456 [dm_io_thd]
8,3 0 162 8.082362306 9956 G WS 55249824 + 456 [dm_io_thd]
8,3 0 163 8.082363357 9956 I WS 55249824 + 456 [dm_io_thd]
8,3 0 164 8.082364168 9956 D WS 55249824 + 456 [dm_io_thd]
8,3 0 168 8.082943188 9956 Q WSM 52753552 + 9 [dm_io_thd]
8,3 0 169 8.082944128 9956 G WSM 52753552 + 9 [dm_io_thd]
8,3 0 170 8.082944516 9956 P N [dm_io_thd]
8,3 0 171 8.082945063 9956 I WSM 52753552 + 9 [dm_io_thd]
8,3 0 172 8.082945461 9956 U N [dm_io_thd] 1
8,3 0 173 8.082945848 9956 D WSM 52753552 + 9 [dm_io_thd]
8,3 1 358 8.122993041 9957 Q WS 55265344 + 136 [dm_io_thd]
......
8,3 1 387 8.123178082 9957 G WS 55265616 + 16 [dm_io_thd]
8,3 1 388 8.123178399 9957 I WS 55265616 + 16 [dm_io_thd]
8,3 1 389 8.123178805 9957 D WS 55265616 + 16 [dm_io_thd]
8,3 1 393 8.123224770 9957 Q WS 55265648 + 16 [dm_io_thd]
8,3 1 394 8.123225332 9957 G WS 55265648 + 16 [dm_io_thd]
8,3 1 395 8.123225650 9957 I WS 55265648 + 16 [dm_io_thd]
8,3 1 396 8.123226082 9957 D WS 55265648 + 16 [dm_io_thd]
8,3 1 398 8.123257086 9957 Q WS 55265680 + 16 [dm_io_thd]
[[email protected] tmp]# cat sda3_io_thread.log_8,3_w.dat|more
7420.463667378 81083264 81083280
7420.463748341 90913576 90913592
7420.464266588 52873936 52873952
7420.464338445 52874072 52874088
7420.464624548 54613992 54614072
7420.464725512 54587992 54588152
7420.465154622 54588152 54588312
7420.465240773 54588312 54588472
7420.465317470 54588472 54588632
7420.465685249 54641704 54641896
7420.465762457 54641896 54642072
7420.465836406 54642072 54642232
7420.465926814 54642232 54642424
7420.466335284 54642424 54642584 --以這裡為例,前後相減,大概是160個IO操作;每次IO size 512 byte,那麼應該是81920 bytes。
7420.466460526 55242640 55243664
7420.466763160 55243664 55244688
......
[[email protected] tmp]# cat /sys/block/sda/queue/physical_block_size
512
[[email protected] tmp]# cat /sys/block/sda/queue/logical_block_size
512
[[email protected] tmp]#
由此可見,還是會存在IO合并的操作,不是單page寫入,page應該是最小IO單元。
4、dm_sched_thd 排程程序
類似Oracle中的job 程序,負責定時任務的排程,預設情況下每秒輪詢一次。不過達夢資料庫中的排程線程,
不單單是這些功能,從官方手冊來看,包括了如下一系列功能:
1) 檢查系統級的時間觸發器,如果滿足觸發條件則生成任務加到工作線程的任務隊列 由工作線程執行;
2) 清理 SQL 緩存、計劃緩存中失效的項,或者超出緩存限制後淘汰不常用的緩存項;
3) 檢查資料重演捕獲持續時間是否到期,到期則自動停止捕獲;
4) 執行動态緩沖區檢查。根據需要動态擴充或動态收縮系統緩沖池;
5) 自動執行檢查點。為了保證日志的及時刷盤,減少系統故障時恢複時間,根據 INI參數設定的自動檢查點執行間隔定期執行檢查點操作;
6) 會話逾時檢測。當客戶連接配接設定了連接配接逾時時,定期檢測是否逾時,如果逾時則自動斷開連接配接;
7) 必要時執行資料更新頁刷盤;
8) 喚醒等待的工作線程
可以通過perf top -p 10219 來簡單窺視一下排程程序的一些函數調用:
68.73% [kernel] [k] finish_task_switch
2.66% [kernel] [k] __do_softirq
2.52% [kernel] [k] _raw_spin_unlock_irqrestore
1.92% libpthread-2.17.so [.] pthread_mutex_lock
1.77% [kernel] [k] __audit_syscall_exit
1.73% dmserver [.] buf4_pool_is_empty_or_recycle_or_fast
1.42% libpthread-2.17.so [.] pthread_mutex_unlock
1.22% libpthread-2.17.so [.] pthread_cond_timedwait@@GLIBC_2.3.2
1.10% [kernel] [k] static_key_enabled
1.00% libc-2.17.so [.] __GI___libc_poll
0.85% dmserver [.] ini_get_value
0.85% dmserver [.] ntsk_leave
0.83% [kernel] [k] fget_light
0.75% [kernel] [k] do_sys_poll
0.75% dmserver [.] mal_lsnr_port_check
0.70% libpthread-2.17.so [.] __pthread_mutex_cond_lock
0.69% [kernel] [k] __check_object_size
0.69% [kernel] [k] unroll_tree_refs
0.64% [kernel] [k] futex_wake
0.57% dmserver [.] ini_get_double_value
0.57% libc-2.17.so [.] __clock_gettime
0.55% [kernel] [k] auditsys
0.55% [kernel] [k] __audit_syscall_entry
0.51% [kernel] [k] wake_q_add
0.50% dmserver [.] buf4_force_flush_pages
0.50% libc-2.17.so [.] __memcpy_ssse3_back
0.50% libpthread-2.17.so [.] __pthread_disable_asynccancel
0.49% dmserver [.] purg2_crash_cmt_trx
0.44% [kernel] [k] __virt_addr_valid
0.44% [kernel] [k] hash_futex
0.38% [kernel] [k] __smp_mb__after_atomic
0.38% [vdso] [.] __vdso_clock_gettime
0.38% dmserver [.] trx4_clear_ptab_version
0.29% dmserver [.] [email protected]
0.22% [kernel] [k] sys_select
0.20% dmserver [.] os_event2_reset
0.17% [kernel] [k] ret_from_sys_call
0.17% dmserver [.] rlog4_flush_for_db_low
0.17% dmserver [.] trx4_active_view_broadcast_thread
0.15% [kernel] [k] _copy_from_user
0.15% [kernel] [k] _raw_qspin_lock
5、dm_redolog_thd 即日志flush 線程
該線程作用類似Oracle LGWR程序,負責将日志緩沖區中的buffer寫出到redo日志檔案中,用于確定事務中斷或異常時可以進行恢複,保證事務一緻性。
在DM8之前的版本中,是沒有redolog flush線程的,統一由IO線程負責。DM8中對這個性能瓶頸點進行了優化拆分,将redolog flush線程獨立了出來;從這點來看DM8
在日志寫方面比之前老版本應該有較大的性能提升。不過仍然是一個單線程。當redo産生量極大的情況下,單線程應該還是會存在瓶頸。
大家知道Oracle 12c版本之前,redo buffer的寫出程序lgwr也是單程序,從12c版本開始,Oracle引入了多程序模式,解決了最後一個瓶頸。
那麼多redolog flush線程會合并寫入嗎? 這裡看看過濾看看前面的strace跟蹤即可:
[[email protected] ~]# cat /tmp/io_thread.log |grep "10052" |grep pwrite64|more
10052 0.000447 pwrite64(7, "\0,\2\0.*\2\0\337\233W\0236\240\332\10\1\0\0\0\0\0m\344\n\0\0\0\0\0m\344"..., 142336, 2559817216) = 142336
10052 0.000914 pwrite64(7, "\0R\4\0\334P\4\0\337\233W\0236\240\332\10\1\0\0\0\0\0n\344\n\0\0\0\0\0n\344"..., 283136, 2559959552) = 283136
10052 0.000270 pwrite64(7, "\0\354\7\0\23\352\7\0\337\233W\0236\240\332\10\1\0\0\0\0\0o\344\n\0\0\0\0\0o\344"..., 519168, 2560242688) = 519168
10052 0.000010 pwrite64(7, "\0\276\0\0F\275\0\0\337\233W\0236\240\332\10\1\0\0\0\0\0p\344\n\0\0\0\0\0p\344"..., 48640, 2560761856 <unfinished ...>
10052 0.003413 <... pwrite64 resumed> ) = 48640
10052 0.000116 pwrite64(7, "\0\4\0\0y\2\0\0\337\233W\0236\240\332\10\1\0\0\0\0\0q\344\n\0\0\0\0\0q\344"..., 1024, 2560810496) = 1024
10052 0.000527 pwrite64(7, "\0\2\0\0\257\0\0\0\337\233W\0236\240\332\10\1\0\0\0\0\0r\344\n\0\0\0\0\0r\344"..., 512, 2560811520) = 512
10052 0.000409 pwrite64(7, "\0\0\10\0\305\376\7\0\337\233W\0236\240\332\10\1\0\0\0\0\0s\344\n\0\0\0\0\0s\344"..., 524288, 2560812032 <unfinished ...>
10052 0.000564 <... pwrite64 resumed> ) = 524288
10052 0.000351 pwrite64(7, "\0\374\7\0\323\372\7\0\337\233W\0236\240\332\10\1\0\0\0\0\0t\344\n\0\0\0\0\0t\344"..., 523264, 2561336320) = 523264
10052 0.000373 pwrite64(7, "\0R\1\0&Q\1\0\337\233W\0236\240\332\10\1\0\0\0\0\0u\344\n\0\0\0\0\0u\344"..., 86528, 2561859584) = 86528
10052 0.002699 pwrite64(7, "\0\340\17\0\221\336\17\0\337\233W\0236\240\332\10\1\0\1\0\0\0v\344\n\0\0\0\0\0v\344"..., 1040384, 2561946112) = 1040384
10052 0.000019 pwrite64(7, "\0\336\17\0\334\334\17\0\337\233W\0236\240\332\10\1\0\1\0\0\0w\344\n\0\0\0\0\0w\344"..., 1039872, 2562986496 <unfinished ...>
10052 0.000426 <... pwrite64 resumed> ) = 1039872
10052 0.000392 pwrite64(7, "\0\4\3\0\234\3\3\0\337\233W\0236\240\332\10\1\0\0\0\0\0x\344\n\0\0\0\0\0x\344"..., 197632, 2564026368) = 197632
10052 0.000237 pwrite64(7, "\0\4\0\0\t\3\0\0\337\233W\0236\240\332\10\1\0\0\0\0\0y\344\n\0\0\0\0\0y\344"..., 1024, 2564224000) = 1024
10052 0.000125 pwrite64(7, "\0\2\0\0\257\0\0\0\337\233W\0236\240\332\10\1\0\0\0\0\0z\344\n\0\0\0\0\0z\344"..., 512, 2564225024) = 512
10052 0.002194 pwrite64(7, "\0\374\7\0005\373\7\0\337\233W\0236\240\332\10\1\0\0\0\0\0{\344\n\0\0\0\0\0{\344"..., 523264, 2564225536) = 523264
10052 0.001846 pwrite64(7, "\0\364\17\0!\363\17\0\337\233W\0236\240\332\10\1\0\1\0\0\0|\344\n\0\0\0\0\0|\344"..., 1045504, 2564748800) = 1045504
10052 0.001553 pwrite64(7, "\0\354\7\0\1\353\7\0\337\233W\0236\240\332\10\1\0\0\0\0\0}\344\n\0\0\0\0\0}\344"..., 519168, 2565794304) = 519168
10052 0.001214 pwrite64(7, "\0\362\7\0\240\360\7\0\337\233W\0236\240\332\10\1\0\0\0\0\0~\344\n\0\0\0\0\0~\344"..., 520704, 2566313472) = 520704
10052 0.000467 pwrite64(7, "\0\366\7\0\210\365\7\0\337\233W\0236\240\332\10\1\0\0\0\0\0\177\344\n\0\0\0\0\0\177\344"..., 521728, 2566834176) = 521728
10052 0.000331 pwrite64(7, "\0\354\7\0`\352\7\0\337\233W\0236\240\332\10\1\0\0\0\0\0\200\344\n\0\0\0\0\0\200\344"..., 519168, 2567355904) = 519168
10052 0.000304 pwrite64(7, "\0\360\7\0'\357\7\0\337\233W\0236\240\332\10\1\0\0\0\0\0\201\344\n\0\0\0\0\0\201\344"..., 520192, 2567875072) = 520192
10052 0.000895 pwrite64(7, "\0\\\5\0c[\5\0\337\233W\0236\240\332\10\1\0\0\0\0\0\202\344\n\0\0\0\0\0\202\344"..., 351232, 2568395264) = 351232
10052 0.000167 pwrite64(7, "\0\4\0\0\1\2\0\0\337\233W\0236\240\332\10\1\0\0\0\0\0\203\344\n\0\0\0\0\0\203\344"..., 1024, 2568746496) = 1024
10052 0.000169 pwrite64(7, "\0\2\0\0\257\0\0\0\337\233W\0236\240\332\10\1\0\0\0\0\0\204\344\n\0\0\0\0\0\204\344"..., 512, 2568747520) = 512
10052 0.000027 pwrite64(7, "\0\24\0\0\270\23\0\0\337\233W\0236\240\332\10\1\0\0\0\0\0\205\344\n\0\0\0\0\0\205\344"..., 5120, 2568748032) = 5120
10052 0.000026 pwrite64(7, "\0\16\0\0\200\f\0\0\337\233W\0236\240\332\10\1\0\0\0\0\0\206\344\n\0\0\0\0\0\206\344"..., 3584, 2568753152) = 3584
10052 0.000027 pwrite64(7, "\0\16\0\0\200\f\0\0\337\233W\0236\240\332\10\1\0\0\0\0\0\207\344\n\0\0\0\0\0\207\344"..., 3584, 2568756736) = 3584
10052 0.000027 pwrite64(7, "\0\16\0\0\200\f\0\0\337\233W\0236\240\332\10\1\0\0\0\0\0\210\344\n\0\0\0\0\0\210\344"..., 3584, 2568760320) = 3584
10052 0.000051 pwrite64(7, "\0\16\0\0\230\f\0\0\337\233W\0236\240\332\10\1\0\0\0\0\0\211\344\n\0\0\0\0\0\211\344"..., 3584, 2568763904) = 3584
10052 0.000027 pwrite64(7, "\0\16\0\0\200\f\0\0\337\233W\0236\240\332\10\1\0\0\0\0\0\212\344\n\0\0\0\0\0\212\344"..., 3584, 2568767488) = 3584
10052 0.000026 pwrite64(7, "\0\16\0\0008\f\0\0\337\233W\0236\240\332\10\1\0\0\0\0\0\213\344\n\0\0\0\0\0\213\344"..., 3584, 2568771072) = 3584
[[email protected] ~]# ls -ltr /proc/9539/fd
total 0
lrwx------. 1 dmdba dinstall 64 Aug 25 01:41 9 -> /opt/dm/dmdbms/data/enmotech/enmotech03.log
lrwx------. 1 dmdba dinstall 64 Aug 25 01:41 8 -> /opt/dm/dmdbms/data/enmotech/enmotech02.log
lrwx------. 1 dmdba dinstall 64 Aug 25 01:41 7 -> /opt/dm/dmdbms/data/enmotech/enmotech01.log
lrwx------. 1 dmdba dinstall 64 Aug 25 01:41 6 -> /opt/dm/dmdbms/data/enmotech/TEMP.DBF
lrwx------. 1 dmdba dinstall 64 Aug 25 01:41 5 -> /opt/dm/dmdbms/data/enmotech/SYSTEM.DBF
lrwx------. 1 dmdba dinstall 64 Aug 25 01:41 4 -> socket:[62747]
lrwx------. 1 dmdba dinstall 64 Aug 25 01:41 3 -> socket:[62746]
l-wx------. 1 dmdba dinstall 64 Aug 25 01:41 2 -> /opt/dmdbms/log/DmServicedmdb.log
lrwx------. 1 dmdba dinstall 64 Aug 25 01:41 17 -> socket:[73233]
l-wx------. 1 dmdba dinstall 64 Aug 25 01:41 16 -> pipe:[64988]
lr-x------. 1 dmdba dinstall 64 Aug 25 01:41 15 -> pipe:[64988]
l-wx------. 1 dmdba dinstall 64 Aug 25 01:41 14 -> pipe:[64987]
lr-x------. 1 dmdba dinstall 64 Aug 25 01:41 13 -> pipe:[64987]
lrwx------. 1 dmdba dinstall 64 Aug 25 01:41 12 -> /opt/dm/dmdbms/data/enmotech/MAIN.DBF
lrwx------. 1 dmdba dinstall 64 Aug 25 01:41 11 -> /opt/dm/dmdbms/data/enmotech/ROLL.DBF
lrwx------. 1 dmdba dinstall 64 Aug 25 01:41 10 -> /opt/dm/dmdbms/data/enmotech/enmotech01.dbf
l-wx------. 1 dmdba dinstall 64 Aug 25 01:41 1 -> /opt/dmdbms/log/DmServicedmdb.log
lr-x------. 1 dmdba dinstall 64 Aug 25 01:41 0 -> /dev/null
[[email protected] ~]#
可見達夢8中,redolog flush線程會進行日志緩沖的合并寫出。這也是正常的處理邏輯,否則性能上不去的。最後還有一些其他線程,如dm_chkpnt_thd 即檢查點線程;purge線程,則主要是負責復原段的清理;歸檔線程、審計線程等等。
總結:
看達夢資料庫的14類線程,不少核心線程原理似乎跟Oracle類似(實際上關系型資料庫應該都差不多);不過仍然有一些差别。
這裡不發表太多評論,防止被噴。從上面測試來看,監聽線程,redolog flush線程 是可能存在瓶頸的點。