天天看點

達夢資料庫學習筆記之 — 線程架構

這裡接着前面的系列文章繼續進行學習探索(本文就是純技術研究,僅代表個人觀點,與公司無關)。達夢資料庫本質上是一個單程序多線程架構,跟MySQL類似。如下所示。

[[email protected] ~]# ps -ef|grep dmserver|grep -v grep
dmdba      9539      1  0 00:47 ?        00:00:02 /opt/dmdbms/bin/dmserver /opt/dm/dmdbms/data/enmotech/dm.ini -noconsole
[[email protected] ~]# 
 
SQL> select pname,trace_name,type$,pid from v$process;
 
LINEID     PNAME    TRACE_NAME TYPE$       PID        
---------- -------- ---------- ----------- -----------
1          dmserver            1           9539
 
[[email protected] enmotech]# pstree -p |grep dmserver
           |-dmserver(9539)-+-{dmserver}(9902)
           |                |-{dmserver}(9956)
           |                |-{dmserver}(9957)
           |                |-{dmserver}(9958)
           |                |-{dmserver}(9959)
           |                |-{dmserver}(10051)
           |                |-{dmserver}(10052)
           |                |-{dmserver}(10151)
           |                |-{dmserver}(10152)
           |                |-{dmserver}(10153)
           |                |-{dmserver}(10154)
           |                |-{dmserver}(10187)
           |                |-{dmserver}(10190)
           |                |-{dmserver}(10192)
           |                |-{dmserver}(10206)
           |                |-{dmserver}(10207)
           |                |-{dmserver}(10208)
           |                |-{dmserver}(10209)
           |                |-{dmserver}(10210)
           |                |-{dmserver}(10211)
           |                |-{dmserver}(10212)
           |                |-{dmserver}(10213)
           |                |-{dmserver}(10219)
           |                |-{dmserver}(10220)
           |                `-{dmserver}(10787)
可以看到這裡的PID即是dmserver的作業系統程序号。跟Oracle 中的pid不同。 接下來我們看看dm8中有哪些線程:

SQL> select id,name,THREAD_DESC from v$threads;
LINEID     ID                   NAME           THREAD_DESC                                                                          
---------- -------------------- -------------- -------------------------------------------------------------------------------------
1          9902                 dm_quit_thd    Thread for executing shutdown-normal operation
2          9956                 dm_io_thd      IO thread
3          9957                 dm_io_thd      IO thread
4          9958                 dm_io_thd      IO thread
5          9959                 dm_io_thd      IO thread
6          10051                dm_chkpnt_thd  Flush checkpoint thread
7          10052                dm_redolog_thd Redo log thread, used to flush log
8          10153                dm_hio_thd     IO thread for HFS to read data pages
9          10152                dm_hio_thd     IO thread for HFS to read data pages
10         10151                dm_hio_thd     IO thread for HFS to read data pages
11         10154                dm_hio_thd     IO thread for HFS to read data pages
12         10187                dm_sqllog_thd  Thread for writing dmsql dmserver
13         10190                dm_purge_thd   Purge thread
14         10211                dm_wrkgrp_thd  User working thread
15         10206                dm_tskwrk_thd  Task Worker Thread for SQL parsing and execution for sevrer itself
16         10210                dm_trctsk_thd  Thread for writing trace information
17         10209                dm_tskwrk_thd  Task Worker Thread for SQL parsing and execution for sevrer itself
18         10208                dm_tskwrk_thd  Task Worker Thread for SQL parsing and execution for sevrer itself
19         10207                dm_tskwrk_thd  Task Worker Thread for SQL parsing and execution for sevrer itself
20         10212                dm_wrkgrp_thd  User working thread
21         10213                dm_audit_thd   Thread for flush audit logs
22         10219                dm_sched_thd   Server scheduling thread,used to trigger background checkpoint, time-related triggers
23         10220                dm_lsnr_thd    Service listener thread
24         10787                dm_sql_thd     User session thread
SQL> select name,THREAD_DESC,count(1) from v$threads group by
2   name,THREAD_DESC order by 3;
LINEID     NAME           THREAD_DESC                                                                           COUNT(1)            
---------- -------------- ------------------------------------------------------------------------------------- --------------------
1          dm_sched_thd   Server scheduling thread,used to trigger background checkpoint, time-related triggers 1
2          dm_chkpnt_thd  Flush checkpoint thread                                                               1
3          dm_lsnr_thd    Service listener thread                                                               1
4          dm_sql_thd     User session thread                                                                   1
5          dm_sqllog_thd  Thread for writing dmsql dmserver                                                     1
6          dm_purge_thd   Purge thread                                                                          1
7          dm_quit_thd    Thread for executing shutdown-normal operation                                        1
8          dm_redolog_thd Redo log thread, used to flush log                                                    1
9          dm_trctsk_thd  Thread for writing trace information                                                  1
10         dm_audit_thd   Thread for flush audit logs                                                           1
11         dm_wrkgrp_thd  User working thread                                                                   2
12         dm_hio_thd     IO thread for HFS to read data pages                                                  4
13         dm_tskwrk_thd  Task Worker Thread for SQL parsing and execution for sevrer itself                    4
14         dm_io_thd      IO thread                                                                             4
14 rows got
used time: 31.360(ms). Execute id is 15.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
SQL> select id,name,THREAD_DESC from v$threads;
 
LINEID     ID                   NAME           THREAD_DESC                                                                          
---------- -------------------- -------------- -------------------------------------------------------------------------------------
1          9902                 dm_quit_thd    Thread for executing shutdown-normal operation
2          9956                 dm_io_thd      IO thread
3          9957                 dm_io_thd      IO thread
4          9958                 dm_io_thd      IO thread
5          9959                 dm_io_thd      IO thread
6          10051                dm_chkpnt_thd  Flush checkpoint thread
7          10052                dm_redolog_thd Redo log thread, used to flush log
8          10153                dm_hio_thd     IO thread for HFS to read data pages
9          10152                dm_hio_thd     IO thread for HFS to read data pages
10         10151                dm_hio_thd     IO thread for HFS to read data pages
11         10154                dm_hio_thd     IO thread for HFS to read data pages
12         10187                dm_sqllog_thd  Thread for writing dmsql dmserver
13         10190                dm_purge_thd   Purge thread
14         10211                dm_wrkgrp_thd  User working thread
15         10206                dm_tskwrk_thd  Task Worker Thread for SQL parsing and execution for sevrer itself
16         10210                dm_trctsk_thd  Thread for writing trace information
17         10209                dm_tskwrk_thd  Task Worker Thread for SQL parsing and execution for sevrer itself
18         10208                dm_tskwrk_thd  Task Worker Thread for SQL parsing and execution for sevrer itself
19         10207                dm_tskwrk_thd  Task Worker Thread for SQL parsing and execution for sevrer itself
20         10212                dm_wrkgrp_thd  User working thread
21         10213                dm_audit_thd   Thread for flush audit logs
22         10219                dm_sched_thd   Server scheduling thread,used to trigger background checkpoint, time-related triggers
23         10220                dm_lsnr_thd    Service listener thread
24         10787                dm_sql_thd     User session thread
 
SQL> select name,THREAD_DESC,count(1) from v$threads group by
2   name,THREAD_DESC order by 3;
 
LINEID     NAME           THREAD_DESC                                                                           COUNT(1)            
---------- -------------- ------------------------------------------------------------------------------------- --------------------
1          dm_sched_thd   Server scheduling thread,used to trigger background checkpoint, time-related triggers 1
2          dm_chkpnt_thd  Flush checkpoint thread                                                               1
3          dm_lsnr_thd    Service listener thread                                                               1
4          dm_sql_thd     User session thread                                                                   1
5          dm_sqllog_thd  Thread for writing dmsql dmserver                                                     1
6          dm_purge_thd   Purge thread                                                                          1
7          dm_quit_thd    Thread for executing shutdown-normal operation                                        1
8          dm_redolog_thd Redo log thread, used to flush log                                                    1
9          dm_trctsk_thd  Thread for writing trace information                                                  1
10         dm_audit_thd   Thread for flush audit logs                                                           1
11         dm_wrkgrp_thd  User working thread                                                                   2
12         dm_hio_thd     IO thread for HFS to read data pages                                                  4
13         dm_tskwrk_thd  Task Worker Thread for SQL parsing and execution for sevrer itself                    4
14         dm_io_thd      IO thread                                                                             4
 
14 rows got
 
used time: 31.360(ms). Execute id is 15.
           

下面我們針對上述線程簡單進行描述和總結。

1、dm_lsnr_thd 監聽線程

類似Oracle的listener 程式,任務是在伺服器端口上進行循環監聽,一旦有來自客戶的連接配接請求,監 聽線程被喚醒并生成一個會話申請任務,加入工作線程的任務隊列,等待工作線程進行處理。

達夢資料庫中監聽線程的監聽端口範圍為1024 – 65534;預設監聽的端口為5236。通過分析參數檔案配置,可以看到與端口相關的一些參數:

[[email protected] enmotech]# cat dm.ini |grep PORT
                PORT_NUM                        = 5236                  #Port number on which the database server will listen
                ELOG_REPORT_LINK_SQL            = 0                     #Whether to write the SQLs that sent to remote database by DBLINKs into error log file
                DFS_HUGE_SUPPORT                = 1                     #Whether support huge table operation in DDFS
                EXTERNAL_JFUN_PORT              = 6363                  #DmAgent port for external java fun. 
                EXTERNAL_AP_PORT                = 4236                  #DmAp port for external fun.
                DCRS_PORT_NUM                   = 6236                  #Port number on which the database dcrs will listen
                AP_PORT_NUM                     = 0                     #Port number on which the database ap will listen
                DW_PORT                         = 0                     #Instance tcp port for watch2
                DCP_PORT_NUM                    = 5237                  #Port number on which DCP will listen
                HA_INST_CHECK_PORT              = 65534                 #HA instance check port
[[email protected] enmotech]#
           

根據達夢官方手冊的描述,為了保證在處理大量客戶連接配接 時系統具有較短的響應時間,監聽線程比普通線程優先級更高。從目前來看,DM8中隻能有一個監聽線程;在高并發場景,可能面臨請求處理不及時的情況(大家知道在Oracle中,如果單個監聽程序無法處理時,我們可以還有建立多個監聽程式)。

2、dm_tskwrk_thd 工作線程

這是達夢資料庫的核心線程;它從任務隊列中取出任務,并根據任務的類型進行相應的處理,負責所有實際的資料相關操作。

DM8 的初始工作線程個數由配置檔案指定,随着會話連接配接的增加,工作線程也會同步增加,以保持每個會話都有專門的工作線程處理請求。

看上去其預設值是cpu core的2倍。高并發場景下,該參數應該是需要調大的。

[[email protected] enmotech]# cat dm.ini |grep TASK
                TASK_THREADS                    = 4                     #Number Of Task Threads
           

3、dm_io_thd 即IO thread

該線程也是核心線程,負責髒資料寫出,類似Oracle中的DB writer程序。根據官方的手冊描述,有如下幾種情況下,需要IO線程來完成工作。

DM Server需要進行IO操作的時機主要有以下三種:

1)需要處理的資料頁不在緩沖區中,此時需要将相關資料頁讀入緩沖區;

2)緩沖區滿或系統關閉時,此時需要将部分髒資料頁寫入磁盤;

3)檢查點到來時,需要将所有髒資料頁寫入磁盤。

通過如下2個參數來進行IO線程數量的控制;

[[email protected] enmotech]# cat dm.ini |grep IO_THR
                IO_THR_GROUPS                   = 2                     #The Number Of Io Thread Groups(Non-Windows Only)
                HIO_THR_GROUPS                  = 2                     #The Number Of Huge Io Thread Groups(Non-Windows Only)
           

這裡簡單建立測試表,并觸發檢查點,來觀察一下IO thread的行為:

SQL> create table test0825 as select * from dba_objects;
executed successfully
used time: 68.693(ms). Execute id is 29.
SQL>  select checkpoint(20);
 
LINEID     CHECKPOINT(20)
---------- --------------
1          0
 
used time: 9.623(ms). Execute id is 30.
SQL>
           

我們可以通過perf top -p 9956 來觀察IO線程的函數調用情況:

70.49%  [kernel]            [k] finish_task_switch
   4.13%  [kernel]            [k] __do_softirq
   2.58%  [kernel]            [k] _raw_spin_unlock_irqrestore
   2.26%  libpthread-2.17.so  [.] pthread_mutex_lock
   1.30%  dmserver            [.] ini_get_value
   1.17%  dmserver            [.] trx4_clear_ptab_version
   1.17%  libpthread-2.17.so  [.] __pthread_enable_asynccancel
   1.16%  [vdso]              [.] __vdso_clock_gettime
   1.16%  libc-2.17.so        [.] __GI___libc_poll
   1.00%  [kernel]            [k] system_call_after_swapgs
   0.78%  [kernel]            [k] __audit_syscall_exit
   0.77%  [kernel]            [k] timespec_add_safe
   0.77%  dmserver            [.] rlog4_log_get
   0.76%  dmserver            [.] buf4_pool_is_empty_or_recycle_or_fast
   0.69%  [kernel]            [k] do_select
   0.67%  [vdso]              [.] __vdso_time
   0.67%  dmserver            [.] uevent_reset
   0.67%  libpthread-2.17.so  [.] _L_unlock_738
   0.65%  [kernel]            [k] dput
   0.59%  libc-2.17.so        [.] __localtime_r
   0.57%  libpthread-2.17.so  [.] pthread_mutex_unlock
   0.56%  [kernel]            [k] static_key_enabled
   0.51%  [kernel]            [k] unroll_tree_refs
   0.45%  [kernel]            [k] mntput
   0.45%  libc-2.17.so        [.] __clock_gettime
   0.39%  [kernel]            [k] __virt_addr_valid
   0.39%  [kernel]            [k] sysret_signal
   0.39%  dmserver            [.] capt_enter
   0.39%  libpthread-2.17.so  [.] 0x000000000000ee41
   0.35%  libpthread-2.17.so  [.] pthread_cond_timedwait@@GLIBC_2.3.2
   0.34%  [kernel]            [k] plist_del
   0.34%  dmserver            [.] ini_get_double_value
   0.26%  [kernel]            [k] path_put
   0.23%  dmserver            [.] buf4_pool_force_flush_pages
   0.20%  [kernel]            [k] __audit_syscall_entry
   0.20%  dmserver            [.] sys_get_sys_mode_low
   0.14%  [kernel]            [k] sys_poll
   0.14%  [vdso]              [.] __vdso_gettimeofday
   0.12%  dmserver            [.] os_thread_sleep_low
   0.12%  dmserver            [.] purg2_thread
           

這裡我建立一個測試表,插入數十萬資料,觸發全量檢查點,對程序進行跟蹤一下看看具體的資料寫出行為:

SQL> select checkpoint(100);
 
LINEID     CHECKPOINT(100)
---------- ---------------
1          0
 
[[email protected] yum.repos.d]# ps -ef|grep dms
dmdba      9539      1  0 00:47 ?        00:00:06 /opt/dmdbms/bin/dmserver /opt/dm/dmdbms/data/enmotech/dm.ini -noconsole
root      11437  11138  0 01:37 pts/2    00:00:00 grep --color=auto dms
[[email protected] yum.repos.d]# strace -fr -o /tmp/io_thread.log -p 9539
strace: Process 9539 attached with 26 threads
 ^Cstrace: Process 9539 detached
strace: Process 9902 detached
strace: Process 9956 detached
strace: Process 9957 detached
strace: Process 9958 detached
strace: Process 9959 detached
strace: Process 10051 detached
strace: Process 10052 detached
strace: Process 10151 detached
。。。。。
strace: Process 10213 detached
strace: Process 10219 detached
strace: Process 10220 detached
strace: Process 10787 detached
[[email protected] ~]# ls -ltr /proc/9539/fd
total 0
lrwx------. 1 dmdba dinstall 64 Aug 25 01:41 9 -> /opt/dm/dmdbms/data/enmotech/enmotech03.log
lrwx------. 1 dmdba dinstall 64 Aug 25 01:41 8 -> /opt/dm/dmdbms/data/enmotech/enmotech02.log
lrwx------. 1 dmdba dinstall 64 Aug 25 01:41 7 -> /opt/dm/dmdbms/data/enmotech/enmotech01.log
lrwx------. 1 dmdba dinstall 64 Aug 25 01:41 6 -> /opt/dm/dmdbms/data/enmotech/TEMP.DBF
lrwx------. 1 dmdba dinstall 64 Aug 25 01:41 5 -> /opt/dm/dmdbms/data/enmotech/SYSTEM.DBF
lrwx------. 1 dmdba dinstall 64 Aug 25 01:41 4 -> socket:[62747]
lrwx------. 1 dmdba dinstall 64 Aug 25 01:41 3 -> socket:[62746]
l-wx------. 1 dmdba dinstall 64 Aug 25 01:41 2 -> /opt/dmdbms/log/DmServicedmdb.log
lrwx------. 1 dmdba dinstall 64 Aug 25 01:41 17 -> socket:[73233]
l-wx------. 1 dmdba dinstall 64 Aug 25 01:41 16 -> pipe:[64988]
lr-x------. 1 dmdba dinstall 64 Aug 25 01:41 15 -> pipe:[64988]
l-wx------. 1 dmdba dinstall 64 Aug 25 01:41 14 -> pipe:[64987]
lr-x------. 1 dmdba dinstall 64 Aug 25 01:41 13 -> pipe:[64987]
lrwx------. 1 dmdba dinstall 64 Aug 25 01:41 12 -> /opt/dm/dmdbms/data/enmotech/MAIN.DBF
lrwx------. 1 dmdba dinstall 64 Aug 25 01:41 11 -> /opt/dm/dmdbms/data/enmotech/ROLL.DBF
lrwx------. 1 dmdba dinstall 64 Aug 25 01:41 10 -> /opt/dm/dmdbms/data/enmotech/enmotech01.dbf
l-wx------. 1 dmdba dinstall 64 Aug 25 01:41 1 -> /opt/dmdbms/log/DmServicedmdb.log
lr-x------. 1 dmdba dinstall 64 Aug 25 01:41 0 -> /dev/null
[[email protected] ~]# cat /tmp/io_thread.log |grep "9956" |grep pwrite64|more
9956       0.000016 pwrite64(11, "\1\0\0\0\320\234\0\0\377\377\377\377\377\377\377\377\377\377\377\377$\0\0\0\0\0\0\0\31Q\260\2"..., 8192, 328859648 <unfinished ...>
9956       0.000005 <... pwrite64 resumed> ) = 8192
9956       0.000030 pwrite64(12, "\4\0\0\0000\0\0\0\377\377\377\377\377\377\377\377\377\377\377\377\25\0\0\0\0\0\0\[email protected]\260\2"..., 8192, 393216 <unfinished ...>
9956       0.000012 <... pwrite64 resumed> ) = 8192
9956       0.000020 pwrite64(12, "\4\0\0\0001\0\0\0\377\377\377\377\377\377\377\377\377\377\377\377\32\32\32\32\0\0\0\0\206Q\260\2"..., 8192, 401408 <unfinished ...>
9956       0.000008 <... pwrite64 resumed> ) = 8192
9956       0.000017 pwrite64(12, "\4\0\0\0\0\0\0\0\377\377\377\377\377\377\377\377\377\377\377\377\23\0\0\0\0\0\0\0,R\260\2"..., 8192, 0 <unfinished ...>
9956       0.000007 <... pwrite64 resumed> ) = 8192
9956       0.000015 pwrite64(12, "\4\0\0\0q\0\0\0\0\0p\0\0\0\0\0r\0\0\0\24\0\0\0\0\0\0\0gQ\260\2"..., 8192, 925696 <unfinished ...>
9956       0.000008 <... pwrite64 resumed> ) = 8192
9956       0.000020 pwrite64(12, "\4\0\0\0\10\0\0\0\377\377\377\377\377\377\377\377\377\377\377\377\21\0\0\0\0\0\0\0,R\260\2"..., 8192, 65536 <unfinished ...>
9956       0.000008 <... pwrite64 resumed> ) = 8192
9956       0.000019 pwrite64(12, "\4\0\0\0r\0\0\0\0\0q\0\0\0\0\0s\0\0\0\24\0\0\0\0\0\0\0iQ\260\2"..., 8192, 933888 <unfinished ...>
9956       0.000008 <... pwrite64 resumed> ) = 8192
9956       0.000029 pwrite64(12, "\4\0\0\0\260\0\0\0\0\0\257\0\0\0\0\0\261\0\0\0\24\0\0\0\0\0\0\0\370Q\260\2"..., 8192, 1441792 <unfinished ...>
9956       0.000009 <... pwrite64 resumed> ) = 8192
9956       0.000029 pwrite64(12, "\4\0\0\0e\0\0\0\0\0d\0\0\0\0\0f\0\0\0\24\0\0\0\0\0\0\0KQ\260\2"..., 8192, 827392 <unfinished ...>
9956       0.000008 <... pwrite64 resumed> ) = 8192
9956       0.000105 pwrite64(12, "\4\0\0\0\242\0\0\0\0\0\241\0\0\0\0\0\243\0\0\0\24\0\0\0\0\0\0\0\330Q\260\2"..., 8192, 1327104) = 8192
9956       0.000019 pwrite64(12, "\4\0\0\0f\0\0\0\0\0e\0\0\0\0\0g\0\0\0\24\0\0\0\0\0\0\0MQ\260\2"..., 8192, 835584 <unfinished ...>
9956       0.000007 <... pwrite64 resumed> ) = 8192
9956       0.000103 pwrite64(12, "\4\0\0\0\243\0\0\0\0\0\242\0\0\0\0\0\244\0\0\0\24\0\0\0\0\0\0\0\332Q\260\2"..., 8192, 1335296) = 8192
9956       0.000019 pwrite64(12, "\4\0\0\0g\0\0\0\0\0f\0\0\0\0\0h\0\0\0\24\0\0\0\0\0\0\0OQ\260\2"..., 8192, 843776 <unfinished ...>
9956       0.000008 <... pwrite64 resumed> ) = 8192
9956       0.000210 pwrite64(12, "\4\0\0\0\244\0\0\0\0\0\243\0\0\0\0\0\245\0\0\0\24\0\0\0\0\0\0\0\334Q\260\2"..., 8192, 1343488) = 8192
9956       0.000022 pwrite64(12, "\4\0\0\0h\0\0\0\0\0g\0\0\0\0\0i\0\0\0\24\0\0\0\0\0\0\0QQ\260\2"..., 8192, 851968 <unfinished ...>
9956       0.000009 <... pwrite64 resumed> ) = 8192
9956       0.000119 pwrite64(12, "\4\0\0\0\245\0\0\0\0\0\244\0\0\0\0\0\246\0\0\0\24\0\0\0\0\0\0\0\336Q\260\2"..., 8192, 1351680) = 8192
9956       0.000013 pwrite64(12, "\4\0\0\0i\0\0\0\0\0h\0\0\0\0\0j\0\0\0\24\0\0\0\0\0\0\0SQ\260\2"..., 8192, 860160 <unfinished ...>
9956       0.000005 <... pwrite64 resumed> ) = 8192
9956       0.000070 pwrite64(12, "\4\0\0\0\246\0\0\0\0\0\245\0\0\0\0\0\247\0\0\0\24\0\0\0\0\0\0\0\340Q\260\2"..., 8192, 1359872) = 8192
9956       0.000019 pwrite64(12, "\4\0\0\0j\0\0\0\0\0i\0\0\0\0\0k\0\0\0\24\0\0\0\0\0\0\0UQ\260\2"..., 8192, 868352 <unfinished ...>
9956       0.000027 <... pwrite64 resumed> ) = 8192
9956       0.000110 pwrite64(12, "\4\0\0\0\247\0\0\0\0\0\246\0\0\0\0\0\250\0\0\0\24\0\0\0\0\0\0\0\345Q\260\2"..., 8192, 1368064) = 8192
9956       0.000019 pwrite64(12, "\4\0\0\0001\0\0\0\377\377\377\377\377\377\377\377\377\377\377\377\32\32\32\32\0\0\0\0lS\260\2"..., 8192, 401408 <unfinished ...>
9956       0.000009 <... pwrite64 resumed> ) = 8192
9956       0.000016 pwrite64(12, "\4\0\0\0y\0\0\0\0\0x\0\0\0\0\0z\0\0\0\24\0\0\0\0\0\0\0zQ\260\2"..., 8192, 991232 <unfinished ...>
9956       0.000007 <... pwrite64 resumed> ) = 8192
9956       0.000013 pwrite64(12, "\4\0\0\0\267\0\0\0\0\0\266\0\0\0\0\0\270\0\0\0\24\0\0\0\0\0\0\0\tR\260\2"..., 8192, 1499136 <unfinished ...>
9956       0.000005 <... pwrite64 resumed> ) = 8192
9956       0.000013 pwrite64(12, "\4\0\0\0\361\0\0\0\0\0\360\0\0\0\0\0\362\0\0\0\24\0\0\0\0\0\0\0\223R\260\2"..., 8192, 1974272 <unfinished ...>
。。。。。。
 
SQL> select name,value,DESCRIPTION from v$parameter where NAME like '%PAGE_SIZE%';
 
LINEID     NAME                  VALUE      DESCRIPTION          
---------- --------------------- ---------- ---------------------
1          GLOBAL_PAGE_SIZE_MASK 4294959104 global_page_size_mask
2          GLOBAL_PAGE_SIZE      8192       global_page_size
3          GLOBAL_LOG_PAGE_SIZE  512        global_log_page_size
           

看上去這裡DM的IO線程是單page寫入(DM預設page size是8k);居然不會進行IO合并。這看起來不太對,否則性能太低。使用blktrace來深入分析看看IO情況。

[[email protected] tmp]# blktrace -d /dev/sda3
^C=== sda3 ===
  CPU  0:                 3116 events,      147 KiB data
  CPU  1:                 7807 events,      366 KiB data
  Total:                 10923 events (dropped 0),      513 KiB data
           

通過btt格式化後可以進一步進行觀察btt -i sda3.blktrace.0 -B sda3_io_thread.log

[[email protected] tmp]# blkparse -i  sda3.blktrace.0 |grep dm_io_thd|more
  8,3    0      156     8.082262362  9956  Q  WS 55248800 + 1024 [dm_io_thd]
  8,3    0      157     8.082264411  9956  G  WS 55248800 + 1024 [dm_io_thd]
  8,3    0      158     8.082265881  9956  I  WS 55248800 + 1024 [dm_io_thd]
  8,3    0      159     8.082266808  9956  D  WS 55248800 + 1024 [dm_io_thd]
  8,3    0      161     8.082361026  9956  Q  WS 55249824 + 456 [dm_io_thd]
  8,3    0      162     8.082362306  9956  G  WS 55249824 + 456 [dm_io_thd]
  8,3    0      163     8.082363357  9956  I  WS 55249824 + 456 [dm_io_thd]
  8,3    0      164     8.082364168  9956  D  WS 55249824 + 456 [dm_io_thd]
  8,3    0      168     8.082943188  9956  Q WSM 52753552 + 9 [dm_io_thd]
  8,3    0      169     8.082944128  9956  G WSM 52753552 + 9 [dm_io_thd]
  8,3    0      170     8.082944516  9956  P   N [dm_io_thd]
  8,3    0      171     8.082945063  9956  I WSM 52753552 + 9 [dm_io_thd]
  8,3    0      172     8.082945461  9956  U   N [dm_io_thd] 1
  8,3    0      173     8.082945848  9956  D WSM 52753552 + 9 [dm_io_thd]
  8,3    1      358     8.122993041  9957  Q  WS 55265344 + 136 [dm_io_thd]
  ......
  8,3    1      387     8.123178082  9957  G  WS 55265616 + 16 [dm_io_thd]
  8,3    1      388     8.123178399  9957  I  WS 55265616 + 16 [dm_io_thd]
  8,3    1      389     8.123178805  9957  D  WS 55265616 + 16 [dm_io_thd]
  8,3    1      393     8.123224770  9957  Q  WS 55265648 + 16 [dm_io_thd]
  8,3    1      394     8.123225332  9957  G  WS 55265648 + 16 [dm_io_thd]
  8,3    1      395     8.123225650  9957  I  WS 55265648 + 16 [dm_io_thd]
  8,3    1      396     8.123226082  9957  D  WS 55265648 + 16 [dm_io_thd]
  8,3    1      398     8.123257086  9957  Q  WS 55265680 + 16 [dm_io_thd]
 
 
  [[email protected] tmp]# cat sda3_io_thread.log_8,3_w.dat|more
 7420.463667378 81083264 81083280
 7420.463748341 90913576 90913592
 7420.464266588 52873936 52873952
 7420.464338445 52874072 52874088
 7420.464624548 54613992 54614072
 7420.464725512 54587992 54588152
 7420.465154622 54588152 54588312
 7420.465240773 54588312 54588472
 7420.465317470 54588472 54588632
 7420.465685249 54641704 54641896
 7420.465762457 54641896 54642072
 7420.465836406 54642072 54642232
 7420.465926814 54642232 54642424
 7420.466335284 54642424 54642584  --以這裡為例,前後相減,大概是160個IO操作;每次IO size 512 byte,那麼應該是81920 bytes。
 7420.466460526 55242640 55243664
 7420.466763160 55243664 55244688
 ......
 
[[email protected] tmp]# cat /sys/block/sda/queue/physical_block_size
512
[[email protected] tmp]# cat /sys/block/sda/queue/logical_block_size 
512
[[email protected] tmp]# 
           

由此可見,還是會存在IO合并的操作,不是單page寫入,page應該是最小IO單元。

4、dm_sched_thd 排程程序

類似Oracle中的job 程序,負責定時任務的排程,預設情況下每秒輪詢一次。不過達夢資料庫中的排程線程,

不單單是這些功能,從官方手冊來看,包括了如下一系列功能:

1) 檢查系統級的時間觸發器,如果滿足觸發條件則生成任務加到工作線程的任務隊列 由工作線程執行;

2) 清理 SQL 緩存、計劃緩存中失效的項,或者超出緩存限制後淘汰不常用的緩存項;

3) 檢查資料重演捕獲持續時間是否到期,到期則自動停止捕獲;

4) 執行動态緩沖區檢查。根據需要動态擴充或動态收縮系統緩沖池;

5) 自動執行檢查點。為了保證日志的及時刷盤,減少系統故障時恢複時間,根據 INI參數設定的自動檢查點執行間隔定期執行檢查點操作;

6) 會話逾時檢測。當客戶連接配接設定了連接配接逾時時,定期檢測是否逾時,如果逾時則自動斷開連接配接;

7) 必要時執行資料更新頁刷盤;

8) 喚醒等待的工作線程

可以通過perf top -p 10219 來簡單窺視一下排程程序的一些函數調用:

68.73%  [kernel]            [k] finish_task_switch
   2.66%  [kernel]            [k] __do_softirq
   2.52%  [kernel]            [k] _raw_spin_unlock_irqrestore
   1.92%  libpthread-2.17.so  [.] pthread_mutex_lock
   1.77%  [kernel]            [k] __audit_syscall_exit
   1.73%  dmserver            [.] buf4_pool_is_empty_or_recycle_or_fast
   1.42%  libpthread-2.17.so  [.] pthread_mutex_unlock
   1.22%  libpthread-2.17.so  [.] pthread_cond_timedwait@@GLIBC_2.3.2
   1.10%  [kernel]            [k] static_key_enabled
   1.00%  libc-2.17.so        [.] __GI___libc_poll
   0.85%  dmserver            [.] ini_get_value
   0.85%  dmserver            [.] ntsk_leave
   0.83%  [kernel]            [k] fget_light
   0.75%  [kernel]            [k] do_sys_poll
   0.75%  dmserver            [.] mal_lsnr_port_check
   0.70%  libpthread-2.17.so  [.] __pthread_mutex_cond_lock
   0.69%  [kernel]            [k] __check_object_size
   0.69%  [kernel]            [k] unroll_tree_refs
   0.64%  [kernel]            [k] futex_wake
   0.57%  dmserver            [.] ini_get_double_value
   0.57%  libc-2.17.so        [.] __clock_gettime
   0.55%  [kernel]            [k] auditsys
   0.55%  [kernel]            [k] __audit_syscall_entry
   0.51%  [kernel]            [k] wake_q_add
   0.50%  dmserver            [.] buf4_force_flush_pages
   0.50%  libc-2.17.so        [.] __memcpy_ssse3_back
   0.50%  libpthread-2.17.so  [.] __pthread_disable_asynccancel
   0.49%  dmserver            [.] purg2_crash_cmt_trx
   0.44%  [kernel]            [k] __virt_addr_valid
   0.44%  [kernel]            [k] hash_futex
   0.38%  [kernel]            [k] __smp_mb__after_atomic
   0.38%  [vdso]              [.] __vdso_clock_gettime
   0.38%  dmserver            [.] trx4_clear_ptab_version
   0.29%  dmserver            [.] [email protected]
   0.22%  [kernel]            [k] sys_select
   0.20%  dmserver            [.] os_event2_reset
   0.17%  [kernel]            [k] ret_from_sys_call
   0.17%  dmserver            [.] rlog4_flush_for_db_low
   0.17%  dmserver            [.] trx4_active_view_broadcast_thread
   0.15%  [kernel]            [k] _copy_from_user
   0.15%  [kernel]            [k] _raw_qspin_lock
           

5、dm_redolog_thd 即日志flush 線程

該線程作用類似Oracle LGWR程序,負責将日志緩沖區中的buffer寫出到redo日志檔案中,用于確定事務中斷或異常時可以進行恢複,保證事務一緻性。

在DM8之前的版本中,是沒有redolog flush線程的,統一由IO線程負責。DM8中對這個性能瓶頸點進行了優化拆分,将redolog flush線程獨立了出來;從這點來看DM8

在日志寫方面比之前老版本應該有較大的性能提升。不過仍然是一個單線程。當redo産生量極大的情況下,單線程應該還是會存在瓶頸。

大家知道Oracle 12c版本之前,redo buffer的寫出程序lgwr也是單程序,從12c版本開始,Oracle引入了多程序模式,解決了最後一個瓶頸。

那麼多redolog flush線程會合并寫入嗎? 這裡看看過濾看看前面的strace跟蹤即可:

[[email protected] ~]# cat /tmp/io_thread.log |grep "10052" |grep pwrite64|more    
10052      0.000447 pwrite64(7, "\0,\2\0.*\2\0\337\233W\0236\240\332\10\1\0\0\0\0\0m\344\n\0\0\0\0\0m\344"..., 142336, 2559817216) = 142336
10052      0.000914 pwrite64(7, "\0R\4\0\334P\4\0\337\233W\0236\240\332\10\1\0\0\0\0\0n\344\n\0\0\0\0\0n\344"..., 283136, 2559959552) = 283136
10052      0.000270 pwrite64(7, "\0\354\7\0\23\352\7\0\337\233W\0236\240\332\10\1\0\0\0\0\0o\344\n\0\0\0\0\0o\344"..., 519168, 2560242688) = 519168
10052      0.000010 pwrite64(7, "\0\276\0\0F\275\0\0\337\233W\0236\240\332\10\1\0\0\0\0\0p\344\n\0\0\0\0\0p\344"..., 48640, 2560761856 <unfinished ...>
10052      0.003413 <... pwrite64 resumed> ) = 48640
10052      0.000116 pwrite64(7, "\0\4\0\0y\2\0\0\337\233W\0236\240\332\10\1\0\0\0\0\0q\344\n\0\0\0\0\0q\344"..., 1024, 2560810496) = 1024
10052      0.000527 pwrite64(7, "\0\2\0\0\257\0\0\0\337\233W\0236\240\332\10\1\0\0\0\0\0r\344\n\0\0\0\0\0r\344"..., 512, 2560811520) = 512
10052      0.000409 pwrite64(7, "\0\0\10\0\305\376\7\0\337\233W\0236\240\332\10\1\0\0\0\0\0s\344\n\0\0\0\0\0s\344"..., 524288, 2560812032 <unfinished ...>
10052      0.000564 <... pwrite64 resumed> ) = 524288
10052      0.000351 pwrite64(7, "\0\374\7\0\323\372\7\0\337\233W\0236\240\332\10\1\0\0\0\0\0t\344\n\0\0\0\0\0t\344"..., 523264, 2561336320) = 523264
10052      0.000373 pwrite64(7, "\0R\1\0&Q\1\0\337\233W\0236\240\332\10\1\0\0\0\0\0u\344\n\0\0\0\0\0u\344"..., 86528, 2561859584) = 86528
10052      0.002699 pwrite64(7, "\0\340\17\0\221\336\17\0\337\233W\0236\240\332\10\1\0\1\0\0\0v\344\n\0\0\0\0\0v\344"..., 1040384, 2561946112) = 1040384
10052      0.000019 pwrite64(7, "\0\336\17\0\334\334\17\0\337\233W\0236\240\332\10\1\0\1\0\0\0w\344\n\0\0\0\0\0w\344"..., 1039872, 2562986496 <unfinished ...>
10052      0.000426 <... pwrite64 resumed> ) = 1039872
10052      0.000392 pwrite64(7, "\0\4\3\0\234\3\3\0\337\233W\0236\240\332\10\1\0\0\0\0\0x\344\n\0\0\0\0\0x\344"..., 197632, 2564026368) = 197632
10052      0.000237 pwrite64(7, "\0\4\0\0\t\3\0\0\337\233W\0236\240\332\10\1\0\0\0\0\0y\344\n\0\0\0\0\0y\344"..., 1024, 2564224000) = 1024
10052      0.000125 pwrite64(7, "\0\2\0\0\257\0\0\0\337\233W\0236\240\332\10\1\0\0\0\0\0z\344\n\0\0\0\0\0z\344"..., 512, 2564225024) = 512
10052      0.002194 pwrite64(7, "\0\374\7\0005\373\7\0\337\233W\0236\240\332\10\1\0\0\0\0\0{\344\n\0\0\0\0\0{\344"..., 523264, 2564225536) = 523264
10052      0.001846 pwrite64(7, "\0\364\17\0!\363\17\0\337\233W\0236\240\332\10\1\0\1\0\0\0|\344\n\0\0\0\0\0|\344"..., 1045504, 2564748800) = 1045504
10052      0.001553 pwrite64(7, "\0\354\7\0\1\353\7\0\337\233W\0236\240\332\10\1\0\0\0\0\0}\344\n\0\0\0\0\0}\344"..., 519168, 2565794304) = 519168
10052      0.001214 pwrite64(7, "\0\362\7\0\240\360\7\0\337\233W\0236\240\332\10\1\0\0\0\0\0~\344\n\0\0\0\0\0~\344"..., 520704, 2566313472) = 520704
10052      0.000467 pwrite64(7, "\0\366\7\0\210\365\7\0\337\233W\0236\240\332\10\1\0\0\0\0\0\177\344\n\0\0\0\0\0\177\344"..., 521728, 2566834176) = 521728
10052      0.000331 pwrite64(7, "\0\354\7\0`\352\7\0\337\233W\0236\240\332\10\1\0\0\0\0\0\200\344\n\0\0\0\0\0\200\344"..., 519168, 2567355904) = 519168
10052      0.000304 pwrite64(7, "\0\360\7\0'\357\7\0\337\233W\0236\240\332\10\1\0\0\0\0\0\201\344\n\0\0\0\0\0\201\344"..., 520192, 2567875072) = 520192
10052      0.000895 pwrite64(7, "\0\\\5\0c[\5\0\337\233W\0236\240\332\10\1\0\0\0\0\0\202\344\n\0\0\0\0\0\202\344"..., 351232, 2568395264) = 351232
10052      0.000167 pwrite64(7, "\0\4\0\0\1\2\0\0\337\233W\0236\240\332\10\1\0\0\0\0\0\203\344\n\0\0\0\0\0\203\344"..., 1024, 2568746496) = 1024
10052      0.000169 pwrite64(7, "\0\2\0\0\257\0\0\0\337\233W\0236\240\332\10\1\0\0\0\0\0\204\344\n\0\0\0\0\0\204\344"..., 512, 2568747520) = 512
10052      0.000027 pwrite64(7, "\0\24\0\0\270\23\0\0\337\233W\0236\240\332\10\1\0\0\0\0\0\205\344\n\0\0\0\0\0\205\344"..., 5120, 2568748032) = 5120
10052      0.000026 pwrite64(7, "\0\16\0\0\200\f\0\0\337\233W\0236\240\332\10\1\0\0\0\0\0\206\344\n\0\0\0\0\0\206\344"..., 3584, 2568753152) = 3584
10052      0.000027 pwrite64(7, "\0\16\0\0\200\f\0\0\337\233W\0236\240\332\10\1\0\0\0\0\0\207\344\n\0\0\0\0\0\207\344"..., 3584, 2568756736) = 3584
10052      0.000027 pwrite64(7, "\0\16\0\0\200\f\0\0\337\233W\0236\240\332\10\1\0\0\0\0\0\210\344\n\0\0\0\0\0\210\344"..., 3584, 2568760320) = 3584
10052      0.000051 pwrite64(7, "\0\16\0\0\230\f\0\0\337\233W\0236\240\332\10\1\0\0\0\0\0\211\344\n\0\0\0\0\0\211\344"..., 3584, 2568763904) = 3584
10052      0.000027 pwrite64(7, "\0\16\0\0\200\f\0\0\337\233W\0236\240\332\10\1\0\0\0\0\0\212\344\n\0\0\0\0\0\212\344"..., 3584, 2568767488) = 3584
10052      0.000026 pwrite64(7, "\0\16\0\0008\f\0\0\337\233W\0236\240\332\10\1\0\0\0\0\0\213\344\n\0\0\0\0\0\213\344"..., 3584, 2568771072) = 3584
[[email protected] ~]# ls -ltr /proc/9539/fd
total 0
lrwx------. 1 dmdba dinstall 64 Aug 25 01:41 9 -> /opt/dm/dmdbms/data/enmotech/enmotech03.log
lrwx------. 1 dmdba dinstall 64 Aug 25 01:41 8 -> /opt/dm/dmdbms/data/enmotech/enmotech02.log
lrwx------. 1 dmdba dinstall 64 Aug 25 01:41 7 -> /opt/dm/dmdbms/data/enmotech/enmotech01.log
lrwx------. 1 dmdba dinstall 64 Aug 25 01:41 6 -> /opt/dm/dmdbms/data/enmotech/TEMP.DBF
lrwx------. 1 dmdba dinstall 64 Aug 25 01:41 5 -> /opt/dm/dmdbms/data/enmotech/SYSTEM.DBF
lrwx------. 1 dmdba dinstall 64 Aug 25 01:41 4 -> socket:[62747]
lrwx------. 1 dmdba dinstall 64 Aug 25 01:41 3 -> socket:[62746]
l-wx------. 1 dmdba dinstall 64 Aug 25 01:41 2 -> /opt/dmdbms/log/DmServicedmdb.log
lrwx------. 1 dmdba dinstall 64 Aug 25 01:41 17 -> socket:[73233]
l-wx------. 1 dmdba dinstall 64 Aug 25 01:41 16 -> pipe:[64988]
lr-x------. 1 dmdba dinstall 64 Aug 25 01:41 15 -> pipe:[64988]
l-wx------. 1 dmdba dinstall 64 Aug 25 01:41 14 -> pipe:[64987]
lr-x------. 1 dmdba dinstall 64 Aug 25 01:41 13 -> pipe:[64987]
lrwx------. 1 dmdba dinstall 64 Aug 25 01:41 12 -> /opt/dm/dmdbms/data/enmotech/MAIN.DBF
lrwx------. 1 dmdba dinstall 64 Aug 25 01:41 11 -> /opt/dm/dmdbms/data/enmotech/ROLL.DBF
lrwx------. 1 dmdba dinstall 64 Aug 25 01:41 10 -> /opt/dm/dmdbms/data/enmotech/enmotech01.dbf
l-wx------. 1 dmdba dinstall 64 Aug 25 01:41 1 -> /opt/dmdbms/log/DmServicedmdb.log
lr-x------. 1 dmdba dinstall 64 Aug 25 01:41 0 -> /dev/null
[[email protected] ~]#
           

可見達夢8中,redolog flush線程會進行日志緩沖的合并寫出。這也是正常的處理邏輯,否則性能上不去的。最後還有一些其他線程,如dm_chkpnt_thd 即檢查點線程;purge線程,則主要是負責復原段的清理;歸檔線程、審計線程等等。

總結:

看達夢資料庫的14類線程,不少核心線程原理似乎跟Oracle類似(實際上關系型資料庫應該都差不多);不過仍然有一些差别。

這裡不發表太多評論,防止被噴。從上面測試來看,監聽線程,redolog flush線程 是可能存在瓶頸的點。