天天看點

mongoDB and numa policy interleave

今天看到hellodba的一篇關于numa架構下單伺服器跑多執行個體mysql的文章位址http://www.hellodb.net/tag/numa

我發現mongodb也有類似情形,可能需要在啟動資料庫的時候加一個調整numa記憶體配置設定政策的設定,如下:

su - mongo -c "numactl --interleave=all mongod -f /opt/mongo/conf/mongod5281.conf"

這樣的話記憶體配置設定政策由預設的default修改為interleave模式,具體可以參考不同的模式的意思。

通過檢視程序的numa_maps

一個實體兩個node的numa硬體架構,已經啟用了interleave政策如下:

cat /proc/$pid/numa_maps

00400000 interleave=0-1 file=/opt/mongo/bin/mongod mapped=539 active=210 n0=269 n1=270

00d46000 interleave=0-1 file=/opt/mongo/bin/mongod anon=12 dirty=12 mapped=23 active=12 n0=12 n1=11

00d63000 interleave=0-1 anon=15 dirty=15 n0=7 n1=8

11d18000 interleave=0-1 heap anon=45 dirty=45 active=44 n0=24 n1=21

4021f000 interleave=0-1

40220000 interleave=0-1 anon=2 dirty=2 n0=1 n1=1

41131000 interleave=0-1

41132000 interleave=0-1 anon=2 dirty=2 n0=1 n1=1

41b32000 interleave=0-1

41b33000 interleave=0-1 anon=2 dirty=2 n0=1 n1=1

42533000 interleave=0-1

42534000 interleave=0-1 anon=2 dirty=2 n0=1 n1=1

42f34000 interleave=0-1

42f35000 interleave=0-1 anon=2 dirty=2 n0=1 n1=1

43935000 interleave=0-1

43936000 interleave=0-1 anon=2 dirty=2 n0=1 n1=1

44336000 interleave=0-1

44337000 interleave=0-1 anon=5 dirty=5 n0=3 n1=2

44d37000 interleave=0-1

44d38000 interleave=0-1 anon=3 dirty=3 n0=1 n1=2

45738000 interleave=0-1

45739000 interleave=0-1 anon=4 dirty=4 n0=2 n1=2

46139000 interleave=0-1

4613a000 interleave=0-1 anon=3 dirty=3 n0=1 n1=2

46b3a000 interleave=0-1

46b3b000 interleave=0-1 anon=3 dirty=3 n0=2 n1=1

3adea00000 interleave=0-1 file=/lib64/ld-2.5.so mapped=20 mapmax=44 n0=20

3adec1b000 interleave=0-1 file=/lib64/ld-2.5.so anon=1 dirty=1 n1=1

3adec1c000 interleave=0-1 file=/lib64/ld-2.5.so anon=1 dirty=1 n0=1

3adee00000 interleave=0-1 file=/lib64/libc-2.5.so mapped=126 mapmax=49 n0=126

3adef4e000 interleave=0-1 file=/lib64/libc-2.5.so

3adf14e000 interleave=0-1 file=/lib64/libc-2.5.so anon=2 dirty=2 mapped=4 mapmax=26 n0=3 n1=1

3adf152000 interleave=0-1 file=/lib64/libc-2.5.so anon=1 dirty=1 n0=1

3adf153000 interleave=0-1 anon=5 dirty=5 n0=2 n1=3

3adf600000 interleave=0-1 file=/lib64/libpthread-2.5.so mapped=16 mapmax=21 n0=16

3adf616000 interleave=0-1 file=/lib64/libpthread-2.5.so

3adf815000 interleave=0-1 file=/lib64/libpthread-2.5.so anon=1 dirty=1 n1=1

3adf816000 interleave=0-1 file=/lib64/libpthread-2.5.so anon=1 dirty=1 n0=1

3adf817000 interleave=0-1 anon=1 dirty=1 n0=1

3adfa00000 interleave=0-1 file=/lib64/libm-2.5.so mapped=27 mapmax=10 active=11 n0=27

3adfa82000 interleave=0-1 file=/lib64/libm-2.5.so

3adfc81000 interleave=0-1 file=/lib64/libm-2.5.so anon=1 dirty=1 n1=1

3adfc82000 interleave=0-1 file=/lib64/libm-2.5.so anon=1 dirty=1 n0=1

3aed000000 interleave=0-1 file=/lib64/libgcc_s-4.1.2-20080825.so.1 mapped=2 mapmax=5 n0=2

3aed00d000 interleave=0-1 file=/lib64/libgcc_s-4.1.2-20080825.so.1

3aed20d000 interleave=0-1 file=/lib64/libgcc_s-4.1.2-20080825.so.1 anon=1 dirty=1 n1=1

3af0c00000 interleave=0-1 file=/usr/lib64/libstdc++.so.6.0.8 mapped=66 mapmax=3 active=65 n0=66

3af0ce6000 interleave=0-1 file=/usr/lib64/libstdc++.so.6.0.8

3af0ee5000 interleave=0-1 file=/usr/lib64/libstdc++.so.6.0.8 anon=5 dirty=5 mapped=6 mapmax=2 n0=4 n1=2

3af0eeb000 interleave=0-1 file=/usr/lib64/libstdc++.so.6.0.8 anon=3 dirty=3 n0=1 n1=2

3af0eee000 interleave=0-1 anon=3 dirty=3 n0=1 n1=2

2aaaaaac5000 interleave=0-1 file=/lib64/libnss_files-2.5.so mapped=5 mapmax=16 n0=5

2aaaaaacf000 interleave=0-1 file=/lib64/libnss_files-2.5.so

2aaaaacce000 interleave=0-1 file=/lib64/libnss_files-2.5.so anon=1 dirty=1 n1=1

2aaaaaccf000 interleave=0-1 file=/lib64/libnss_files-2.5.so anon=1 dirty=1 n0=1

2aaaac000000 interleave=0-1 anon=17 dirty=17 n0=7 n1=10

2aaaac021000 interleave=0-1

2aaab0000000 interleave=0-1 file=/data1/mongodata/local/local.ns

2aaaf0000000 interleave=0-1 file=/data1/mongodata/local/local.ns mapped=262144 active=262132 n0=131072 n1=131072

2aab30000000 interleave=0-1 file=/data1/mongodata/local/local.0

2aab34000000 interleave=0-1 file=/data1/mongodata/local/local.0 mapped=3 active=1 n0=2 n1=1

2aab38000000 interleave=0-1 file=/data1/mongodata/local/local.1

2aabb7f00000 interleave=0-1 file=/data1/mongodata/local/local.1 mapped=2 active=0 n1=2

2aac37e00000 interleave=0-1 file=/data1/mongodata/local/local.2

2aacb7d00000 interleave=0-1 file=/data1/mongodata/local/local.2 mapped=2 active=0 n0=2

2aad37c00000 interleave=0-1 file=/data1/mongodata/local/local.3

2aadb7b00000 interleave=0-1 file=/data1/mongodata/local/local.3 mapped=2 active=0 n0=2

2aae37a00000 interleave=0-1 file=/data1/mongodata/local/local.4

2aaeb7900000 interleave=0-1 file=/data1/mongodata/local/local.4 mapped=2 active=0 n0=2

2aaf37800000 interleave=0-1 file=/data1/mongodata/local/local.5

2aafb7700000 interleave=0-1 file=/data1/mongodata/local/local.5 mapped=2 active=0 n0=2

2b3f9d648000 interleave=0-1 anon=3 dirty=3 n0=2 n1=1

2b3f9d662000 interleave=0-1 anon=4 dirty=4 n0=2 n1=2

7fff14f57000 interleave=0-1 stack anon=6 dirty=6 n0=3 n1=3

7fff14f76000 interleave=0-1 mapped=1 mapmax=32 active=0 n0=1

另一台伺服器,不調整numa記憶體配置設定政策啟動mongodb,檢視結果如下 : 

00400000 default file=/opt/mongo/bin/mongod mapped=542 n0=525 n1=17

00d46000 default file=/opt/mongo/bin/mongod anon=12 dirty=12 mapped=24 n0=12 n1=12

00d63000 default anon=15 dirty=15 n1=15

1e78b000 default heap anon=38 dirty=38 n0=3 n1=35

40c02000 default

40c03000 default anon=2 dirty=2 n0=1 n1=1

41603000 default

41604000 default anon=2 dirty=2 n0=1 n1=1

42004000 default

42005000 default anon=2 dirty=2 active=0 n1=2

42a05000 default

42a06000 default anon=2 dirty=2 n0=1 n1=1

43406000 default

43407000 default anon=2 dirty=2 n0=2

43e07000 default

43e08000 default anon=2 dirty=2 n0=2

44808000 default

44809000 default anon=5 dirty=5 n0=5

45209000 default

4520a000 default anon=4 dirty=4 active=3 n0=1 n1=3

45c0a000 default

45c0b000 default anon=2 dirty=2 n0=2

32c4e00000 default file=/lib64/ld-2.5.so mapped=20 mapmax=46 n0=20

32c501b000 default file=/lib64/ld-2.5.so anon=1 dirty=1 n1=1

32c501c000 default file=/lib64/ld-2.5.so anon=1 dirty=1 n1=1

32c5200000 default file=/lib64/libc-2.5.so mapped=124 mapmax=51 n0=120 n1=4

32c534e000 default file=/lib64/libc-2.5.so

32c554e000 default file=/lib64/libc-2.5.so anon=2 dirty=2 mapped=4 mapmax=27 n0=2 n1=2

32c5552000 default file=/lib64/libc-2.5.so anon=1 dirty=1 n1=1

32c5553000 default anon=5 dirty=5 n0=1 n1=4

32c5a00000 default file=/lib64/libpthread-2.5.so mapped=16 mapmax=21 n1=16

32c5a16000 default file=/lib64/libpthread-2.5.so

32c5c15000 default file=/lib64/libpthread-2.5.so anon=1 dirty=1 n1=1

32c5c16000 default file=/lib64/libpthread-2.5.so anon=1 dirty=1 n1=1

32c5c17000 default anon=1 dirty=1 n1=1

32c5e00000 default file=/lib64/libm-2.5.so mapped=27 mapmax=10 n0=14 n1=13

32c5e82000 default file=/lib64/libm-2.5.so

32c6081000 default file=/lib64/libm-2.5.so anon=1 dirty=1 n1=1

32c6082000 default file=/lib64/libm-2.5.so anon=1 dirty=1 n1=1

32d4000000 default file=/lib64/libgcc_s-4.1.2-20080825.so.1 mapped=9 mapmax=5 n0=7 n1=2

32d400d000 default file=/lib64/libgcc_s-4.1.2-20080825.so.1

32d420d000 default file=/lib64/libgcc_s-4.1.2-20080825.so.1 anon=1 dirty=1 n1=1

32d7000000 default file=/usr/lib64/libstdc++.so.6.0.8 mapped=77 mapmax=3 n0=77

32d70e6000 default file=/usr/lib64/libstdc++.so.6.0.8

32d72e5000 default file=/usr/lib64/libstdc++.so.6.0.8 anon=5 dirty=5 mapped=6 mapmax=2 n0=1 n1=5

32d72eb000 default file=/usr/lib64/libstdc++.so.6.0.8 anon=3 dirty=3 n1=3

32d72ee000 default anon=3 dirty=3 n1=3

2aaaaaac5000 default file=/lib64/libnss_files-2.5.so mapped=5 mapmax=18 n1=5

2aaaaaacf000 default file=/lib64/libnss_files-2.5.so

2aaaaacce000 default file=/lib64/libnss_files-2.5.so anon=1 dirty=1 n0=1

2aaaaaccf000 default file=/lib64/libnss_files-2.5.so anon=1 dirty=1 n0=1

2aaaac000000 default anon=18 dirty=18 active=13 n0=7 n1=11

2aaaac021000 default

2aaab0000000 default file=/opt/mongodata/local/local.ns

2aaaf0000000 default file=/opt/mongodata/local/local.ns mapped=262144 n0=237592 n1=24552

2aab30000000 default file=/opt/mongodata/local/local.0

2aab34000000 default file=/opt/mongodata/local/local.0 mapped=2 n0=2

2aab38000000 default file=/opt/mongodata/local/local.1

2aabb7f00000 default file=/opt/mongodata/local/local.1 mapped=1 n0=1

2aac37e00000 default file=/opt/mongodata/local/local.2

2aacb7d00000 default file=/opt/mongodata/local/local.2 mapped=1 n0=1

2aad37c00000 default file=/opt/mongodata/local/local.3

2aadb7b00000 default file=/opt/mongodata/local/local.3 mapped=1 n1=1

2aae37a00000 default file=/opt/mongodata/local/local.4

2aaeb7900000 default file=/opt/mongodata/local/local.4 mapped=1 n1=1

2aaf37800000 default file=/opt/mongodata/local/local.5

2aafb7700000 default file=/opt/mongodata/local/local.5 mapped=1 n1=1

2b2a3fa63000 default anon=3 dirty=3 n1=3

2b2a3fa7d000 default anon=5 dirty=5 n1=5

7fffe3123000 default stack anon=7 dirty=7 n1=7

7fffe31fc000 default mapped=1 mapmax=34 active=0 n0=1

【參考】

numactl

description

       numactl runs processes with a specific numa scheduling or memory placement policy.  the policy is set for  com-

       mand and inherited by all of its children.  in addition it can set persistent policy for shared memory segments

       or files.

       policy settings are:

       --interleave=nodes, -i nodes

              set an memory interleave policy. memory will be allocated using round robin on nodes.  when memory  can-

              not be allocated on the current interleave target fall back to other nodes.

       --membind=nodes, -m nodes

              only  allocate  memory  from  nodes.   allocation will fail when there is not enough memory available on

              these nodes.

       --cpunodebind=nodes, -n nodes

              only execute process on the cpus of nodes.  note that nodes may consist of multiple cpus.

       --physcpubind=cpus, -c cpus

              only execute process on cpus.  this accepts physical cpu numbers as shown in  the  processor  fields  of

              /proc/cpuinfo.

       --localalloc, -l

              do always local allocation on the current node.

不好說哪個政策絕對是好的,隻有适合什麼場景的。

前段時間還因為numa政策沒有使用好,導緻oracle資料庫的某些程序僅使用本地node的記憶體,另一個node有free的記憶體卻不使用的情況。進而swap空間被占用。性能下降。其實這種情況下可以考慮interleave政策。