一、導讀
經過最近研究,發現對systemd如何利用cgroup的執行個體少之又少,而且,很多人搞不清,在el7上,如果想使用cgroup到底怎麼使用?到底該如何systemd為一個程序或者服務利用cgroup?
本文,實戰舉例詳解一個服務,是如何通過systemd來利用cgroup對cpu,memory,blockIO資源進行管理的。
但是,本文需要您對systemd中的cgroup概念有基本的了解,對systemd管理服務的方式有基本的了解。
另外,libcgroup也不能再rhel7上使用,使用的話,會造成不可預想的後果,取而代之的是systemd USING CONTROL GROUPS
WARNING The deprecated cgconfig tool from the libcgroup package is available to mount and handle hierarchies for controllers not yet supported by systemd (most notably the net-prio controller). Never use libcgropup tools to modify the default hierarchies mounted by systemd since it would lead to unexpected behavior. The libcgroup library will be removed in the future versions of Red Hat Enterprise Linux. For more information on how to use cgconfig, see Chapter 3, Using libcgroup Tools.
在rhel7後,libcgroup不再存在,被systemd取代,systemd提供了一些使用cgroup的方針,但是,沒有做到全部,比如,你不能通過systemd使用cpuset ,freezer,cpuset or freezer are currently not exposed at all due to the broken inheritance semantics of the kernel logic. Also, migrating units to a different slice at runtime is not supported (i.e. altering the Slice= property for running units) as the kernel currently lacks atomic cgroup subtree moves.
How is it possible to safely achieve the same result (exclusive cpu access/arbitrary cpusets for userland applications) ?
雖然,systemd不支援cpuset,但是相信以後會支援的,另外,現在有一個略顯笨拙,但是可以實作同樣的目标的方法:請見下文介紹
二、實戰
第一步:建立slice,service
使用systemd建立啟動一個cc.service
[root@localhost /home/ahao.mah/systemd]
#cat /usr/libexec/cc.py
#!/usr/bin/python
while True:
pass
[root@localhost /home/ahao.mah/systemd]
#chmod +x /usr/libexec/cc.py
建立cc.service的unit檔案
[root@localhost /home/ahao.mah/systemd]
#vim /etc/systemd/system/cc.service
[Unit]
Description=cc
ConditionFileIsExecutable=/usr/libexec/cc.py
[Service]
Type=simple
ExecStart=/usr/libexec/cc.py
[Install]
WantedBy=multi-user.target
啟動cc服務
[root@localhost /home/ahao.mah/systemd]
#systemctl restart cc
[root@localhost /home/ahao.mah/systemd]
#systemctl status cc
● cc.service - cc
Loaded: loaded (/etc/systemd/system/cc.service; disabled; vendor preset: disabled)
Active: active (running) since Fri 2016-08-26 11:00:12 CST; 5s ago
Main PID: 33542 (cc.py)
CGroup: /system.slice/cc.service
└─33542 /usr/bin/python /usr/libexec/cc.py
Aug 26 11:00:12 localhost systemd[1]: Started cc.
Aug 26 11:00:12 localhost systemd[1]: Starting cc...
cc服務跑滿了cpu
[root@localhost /home/ahao.mah/systemd]
#top
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
33542 root 20 0 125320 4532 2032 R 100.0 0.0 0:23.39 cc.py
[root@localhost /home/ahao.mah/systemd]
#mpstat -P ALL 1 10
Linux 3.10.0-327.alx2000.alxos7.x86_64 (localhost) 08/26/2016 _x86_64_ (24 CPU)
12:10:30 PM CPU %usr %nice %sys %iowait %irq %soft %steal %guest %gnice %idle
12:10:31 PM all 4.16 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 95.84
12:10:31 PM 0 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 100.00
12:10:31 PM 1 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 100.00
12:10:31 PM 2 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 100.00
12:10:31 PM 3 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 100.00
12:10:31 PM 4 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 100.00
12:10:31 PM 5 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 100.00
12:10:31 PM 6 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 100.00
12:10:31 PM 7 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 100.00
12:10:31 PM 8 100.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
12:10:31 PM 9 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 100.00
12:10:31 PM 10 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 100.00
12:10:31 PM 11 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 100.00
12:10:31 PM 12 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 100.00
12:10:31 PM 13 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 100.00
12:10:31 PM 14 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 100.00
12:10:31 PM 15 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 100.00
12:10:31 PM 16 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 100.00
12:10:31 PM 17 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 100.00
12:10:31 PM 18 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 100.00
12:10:31 PM 19 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 100.00
12:10:31 PM 20 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 100.00
12:10:31 PM 21 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 100.00
12:10:31 PM 22 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 100.00
12:10:31 PM 23 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 100.00
你會發現cc.service是通過systemd啟動的,是以,執行systemd-cgls,将會在system.slice下面。如果你不經過systemd執行/usr/libexec/cc.py,那麼,執行systemd-cgls,這個程序将屬于cgroup樹的user.slice下。
[root@localhost /home/ahao.mah/systemd]
#systemd-cgls
└─system.slice
├─cc.service
│ └─35480 /usr/bin/python /usr/libexec/cc.py
第二步:使用crgoup控制程序資源
首先,判斷cc服務,屬于cgroup樹的哪個分支,很明顯,我們既然沒有在配置改變過,那麼cc服務,一定屬于system.slice
[root@localhost /home/ahao.mah/systemd]
#systemctl show cc
Slice=system.slice
ControlGroup=/system.slice/cc.service
修改服務,所屬slice
[root@localhost /home/ahao.mah/systemd]
#vim /etc/systemd/system/cc.service
[Unit]
Description=cc
ConditionFileIsExecutable=/usr/libexec/cc.py
[Service]
Type=simple
ExecStart=/usr/libexec/cc.py
Slice=jiangyi.slice
[Install]
WantedBy=multi-user.target
[root@localhost /home/ahao.mah/systemd]
#systemd-cgl
├─jiangyi.slice
│ └─cc.service
│ └─37720 /usr/bin/python /usr/libexec/cc.py
然而,此時,我們并沒有為jiangyi.slice使用cgroup
[root@localhost /home/ahao.mah/systemd]
#lscgroup |grep jiangyi.slice
[root@localhost /home/ahao.mah/systemd]
#lscgroup |grep cc.service
在/etc/systemd/system/cc.service中添加CPUAccounting=yes。這是在宣布,jiangyi.slice,和jiangyi.slice下的cc.service,都将開始使用cgroup的cpu,cpuacct這個資源管理。
[root@localhost /home/ahao.mah/systemd]
#lscgroup |grep jiangyi
cpu,cpuacct:/jiangyi.slice
cpu,cpuacct:/jiangyi.slice/cc.service
[root@localhost /home/ahao.mah/systemd]
#lscgroup |grep cc.service
cpu,cpuacct:/jiangyi.slice/cc.service
然而,此時cc.service依然占用了cpu的100%,如下,都是這2個參數的預設值。其中,可以用 cpu.cfs_period_us 和 cpu.cfs_quota_us 來限制該組中的所有程序在機關時間裡可以使用的 cpu 時間。這裡的 cfs 是完全公平排程器的縮寫。cpu.cfs_period_us 就是時間周期,預設為 100000,即百毫秒。cpu.cfs_quota_us 就是在這期間内可使用的 cpu 時間,預設 -1,即無限制。
[root@localhost /home/ahao.mah/systemd]
#cat /sys/fs/cgroup/cpu/jiangyi.slice/cc.service/cpu.cfs_period_us
100000
[root@localhost /home/ahao.mah/systemd]
#cat /sys/fs/cgroup/cpu/jiangyi.slice/cc.service/cpu.cfs_quota_us
-1
是以,隻要執行如下2步,cc.service的cpu占用率就會立刻跌倒50%。
[root@localhost /home/ahao.mah/systemd]
#ps aux | grep cc
root 39402 99.8 0.0 125320 4536 ? Rs 11:21 5:38 /usr/bin/python /usr/libexec/cc.py
echo 50000 > /sys/fs/cgroup/cpu/jiangyi.slice/cc.service/cpu.cfs_quota_us
echo 39402 > /sys/fs/cgroup/cpu/jiangyi.slice/cc.service/tasks
[root@localhost /home/ahao.mah/systemd]
#top
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
39402 root 20 0 125320 4536 2032 R 50.2 0.0 7:57.40 cc.py
下面,開始考慮,如何通過systemd的unit檔案,利用cgroup管理資源呢?
第三步:systemd控制crgoup
systemd是如何使用cgroup的,這個問題困擾了很多的同學,systemd其實是通過UNIT檔案的配置,來使用cgroup的功能的,比如,使得cc.srevice利用cgroup的cpu,memory,blockIO的資源管理;需要的參數分别是:CPUAccounting=yes MemoryAccounting=yes TasksAccounting=yes BlockIOAccounting=yes
那麼,這些參數,在#man systemd.resource-control中,有詳細的解釋。
舉例:
[root@localhost /home/ahao.mah/systemd]
#cat /etc/systemd/system/cc.service
[Unit]
Description=cc
ConditionFileIsExecutable=/usr/libexec/cc.py
[Service]
Type=simple
ExecStart=/usr/libexec/cc.py
Slice=jiangyi.slice
CPUAccounting=yes
MemoryAccounting=yes
TasksAccounting=yes
BlockIOAccounting=yes
[Install]
WantedBy=multi-user.target
檢查cgroup樹中是否存在我們的cc.service,jiangyi.slice
[root@localhost /home/ahao.mah/systemd]
#lscgroup |grep jiangyi
cpu,cpuacct:/jiangyi.slice
cpu,cpuacct:/jiangyi.slice/cc.service
blkio:/jiangyi.slice
blkio:/jiangyi.slice/cc.service
memory:/jiangyi.slice
memory:/jiangyi.slice/cc.service
[root@localhost /home/ahao.mah/systemd]
#lscgroup |grep cc.service
cpu,cpuacct:/jiangyi.slice/cc.service
blkio:/jiangyi.slice/cc.service
memory:/jiangyi.slice/cc.service
cgroup的資訊,在systemctl status cc中也是有展現的。
[root@localhost /home/ahao.mah/systemd]
#systemctl status cc
● cc.service - cc
Loaded: loaded (/etc/systemd/system/cc.service; disabled; vendor preset: disabled)
Active: active (running) since Fri 2016-08-26 14:18:28 CST; 24s ago
Main PID: 84861 (cc.py)
Memory: 2.5M
CGroup: /jiangyi.slice/cc.service
└─84861 /usr/bin/python /usr/libexec/cc.py
Aug 26 14:18:28 localhost systemd[1]: Started cc.
Aug 26 14:18:28 localhost systemd[1]: Starting cc...
三、實際應用
3.1 限制cpu:cpu.shares
cc.service
[root@localhost /root]
#systemctl cat cc
# /etc/systemd/system/cc.service
[Unit]
Description=cc
ConditionFileIsExecutable=/usr/libexec/cc.py
[Service]
Type=simple
ExecStart=/usr/libexec/cc.py
Slice=jiangyi.slice
CPUAccounting=yes
MemoryAccounting=yes
TasksAccounting=yes
BlockIOAccounting=yes
[Install]
WantedBy=multi-user.target
ee.service
[root@localhost /root]
#systemctl cat ee
# /etc/systemd/system/ee.service
[Unit]
Description=ee
ConditionFileIsExecutable=/usr/libexec/ee.py
[Service]
Type=simple
ExecStart=/usr/libexec/cc.py
Slice=jiangyi.slice
CPUAccounting=yes
MemoryAccounting=yes
TasksAccounting=yes
BlockIOAccounting=yes
[Install]
WantedBy=multi-user.target
預設:cpu.shares都是1024
[root@localhost /root]
#cat /sys/fs/cgroup/cpu/jiangyi.slice/cpu.shares
1024
[root@localhost /root]
#cat /sys/fs/cgroup/cpu/jiangyi.slice/cc.service/cpu.shares
1024
[root@localhost /root]
#cat /sys/fs/cgroup/cpu/jiangyi.slice/ee.service/cpu.shares
1024
mpstat -P ALL 1 2:跑慢了2個cpu core
[root@localhost /root]
#mpstat -P ALL 1 2
Linux 3.10.0-327.alx2000.alxos7.x86_64 (localhost) 09/18/2016 _x86_64_ (24 CPU)
08:32:09 PM CPU %usr %nice %sys %iowait %irq %soft %steal %guest %gnice %idle
08:32:10 PM all 8.33 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 91.67
08:32:10 PM 0 0.00 0.00 0.99 0.00 0.00 0.00 0.00 0.00 0.00 99.01
08:32:10 PM 1 0.00 0.00 0.99 0.00 0.00 0.00 0.00 0.00 0.00 99.01
08:32:10 PM 2 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 100.00
08:32:10 PM 3 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 100.00
08:32:10 PM 4 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 100.00
08:32:10 PM 5 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 100.00
08:32:10 PM 6 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 100.00
08:32:10 PM 7 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 100.00
08:32:10 PM 8 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 100.00
08:32:10 PM 9 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 100.00
08:32:10 PM 10 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 100.00
08:32:10 PM 11 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 100.00
08:32:10 PM 12 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 100.00
08:32:10 PM 13 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 100.00
08:32:10 PM 14 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 100.00
08:32:10 PM 15 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 100.00
08:32:10 PM 16 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 100.00
08:32:10 PM 17 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 100.00
08:32:10 PM 18 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 100.00
08:32:10 PM 19 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 100.00
08:32:10 PM 20 100.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
08:32:10 PM 21 100.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
08:32:10 PM 22 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 100.00
08:32:10 PM 23 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 100.00
cpu.shares 不是限制程序能使用的絕對的 cpu 時間,而是控制各個組之間的配額
這裡先參考一下:用 cgroups 管理 cpu 資源
3.2 限制cpu:CPUQuota=40%
如下,僅僅CPUAccounting=yes MemoryAccounting=yes TasksAccounting=yes BlockIOAccounting=yes,打開這些統計不行,我們還要限制service對資源的使用;
[root@localhost /home/ahao.mah/systemd]
#cat /etc/systemd/system/cc.service
[Unit]
Description=cc
ConditionFileIsExecutable=/usr/libexec/cc.py
[Service]
Type=simple
ExecStart=/usr/libexec/cc.py
Slice=jiangyi.slice
CPUAccounting=yes
MemoryAccounting=yes
TasksAccounting=yes
BlockIOAccounting=yes
[Install]
WantedBy=multi-user.target
前面看到了,cc.service吃掉了一個cpu的100%,現在我們就限制它,新增參數:CPUQuota=40%
[root@localhost /root]
#cat /etc/systemd/system/cc.service
[Unit]
Description=cc
ConditionFileIsExecutable=/usr/libexec/cc.py
[Service]
Type=simple
ExecStart=/usr/libexec/cc.py
Slice=jiangyi.slice
CPUAccounting=yes
CPUQuota=40%
MemoryAccounting=yes
TasksAccounting=yes
BlockIOAccounting=yes
[Install]
WantedBy=multi-user.target
[root@localhost /root]
#systemctl daemon-reload
[root@localhost /root]
#systemctl restart cc.service
如下,你會發現,cc.service最多可以占用40%的單個cpu;
[root@localhost /root]
#mpstat -P ALL 1 3
Linux 3.10.0-327.alx2000.alxos7.x86_64 (localhost) 09/18/2016 _x86_64_ (24 CPU)
05:28:43 PM CPU %usr %nice %sys %iowait %irq %soft %steal %guest %gnice %idle
05:28:44 PM all 1.75 0.00 0.08 0.00 0.00 0.00 0.00 0.00 0.00 98.17
05:28:44 PM 0 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 100.00
05:28:44 PM 1 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 100.00
05:28:44 PM 2 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 100.00
05:28:44 PM 3 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 100.00
05:28:44 PM 4 0.00 0.00 0.99 0.00 0.00 0.00 0.00 0.00 0.00 99.01
05:28:44 PM 5 0.00 0.00 0.99 0.00 0.00 0.00 0.00 0.00 0.00 99.01
05:28:44 PM 6 40.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 60.00
05:28:44 PM 7 0.00 0.00 1.00 0.00 0.00 0.00 0.00 0.00 0.00 99.00
05:28:44 PM 8 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 100.00
05:28:44 PM 9 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 100.00
05:28:44 PM 10 0.99 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 99.01
05:28:44 PM 11 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 100.00
05:28:44 PM 12 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 100.00
05:28:44 PM 13 0.00 0.00 0.99 0.00 0.00 0.00 0.00 0.00 0.00 99.01
05:28:44 PM 14 0.00 0.00 0.99 0.00 0.00 0.00 0.00 0.00 0.00 99.01
05:28:44 PM 15 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 100.00
05:28:44 PM 16 1.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 99.00
05:28:44 PM 17 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 100.00
05:28:44 PM 18 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 100.00
05:28:44 PM 19 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 100.00
05:28:44 PM 20 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 100.00
05:28:44 PM 21 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 100.00
05:28:44 PM 22 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 100.00
05:28:44 PM 23 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 100.00
3.3 限制memory
記憶體蹭蹭蹭
[root@localhost /root]
#cat /usr/libexec/dd
#!/usr/bin/bash
x="a"
while [ True ];do
x=$x$x
done;
記憶體蹭蹭蹭到了2G
[root@localhost /root]
#systemctl status dd
● dd.service - dd
Loaded: loaded (/etc/systemd/system/dd.service; disabled; vendor preset: disabled)
Active: active (running) since Sun 2016-09-18 17:58:01 CST; 59s ago
Main PID: 53549 (dd)
Memory: 2.0G
CGroup: /jiangyi.slice/dd.service
└─53549 /usr/bin/bash /usr/libexec/dd
[root@localhost /root]
#pid=`ps -ef|grep cc|grep -v grep |awk '{print $2}'` ; vmrss=`cat /proc/${pid}/status|grep -i VmRSS|awk '{print $2}'`;vmrss_m=$(($vmrss/1024));echo $vmrss_m
4
限制最多使用記憶體200M
[root@localhost /root]
#cat /etc/systemd/system/dd.service
[Unit]
Description=dd
ConditionFileIsExecutable=/usr/libexec/cc.py
[Service]
Type=simple
ExecStart=/usr/libexec/dd
Slice=jiangyi.slice
CPUAccounting=yes
CPUQuota=40%
MemoryAccounting=yes
MemoryMax=100M
MemoryLimit=200M
TasksAccounting=yes
BlockIOAccounting=yes
[Install]
WantedBy=multi-user.target
[root@localhost /root]
#cat /sys/fs/cgroup/memory/jiangyi.slice/dd.service/memory.limit_in_bytes
209715200
如下,效果很明顯,發現MemoryMax=100M(最新款)沒有生效,MemoryLimit=200M(老款)生效了,那是因為,MemoryMax應該是 cgroup-v2的參數
[root@localhost /root]
#systemctl status dd
● dd.service - dd
Loaded: loaded (/etc/systemd/system/dd.service; disabled; vendor preset: disabled)
Active: active (running) since Sun 2016-09-18 19:44:42 CST; 27s ago
Main PID: 82182 (dd)
Memory: 199.8M (limit: 200.0M)
CGroup: /jiangyi.slice/dd.service
└─82182 /usr/bin/bash /usr/libexec/dd
觀察了一會兒,沒有被立刻OOM kill掉,大概等了一會兒,才被kill掉;
[root@localhost /root]
#systemctl status dd
● dd.service - dd
Loaded: loaded (/etc/systemd/system/dd.service; disabled; vendor preset: disabled)
Active: failed (Result: signal) since Sun 2016-09-18 20:00:06 CST; 10min ago
Process: 84350 ExecStart=/usr/libexec/dd (code=killed, signal=KILL)
Main PID: 84350 (code=killed, signal=KILL)
檢視日志,确實被OOM kill掉了
Sep 18 20:18:35 jiangyi02 kernel: dd invoked oom-killer: gfp_mask=0xd0, order=0, oom_score_adj=0
Sep 18 20:18:35 jiangyi02 kernel: dd cpuset=/ mems_allowed=0
Sep 18 20:18:35 jiangyi02 kernel: CPU: 0 PID: 89722 Comm: dd Not tainted 3.10.0-327.alx2000.alxos7.x86_64 #1
Sep 18 20:18:35 jiangyi02 kernel: Task in /jiangyi.slice/dd.service killed as a result of limit of /jiangyi.slice/dd.service
Sep 18 20:18:35 jiangyi02 kernel: Memory cgroup stats for /jiangyi.slice/dd.service: cache:0KB rss:204800KB rss_huge:0KB mapped_file:0KB swap:2097064KB inactive_anon:102532KB active_anon:102268KB inactive_file:0KB active_file:0KB unevictable:0KB
Sep 18 20:18:35 jiangyi02 kernel: [89722] 0 89722 684180 51453 1138 524310 0 dd
Sep 18 20:18:35 jiangyi02 kernel: Memory cgroup out of memory: Kill process 89722 (dd) score 972 or sacrifice child
Sep 18 20:18:35 jiangyi02 kernel: Killed process 89722 (dd) total-vm:2736720kB, anon-rss:204528kB, file-rss:1284kB
Sep 18 20:18:35 jiangyi02 systemd[1]: dd.service: main process exited, code=killed, status=9/KILL
Sep 18 20:18:35 jiangyi02 systemd[1]: Unit dd.service entered failed state.
Sep 18 20:18:35 jiangyi02 systemd[1]: dd.service failed.
如下,是MemoryMax=bytes(新款上市) MemoryLimit=bytes(老款)的解釋,
- MemoryMax=bytes 絕對剛性的限制該單元中的程序最多可以使用多少記憶體。這是一個不允許突破的剛性限制,觸碰此限制會導緻程序由于記憶體不足而被強制殺死。建議将 MemoryHigh= 用作主要的記憶體限制手段, 而将 MemoryMax= 用作不可突破的底線。
選項值可以是以位元組為機關的絕對記憶體大小(可以使用以1024為基數的 K, M, G, T 字尾), 也可以是以百分比表示的相對記憶體大小(相對于系統的全部實體記憶體), 還可以設為特殊值 “infinity” 表示不作限制。此選項控制着cgroup的 “memory.max” 屬性值,詳見 cgroup-v2.txt 文檔。
此選項隐含着 “MemoryAccounting=true”
此選項是新式資源控制選項。相當于舊式的 MemoryLimit= 選項。
- MemoryLimit=bytes 絕對剛性的限制該單元中的程序最多可以使用多少記憶體。這是一個不允許突破的剛性限制,觸碰此限制會導緻程序由于記憶體不足而被強制殺死。選項值可以是以位元組為機關的絕對記憶體大小(可以使用以1024為基數的 K, M, G, T 字尾), 也可以是以百分比表示的相對記憶體大小(相對于系統的全部實體記憶體), 還可以設為特殊值 “infinity” 表示不作限制。此選項控制着cgroup的 “memory.limit_in_bytes” 屬性值, 詳見 memory.txt 文檔。
此選項是舊式資源控制選項。建議使用新式的 MemoryMax= 選項。
除了MemoryLimit=bytes限制記憶體的參數外,還有其它的參數,可以參看這裡:看man手冊
限制How to use cgroup cpusets with systemd in RHEL7?
使用cpuset去指定一個service的cpu,目前systemd不支援,是以man手冊裡也沒有;
However, currently the cpuset interface is not exposed through system and, so as far as we can tell, this functionalxty is now not available: [2][3] Note that the number of cgroup attributes currently exposed as unit properties is limited. This will be extended later on, as their kernel interfaces are cleaned up. For example cpuset or freezer are currently not exposed at all due to the broken inheritance semantics of the kernel logic. Also, migrating units to a different slice at runtime is not supported (i.e. altering the Slice= property for running units) as the kernel currently lacks atomic cgroup subtree moves.
不過我們還是有一個變相的方法:如下方法摘自How to use cgroup cpusets with systemd in RHEL7?
[root@localhost /root]
#cat /usr/libexec/ff.py
#!/usr/bin/python
while True:
pass
[root@localhost /root]
#vim /usr/lib/systemd/system/ff.service
[Unit]
Description=ff
After=syslog.target network.target auditd.service
[Service]
ExecStartPre=/usr/bin/mkdir -p /sys/fs/cgroup/cpuset/mygroup1 ===> Create group to manage process
ExecStartPre=/bin/bash -c '/usr/bin/echo "4" > /sys/fs/cgroup/cpuset/mygroup1/cpuset.cpus' ==> Assign cpu core 3 to this process
ExecStartPre=/bin/bash -c '/usr/bin/echo "0" > /sys/fs/cgroup/cpuset/mygroup1/cpuset.mems'
ExecStart=/usr/libexec/ff.py ===> Run this process
ExecStartPost=/bin/bash -c '/usr/bin/echo $MAINPID > /sys/fs/cgroup/cpuset/mygroup1/tasks' ==> Assign process id to group task file
ExecStopPost=/usr/bin/rmdir /sys/fs/cgroup/cpuset/mygroup1 ==> At the time of stop remove group
Restart=on-failure
[Install]
WantedBy=multi-user.target
[root@localhost /root]
#systemctl daemon-reload
[root@localhost /root]
#systemctl restart ff
[root@localhost /root]
#mpstat -P ALL 1 2
Linux 3.10.0-327.alx2000.alxos7.x86_64 (localhost) 09/18/2016 _x86_64_ (24 CPU)
09:15:50 PM CPU %usr %nice %sys %iowait %irq %soft %steal %guest %gnice %idle
09:15:51 PM all 4.20 0.00 0.04 0.00 0.00 0.00 0.00 0.00 0.00 95.76
09:15:51 PM 0 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 100.00
09:15:51 PM 1 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 100.00
09:15:51 PM 2 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 100.00
09:15:51 PM 3 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 100.00
09:15:51 PM 4 100.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
09:15:51 PM 5 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 100.00
09:15:51 PM 6 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 100.00
09:15:51 PM 7 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 100.00
09:15:51 PM 8 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 100.00
09:15:51 PM 9 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 100.00
09:15:51 PM 10 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 100.00
09:15:51 PM 11 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 100.00
09:15:51 PM 12 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 100.00
09:15:51 PM 13 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 100.00
09:15:51 PM 14 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 100.00
09:15:51 PM 15 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 100.00
09:15:51 PM 16 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 100.00
09:15:51 PM 17 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 100.00
09:15:51 PM 18 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 100.00
09:15:51 PM 19 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 100.00
09:15:51 PM 20 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 100.00
09:15:51 PM 21 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 100.00
09:15:51 PM 22 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 100.00
09:15:51 PM 23 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 100.00
四、額外須知:systemd-run
使用systemd-run,建立臨時cgroup,舉例:建立一個service:toptest.service并且在test.slice下。
[root@localhost /home/ahao.mah/systemd]
#systemd-run --unit=toptest --slice=test top -b
Running as unit toptest.service.
├─test.slice
│ └─toptest.service
│ └─34670 /usr/bin/top -b
臨時的,也就是意味着UNIT都是臨時的。
#systemctl cat toptest.service
# /run/systemd/system/toptest.service
# Transient stub
# /run/systemd/system/toptest.service.d/50-Description.conf
[Unit]
Description=/usr/bin/top -b
# /run/systemd/system/toptest.service.d/50-ExecStart.conf
[Service]
ExecStart=
ExecStart=@/usr/bin/top "/usr/bin/top" "-b"
# /run/systemd/system/toptest.service.d/50-Slice.conf
[Service]
Slice=test.slice
[root@localhost /home/ahao.mah/systemd]
#lscgroup |grep test.slice
cpu,cpuacct:/--slice\x3dtest.slice
cpu,cpuacct:/test.slice
blkio:/--slice\x3dtest.slice
blkio:/test.slice
memory:/--slice\x3dtest.slice
memory:/test.slice
[root@localhost /home/ahao.mah/systemd]
#lscgroup |grep toptest.service
[root@localhost /home/ahao.mah/systemd]
#systemctl status toptest.service
● toptest.service - /usr/bin/top -b
Loaded: loaded (/run/systemd/system/toptest.service; static; vendor preset: disabled)
Drop-In: /run/systemd/system/toptest.service.d
└─50-Description.conf, 50-ExecStart.conf, 50-Slice.conf
Active: active (running) since Fri 2016-08-26 11:04:41 CST; 3h 29min ago
Main PID: 34670 (top)
CGroup: /test.slice/toptest.service
└─34670 /usr/bin/top -b
[root@localhost /home/ahao.mah/systemd]
#ll /sys/fs/cgroup/cpu/test.slice/
cgroup.clone_children cpuacct.proc_stat cpuacct.usage_percpu cpu.rt_period_us cpu.stat
cgroup.event_control cpuacct.stat cpu.cfs_period_us cpu.rt_runtime_us notify_on_release
cgroup.procs cpuacct.usage cpu.cfs_quota_us cpu.shares tasks
五、日常運維
停止一個service
#systemctl kill toptest.service --signal=SIGTERM
指令列界面設定參數
systemctl set-property httpd.service CPUShares=600 MemoryLimit=500M
希望此更改為臨時更改,請添加 –runtime
systemctl set-property --runtime name property=value
cgroup 動态描述
systemd-cgtop
六、參考
- Linux Programmer’s Manual CGROUPS(7)
- The New Control Group Interfaces @freedesktop.org
- 7u官網文檔

關注【ALiDataOps】
資料智能運維時代與你同行
微信掃一掃
關注該公衆号
:,。 視訊小程式 贊,輕點兩下取消贊在看,輕點兩下取消在看