### 概述
GNU parallel 是一個 shell 工具,用于使用一台或多台計算機并行執行作業。作業可以是單個指令或必須為輸入中的每一行運作的小腳本。典型的輸入是檔案清單、主機清單、使用者清單、URL 清單或表清單。作業也可以是從管道讀取的指令。 GNU parallel 然後可以拆分輸入并将其通過管道并行傳輸到指令中。
- 在 shell 中編寫循環,你會發現 GNU parallel 可以取代大部分循環,并通過并行運作多個作業來使它們運作得更快。
- 對于每一行輸入,GNU parallel 将以該行作為參數執行指令。如果沒有給出指令,則執行輸入行。多條線路将并行運作。
- GNU parallel 確定指令的輸出與順序運作指令的輸出相同。這使得使用 GNU parallel 的輸出作為其他程式的輸入成為可能。
![在這裡插入圖檔描述](https://img-blog.csdnimg.cn/8c06525096064280bf44917344ba86c5.png)
## 基本文法
熟悉xargs的同學對這個應該了解起來很快
### 1、生成五個檔案并重定向輸入
```bash
seq 5 | parallel seq {} '>' example.{}
# 回憶一下for 循環怎麼寫來着
# for i in `seq 5`;do echo `seq $i` > example-for.$i;done
```
### 2、parallel的輸入
::: 後面跟的是其從指令行的輸入
- parallel echo ::: 1 2 3 4 5
輸出是
```bash
1
2
3
4
5
```
- parallel wc ::: example.*
輸入是檔案名
```bash
1 1 2 example.1
2 2 4 example.2
3 3 6 example.3
4 4 8 example.4
5 5 10 example.5
```
wc 預設輸出解釋
```bash
wc example.3
3 3 6 example.3
#行數 單詞數 位元組數 檔案名
```
- parallel echo ::: S M L ::: Green Red
多個::: 輸入,輸出是排列組合
```bash
S Green
S Red
M Green
M Red
L Green
L Red
```
- find example.* -print | parallel echo File
parallel從标準輸入讀取
```bash
File example.1
File example.2
File example.3
File example.4
File example.5
```
### 3、和指令行的結合
```bash
# parallel echo counting lines';' wc -l ::: example.*
counting lines
1 example.1
counting lines
2 example.2
counting lines
3 example.3
counting lines
4 example.4
counting lines
5 example.5
```
用{}進行字元替換,這個是不是和xargs 很像
```bash
parallel echo test lines';' wc -l ::: example.*
test example.1
1 example.1
test example.2
2 example.2
test example.4
4 example.4
test example.3
3 example.3
test example.5
5 example.5
```
當有多個輸入的時候,使用{1} {2}
例如需要分别統計example.*中的行數和位元組數
```bash
# parallel echo count {1} in {2}';' wc {1} {2} ::: -l -c ::: example.*
count -l in example.1
1 example.1
count -l in example.2
2 example.2
count -l in example.3
3 example.3
count -l in example.4
4 example.4
count -l in example.5
5 example.5
count -c in example.1
2 example.1
count -c in example.2
4 example.2
count -c in example.3
6 example.3
count -c in example.4
8 example.4
count -c in example.5
10 example.5
```
--dry-run 測試
```bash
# parallel --dry-run echo count {1} {2} ';' wc {1} {2} ::: -c -l ::: example.*
# 看這個結果已經不是順序得了
echo count -c example.1 ; wc -c example.1
echo count -c example.2 ; wc -c example.2
echo count -c example.3 ; wc -c example.3
echo count -c example.5 ; wc -c example.5
echo count -c example.4 ; wc -c example.4
echo count -l example.1 ; wc -l example.1
echo count -l example.2 ; wc -l example.2
echo count -l example.3 ; wc -l example.3
echo count -l example.4 ; wc -l example.4
```
### 4、輸出
### 5、并行數量
當然這個是并行的,并行數設定多少合适呢?
預設值是和你的os 的cores相同。一般為了限制parallel占據所有的cpu資源,建議使用 --jobs限制其并發數,作為腳本的參數輸入比較常見
--jobs 0 竟可能多的并行
測試
```bash
# 并行為1,理論上就是5+4+3+2+1 =15 s
time parallel --jobs 1 sleep {}';' echo {} done ::: 5 4 3 1 2
# 并行為0,取決于最慢的那個sleep
time parallel --jobs 0 sleep {}';' echo {} done ::: 5 4 3 1 2
```
如果是五個job
```bash
Job slot 1: 55555
Job slot 2: 4444
Job slot 3: 333
Job slot 4: 1
Job slot 5: 22
```
### 6、處理大文本資料
将資料塊傳遞給标準輸入上
```bash
#seq 1000000 | parallel --pipe wc
165668 165668 1048571
149796 149796 1048572
149796 149796 1048572
149796 149796 1048572
149796 149796 1048572
149796 149796 1048572
85352 85352 597465
```
大約 1 MB 的塊傳遞給每個作業
1mb的行數、字元數、位元組數
## 實戰 并發docker run
并行啟動dokcer 容器進行redis key遷移,效能大幅度提升。
通過以下腳本可以體會到 parallel的魅力:
- 代替了shell中的循環
- 線程數量可控制(原生shell循環做不到)
- 多線程輸出可保證順序(原生shell循環做不到)
```bash
#!/bin/bash
# date 2023年2月9日17:57:20
# author ninesun
# desc parallel docker run
set -e
set -o pipefail
# 擷取到程式的絕對路徑
SCRIPT="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)/$(basename "${BASH_SOURCE[0]}")"
# parallel 并行數量
JOBS=${JOBS:-5}
ERRORS="$(pwd)/errors"
INFO="$(pwd)/info"
dockerrun() {
f=$1
docker rm -f redis-img-${f}
# echo ${f}
docker run --name redis-img-${f} 10.50.10.185/harbortest/redis-mig:1.2 python3 redisMigrate.py 10.50.10.45 19000 10.50.10.170 7100 \
${f} ::: "${files[@]}" > $INFO/dockerrun.log
}
echo
echo
main(){
# get the indexfile
IFS=#39;\n'
mapfile -t files < <(find ./ -name "st*.txt.*" -o -name "line*.txt.*" |sed 's|./||'| sort)
unset IFS
# docker run all jobs
echo "Running in parallel with ${JOBS} jobs."
# 開啟$jobs 個 /opt/redis-mig/redis_key_mig/mig-v2.sh dockerrun xxx.txt
parallel --tag --verbose --ungroup -j"${JOBS}" "$SCRIPT" dockerrun {1} ::: "${files[@]}"
if [[ ! -f "$ERRORS" ]]; then
echo "No errors, hooray!"
else
echo "[ERROR] Some images did not build correctly, see below." >&2
echo "These images failed: $(cat "$ERRORS")" >&2
exit 1
fi
}
run(){
args=$*
f=$1
if [[ "$f" == "" ]]; then
main "$args"
else
$args
fi
}
run "$@"
```
10個并發測試
```bash
./mig-v2.sh
Running in parallel with 10 jobs.
Academic tradition requires you to cite works you base your article on.
When using programs that use GNU Parallel to process data for publication
please cite:
O. Tange (2011): GNU Parallel - The Command-Line Power Tool,
;login: The USENIX Magazine, February 2011:42-47.
This helps funding further development; AND IT WON'T COST YOU A CENT.
If you pay 10000 EUR you should feel free to use GNU Parallel without citing.
To silence the citation notice: run 'parallel --bibtex'.
/opt/redis-mig/redis_key_mig/mig-v2.sh dockerrun lineurl.txt.000
/opt/redis-mig/redis_key_mig/mig-v2.sh dockerrun lineurl.txt.001
/opt/redis-mig/redis_key_mig/mig-v2.sh dockerrun lineurl.txt.002
/opt/redis-mig/redis_key_mig/mig-v2.sh dockerrun lineurl.txt.003
lineurl.txt.000
/opt/redis-mig/redis_key_mig/mig-v2.sh dockerrun lineurl.txt.004
/opt/redis-mig/redis_key_mig/mig-v2.sh dockerrun lineurl.txt.005
lineurl.txt.001
lineurl.txt.002
/opt/redis-mig/redis_key_mig/mig-v2.sh dockerrun lineurl.txt.006
/opt/redis-mig/redis_key_mig/mig-v2.sh dockerrun lineurl.txt.007
/opt/redis-mig/redis_key_mig/mig-v2.sh dockerrun lineurl.txt.008
lineurl.txt.004
lineurl.txt.003
/opt/redis-mig/redis_key_mig/mig-v2.sh dockerrun lineurl.txt.009
lineurl.txt.008
lineurl.txt.005
lineurl.txt.007
lineurl.txt.006
lineurl.txt.009
/opt/redis-mig/redis_key_mig/mig-v2.sh dockerrun lineurl.txt.010
lineurl.txt.010
/opt/redis-mig/redis_key_mig/mig-v2.sh dockerrun startline.txt.000
startline.txt.000
/opt/redis-mig/redis_key_mig/mig-v2.sh dockerrun startline.txt.001
startline.txt.001
/opt/redis-mig/redis_key_mig/mig-v2.sh dockerrun startline.txt.002
startline.txt.002
/opt/redis-mig/redis_key_mig/mig-v2.sh dockerrun startline.txt.003
startline.txt.003
/opt/redis-mig/redis_key_mig/mig-v2.sh dockerrun startline.txt.004
startline.txt.004
/opt/redis-mig/redis_key_mig/mig-v2.sh dockerrun startline.txt.005
startline.txt.005
/opt/redis-mig/redis_key_mig/mig-v2.sh dockerrun startline.txt.006
startline.txt.006
/opt/redis-mig/redis_key_mig/mig-v2.sh dockerrun startline.txt.007
startline.txt.007
/opt/redis-mig/redis_key_mig/mig-v2.sh dockerrun startline.txt.008
startline.txt.008
/opt/redis-mig/redis_key_mig/mig-v2.sh dockerrun startline.txt.009
startline.txt.009
/opt/redis-mig/redis_key_mig/mig-v2.sh dockerrun startline.txt.010
startline.txt.010
No errors, hooray!
```
10個線程 壓測性能
2g2c 本地拉起的虛拟機
10mins遷移完成
![在這裡插入圖檔描述](https://img-blog.csdnimg.cn/86817486d0be4d2385b30dad6702f413.png)
```bash
]# bash -x mig-v2.sh
+ set -e
+ set -o pipefail
+++ dirname mig-v2.sh
++ cd .
++ pwd
++ basename mig-v2.sh
+ SCRIPT=/opt/redis-mig/redis_key_mig/mig-v2.sh
+ JOBS=1
++ pwd
+ ERRORS=/opt/redis-mig/redis_key_mig/errors
++ pwd
+ INFO=/opt/redis-mig/redis_key_mig/info
+ echo
+ echo
+ run
+ args=
+ f=
+ [[ '' == '' ]]
+ main ''
+ IFS='
'
+ mapfile -t files
++ find ./ -name 'st*.txt.*' -o -name 'line*.txt.*'
++ sed 's|./||'
++ sort
+ unset IFS
+ echo
+ echo 'Running in parallel with 1 jobs.'
Running in parallel with 1 jobs.
+ parallel --tag --verbose --ungroup -j1 /opt/redis-mig/redis_key_mig/mig-v2.sh dockerrun '{1}' ::: lineurl.txt.000 lineurl.txt.001 lineurl.txt.002 lineurl.txt.003 lineurl.txt.004 lineurl.txt.005 lineurl.txt.006 lineurl.txt.007 lineurl.txt.008 lineurl.txt.009 lineurl.txt.010 startline.txt.000 startline.txt.001 startline.txt.002 startline.txt.003 startline.txt.004 startline.txt.005 startline.txt.006 startline.txt.007 startline.txt.008 startline.txt.009 startline.txt.010
Academic tradition requires you to cite works you base your article on.
When using programs that use GNU Parallel to process data for publication
please cite:
O. Tange (2011): GNU Parallel - The Command-Line Power Tool,
;login: The USENIX Magazine, February 2011:42-47.
This helps funding further development; AND IT WON'T COST YOU A CENT.
If you pay 10000 EUR you should feel free to use GNU Parallel without citing.
To silence the citation notice: run 'parallel --bibtex'.
/opt/redis-mig/redis_key_mig/mig-v2.sh dockerrun lineurl.txt.000
lineurl.txt.000
/opt/redis-mig/redis_key_mig/mig-v2.sh dockerrun lineurl.txt.001
lineurl.txt.001
/opt/redis-mig/redis_key_mig/mig-v2.sh dockerrun lineurl.txt.002
```
## 參考
GNU_Parallel_2018.pdf
https://www.gnu.org/software/parallel/