天天看點

shell指令行并行神器 - parallel

作者:弈秋的美好生活

### 概述

GNU parallel 是一個 shell 工具,用于使用一台或多台計算機并行執行作業。作業可以是單個指令或必須為輸入中的每一行運作的小腳本。典型的輸入是檔案清單、主機清單、使用者清單、URL 清單或表清單。作業也可以是從管道讀取的指令。 GNU parallel 然後可以拆分輸入并将其通過管道并行傳輸到指令中。

- 在 shell 中編寫循環,你會發現 GNU parallel 可以取代大部分循環,并通過并行運作多個作業來使它們運作得更快。

- 對于每一行輸入,GNU parallel 将以該行作為參數執行指令。如果沒有給出指令,則執行輸入行。多條線路将并行運作。

- GNU parallel 確定指令的輸出與順序運作指令的輸出相同。這使得使用 GNU parallel 的輸出作為其他程式的輸入成為可能。

![在這裡插入圖檔描述](https://img-blog.csdnimg.cn/8c06525096064280bf44917344ba86c5.png)

## 基本文法

熟悉xargs的同學對這個應該了解起來很快

### 1、生成五個檔案并重定向輸入

```bash

seq 5 | parallel seq {} '>' example.{}

# 回憶一下for 循環怎麼寫來着

# for i in `seq 5`;do echo `seq $i` > example-for.$i;done

```

### 2、parallel的輸入

::: 後面跟的是其從指令行的輸入

- parallel echo ::: 1 2 3 4 5

輸出是

```bash

1

2

3

4

5

```

- parallel wc ::: example.*

輸入是檔案名

```bash

1 1 2 example.1

2 2 4 example.2

3 3 6 example.3

4 4 8 example.4

5 5 10 example.5

```

wc 預設輸出解釋

```bash

wc example.3

3 3 6 example.3

#行數 單詞數 位元組數 檔案名

```

- parallel echo ::: S M L ::: Green Red

多個::: 輸入,輸出是排列組合

```bash

S Green

S Red

M Green

M Red

L Green

L Red

```

- find example.* -print | parallel echo File

parallel從标準輸入讀取

```bash

File example.1

File example.2

File example.3

File example.4

File example.5

```

### 3、和指令行的結合

```bash

# parallel echo counting lines';' wc -l ::: example.*

counting lines

1 example.1

counting lines

2 example.2

counting lines

3 example.3

counting lines

4 example.4

counting lines

5 example.5

```

用{}進行字元替換,這個是不是和xargs 很像

```bash

parallel echo test lines';' wc -l ::: example.*

test example.1

1 example.1

test example.2

2 example.2

test example.4

4 example.4

test example.3

3 example.3

test example.5

5 example.5

```

當有多個輸入的時候,使用{1} {2}

例如需要分别統計example.*中的行數和位元組數

```bash

# parallel echo count {1} in {2}';' wc {1} {2} ::: -l -c ::: example.*

count -l in example.1

1 example.1

count -l in example.2

2 example.2

count -l in example.3

3 example.3

count -l in example.4

4 example.4

count -l in example.5

5 example.5

count -c in example.1

2 example.1

count -c in example.2

4 example.2

count -c in example.3

6 example.3

count -c in example.4

8 example.4

count -c in example.5

10 example.5

```

--dry-run 測試

```bash

# parallel --dry-run echo count {1} {2} ';' wc {1} {2} ::: -c -l ::: example.*

# 看這個結果已經不是順序得了

echo count -c example.1 ; wc -c example.1

echo count -c example.2 ; wc -c example.2

echo count -c example.3 ; wc -c example.3

echo count -c example.5 ; wc -c example.5

echo count -c example.4 ; wc -c example.4

echo count -l example.1 ; wc -l example.1

echo count -l example.2 ; wc -l example.2

echo count -l example.3 ; wc -l example.3

echo count -l example.4 ; wc -l example.4

```

### 4、輸出

### 5、并行數量

當然這個是并行的,并行數設定多少合适呢?

預設值是和你的os 的cores相同。一般為了限制parallel占據所有的cpu資源,建議使用 --jobs限制其并發數,作為腳本的參數輸入比較常見

--jobs 0 竟可能多的并行

測試

```bash

# 并行為1,理論上就是5+4+3+2+1 =15 s

time parallel --jobs 1 sleep {}';' echo {} done ::: 5 4 3 1 2

# 并行為0,取決于最慢的那個sleep

time parallel --jobs 0 sleep {}';' echo {} done ::: 5 4 3 1 2

```

如果是五個job

```bash

Job slot 1: 55555

Job slot 2: 4444

Job slot 3: 333

Job slot 4: 1

Job slot 5: 22

```

### 6、處理大文本資料

将資料塊傳遞給标準輸入上

```bash

#seq 1000000 | parallel --pipe wc

165668 165668 1048571

149796 149796 1048572

149796 149796 1048572

149796 149796 1048572

149796 149796 1048572

149796 149796 1048572

85352 85352 597465

```

大約 1 MB 的塊傳遞給每個作業

1mb的行數、字元數、位元組數

## 實戰 并發docker run

并行啟動dokcer 容器進行redis key遷移,效能大幅度提升。

通過以下腳本可以體會到 parallel的魅力:

- 代替了shell中的循環

- 線程數量可控制(原生shell循環做不到)

- 多線程輸出可保證順序(原生shell循環做不到)

```bash

#!/bin/bash

# date 2023年2月9日17:57:20

# author ninesun

# desc parallel docker run

set -e

set -o pipefail

# 擷取到程式的絕對路徑

SCRIPT="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)/$(basename "${BASH_SOURCE[0]}")"

# parallel 并行數量

JOBS=${JOBS:-5}

ERRORS="$(pwd)/errors"

INFO="$(pwd)/info"

dockerrun() {

f=$1

docker rm -f redis-img-${f}

# echo ${f}

docker run --name redis-img-${f} 10.50.10.185/harbortest/redis-mig:1.2 python3 redisMigrate.py 10.50.10.45 19000 10.50.10.170 7100 \

${f} ::: "${files[@]}" > $INFO/dockerrun.log

}

echo

echo

main(){

# get the indexfile

IFS=#39;\n'

mapfile -t files < <(find ./ -name "st*.txt.*" -o -name "line*.txt.*" |sed 's|./||'| sort)

unset IFS

# docker run all jobs

echo "Running in parallel with ${JOBS} jobs."

# 開啟$jobs 個 /opt/redis-mig/redis_key_mig/mig-v2.sh dockerrun xxx.txt

parallel --tag --verbose --ungroup -j"${JOBS}" "$SCRIPT" dockerrun {1} ::: "${files[@]}"

if [[ ! -f "$ERRORS" ]]; then

echo "No errors, hooray!"

else

echo "[ERROR] Some images did not build correctly, see below." >&2

echo "These images failed: $(cat "$ERRORS")" >&2

exit 1

fi

}

run(){

args=$*

f=$1

if [[ "$f" == "" ]]; then

main "$args"

else

$args

fi

}

run "$@"

```

10個并發測試

```bash

./mig-v2.sh

Running in parallel with 10 jobs.

Academic tradition requires you to cite works you base your article on.

When using programs that use GNU Parallel to process data for publication

please cite:

O. Tange (2011): GNU Parallel - The Command-Line Power Tool,

;login: The USENIX Magazine, February 2011:42-47.

This helps funding further development; AND IT WON'T COST YOU A CENT.

If you pay 10000 EUR you should feel free to use GNU Parallel without citing.

To silence the citation notice: run 'parallel --bibtex'.

/opt/redis-mig/redis_key_mig/mig-v2.sh dockerrun lineurl.txt.000

/opt/redis-mig/redis_key_mig/mig-v2.sh dockerrun lineurl.txt.001

/opt/redis-mig/redis_key_mig/mig-v2.sh dockerrun lineurl.txt.002

/opt/redis-mig/redis_key_mig/mig-v2.sh dockerrun lineurl.txt.003

lineurl.txt.000

/opt/redis-mig/redis_key_mig/mig-v2.sh dockerrun lineurl.txt.004

/opt/redis-mig/redis_key_mig/mig-v2.sh dockerrun lineurl.txt.005

lineurl.txt.001

lineurl.txt.002

/opt/redis-mig/redis_key_mig/mig-v2.sh dockerrun lineurl.txt.006

/opt/redis-mig/redis_key_mig/mig-v2.sh dockerrun lineurl.txt.007

/opt/redis-mig/redis_key_mig/mig-v2.sh dockerrun lineurl.txt.008

lineurl.txt.004

lineurl.txt.003

/opt/redis-mig/redis_key_mig/mig-v2.sh dockerrun lineurl.txt.009

lineurl.txt.008

lineurl.txt.005

lineurl.txt.007

lineurl.txt.006

lineurl.txt.009

/opt/redis-mig/redis_key_mig/mig-v2.sh dockerrun lineurl.txt.010

lineurl.txt.010

/opt/redis-mig/redis_key_mig/mig-v2.sh dockerrun startline.txt.000

startline.txt.000

/opt/redis-mig/redis_key_mig/mig-v2.sh dockerrun startline.txt.001

startline.txt.001

/opt/redis-mig/redis_key_mig/mig-v2.sh dockerrun startline.txt.002

startline.txt.002

/opt/redis-mig/redis_key_mig/mig-v2.sh dockerrun startline.txt.003

startline.txt.003

/opt/redis-mig/redis_key_mig/mig-v2.sh dockerrun startline.txt.004

startline.txt.004

/opt/redis-mig/redis_key_mig/mig-v2.sh dockerrun startline.txt.005

startline.txt.005

/opt/redis-mig/redis_key_mig/mig-v2.sh dockerrun startline.txt.006

startline.txt.006

/opt/redis-mig/redis_key_mig/mig-v2.sh dockerrun startline.txt.007

startline.txt.007

/opt/redis-mig/redis_key_mig/mig-v2.sh dockerrun startline.txt.008

startline.txt.008

/opt/redis-mig/redis_key_mig/mig-v2.sh dockerrun startline.txt.009

startline.txt.009

/opt/redis-mig/redis_key_mig/mig-v2.sh dockerrun startline.txt.010

startline.txt.010

No errors, hooray!

```

10個線程 壓測性能

2g2c 本地拉起的虛拟機

10mins遷移完成

![在這裡插入圖檔描述](https://img-blog.csdnimg.cn/86817486d0be4d2385b30dad6702f413.png)

```bash

]# bash -x mig-v2.sh

+ set -e

+ set -o pipefail

+++ dirname mig-v2.sh

++ cd .

++ pwd

++ basename mig-v2.sh

+ SCRIPT=/opt/redis-mig/redis_key_mig/mig-v2.sh

+ JOBS=1

++ pwd

+ ERRORS=/opt/redis-mig/redis_key_mig/errors

++ pwd

+ INFO=/opt/redis-mig/redis_key_mig/info

+ echo

+ echo

+ run

+ args=

+ f=

+ [[ '' == '' ]]

+ main ''

+ IFS='

'

+ mapfile -t files

++ find ./ -name 'st*.txt.*' -o -name 'line*.txt.*'

++ sed 's|./||'

++ sort

+ unset IFS

+ echo

+ echo 'Running in parallel with 1 jobs.'

Running in parallel with 1 jobs.

+ parallel --tag --verbose --ungroup -j1 /opt/redis-mig/redis_key_mig/mig-v2.sh dockerrun '{1}' ::: lineurl.txt.000 lineurl.txt.001 lineurl.txt.002 lineurl.txt.003 lineurl.txt.004 lineurl.txt.005 lineurl.txt.006 lineurl.txt.007 lineurl.txt.008 lineurl.txt.009 lineurl.txt.010 startline.txt.000 startline.txt.001 startline.txt.002 startline.txt.003 startline.txt.004 startline.txt.005 startline.txt.006 startline.txt.007 startline.txt.008 startline.txt.009 startline.txt.010

Academic tradition requires you to cite works you base your article on.

When using programs that use GNU Parallel to process data for publication

please cite:

O. Tange (2011): GNU Parallel - The Command-Line Power Tool,

;login: The USENIX Magazine, February 2011:42-47.

This helps funding further development; AND IT WON'T COST YOU A CENT.

If you pay 10000 EUR you should feel free to use GNU Parallel without citing.

To silence the citation notice: run 'parallel --bibtex'.

/opt/redis-mig/redis_key_mig/mig-v2.sh dockerrun lineurl.txt.000

lineurl.txt.000

/opt/redis-mig/redis_key_mig/mig-v2.sh dockerrun lineurl.txt.001

lineurl.txt.001

/opt/redis-mig/redis_key_mig/mig-v2.sh dockerrun lineurl.txt.002

```

## 參考

GNU_Parallel_2018.pdf

https://www.gnu.org/software/parallel/