天天看點

Linux四劍客之awk數組指派操作(因為awk是一行一行讀入的,相當是循環了整個檔案中的内容)指派完成後,需要通過循環的方式将其索引的次數周遊出來統計狀态碼出現的次數!/bin/bash

1.Awk數組概述

1.什麼是awk數組

數組其實也算是變量, 傳統的變量隻能存儲一個值,但數組可以存儲多個值。

2.awk數組應用場景

通常用來統計、比如:統計網站通路TOP10、網站Url通路TOP10等等等

3.awk數組統計技巧

1.在awk中,使用數組時,不僅可以使用1 2 3 ..n作為數組索引,也可以使用字元串作為數組索引。

2.要統計某個字段的值,就将該字段作為數組的索引,然後對索引進行周遊。

4.awk數組的文法

array_name[index]=value

5.awk數組示例

1.統計/etc/passwd中各種類型shell的數量

[root@test ~]# awk -F ":" '{a[$NF]++}END{for (i in a){printf "%-20s%s\n",i,a[i]}}' /etc/passwd

/bin/sync 1

/bin/bash 3

/sbin/nologin 21

/sbin/halt 1

/sbin/shutdown 1

或者寫到檔案裡

[root@test ~/2021-07-02]# cat jishu.awk

BEGIN{

FS=":"           

}

指派操作(因為awk是一行一行讀入的,相當是循環了整個檔案中的内容)

{

hosts[$NF]++           

指派完成後,需要通過循環的方式将其索引的次數周遊出來

END {

for (item in hosts) {
    print item,
    hosts[item]
}           

[root@test ~/2021-07-02]# awk -f jishu.awk /etc/passwd

/bin/sync 1

/bin/bash 3

/sbin/nologin 21

/sbin/halt 1

/sbin/shutdown 1

2.統計主機上所有的tcp連結狀态數,按照每個tcp狀态分類

[root@test ~]# netstat -an | grep tcp | awk '{arr[$6]++}END{for (i in arr) print i,arr[i]}'

LISTEN 4

ESTABLISHED 3

3.統計目前系統22端口連接配接狀态資訊。<目前時實狀态ss>

[root@test ~]# ss -an|awk '/:22/{tcp[$2]++} END {for(i in tcp){print i,tcp[i]}}'

LISTEN 2

ESTAB 3

4.統計目前通路的每個IP的數量<目前時實狀态 netstat,ss>

[root@sfy ~]# ss -an|awk '$5~/:443/{ips[$5]++} END {for(i in ips){print i,ips[i]}}'

172.17.188.85:443 324

*:443 1

2.Awk數組示例

Nginx日志分析,日志格式如下:

log_format main '$remote_addr - $remote_user [$time_local] "$request" '

'$status $body_bytes_sent "$http_referer" '
                  '"$http_user_agent" "$http_x_forwarded_for"';           

52.55.21.59 - - [25/Jan/2018:14:55:36 +0800] "GET /feed/ HTTP/1.1" 404 162 "https://www.buyao007.icu/" "Opera/9.80 (Macintosh; Intel Mac OS X 10.6.8; U; de) Presto/2.9.168 Version/11.52" "-"

1.僅使用awk統計一天内通路最多的10個IP

[root@test ~/2021-07-02]# cat access_ip.awk

ip[$1]++           

END{

for (i=30000;i>0;i--){
    for ( b in ip ){
        if (i==ip[b]){
            printf "%-15s%-10s%s\n",b,"出現的次數為",ip[b]
            c++
            break
        }
    #    if (c==10){exit}
    }
    if (c==10){break}
}           

[root@test ~/2021-07-02]# awk -f access_ip.awk access.log

58.220.223.62 出現的次數為 12049

112.64.171.98 出現的次數為 10856

114.83.184.139 出現的次數為 1982

117.136.66.10 出現的次數為 1662

115.29.245.13 出現的次數為 1318

223.104.5.197 出現的次數為 961

116.216.0.60 出現的次數為 957

180.111.48.14 出現的次數為 939

223.104.5.202 出現的次數為 871

223.104.4.139 出現的次數為 869

2.統計通路大于10000次的IP

[root@test ~/2021-07-02]# cat access_ip\>10000.awk

ip[$1]++           
for (i=30000;i>=10000;i--){
    for ( b in ip ){
        if (i==ip[b]){
            printf "%-15s%-10s%s\n",b,"出現的次數為",ip[b]
            c++
            break
        }
    #    if (c==10){exit}
    }
    if (c==10){break}
}           

[root@test ~/2021-07-02]# awk -f access_ip\>10000.awk access.log

3.統計通路最多的10個頁面($request top 10)

[root@test ~/2021-07-02]# cat ngx_request_top_10.awk

ip[$7]++           
for (i=5000;i>=1;i--){
    for ( b in ip ){
        if (i==ip[b]){
            printf "%-15s%-10s%s\n",b,"出現的次數為",ip[b]
            c++
            break
        }
    }
    if (c==10){break}
}           

[root@test ~]# awk -f ngx_request_top_10.awk access.log

/online/api/mc/cart/new/getCart.json出現的次數為 4838

/online/api/mc/sys/nowTime.json出現的次數為 3859

/online/mc/crm/integration/ance.json出現的次數為 2445

/online/api/mc/cart/save.json出現的次數為 1872

/ccbs/global/ontextPath.jsp出現的次數為 1797

/mobile/ account/tpl/footerTpl.html出現的次數為 1548

/online/aguage=zh_CNECIAL_MENU出現的次數為 1344

/mobile/tpl/productCategoryTpl.html出現的次數為 912

/ccbs/ plugins/images/loading.gif出現的次數為 838

/favicon.ico 出現的次數為 810

4.統計每個URL通路内容總大小($body_bytes_sent)

[root@test ~/2021-07-02]# cat ngx_request_body.awk

ip[$7]=$10+ip[$7]           
for ( b in ip ){
    printf "%-15s%-10s%s\n",b,"大小為",ip[b]/1024/1024"MB"
}           

[root@test ~/2021-07-02]# awk -f ngx_request_body.awk access.log|sort -k2 -nr|head

/mobe/the/odu/home/images/11/03.png大小為 109.826MB

/mobe/the/odu/home/images/11/04.png大小為 98.5827MB

/mobe/the/odu/home/images/11/02.png大小為 98.2485MB

/mobe/the/odu/home/images/11/05.png大小為 78.9912MB

/online/images/pro _90601.png大小為 63.8436MB

/online/odbproduct_90602.png大小為 63.4223MB

/online/oges/prodduct_90600.png大小為 62.222MB

/mobe/the/odu/11/4.png?v=21020大小為 59.6233MB

/mobe/the/odu/home/images/11/2.png大小為 59.1334MB

/mobe/the/odu/home/images/11/01.png大小為 57.6912MB

5.統計11月22日,每個IP通路狀态碼數量($status)

[root@test ~/2021-07-02]# cat ngx_ip_code.awk

$3~/22/Nov/

ip_code[$1" ""狀态碼"" "$9]++           
for ( item in ip_code ) {
    print item,ip_code[item]
}           

[root@test ~/2021-07-02]# awk -f ngx_ip_code.awk access.log |sort -k4 -rn|head|sort -k1.1,1.13 -nrk4|column -t

223.104.5.197 狀态碼 200 958

117.136.66.10 狀态碼 200 1655

116.216.0.60 狀态碼 200 951

114.83.184.139 狀态碼 200 1975

112.64.171.98 狀态碼 200 5803

112.64.171.98 狀态碼 304 3792

112.64.171.98 狀态碼 404 1148

58.220.223.62 狀态碼 304 6412

58.220.223.62 狀态碼 200 4421

58.220.223.62 狀态碼 404 956

6.統計每個ip通路狀态碼為404及出現的次數($status)

[root@test ~/2021-07-02]# cat ngx_status_top_404.awk

ip_code[$1" ""狀态碼"" "$9]++           
for ( item in ip_code ) {
if (item~/404$/){
    print item,ip_code[item]
}
}           

[root@test ~/2021-07-02]# awk -f ngx_status_top_404.awk access.log|sort -nrk4|head|sort -k1.1,1.13 -r -nrk4|column -t

219.145.184.196 狀态碼 404 51

182.34.127.126 狀态碼 404 51

175.25.171.26 狀态碼 404 46

140.206.89.150 狀态碼 404 45

115.29.245.13 狀态碼 404 89

114.141.164.19 狀态碼 404 51

114.111.166.19 狀态碼 404 60

112.64.171.98 狀态碼 404 1148

58.220.223.62 狀态碼 404 956

10.8.4.6 狀态碼 404 57

7.統計各種狀态碼數量

統計狀态碼出現的次數

[root@test ~/2021-07-02]# awk '{code[$9]++} END {for(i in code){print i,code[i]}}' access.log

408 13

301 146

302 789

304 18712

400 242

403 37

200 142666

413 50

404 3863

500 7

499 418

[root@test ~]# awk '{if ($9>=200 && $9<300) {j++}\

else if ($9>=300 && $9<400) {k++}\

else if ($9>=400 && $9<500) {n++}\

else if($9>=500) {p++}}\

END{\

printf "%s%s%s%s%s\n",j"\n",k"\n",n"\n",p"\n",i+j+k+n+p}' access.log

3.Awk數組案例

1.模拟生産環境資料腳本

[root@test ~/2021-07-02]# cat insert.sh

!/bin/bash

function create_random()

min=$1
max=$(($2-$min+1))
num=$(date +%s%N)
echo $(($num%$max+$min))           

INDEX=1

i=1

while [ $i -le 3000 ]

do

for user in liu guan zhang sun fuyang
do
    COUNT=$RANDOM
    NUM1=`create_random 1 $COUNT`
    NUM2=`expr $COUNT - $NUM1`
    echo "`date '+%Y-%m-%d %H:%M:%S'` $INDEX user: $user insert $COUNT records into datebase:product table:detail, insert $NUM1 records successfully,failed $NUM2 records" >> ./db.log.`date +%Y%m%d`
    INDEX=`expr $INDEX + 1`
done           

i=$[$i+1]

done

資料格式如下:

2021-07-04 16:11:15 5000 user: fuyang insert 4704 records into datebase:product table:detail, insert 1243 records successfully,failed 3461 records

需求1:統計每個人分别插入了多少條records進資料庫

[root@test ~/2021-07-02]# cat db1.awk

BEGIN {

printf "%-20s%-20s\n","User","Total records"           

}

success[$5]+=$7
#success[$5]=success[$5]+$7           
for (u in success)
printf "%-20s%-20d\n",u,success[u]           

[root@test ~/2021-07-02]# awk -f db1.awk db.log.20210704

User Total records

guan 16355784

liu 16838450

fuyang 16517171

sun 16383219

zhang 16444839

需求2:統計每個人分别插入成功了多少record,失敗了多少record

[root@test ~/2021-07-02]# cat db2.awk

printf "%-20s%-20s%-20s\n","User","Success","Failed"           
success[$5]+=$13
failed[$5]+=$16           
for (u in success) 
printf "%-20s%-20d%-20d\n",u,success[u],failed[u]           

[root@test ~/2021-07-02]# awk -f db2.awk db.log.20210704

User Success Failed

guan 8090983 8264801

liu 7972402 8866048

fuyang 8281769 8235402

sun 8102644 8280575

zhang 8408980 8035859

需求3:将需求1和需求2結合起來,一起輸出,輸出每個人分别插入多少條資料,多少成功,多少失敗,并且要格式化輸出,加上标題

[root@test ~/2021-07-02]# cat db3.awk

printf "%-10s%-10s%-10s%-10s\n","User","Total","Success","Failed"           
success[$5]+=$13
failed[$5]+=$16           
for (u in success) 
printf "%-10s%-10s%-10d%-10d\n",u,success[u]+failed[u],success[u],failed[u]           

[root@test ~/2021-07-02]# awk -f db3.awk db.log.20210704

User Total Success Failed

guan 16355784 8090983 8264801

liu 16838450 7972402 8866048

fuyang 16517171 8281769 8235402

sun 16383219 8102644 8280575

zhang 16444839 8408980 8035859

需求4:在例子3的基礎上,加上結尾,統計全部插入記錄數,成功記錄數,失敗記錄數。

[root@test ~/2021-07-02]# cat db4.awk

printf "%-10s%-10s%-10s%-10s\n","User","Total","Success","Failed"           
total[$5]+=$7
success[$5]+=$13
failed[$5]+=$16

#在原始資料進行統計累計
total_sum+=$7
success_sum+=$13
failed_sum+=$16             
for (u in success) {
printf "%-10s%-10s%-10d%-10d\n",u,total[u],success[u],failed[u]
}
printf "%-10s%-10s%-10d%-10d\n","total",total_sum,success_sum,failed_sum           

[root@test ~/2021-07-02]# awk -f db4.awk db.log.20210704

zhang 16444839 8408980 8035859

total 82539463 40856778 41682685