天天看點

測試必會之 Linux 三劍客之 awk

awk = “Aho Weiberger and Kernighan” 三個作者的姓的第一個字母

awk 是 Linux 下的一個指令,同時也是一種語言解析引擎

awk 具備完整的程式設計特性。比如執行指令,網絡請求等

精通 awk,是一個 Linux 工作者的必備技能

文法:awk ‘pattern{action}’

awk pattern文法

  • awk 理論上可以代替 grep
  • awk ‘pattern{action}’ ,預設以空格分隔   大括号外代表正則,大括号内代表動作,多個動作可以寫多個大括号,但必須在一個‘’内

常用内置變量

FS                 設定輸入域分隔符,等價于指令行 -F選項
NF                 浏覽記錄的域的個數(列數)
NR                 已讀的記錄數(行數)
ARGC               指令行參數個數
OFS                輸出域分隔符
ORS                輸出記錄分隔符
RS                 控制記錄分隔符
ARGV               指令行參數排列
ENVIRON            支援隊列中系統環境變量的使用
FILENAME           awk浏覽的檔案名
FNR 浏覽檔案的記錄數           
awk ‘BBEGIN{}END{}’ 開始和結束

           
awk ‘/Running/’ 正則比對

           
awk ‘/aa/,/bb/’ 區間選擇

           

awk的字段資料處理

  • -F 參數指定字段分隔符
  • BEGIN{FS=‘_’} 也可以表示分隔符
$0 代表原來的行

           
$1 代表第一個字段

           
$N 代表第N個字段

           

一個例子

chenshifengdeMacBook-Pro:~ chenshifeng$ echo "111 222|333 444|555 666"|awk 'BEGIN{RS="|"}{print $0}'
111 222
333 444
555 666           

下面以一個在nginx.log中查找傳回狀态碼非200的請求響應數目的需求為例,示範awk的基礎用法

有一份nginx.log檔案,打開後内容格式如下:

220.181.108.111 - - [05/Dec/2018:00:11:42 +0000] "GET /topics/15225/show_wechat HTTP/1.1" 200 1684 "-" "Mozilla/5.0 (compatible; Baiduspider/2.0; +http://www.baidu.com/search/spider.html)" 0.029 0.029 .
216.244.66.241 - - [05/Dec/2018:00:11:42 +0000] "GET /topics/10052/replies/85845/reply_suggest HTTP/1.1" 301 5 "-" "Mozilla/5.0 (compatible; DotBot/1.1; http://www.opensiteexplorer.org/dotbot, [email protected])" 0.016 0.016 .
216.244.66.241 - - [05/Dec/2018:00:11:42 +0000] "GET /topics/10040?order_by=created_at HTTP/1.1" 301 5 "-" "Mozilla/5.0 (compatible; DotBot/1.1; http://www.opensiteexplorer.org/dotbot, [email protected])" 0.002 0.002 .
216.244.66.241 - - [05/Dec/2018:00:11:42 +0000] "GET /topics/10043/replies/85544/reply_suggest HTTP/1.1" 301 5 "-" "Mozilla/5.0 (compatible; DotBot/1.1; http://www.opensiteexplorer.org/dotbot, [email protected])" 0.001 0.001 .
216.244.66.241 - - [05/Dec/2018:00:11:44 +0000] "GET /topics/10075/replies/89029/edit HTTP/1.1" 301 5 "-" "Mozilla/5.0 (compatible; DotBot/1.1; http://www.opensiteexplorer.org/dotbot, [email protected])" 0.001 0.001 .
216.244.66.241 - - [05/Dec/2018:00:11:44 +0000] "GET /topics/10075/replies/89631/edit HTTP/1.1" 301 5 "-" "Mozilla/5.0 (compatible; DotBot/1.1; http://www.opensiteexplorer.org/dotbot, [email protected])" 0.001 0.001 .
216.244.66.241 - - [05/Dec/2018:00:11:45 +0000] "GET /topics/10075?order_by=created_at HTTP/1.1" 301 5 "-" "Mozilla/5.0 (compatible; DotBot/1.1; http://www.opensiteexplorer.org/dotbot, [email protected])" 0.000 0.000 .
216.244.66.241 - - [05/Dec/2018:00:11:45 +0000] "GET /topics/10075?order_by=like HTTP/1.1" 301 5 "-" "Mozilla/5.0 (compatible; DotBot/1.1; http://www.opensiteexplorer.org/dotbot, [email protected])" 0.001 0.001 .
223.71.41.98 - - [05/Dec/2018:00:11:46 +0000] "GET /cable HTTP/1.1" 101 60749 "-" "Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:63.0) Gecko/20100101 Firefox/63.0" 2608.898 2608.898 .
113.87.161.17 - - [05/Dec/2018:00:11:39 +0000] "GET /cable HTTP/1.1" 101 3038 "-" "Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/67.0.3396.62 Safari/537.36" 112.418 112.418 .
216.244.66.241 - - [05/Dec/2018:00:11:46 +0000] "GET /topics/10079/replies/119591/edit HTTP/1.1" 301 5 "-" "Mozilla/5.0 (compatible; DotBot/1.1; http://www.opensiteexplorer.org/dotbot, [email protected])" 0.001 0.001 .
216.244.66.241 - - [05/Dec/2018:00:11:46 +0000] "GET /topics/10089?locale=zh-TW HTTP/1.1" 301 5 "-" "Mozilla/5.0 (compatible; DotBot/1.1; http://www.opensiteexplorer.org/dotbot, [email protected])" 0.002 0.002 .           

觀察log内容,可以發現,以空格為分隔符,狀态碼在第九個字段位置;這裡我們用awk指令從第九個字段位置開始比對非200的狀态碼并列印出來。指令:

chenshifengdeMacBook-Pro:~ chenshifeng$ 

awk '$9!~/200/{print $9}' nginx.log301

301
           
301
           
301
           
301
           
301
           
301
           
301
           
301
           
......#剩餘部分省略

           

再對取出的資料進行排序->去重->按數字的倒叙進行排列。指令:

awk '$9!~/200/{print $9}' nginx.log | sort | uniq -c | sort -nr

           

指令含義:

sort: 按從小到大進行排序

           
uniq -c :去重(相鄰)

           
-nr: 按數字進行倒叙排序

           
-n:按數字進行排序

           

結果展示:

chenshifengdeMacBook-Pro:~ chenshifeng$ awk '$9!~/200/{print $9}' nginx.log | sort | uniq -c | sort -nr
    433 101
    304 301
    266 404
    152 302
      7 401
      5 304
      2 499
      2 422
      1 500           

再結合 awk ‘BBEGIN{}END{}’ 指令,以統計目前使用者數目的例子來展示指令用法

使用 

cat /etc/passwd

 指令來檢視本機使用者,我們需要提取出使用者名稱并加上數字序号顯示出來,達到這種效果:

1 nobody2 root
           
3 daemon
           
4 _uucp
           
5 _taskgated
           
6 _networkd
           
7 _installassistant
           
8 _lp
           
9 _postfix
           
......
           

使用者資訊:

chenshifengdeMacBook-Pro:~ chenshifeng$ cat /etc/passwd 
##
# User Database
# 
# Note that this file is consulted directly only when the system is running
# in single-user mode.  At other times this information is provided by
# Open Directory.
#
# See the opendirectoryd(8) man page for additional information about
# Open Directory.
##
nobody:*:-2:-2:Unprivileged User:/var/empty:/usr/bin/false
root:*:0:0:System Administrator:/var/root:/bin/sh
daemon:*:1:1:System Services:/var/root:/usr/bin/false
_uucp:*:4:4:Unix to Unix Copy Protocol:/var/spool/uucp:/usr/sbin/uucico
_taskgated:*:13:13:Task Gate Daemon:/var/empty:/usr/bin/false
_networkd:*:24:24:Network Services:/var/networkd:/usr/bin/false
_installassistant:*:25:25:Install Assistant:/var/empty:/usr/bin/false
_lp:*:26:26:Printing Services:/var/spool/cups:/usr/bin/false
_postfix:*:27:27:Postfix Mail Server:/var/spool/postfix:/usr/bin/false
_scsd:*:31:31:Service Configuration Service:/var/empty:/usr/bin/false
_ces:*:32:32:Certificate Enrollment Service:/var/empty:/usr/bin/false
_appstore:*:33:33:Mac App Store Service:/var/db/appstore:/usr/bin/false
_mcxalr:*:54:54:MCX AppLaunch:/var/empty:/usr/bin/false
_appleevents:*:55:55:AppleEvents Daemon:/var/empty:/usr/bin/false
_geod:*:56:56:Geo Services Daemon:/var/db/geod:/usr/bin/false
_devdocs:*:59:59:Developer Documentation:/var/empty:/usr/bin/false......省略           

思路:

* 利用sed删除前10行注釋
* 利用awk将取出第一列使用者及行數;
* 注意:              cat /etc/passwd                列印出的結果中,最上方的注釋需要處理跳過      
chenshifengdeMacBook-Pro:~ chenshifeng$  sed '1,10d' /etc/passwd| awk -F ':' '{print NR,$1}' 
1 nobody
2 root
3 daemon
4 _uucp
5 _taskgated
6 _networkd
7 _installassistant
8 _lp
9 _postfix
10 _scsd
11 _ces
12 _appstore
13 _mcxalr
14 _appleevents
.......           

有如下檔案,該檔案為微信朋友圈頁面檔案

<node index="0" text="随風" resource-id="com.tencent.mm:id/e3x" class="android.widget.TextView" package="com.tencent.mm" content-desc="" checkable="false" checked="false" clickable="false" enabled="true" focusable="false" focused="false" scrollable="false" long-clickable="false" password="false" selected="false" bounds="[166,743][1040,805]" /></node>
<node index="1" text="哈哈哈哈哈哈" resource-id="com.tencent.mm:id/b_e" class="android.widget.LinearLayout" package="com.tencent.mm" content-desc="" checkable="false" checked="false" clickable="true" enabled="true" focusable="true" focused="false" scrollable="false" long-clickable="false" password="false" selected="false" bounds="[166,813][1048,867]">           

提取使用者名和所發的朋友圈

$ demo='<node index="0" text="随風" resource-id="com.tencent.mm:id/e3x" class="android.widget.TextView" package="com.tencent.mm" content-desc="" checkable="false" checked="false" clickable="false" enabled="true" focusable="false" focused="false" scrollable="false" long-clickable="false" password="false" selected="false" bounds="[166,743][1040,805]" /></node>
> <node index="1" text="哈哈哈哈哈哈" resource-id="com.tencent.mm:id/b_e" class="android.widget.LinearLayout" package="com.tencent.mm" content-desc="" checkable="false" checked="false" clickable="true" enabled="true" focusable="true" focused="false" scrollable="false" long-clickable="false" password="false" selected="false" bounds="[166,813][1048,867]">'
$ echo $demo|sed 's#><#>|<#g' |awk 'BEGIN{RS="|"}{print $0}' |awk -F\" 'BEGIN{OFS="\t"}/e3x/{name=$4}/b_e/{msg=$4;print name,"|",msg}'
随風    |    哈哈哈哈哈哈