分析日志中通路最大的前100IP!/bin/bash

2021-11-16 06:00:39

有一個10G大的apache通路日志，要求找出通路/stat.php面頁次數大于10000次的前100個IP。

日志樣本：211.11.129.181 — [26/Mar/2015:03:00:01 +0800] “GET/stat.php?pid=016 HTTP/1.1” 302 “-” “-” “”Mozllia/4.0(compatible;MSIE 6.0;Windows NT 5.1)”

分析：10G日志很大了，直接grep或者awk去分析肯定很慢，對記憶體消耗也是非常大的。若伺服器配置較低，可以考慮把日志切割，比如切割成100個100M的檔案，然後再針對這100個檔案分别去統計排名前100的ip，得出結果後合并到一個檔案中，再進行一次分析。

是以使用shell腳本來：

sta() {

grep ‘/stat.php’ $1|awk ‘{print $1}’|sort -n |unic -c|sort -n|tail -100

}

logfile=/data/logs/access.log

mkdir /data/logs/tmp

cd /data/logs

split -b 100M access.log smallfile

mv smallfile* tmp

cd tmp

for f in <code>ls smallfile*</code>

sta $f >> top100.txt

done

count_sum() {

sum=0

n=<code>grep “$1” $f|awk ‘{print $1}|wc -l</code>

sum=$[$sum+$n]

echo $sum $1

for ip in <code>awk ‘{print $2}’ top100.txt|sort -n |uniq</code>

count_sum $ip >> ip.txt

awk ‘$1>10000’ ip.txt|sort -nr |head -100

繼續閱讀