天天看點

大資料平台運維之Pig

大資料系列之運維(自主搭建的大資料平台)

(5)Pig運維

  1. 在 master 節點安裝 Pig Clients,打開 Linux Shell 以 MapReduce 模式啟動它的 Grunt。
大資料平台運維之Pig
大資料平台運維之Pig
[[email protected] ~]# pig
           
  1. 在 master 節點安裝 Pig Clients,打開 Linux Shell 以 Local 模式啟動它的Grunt。
[[email protected] ~]# pig -x local
           
  1. 使用 Pig 工具在 Local 模式計算系統日志 access-log.txt 中的 IP 的點選數,要求使用 GROUP BY 語句按照 IP 進行分組,通過 FOREACH 運算符,對關系的列進行疊代,統計每個分組的總行數,最後使用 DUMP 語句查詢統計結果。
[[email protected] ~]# pig -x local
grunt> copyFromLocal /root/tiku/Pig/access-log.txt /user/root/access-log.txt
grunt> A = LOAD '/user/root/access-log.txt' USING PigStorage('\t') AS (ip,others);
grunt> group_ip = group A by ip;
grunt> result = foreach group_ip generate group,COUNT(A);
grunt> dump result;
(59.61.216.4 - - [11/May/2016:14:33:10 -0400] "GET /assets/fonts/fontawesome-webfont.woff?v=3.2.1 HTTP/1.1" 200 43572 "http://gs.1daoyun.com/assets/stylesheets/light-theme.css" "Mozilla/5.0 (Windows NT 6.1; WOW64; rv:46.0) Gecko/20100101 Firefox/46.0" "-",1)
(61.160.71.250 - - [11/May/2016:16:03:25 -0400] "GET /lms/myexam.action HTTP/1.1" 200 10762 "http://gs.1daoyun.com/lms/competionlist.action" "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/50.0.2661.87 Safari/537.36" "-",1)

           

copyFromLocal /root/tiku/Pig/access-log.txt /user/root/access-log.txt中

/root/tiku/Pig/access-log.txt是我的練習題庫的存放路徑,

/user/root/是pig的日志存放路徑,你在運作pig是可以檢視到你的日志存儲路徑

大資料平台運維之Pig

這裡的/root/ 指的是file的使用者。為root。因為我們使用本地模式開啟pig。

用MapReduce模式開啟pig就是在HDFS的使用者了 。

LOAD ‘/user/root/access-log.txt’ USING PigStorage(’\t’) AS (ip,others);中

//PigStorage(’\t’)文本裡是哪種方式分隔的就用同樣的分隔方式。

  1. 使用 Pig 工具計算天氣資料集 temperature.txt 中年度最高氣溫,要求使用GROUP BY 語句按照 year 進行分組,通過 FOREACH 運算符,對關系的列進行疊代,統計每個分組的最大值,最後使用 DUMP 語句查詢計算結果。
grunt> copyFromLocal /root/tiku/Pig/temperature.txt /user/root/temperature.txt
grunt> A = LOAD '/user/root/temperature.txt' USING PigStorage(' ')AS(year:int,temp:int);
grunt> group_year = group A by year;
grunt> result = foreach group_year generate group,MAX(A.temp);
grunt> dump result;
(1990,38)
(1991,38)
(1992,38)
(1993,39)
(1994,39)
(1995,39)
(1996,40)
(1997,33)
(1998,37)
(1999,38)
(2000,38)
(2001,40)
(2002,38)
(2003,40)
(2004,40)
(2005,38)
(2006,34)
(2007,39)
(2008,38)
(2009,39)
(2010,39)
(2011,36)
(2012,40)
(2013,36)
(2014,37)
(2015,39)

           
  1. 使用 Pig 工具統計資料集 ip_to_country 中每個國家的 IP 位址數。要求使用GROUP BY 語句按照國家進行分組,通過 FOREACH 運算符,對關系的列進行疊代,統計每個分組的 IP 位址數目,最後将統計結果儲存到/data/pig/output 目錄中,并檢視資料結果。
grunt> copyFromLocal /root/tiku/Pig/ip_to_country.txt /user/root/ip_to_country.txt
grunt> A = LOAD '/user/root/ip_to_country.txt' USING PigStorage('\t') AS (ip:chararray,country:chararray);
grunt> group_country = group A by country;
grunt> result = foreach group_country generate flatten(group),COUNT(A) as counts;
grunt> store result into '/data/pig/output';
grunt> dump result;

           

store result into ‘/data/pig/output’;中的結果路徑,

可以不用在HDFS上先建立。得保證這個路徑在HDFS上不存在。否則會報錯路徑已存在。

(Iraq,1)
(Oman,1)
(Peru,3)
(Chile,7)
(China,252)
(Egypt,6)
(Gabon,1)
(India,30)
(Italy,43)
(Japan,177)
(Macau,1)
(Nepal,1)
(Qatar,1)
(Spain,21)
(Yemen,2)
(Angola,2)
(Brazil,38)
(Canada,75)
(Europe,34)
(France,58)
(Greece,6)
(Israel,6)
(Kuwait,5)
(Latvia,1)
(Mexico,23)
(Norway,18)
(Poland,15)
(Serbia,1)
(Sweden,17)
(Taiwan,26)
(Turkey,16)
(Albania,1)
(Algeria,2)
(Austria,14)
(Bahrain,1)
(Belarus,1)
(Belgium,14)
(Croatia,2)
(Denmark,11)
(Ecuador,3)
(Estonia,2)
(Finland,13)
(Germany,89)
(Hungary,2)
(Iceland,1)
(Ireland,5)
(Morocco,19)
(Nigeria,1)
(Romania,13)
(Senegal,1)
(Tunisia,3)
(Ukraine,10)
(Uruguay,2)
(Vietnam,13)
(Barbados,1)
(Botswana,1)
(Bulgaria,6)
(Colombia,21)
(Malaysia,8)
(Pakistan,4)
(Portugal,3)
(Slovenia,2)
(Thailand,10)
(Argentina,13)
(Australia,68)
(Guatemala,1)
(Hong Kong,8)
(Indonesia,29)
(Lithuania,6)
(Macedonia,1)
(Mauritius,10)
(Singapore,5)
(Venezuela,4)
(Azerbaijan,1)
(Costa Rica,2)
(Kazakhstan,3)
(Martinique,1)
(Uzbekistan,1)
(Netherlands,28)
(New Zealand,9)
(Philippines,7)
(Switzerland,15)
(Saudi Arabia,4)
(South Africa,20)
(United States,1379)
(Czech Republic,7)
(United Kingdom,93)
(Anonymous Proxy,1)
(Dominican Republic,1)
(Korea, Republic of,70)
(Russian Federation,36)
(Satellite Provider,2)
(Moldova, Republic of,1)
(Syrian Arab Republic,1)
(United Arab Emirates,2)
(Bosnia and Herzegovina,1)
(Iran, Islamic Republic of,2)
(Tanzania, United Republic of,1)

           
運作pig相關作業:報錯
2020-04-02 13:02:59,410 [main] INFO  org.apache.hadoop.ipc.Client - Retrying connect to server: 0.0.0.0/0.0.0.0:10020. Already tried 9 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
2020-04-02 13:02:59,521 [main] INFO  org.apache.hadoop.mapred.ClientServiceDelegate - Application state is completed. FinalApplicationStatus=SUCCEEDED. Redirecting to job history server
可能是由于沒有開啟historyserver服務。
解決方法:
[[email protected] sbin]# ./mr-jobhistory-daemon.sh start historyserver
開啟historyserver服務。
           

在此感謝先電雲提供的題庫。

感謝Apache開源技術服務支援

感謝抛物線、mn525520、菜鳥一枚2019三位部落客的相關部落格。

繼續閱讀