最近在監控中發現HiveServer2連接配接到zookeeper裡的連接配接持續上漲,很奇怪,雖然知道HiveServer2支援并發連接配接,使用ZooKeeper來管理Hive表的讀寫鎖,但我們的環境并不需要這些,我們已經關閉并發功能,以下是線上的配置,甚至把這些值都改成final了。

但是zookeeper連接配接依然會漲。後來想想,我們要通路的表是hive去映射的hbase,hiveserver2什麼時候去連接配接zookeeper,它連接配接zookeeper幹麼,先從日志下手,将線上日志級别改為了debug,然後在hiveserver2.log發現了如下資訊:
2016-02-23 14:03:30,271 DEBUG [HiveServer2-Background-Pool: Thread-598-SendThread(hadoop002:2181)]: zookeeper.ClientCnxn (ClientCnxn.java:readResponse(717)) - Got ping response for sessionid: 0x252fd37100600d2 after 0ms
2016-02-23 14:03:30,325 DEBUG [HiveServer2-Background-Pool: Thread-797-SendThread(hadoop003:2181)]: zookeeper.ClientCnxn (ClientCnxn.java:readResponse(717)) - Got ping response for sessionid: 0x352fd3707b600e3 after 0ms
2016-02-23 14:03:30,626 DEBUG [HiveServer2-Background-Pool: Thread-1138-SendThread(hadoop003:2181)]: zookeeper.ClientCnxn (ClientCnxn.java:readResponse(717)) - Got ping response for sessionid: 0x352fd3707b600e8 after 0ms
2016-02-23 14:03:30,768 DEBUG [HiveServer2-Background-Pool: Thread-730-SendThread(hadoop001:2181)]: zookeeper.ClientCnxn (ClientCnxn.java:readResponse(717)) - Got ping response for sessionid: 0x152fd3707c800db after 0ms
2016-02-23 14:03:32,751 DEBUG [HiveServer2-Background-Pool: Thread-461-SendThread(hadoop001:2181)]: zookeeper.ClientCnxn (ClientCnxn.java:readResponse(717)) - Got ping response for sessionid: 0x152fd3707c800d5 after 0ms
2016-02-23 14:03:33,057 DEBUG [HiveServer2-Background-Pool: Thread-1211-SendThread(hadoop002:2181)]: zookeeper.ClientCnxn (ClientCnxn.java:readResponse(717)) - Got ping response for sessionid: 0x252fd37100600dd after 0ms
這是個線程池,由SessionManager建立,但它是在何時建立的,從日志裡一時不好看出來,是以在我們測試環境裡對HiveServer2搞了個遠端調試,啟用遠端調試步驟:
在/etc/hive/conf/conf.server下hive-env.sh裡上方添加:#add by lidong for remote debug
export HADOOP_OPTS="$HADOOP_OPTS -Xdebug -Xrunjdwp:transport=dt_socket,server=y,suspend=n,address=8888 -XX:NewRatio=12 -XX:MaxHeapFreeRatio=40 -XX:MinHeapFreeRatio=15 -XX:-UseGCOverheadLimit"
又經過近2天的折騰,終于搞明白了這個zookeeper連接配接是在Hive工程裡的MapRedTask的execute(DriverContext driverContext) 方法裡建立的:
...
if (!runningViaChild) { //這句很重要,解決就靠它了
// we are not running this mapred task via child jvm
// so directly invoke ExecDriver
return super.execute(driverContext);//就是這句,會調用hadoop裡的JobClient去submitJob(job); 然後zookeeper連接配接就産生了
}
後面也再沒去清理zookeeper的連接配接,導緻就留下了
原因都清楚了,我選擇了更為簡單的處理辦法,讓控制runningViaChild的參數為true,讓每個job在hiveserver2裡都是子程序去送出,子程序結束,所有的資源都釋放了
解決辦法就是:
在hive-site.xml裡,把
hive.exec.submitviachild 設定為true
調試的堆棧資訊留個紀念: