问题:
storm异常停止后,重启storm,nimbus进程几秒后便不见了。 查看日志报错:nimbus [ERROR] Error when processing event
java.io.FileNotFoundException: File ‘/data/storm/nimbus/stormdist/risk_topo-1-1574555563/stormconf.ser’ does not exist
at org.apache.commons.io.FileUtils.openInputStream(FileUtils.java:299) ~[analyzer-storm-dependency.jar:na]
at org.apache.commons.io.FileUtils.readFileToByteArray(FileUtils.java:1763) ~[analyzer-storm-dependency.jar:na]
at backtype.storm.daemon.nimbus r e a d s t o r m c o n f . i n v o k e ( n i m b u s . c l j : 89 ) [ s t o r m − c o r e − 0.9.5. j a r : 0.9.5 ] a t b a c k t y p e . s t o r m . d a e m o n . n i m b u s read_storm_conf.invoke(nimbus.clj:89) ~[storm-core-0.9.5.jar:0.9.5] at backtype.storm.daemon.nimbus readstormconf.invoke(nimbus.clj:89) [storm−core−0.9.5.jar:0.9.5]atbacktype.storm.daemon.nimbuscompute_executors.invoke(nimbus.clj:419) ~[storm-core-0.9.5.jar:0.9.5]
at backtype.storm.daemon.nimbusKaTeX parse error: Expected group after '_' at position 17: …ompute_topology_̲_GT_executorsiter__3324__3328KaTeX parse error: Expected group after '_' at position 3: fn_̲_3329.invoke(ni…seq.invoke(core.clj:133) ~[clojure-1.5.1.jar:na]
at clojure.core.protocols s e q r e d u c e . i n v o k e ( p r o t o c o l s . c l j : 30 ) [ c l o j u r e − 1.5.1. j a r : n a ] a t c l o j u r e . c o r e . p r o t o c o l s seq_reduce.invoke(protocols.clj:30) ~[clojure-1.5.1.jar:na] at clojure.core.protocols seqreduce.invoke(protocols.clj:30) [clojure−1.5.1.jar:na]atclojure.core.protocolsfn__6026.invoke(protocols.clj:54) ~[clojure-1.5.1.jar:na]
at clojure.core.protocolsKaTeX parse error: Expected group after '_' at position 3: fn_̲_5979G__5974__5992.invoke(protocols.clj:13) ~[clojure-1.5.1.jar:na]
at clojure.core r e d u c e . i n v o k e ( c o r e . c l j : 6177 ) [ c l o j u r e − 1.5.1. j a r : n a ] a t c l o j u r e . c o r e reduce.invoke(core.clj:6177) ~[clojure-1.5.1.jar:na] at clojure.core reduce.invoke(core.clj:6177) [clojure−1.5.1.jar:na]atclojure.coreinto.invoke(core.clj:6229) ~[clojure-1.5.1.jar:na]
at backtype.storm.daemon.nimbusKaTeX parse error: Expected group after '_' at position 17: …ompute_topology_̲_GT_executors.i…compute_new_topology__GT_executor__GT_node_PLUS_port.invoke(nimbus.clj:550) ~[storm-core-0.9.5.jar:0.9.5]
at backtype.storm.daemon.nimbus m k a s s i g n m e n t s . d o I n v o k e ( n i m b u s . c l j : 662 ) [ s t o r m − c o r e − 0.9.5. j a r : 0.9.5 ] a t c l o j u r e . l a n g . R e s t F n . i n v o k e ( R e s t F n . j a v a : 410 ) [ c l o j u r e − 1.5.1. j a r : n a ] a t b a c k t y p e . s t o r m . d a e m o n . n i m b u s mk_assignments.doInvoke(nimbus.clj:662) ~[storm-core-0.9.5.jar:0.9.5] at clojure.lang.RestFn.invoke(RestFn.java:410) ~[clojure-1.5.1.jar:na] at backtype.storm.daemon.nimbus mkassignments.doInvoke(nimbus.clj:662) [storm−core−0.9.5.jar:0.9.5]atclojure.lang.RestFn.invoke(RestFn.java:410) [clojure−1.5.1.jar:na]atbacktype.storm.daemon.nimbusfn__3724KaTeX parse error: Expected group after '_' at position 8: exec_fn_̲_1103__auto____…fn__3730KaTeX parse error: Expected group after '_' at position 3: fn_̲_3731.invoke(ni…fn__3724KaTeX parse error: Expected group after '_' at position 8: exec_fn_̲_1103__auto____…fn__3730.invoke(nimbus.clj:908) ~[storm-core-0.9.5.jar:0.9.5]
at backtype.storm.timer s c h e d u l e r e c u r r i n g schedule_recurring schedulerecurringthis__1807.invoke(timer.clj:99) ~[storm-core-0.9.5.jar:0.9.5]
at backtype.storm.timer m k t i m e r mk_timer mktimerfn__1790KaTeX parse error: Expected group after '_' at position 3: fn_̲_1791.invoke(ti…mk_timer$fn__1790.invoke(timer.clj:42) ~[storm-core-0.9.5.jar:0.9.5]
at clojure.lang.AFn.run(AFn.java:24) ~[clojure-1.5.1.jar:na]
at java.lang.Thread.run(Thread.java:724) ~[na:1.7.0_25]
2019-11-24T10:43:46.368+0800 b.s.util [ERROR] Halting process: (“Error when processing an event”)
java.lang.RuntimeException: (“Error when processing an event”)
分析:
nimbus重启后便down掉是因为zookeeper里还保留着之前异常挂掉storm的信息,以至于每次重启storm的时候,zookeeper都会去读取该topology信息,如果重启的时候已经将{storm.local.dir}目录下的文件删除了,便会报找不到文件了。此时应该将zookeeper的storm也删除掉,实现同步。
解决方案:
如果不清楚zookeeper安装在哪里,则是使用命令查询
find / -name zkCli.sh
我的zookeeper安装位置:
/opt/cloudera/parcels/CDH-5.7.1-1.cdh5.7.1.p0.11/lib/zookeeper/bin
则修复步骤:
1、进入目录:
cd /opt/cloudera/parcels/CDH-5.7.1-1.cdh5.7.1.p0.11/lib/zookeeper/bin
2、输入 ./zkCli.sh
3、查看服务: ls /
4、删除storm的服务: rmr /storm
5、重启storm即可
当然也要记得将配置中{storm.local.dir}的路径的supervisor,nimbus目录删除掉
PS:kill掉storm进程命令
ps -ef | grep storm | grep -v ‘grep’ | awk ‘{print $2}’ | xargs kill -9