环境说明:
安装的是HDP-2.1
Ambari-1.7
本来是用的与hdp2.1一起发布的ambari1.6版本,后来ambari单独升级了。
问题描述:
由于业务量的增加,想要调整一下yarn的参数。但在调整了参数之后,重启yarn的时候,"App Timeline Server"总是失败。
出错信息:
stderr: /var/lib/ambari-agent/data/errors-3526.txt
2015-08-05 15:59:52,609 - Error while executing command 'start':
Traceback (most recent call last):
File "/usr/lib/python2.6/site-packages/resource_management/libraries/script/script.py", line 123, in execute
method(env)
File "/var/lib/ambari-agent/cache/stacks/HDP/2.0.6/services/YARN/package/scripts/application_timeline_server.py", line 42, in start
service('timelineserver', action='start')
File "/var/lib/ambari-agent/cache/stacks/HDP/2.0.6/services/YARN/package/scripts/service.py", line 59, in service
initial_wait=5
File "/usr/lib/python2.6/site-packages/resource_management/core/base.py", line 148, in __init__
self.env.run()
File "/usr/lib/python2.6/site-packages/resource_management/core/environment.py", line 149, in run
self.run_action(resource, action)
File "/usr/lib/python2.6/site-packages/resource_management/core/environment.py", line 115, in run_action
provider_action()
File "/usr/lib/python2.6/site-packages/resource_management/core/providers/system.py", line 241, in action_run
raise ex
Fail: Execution of 'ls /var/run/hadoop-yarn/yarn/yarn-yarn-timelineserver.pid >/dev/null 2>&1 && ps `cat /var/run/hadoop-yarn/yarn/yarn-yarn-timelineserver.pid` >/dev/null 2>&1' returned 1.
stdout: /var/lib/ambari-agent/data/output-3526.txt
2015-08-05 15:59:44,814 - Group['hadoop'] {'ignore_failures': False}
2015-08-05 15:59:44,816 - Modifying group hadoop
2015-08-05 15:59:44,878 - Group['nobody'] {'ignore_failures': False}
2015-08-05 15:59:44,879 - Modifying group nobody
2015-08-05 15:59:44,914 - Group['users'] {'ignore_failures': False}
2015-08-05 15:59:44,915 - Modifying group users
2015-08-05 15:59:44,950 - Group['nagios'] {'ignore_failures': False}
2015-08-05 15:59:44,951 - Modifying group nagios
2015-08-05 15:59:44,988 - User['hive'] {'gid': 'hadoop', 'ignore_failures': False, 'groups': [u'hadoop']}
2015-08-05 15:59:44,988 - Modifying user hive
2015-08-05 15:59:45,015 - User['oozie'] {'gid': 'hadoop', 'ignore_failures': False, 'groups': [u'users']}
2015-08-05 15:59:45,016 - Modifying user oozie
2015-08-05 15:59:45,042 - User['nobody'] {'gid': 'hadoop', 'ignore_failures': False, 'groups': [u'nobody']}
2015-08-05 15:59:45,043 - Modifying user nobody
2015-08-05 15:59:45,068 - User['nagios'] {'gid': 'nagios', 'ignore_failures': False, 'groups': [u'hadoop']}
2015-08-05 15:59:45,069 - Modifying user nagios
2015-08-05 15:59:45,095 - User['ambari-qa'] {'gid': 'hadoop', 'ignore_failures': False, 'groups': [u'users']}
2015-08-05 15:59:45,096 - Modifying user ambari-qa
2015-08-05 15:59:45,122 - User['hdfs'] {'gid': 'hadoop', 'ignore_failures': False, 'groups': [u'hadoop']}
2015-08-05 15:59:45,123 - Modifying user hdfs
2015-08-05 15:59:45,148 - User['storm'] {'gid': 'hadoop', 'ignore_failures': False, 'groups': [u'hadoop']}
2015-08-05 15:59:45,149 - Modifying user storm
2015-08-05 15:59:45,175 - User['mapred'] {'gid': 'hadoop', 'ignore_failures': False, 'groups': [u'hadoop']}
2015-08-05 15:59:45,176 - Modifying user mapred
2015-08-05 15:59:45,202 - User['hbase'] {'gid': 'hadoop', 'ignore_failures': False, 'groups': [u'hadoop']}
2015-08-05 15:59:45,203 - Modifying user hbase
2015-08-05 15:59:45,228 - User['tez'] {'gid': 'hadoop', 'ignore_failures': False, 'groups': [u'users']}
2015-08-05 15:59:45,229 - Modifying user tez
2015-08-05 15:59:45,255 - User['zookeeper'] {'gid': 'hadoop', 'ignore_failures': False, 'groups': [u'hadoop']}
2015-08-05 15:59:45,256 - Modifying user zookeeper
2015-08-05 15:59:45,282 - User['sqoop'] {'gid': 'hadoop', 'ignore_failures': False, 'groups': [u'hadoop']}
2015-08-05 15:59:45,282 - Modifying user sqoop
2015-08-05 15:59:45,308 - User['yarn'] {'gid': 'hadoop', 'ignore_failures': False, 'groups': [u'hadoop']}
2015-08-05 15:59:45,309 - Modifying user yarn
2015-08-05 15:59:45,335 - User['hcat'] {'gid': 'hadoop', 'ignore_failures': False, 'groups': [u'hadoop']}
2015-08-05 15:59:45,336 - Modifying user hcat
2015-08-05 15:59:45,362 - File['/var/lib/ambari-agent/data/tmp/changeUid.sh'] {'content': StaticFile('changeToSecureUid.sh'), 'mode': 0555}
2015-08-05 15:59:45,365 - Execute['/var/lib/ambari-agent/data/tmp/changeUid.sh ambari-qa /tmp/hadoop-ambari-qa,/tmp/hsperfdata_ambari-qa,/home/ambari-qa,/tmp/ambari-qa,/tmp/sqoop-ambari-qa 2>/dev/null'] {'not_if': 'test $(id -u ambari-qa) -gt 1000'}
2015-08-05 15:59:45,389 - Skipping Execute['/var/lib/ambari-agent/data/tmp/changeUid.sh ambari-qa /tmp/hadoop-ambari-qa,/tmp/hsperfdata_ambari-qa,/home/ambari-qa,/tmp/ambari-qa,/tmp/sqoop-ambari-qa 2>/dev/null'] due to not_if
2015-08-05 15:59:45,390 - File['/var/lib/ambari-agent/data/tmp/changeUid.sh'] {'content': StaticFile('changeToSecureUid.sh'), 'mode': 0555}
2015-08-05 15:59:45,392 - Execute['/var/lib/ambari-agent/data/tmp/changeUid.sh hbase /home/hbase,/tmp/hbase,/usr/bin/hbase,/var/log/hbase,/apps/hadoop/hbase 2>/dev/null'] {'not_if': 'test $(id -u hbase) -gt 1000'}
2015-08-05 15:59:45,415 - Skipping Execute['/var/lib/ambari-agent/data/tmp/changeUid.sh hbase /home/hbase,/tmp/hbase,/usr/bin/hbase,/var/log/hbase,/apps/hadoop/hbase 2>/dev/null'] due to not_if
2015-08-05 15:59:45,416 - Directory['/etc/hadoop/conf.empty'] {'owner': 'root', 'group': 'root', 'recursive': True}
2015-08-05 15:59:45,417 - Link['/etc/hadoop/conf'] {'not_if': 'ls /etc/hadoop/conf', 'to': '/etc/hadoop/conf.empty'}
2015-08-05 15:59:45,439 - Skipping Link['/etc/hadoop/conf'] due to not_if
2015-08-05 15:59:45,476 - File['/etc/hadoop/conf/hadoop-env.sh'] {'content': InlineTemplate(...), 'owner': 'hdfs'}
2015-08-05 15:59:45,500 - Execute['/bin/echo 0 > /selinux/enforce'] {'only_if': 'test -f /selinux/enforce'}
2015-08-05 15:59:45,546 - Directory['/var/log/hadoop'] {'owner': 'root', 'group': 'hadoop', 'mode': 0775, 'recursive': True}
2015-08-05 15:59:45,547 - Directory['/var/run/hadoop'] {'owner': 'root', 'group': 'root', 'recursive': True}
2015-08-05 15:59:45,548 - Directory['/tmp/hadoop-hdfs'] {'owner': 'hdfs', 'recursive': True}
2015-08-05 15:59:45,560 - File['/etc/hadoop/conf/commons-logging.properties'] {'content': Template('commons-logging.properties.j2'), 'owner': 'hdfs'}
2015-08-05 15:59:45,565 - File['/etc/hadoop/conf/health_check'] {'content': Template('health_check-v2.j2'), 'owner': 'hdfs'}
2015-08-05 15:59:45,566 - File['/etc/hadoop/conf/log4j.properties'] {'content': '...', 'owner': 'hdfs', 'group': 'hadoop', 'mode': 0644}
2015-08-05 15:59:45,579 - File['/etc/hadoop/conf/hadoop-metrics2.properties'] {'content': Template('hadoop-metrics2.properties.j2'), 'owner': 'hdfs'}
2015-08-05 15:59:45,581 - File['/etc/hadoop/conf/task-log4j.properties'] {'content': StaticFile('task-log4j.properties'), 'mode': 0755}
2015-08-05 15:59:45,582 - File['/etc/hadoop/conf/configuration.xsl'] {'owner': 'hdfs', 'group': 'hadoop'}
2015-08-05 15:59:45,867 - Directory['/var/run/hadoop-yarn/yarn'] {'owner': 'yarn', 'group': 'hadoop', 'recursive': True}
2015-08-05 15:59:45,870 - Directory['/var/log/hadoop-yarn/yarn'] {'owner': 'yarn', 'group': 'hadoop', 'recursive': True}
2015-08-05 15:59:45,871 - Directory['/var/run/hadoop-mapreduce/mapred'] {'owner': 'mapred', 'group': 'hadoop', 'recursive': True}
2015-08-05 15:59:45,872 - Directory['/var/log/hadoop-mapreduce/mapred'] {'owner': 'mapred', 'group': 'hadoop', 'recursive': True}
2015-08-05 15:59:45,873 - Directory['/var/log/hadoop-yarn'] {'owner': 'yarn', 'ignore_failures': True, 'recursive': True}
2015-08-05 15:59:45,873 - XmlConfig['core-site.xml'] {'group': 'hadoop', 'conf_dir': '/etc/hadoop/conf', 'mode': 0644, 'configuration_attributes': ..., 'owner': 'hdfs', 'configurations': ...}
2015-08-05 15:59:45,908 - Generating config: /etc/hadoop/conf/core-site.xml
2015-08-05 15:59:45,909 - File['/etc/hadoop/conf/core-site.xml'] {'owner': 'hdfs', 'content': InlineTemplate(...), 'group': 'hadoop', 'mode': 0644, 'encoding': 'UTF-8'}
2015-08-05 15:59:45,912 - Writing File['/etc/hadoop/conf/core-site.xml'] because contents don't match
2015-08-05 15:59:45,913 - XmlConfig['mapred-site.xml'] {'group': 'hadoop', 'conf_dir': '/etc/hadoop/conf', 'mode': 0644, 'configuration_attributes': ..., 'owner': 'yarn', 'configurations': ...}
2015-08-05 15:59:45,939 - Generating config: /etc/hadoop/conf/mapred-site.xml
2015-08-05 15:59:45,940 - File['/etc/hadoop/conf/mapred-site.xml'] {'owner': 'yarn', 'content': InlineTemplate(...), 'group': 'hadoop', 'mode': 0644, 'encoding': 'UTF-8'}
2015-08-05 15:59:45,943 - Writing File['/etc/hadoop/conf/mapred-site.xml'] because contents don't match
2015-08-05 15:59:45,944 - Changing owner for /etc/hadoop/conf/mapred-site.xml from 1022 to yarn
2015-08-05 15:59:45,944 - XmlConfig['yarn-site.xml'] {'group': 'hadoop', 'conf_dir': '/etc/hadoop/conf', 'mode': 0644, 'configuration_attributes': ..., 'owner': 'yarn', 'configurations': ...}
2015-08-05 15:59:45,970 - Generating config: /etc/hadoop/conf/yarn-site.xml
2015-08-05 15:59:45,971 - File['/etc/hadoop/conf/yarn-site.xml'] {'owner': 'yarn', 'content': InlineTemplate(...), 'group': 'hadoop', 'mode': 0644, 'encoding': 'UTF-8'}
2015-08-05 15:59:45,975 - Writing File['/etc/hadoop/conf/yarn-site.xml'] because contents don't match
2015-08-05 15:59:45,976 - XmlConfig['capacity-scheduler.xml'] {'group': 'hadoop', 'conf_dir': '/etc/hadoop/conf', 'mode': 0644, 'configuration_attributes': ..., 'owner': 'yarn', 'configurations': ...}
2015-08-05 15:59:46,002 - Generating config: /etc/hadoop/conf/capacity-scheduler.xml
2015-08-05 15:59:46,003 - File['/etc/hadoop/conf/capacity-scheduler.xml'] {'owner': 'yarn', 'content': InlineTemplate(...), 'group': 'hadoop', 'mode': 0644, 'encoding': 'UTF-8'}
2015-08-05 15:59:46,005 - Writing File['/etc/hadoop/conf/capacity-scheduler.xml'] because contents don't match
2015-08-05 15:59:46,006 - Changing owner for /etc/hadoop/conf/capacity-scheduler.xml from 1021 to yarn
2015-08-05 15:59:46,006 - Directory['/var/log/hadoop-yarn/timeline'] {'owner': 'yarn', 'group': 'hadoop', 'recursive': True}
2015-08-05 15:59:46,007 - File['/etc/hadoop/conf/yarn.exclude'] {'owner': 'yarn', 'group': 'hadoop'}
2015-08-05 15:59:46,015 - File['/etc/security/limits.d/yarn.conf'] {'content': Template('yarn.conf.j2'), 'mode': 0644}
2015-08-05 15:59:46,020 - File['/etc/security/limits.d/mapreduce.conf'] {'content': Template('mapreduce.conf.j2'), 'mode': 0644}
2015-08-05 15:59:46,033 - File['/etc/hadoop/conf/yarn-env.sh'] {'content': InlineTemplate(...), 'owner': 'yarn', 'group': 'hadoop', 'mode': 0755}
2015-08-05 15:59:46,038 - File['/etc/hadoop/conf/mapred-env.sh'] {'content': InlineTemplate(...), 'owner': 'hdfs'}
2015-08-05 15:59:46,044 - File['/etc/hadoop/conf/taskcontroller.cfg'] {'content': Template('taskcontroller.cfg.j2'), 'owner': 'hdfs'}
2015-08-05 15:59:46,045 - XmlConfig['mapred-site.xml'] {'owner': 'mapred', 'group': 'hadoop', 'conf_dir': '/etc/hadoop/conf', 'configuration_attributes': ..., 'configurations': ...}
2015-08-05 15:59:46,071 - Generating config: /etc/hadoop/conf/mapred-site.xml
2015-08-05 15:59:46,072 - File['/etc/hadoop/conf/mapred-site.xml'] {'owner': 'mapred', 'content': InlineTemplate(...), 'group': 'hadoop', 'mode': None, 'encoding': 'UTF-8'}
2015-08-05 15:59:46,075 - Writing File['/etc/hadoop/conf/mapred-site.xml'] because contents don't match
2015-08-05 15:59:46,076 - Changing owner for /etc/hadoop/conf/mapred-site.xml from 1020 to mapred
2015-08-05 15:59:46,076 - XmlConfig['capacity-scheduler.xml'] {'owner': 'hdfs', 'group': 'hadoop', 'conf_dir': '/etc/hadoop/conf', 'configuration_attributes': ..., 'configurations': ...}
2015-08-05 15:59:46,103 - Generating config: /etc/hadoop/conf/capacity-scheduler.xml
2015-08-05 15:59:46,104 - File['/etc/hadoop/conf/capacity-scheduler.xml'] {'owner': 'hdfs', 'content': InlineTemplate(...), 'group': 'hadoop', 'mode': None, 'encoding': 'UTF-8'}
2015-08-05 15:59:46,106 - Changing owner for /etc/hadoop/conf/capacity-scheduler.xml from 1020 to hdfs
2015-08-05 15:59:46,106 - File['/etc/hadoop/conf/ssl-client.xml.example'] {'owner': 'mapred', 'group': 'hadoop'}
2015-08-05 15:59:46,107 - File['/etc/hadoop/conf/ssl-server.xml.example'] {'owner': 'mapred', 'group': 'hadoop'}
2015-08-05 15:59:46,110 - File['/var/run/hadoop-yarn/yarn/yarn-yarn-timelineserver.pid'] {'action': ['delete'], 'not_if': 'ls /var/run/hadoop-yarn/yarn/yarn-yarn-timelineserver.pid >/dev/null 2>&1 && ps `cat /var/run/hadoop-yarn/yarn/yarn-yarn-timelineserver.pid` >/dev/null 2>&1'}
2015-08-05 15:59:46,209 - Deleting File['/var/run/hadoop-yarn/yarn/yarn-yarn-timelineserver.pid']
2015-08-05 15:59:46,210 - Execute['ulimit -c unlimited; export HADOOP_LIBEXEC_DIR=/usr/lib/hadoop/libexec && /usr/lib/hadoop-yarn/sbin/yarn-daemon.sh --config /etc/hadoop/conf start timelineserver'] {'not_if': 'ls /var/run/hadoop-yarn/yarn/yarn-yarn-timelineserver.pid >/dev/null 2>&1 && ps `cat /var/run/hadoop-yarn/yarn/yarn-yarn-timelineserver.pid` >/dev/null 2>&1', 'user': 'yarn'}
2015-08-05 15:59:47,381 - Execute['ls /var/run/hadoop-yarn/yarn/yarn-yarn-timelineserver.pid >/dev/null 2>&1 && ps `cat /var/run/hadoop-yarn/yarn/yarn-yarn-timelineserver.pid` >/dev/null 2>&1'] {'initial_wait': 5, 'not_if': 'ls /var/run/hadoop-yarn/yarn/yarn-yarn-timelineserver.pid >/dev/null 2>&1 && ps `cat /var/run/hadoop-yarn/yarn/yarn-yarn-timelineserver.pid` >/dev/null 2>&1', 'user': 'yarn'}
2015-08-05 15:59:52,609 - Error while executing command 'start':
Traceback (most recent call last):
File "/usr/lib/python2.6/site-packages/resource_management/libraries/script/script.py", line 123, in execute
method(env)
File "/var/lib/ambari-agent/cache/stacks/HDP/2.0.6/services/YARN/package/scripts/application_timeline_server.py", line 42, in start
service('timelineserver', action='start')
File "/var/lib/ambari-agent/cache/stacks/HDP/2.0.6/services/YARN/package/scripts/service.py", line 59, in service
initial_wait=5
File "/usr/lib/python2.6/site-packages/resource_management/core/base.py", line 148, in __init__
self.env.run()
File "/usr/lib/python2.6/site-packages/resource_management/core/environment.py", line 149, in run
self.run_action(resource, action)
File "/usr/lib/python2.6/site-packages/resource_management/core/environment.py", line 115, in run_action
provider_action()
File "/usr/lib/python2.6/site-packages/resource_management/core/providers/system.py", line 241, in action_run
raise ex
Fail: Execution of 'ls /var/run/hadoop-yarn/yarn/yarn-yarn-timelineserver.pid >/dev/null 2>&1 && ps `cat /var/run/hadoop-yarn/yarn/yarn-yarn-timelineserver.pid` >/dev/null 2>&1' returned 1.
问题原因:
由于ambari-1.7版本对"AppTimeline Server"进行了调整,出现与hdp2.1不兼容的现象。
网摘:
In Ambari 1.7.0, the pid file /var/run/hadoop-yarn/yarn/yarn-yarn-historyserver.pid was deprecated
in favor of /var/run/hadoop-yarn/yarn/yarn-yarn-timelineserver.pid to be consistent.
You may want to run ps on the timeline server process, kill it, and delete both pid files
above, then attempt to start it again.
解决方法:
[[email protected]~]#su -s /bin/bash - yarn -c "ulimit -c unlimited; exportHADOOP_LIBEXEC_DIR=/usr/lib/hadoop/libexec &&/usr/lib/hadoop-yarn/sbin/yarn-daemon.sh --config /etc/hadoop/confstartorg.apache.hadoop.yarn.server.applicationhistoryservice.ApplicationHistoryServer"
[[email protected]~]#jps|grep ApplicationHistoryServer
53018 ApplicationHistoryServer
[[email protected]~]#echo 53018 >/var/run/hadoop-yarn/yarn/yarn-yarn-timelineserver.pid
解决思路:
查看其他集群上正常运行的程序:
在ambari1.6+hdp2.1上[正常]的启动命令是:
[[email protected]~]#ps -ef |grep ApplicationHistoryServer
/usr/jdk64/jdk1.7.0_45/bin/java-Dproc_historyserver -Xmx1024m-Dhadoop.log.dir=/var/log/hadoop-yarn/yarn-Dyarn.log.dir=/var/log/hadoop-yarn/yarn-Dhadoop.log.file=yarn-yarn-historyserver-UHDATA013.log-Dyarn.log.file=yarn-yarn-historyserver-UHDATA013.log-Dyarn.home.dir= -Dyarn.id.str=yarn -Dhadoop.root.logger=INFO,RFA-Dyarn.root.logger=INFO,RFA-Djava.library.path=:/usr/lib/hadoop/lib/native/Linux-amd64-64:/usr/lib/hadoop/lib/native:/usr/lib/hadoop/lib/native/Linux-amd64-64:/usr/lib/hadoop/lib/native-Dyarn.policy.file=hadoop-policy.xml-Dhadoop.log.dir=/var/log/hadoop-yarn/yarn-Dyarn.log.dir=/var/log/hadoop-yarn/yarn-Dhadoop.log.file=yarn-yarn-historyserver-UHDATA013.log-Dyarn.log.file=yarn-yarn-historyserver-UHDATA013.log-Dyarn.home.dir=/usr/lib/hadoop-yarn-Dhadoop.home.dir=/usr/lib/hadoop-yarn -Dhadoop.root.logger=INFO,RFA-Dyarn.root.logger=INFO,RFA-Djava.library.path=:/usr/lib/hadoop/lib/native/Linux-amd64-64:/usr/lib/hadoop/lib/native:/usr/lib/hadoop/lib/native/Linux-amd64-64:/usr/lib/hadoop/lib/native-classpath/etc/hadoop/conf:/etc/hadoop/conf:/etc/hadoop/conf:/usr/lib/hadoop/lib/*:/usr/lib/hadoop/.//*:/usr/lib/hadoop-hdfs/./:/usr/lib/hadoop-hdfs/lib/*:/usr/lib/hadoop-hdfs/.//*:/usr/lib/hadoop-yarn/lib/*:/usr/lib/hadoop-yarn/.//*:/usr/lib/hadoop-mapreduce/lib/*:/usr/lib/hadoop-mapreduce/.//*::/usr/share/java/mysql-connector-java-5.1.17.jar:/usr/share/java/mysql-connector-java.jar:/usr/lib/hadoop-mapreduce/*:/usr/lib/tez/*:/usr/lib/tez/lib/*:/etc/tez/conf:/usr/share/java/mysql-connector-java-5.1.17.jar:/usr/share/java/mysql-connector-java.jar:/usr/lib/hadoop-mapreduce/*:/usr/lib/tez/*:/usr/lib/tez/lib/*:/etc/tez/conf:/usr/lib/hadoop-yarn/.//*:/usr/lib/hadoop-yarn/lib/*:/etc/hadoop/conf/ahs-config/log4j.propertiesorg.apache.hadoop.yarn.server.applicationhistoryservice.ApplicationHistoryServer
在故障集群上(ambari1.7+hdp2.1)上,经过多次尝试启动"AppTimeline Server",抓取到的启动命令是:
yarn 22398 22363 0 16:56 ? 00:00:00/usr/java/default//bin/java -Dproc_timelineserver -Xmx1024m-Dhadoop.log.dir=/var/log/hadoop-yarn/yarn-Dyarn.log.dir=/var/log/hadoop-yarn/yarn-Dhadoop.log.file=yarn-yarn-timelineserver-UHVDATA013.log-Dyarn.log.file=yarn-yarn-timelineserver-UHVDATA013.log-Dyarn.home.dir= -Dyarn.id.str=yarn -Dhadoop.root.logger=INFO,RFA-Dyarn.root.logger=INFO,RFA-Djava.library.path=:/usr/lib/hadoop/lib/native/Linux-amd64-64:/usr/lib/hadoop/lib/native:/usr/lib/hadoop/lib/native/Linux-amd64-64:/usr/lib/hadoop/lib/native-Dyarn.policy.file=hadoop-policy.xml-Dhadoop.log.dir=/var/log/hadoop-yarn/yarn-Dyarn.log.dir=/var/log/hadoop-yarn/yarn-Dhadoop.log.file=yarn-yarn-timelineserver-UHVDATA013.log-Dyarn.log.file=yarn-yarn-timelineserver-UHVDATA013.log-Dyarn.home.dir=/usr/lib/hadoop-yarn-Dhadoop.home.dir=/usr/lib/hadoop-yarn -Dhadoop.root.logger=INFO,RFA-Dyarn.root.logger=INFO,RFA-Djava.library.path=:/usr/lib/hadoop/lib/native/Linux-amd64-64:/usr/lib/hadoop/lib/native:/usr/lib/hadoop/lib/native/Linux-amd64-64:/usr/lib/hadoop/lib/native-classpath/etc/hadoop/conf:/etc/hadoop/conf:/etc/hadoop/conf:/usr/lib/hadoop/lib/*:/usr/lib/hadoop/.//*:/usr/lib/hadoop-hdfs/./:/usr/lib/hadoop-hdfs/lib/*:/usr/lib/hadoop-hdfs/.//*:/usr/lib/hadoop-yarn/lib/*:/usr/lib/hadoop-yarn/.//*:/usr/lib/hadoop-mapreduce/lib/*:/usr/lib/hadoop-mapreduce/.//*::/usr/share/java/mysql-connector-java-5.1.17.jar:/usr/share/java/mysql-connector-java.jar:/usr/lib/hadoop-mapreduce/*:/usr/lib/tez/*:/usr/lib/tez/lib/*:/etc/tez/conf:/usr/share/java/mysql-connector-java-5.1.17.jar:/usr/share/java/mysql-connector-java.jar:/usr/lib/hadoop-mapreduce/*:/usr/lib/tez/*:/usr/lib/tez/lib/*:/etc/tez/conf:/usr/lib/hadoop-yarn/.//*:/usr/lib/hadoop-yarn/lib/*timelineserver
出错日志:
[[email protected]~]#cat/var/log/hadoop-yarn/yarn/yarn-yarn-timelineserver-UHVDATA013.out
Error:Could not find or load main class timelineserver
ulimit-a
corefile size (blocks, -c) unlimited
dataseg size (kbytes, -d) unlimited
schedulingpriority (-e) 0
filesize (blocks, -f) unlimited
pendingsignals (-i) 1032173
maxlocked memory (kbytes, -l) 64
maxmemory size (kbytes, -m) unlimited
openfiles (-n) 32768
pipesize (512 bytes, -p) 8
POSIXmessage queues (bytes, -q) 819200
real-timepriority (-r) 0
stacksize (kbytes, -s) 10240
cputime (seconds, -t) unlimited
maxuser processes (-u) 65535
virtualmemory (kbytes, -v) unlimited
filelocks (-x) unlimited
继续跟踪程序调用关系,得到:
真正的启动栈(注意进程的父子关系):
root 47604 42940 24 17:32 ? 00:00:00 /usr/bin/python2.6/var/lib/ambari-agent/cache/stacks/HDP/2.0.6/services/YARN/package/scripts/application_timeline_server.pySTART /var/lib/ambari-agent/data/command-3530.json/var/lib/ambari-agent/cache/stacks/HDP/2.0.6/services/YARN/package/var/lib/ambari-agent/data/structured-out-3530.json INFO/var/lib/ambari-agent/data/tmp
root 47624 47604 1 17:32 ? 00:00:00 su-s /bin/bash - yarn -c ulimit -c unlimited; exportHADOOP_LIBEXEC_DIR=/usr/lib/hadoop/libexec &&/usr/lib/hadoop-yarn/sbin/yarn-daemon.sh --config /etc/hadoop/confstart timelineserver
yarn 47625 47624 1 17:32 ? 00:00:00 -bash -c ulimit -cunlimited; export HADOOP_LIBEXEC_DIR=/usr/lib/hadoop/libexec &&/usr/lib/hadoop-yarn/sbin/yarn-daemon.sh --config /etc/hadoop/confstart timelineserver
yarn 47642 47625 3 17:32 ? 00:00:00 bash/usr/lib/hadoop-yarn/sbin/yarn-daemon.sh --config /etc/hadoop/confstart timelineserver
yarn 47677 47642 59 17:32 ? 00:00:00/usr/java/default//bin/java -Dproc_timelineserver -Xmx1024m-Dhadoop.log.dir=/var/log/hadoop-yarn/yarn-Dyarn.log.dir=/var/log/hadoop-yarn/yarn-Dhadoop.log.file=yarn-yarn-timelineserver-UHVDATA013.log-Dyarn.log.file=yarn-yarn-timelineserver-UHVDATA013.log-Dyarn.home.dir= -Dyarn.id.str=yarn -Dhadoop.root.logger=INFO,RFA-Dyarn.root.logger=INFO,RFA-Djava.library.path=:/usr/lib/hadoop/lib/native/Linux-amd64-64:/usr/lib/hadoop/lib/native:/usr/lib/hadoop/lib/native/Linux-amd64-64:/usr/lib/hadoop/lib/native-Dyarn.policy.file=hadoop-policy.xml-Dhadoop.log.dir=/var/log/hadoop-yarn/yarn-Dyarn.log.dir=/var/log/hadoop-yarn/yarn-Dhadoop.log.file=yarn-yarn-timelineserver-UHVDATA013.log-Dyarn.log.file=yarn-yarn-timelineserver-UHVDATA013.log-Dyarn.home.dir=/usr/lib/hadoop-yarn-Dhadoop.home.dir=/usr/lib/hadoop-yarn -Dhadoop.root.logger=INFO,RFA-Dyarn.root.logger=INFO,RFA-Djava.library.path=:/usr/lib/hadoop/lib/native/Linux-amd64-64:/usr/lib/hadoop/lib/native:/usr/lib/hadoop/lib/native/Linux-amd64-64:/usr/lib/hadoop/lib/native-classpath/etc/hadoop/conf:/etc/hadoop/conf:/etc/hadoop/conf:/usr/lib/hadoop/lib/*:/usr/lib/hadoop/.//*:/usr/lib/hadoop-hdfs/./:/usr/lib/hadoop-hdfs/lib/*:/usr/lib/hadoop-hdfs/.//*:/usr/lib/hadoop-yarn/lib/*:/usr/lib/hadoop-yarn/.//*:/usr/lib/hadoop-mapreduce/lib/*:/usr/lib/hadoop-mapreduce/.//*::/usr/share/java/mysql-connector-java-5.1.17.jar:/usr/share/java/mysql-connector-java.jar:/usr/lib/hadoop-mapreduce/*:/usr/lib/tez/*:/usr/lib/tez/lib/*:/etc/tez/conf:/usr/share/java/mysql-connector-java-5.1.17.jar:/usr/share/java/mysql-connector-java.jar:/usr/lib/hadoop-mapreduce/*:/usr/lib/tez/*:/usr/lib/tez/lib/*:/etc/tez/conf:/usr/lib/hadoop-yarn/.//*:/usr/lib/hadoop-yarn/lib/*timelineserver
其中红色命令行为其实际启动命令,
su-s /bin/bash - yarn -c ulimit -c unlimited; exportHADOOP_LIBEXEC_DIR=/usr/lib/hadoop/libexec &&/usr/lib/hadoop-yarn/sbin/yarn-daemon.sh --config /etc/hadoop/confstart timelineserver
通过对比正常运行的命令行和出错日志信息猜测,将timelineserver替换成org.apache.hadoop.yarn.server.applicationhistoryservice.ApplicationHistoryServer,进行尝试,发现运行正常。
并将进程id写入pid文件/var/run/hadoop-yarn/yarn/yarn-yarn-timelineserver.pid中,然后再查看ambari管理界面,显示正常了。
跟踪脚本:
while: ;do
sleep0.5;
cat/var/run/hadoop-yarn/yarn/yarn-yarn-timelineserver.pid 2>/dev/null\
&&{ ps -ef|grep -v grep|grep `cat/var/run/hadoop-yarn/yarn/yarn-yarn-timelineserver.pid` ;
ps-ef >>/tmp/qiyfongtestimel.log; } ;
done