天天看点

"App Timeline Server"重启失败解决办法环境说明:问题描述:出错信息:问题原因:解决方法:解决思路:跟踪脚本:

环境说明:

安装的是HDP-2.1

Ambari-1.7

本来是用的与hdp2.1一起发布的ambari1.6版本,后来ambari单独升级了。

问题描述:

由于业务量的增加,想要调整一下yarn的参数。但在调整了参数之后,重启yarn的时候,"App Timeline Server"总是失败。

出错信息:

stderr:   /var/lib/ambari-agent/data/errors-3526.txt 
2015-08-05 15:59:52,609 - Error while executing command 'start':
Traceback (most recent call last):
  File "/usr/lib/python2.6/site-packages/resource_management/libraries/script/script.py", line 123, in execute
    method(env)
  File "/var/lib/ambari-agent/cache/stacks/HDP/2.0.6/services/YARN/package/scripts/application_timeline_server.py", line 42, in start
    service('timelineserver', action='start')
  File "/var/lib/ambari-agent/cache/stacks/HDP/2.0.6/services/YARN/package/scripts/service.py", line 59, in service
    initial_wait=5
  File "/usr/lib/python2.6/site-packages/resource_management/core/base.py", line 148, in __init__
    self.env.run()
  File "/usr/lib/python2.6/site-packages/resource_management/core/environment.py", line 149, in run
    self.run_action(resource, action)
  File "/usr/lib/python2.6/site-packages/resource_management/core/environment.py", line 115, in run_action
    provider_action()
  File "/usr/lib/python2.6/site-packages/resource_management/core/providers/system.py", line 241, in action_run
    raise ex
Fail: Execution of 'ls /var/run/hadoop-yarn/yarn/yarn-yarn-timelineserver.pid >/dev/null 2>&1 && ps `cat /var/run/hadoop-yarn/yarn/yarn-yarn-timelineserver.pid` >/dev/null 2>&1' returned 1.
stdout:   /var/lib/ambari-agent/data/output-3526.txt 
2015-08-05 15:59:44,814 - Group['hadoop'] {'ignore_failures': False}
2015-08-05 15:59:44,816 - Modifying group hadoop
2015-08-05 15:59:44,878 - Group['nobody'] {'ignore_failures': False}
2015-08-05 15:59:44,879 - Modifying group nobody
2015-08-05 15:59:44,914 - Group['users'] {'ignore_failures': False}
2015-08-05 15:59:44,915 - Modifying group users
2015-08-05 15:59:44,950 - Group['nagios'] {'ignore_failures': False}
2015-08-05 15:59:44,951 - Modifying group nagios
2015-08-05 15:59:44,988 - User['hive'] {'gid': 'hadoop', 'ignore_failures': False, 'groups': [u'hadoop']}
2015-08-05 15:59:44,988 - Modifying user hive
2015-08-05 15:59:45,015 - User['oozie'] {'gid': 'hadoop', 'ignore_failures': False, 'groups': [u'users']}
2015-08-05 15:59:45,016 - Modifying user oozie
2015-08-05 15:59:45,042 - User['nobody'] {'gid': 'hadoop', 'ignore_failures': False, 'groups': [u'nobody']}
2015-08-05 15:59:45,043 - Modifying user nobody
2015-08-05 15:59:45,068 - User['nagios'] {'gid': 'nagios', 'ignore_failures': False, 'groups': [u'hadoop']}
2015-08-05 15:59:45,069 - Modifying user nagios
2015-08-05 15:59:45,095 - User['ambari-qa'] {'gid': 'hadoop', 'ignore_failures': False, 'groups': [u'users']}
2015-08-05 15:59:45,096 - Modifying user ambari-qa
2015-08-05 15:59:45,122 - User['hdfs'] {'gid': 'hadoop', 'ignore_failures': False, 'groups': [u'hadoop']}
2015-08-05 15:59:45,123 - Modifying user hdfs
2015-08-05 15:59:45,148 - User['storm'] {'gid': 'hadoop', 'ignore_failures': False, 'groups': [u'hadoop']}
2015-08-05 15:59:45,149 - Modifying user storm
2015-08-05 15:59:45,175 - User['mapred'] {'gid': 'hadoop', 'ignore_failures': False, 'groups': [u'hadoop']}
2015-08-05 15:59:45,176 - Modifying user mapred
2015-08-05 15:59:45,202 - User['hbase'] {'gid': 'hadoop', 'ignore_failures': False, 'groups': [u'hadoop']}
2015-08-05 15:59:45,203 - Modifying user hbase
2015-08-05 15:59:45,228 - User['tez'] {'gid': 'hadoop', 'ignore_failures': False, 'groups': [u'users']}
2015-08-05 15:59:45,229 - Modifying user tez
2015-08-05 15:59:45,255 - User['zookeeper'] {'gid': 'hadoop', 'ignore_failures': False, 'groups': [u'hadoop']}
2015-08-05 15:59:45,256 - Modifying user zookeeper
2015-08-05 15:59:45,282 - User['sqoop'] {'gid': 'hadoop', 'ignore_failures': False, 'groups': [u'hadoop']}
2015-08-05 15:59:45,282 - Modifying user sqoop
2015-08-05 15:59:45,308 - User['yarn'] {'gid': 'hadoop', 'ignore_failures': False, 'groups': [u'hadoop']}
2015-08-05 15:59:45,309 - Modifying user yarn
2015-08-05 15:59:45,335 - User['hcat'] {'gid': 'hadoop', 'ignore_failures': False, 'groups': [u'hadoop']}
2015-08-05 15:59:45,336 - Modifying user hcat
2015-08-05 15:59:45,362 - File['/var/lib/ambari-agent/data/tmp/changeUid.sh'] {'content': StaticFile('changeToSecureUid.sh'), 'mode': 0555}
2015-08-05 15:59:45,365 - Execute['/var/lib/ambari-agent/data/tmp/changeUid.sh ambari-qa /tmp/hadoop-ambari-qa,/tmp/hsperfdata_ambari-qa,/home/ambari-qa,/tmp/ambari-qa,/tmp/sqoop-ambari-qa 2>/dev/null'] {'not_if': 'test $(id -u ambari-qa) -gt 1000'}
2015-08-05 15:59:45,389 - Skipping Execute['/var/lib/ambari-agent/data/tmp/changeUid.sh ambari-qa /tmp/hadoop-ambari-qa,/tmp/hsperfdata_ambari-qa,/home/ambari-qa,/tmp/ambari-qa,/tmp/sqoop-ambari-qa 2>/dev/null'] due to not_if
2015-08-05 15:59:45,390 - File['/var/lib/ambari-agent/data/tmp/changeUid.sh'] {'content': StaticFile('changeToSecureUid.sh'), 'mode': 0555}
2015-08-05 15:59:45,392 - Execute['/var/lib/ambari-agent/data/tmp/changeUid.sh hbase /home/hbase,/tmp/hbase,/usr/bin/hbase,/var/log/hbase,/apps/hadoop/hbase 2>/dev/null'] {'not_if': 'test $(id -u hbase) -gt 1000'}
2015-08-05 15:59:45,415 - Skipping Execute['/var/lib/ambari-agent/data/tmp/changeUid.sh hbase /home/hbase,/tmp/hbase,/usr/bin/hbase,/var/log/hbase,/apps/hadoop/hbase 2>/dev/null'] due to not_if
2015-08-05 15:59:45,416 - Directory['/etc/hadoop/conf.empty'] {'owner': 'root', 'group': 'root', 'recursive': True}
2015-08-05 15:59:45,417 - Link['/etc/hadoop/conf'] {'not_if': 'ls /etc/hadoop/conf', 'to': '/etc/hadoop/conf.empty'}
2015-08-05 15:59:45,439 - Skipping Link['/etc/hadoop/conf'] due to not_if
2015-08-05 15:59:45,476 - File['/etc/hadoop/conf/hadoop-env.sh'] {'content': InlineTemplate(...), 'owner': 'hdfs'}
2015-08-05 15:59:45,500 - Execute['/bin/echo 0 > /selinux/enforce'] {'only_if': 'test -f /selinux/enforce'}
2015-08-05 15:59:45,546 - Directory['/var/log/hadoop'] {'owner': 'root', 'group': 'hadoop', 'mode': 0775, 'recursive': True}
2015-08-05 15:59:45,547 - Directory['/var/run/hadoop'] {'owner': 'root', 'group': 'root', 'recursive': True}
2015-08-05 15:59:45,548 - Directory['/tmp/hadoop-hdfs'] {'owner': 'hdfs', 'recursive': True}
2015-08-05 15:59:45,560 - File['/etc/hadoop/conf/commons-logging.properties'] {'content': Template('commons-logging.properties.j2'), 'owner': 'hdfs'}
2015-08-05 15:59:45,565 - File['/etc/hadoop/conf/health_check'] {'content': Template('health_check-v2.j2'), 'owner': 'hdfs'}
2015-08-05 15:59:45,566 - File['/etc/hadoop/conf/log4j.properties'] {'content': '...', 'owner': 'hdfs', 'group': 'hadoop', 'mode': 0644}
2015-08-05 15:59:45,579 - File['/etc/hadoop/conf/hadoop-metrics2.properties'] {'content': Template('hadoop-metrics2.properties.j2'), 'owner': 'hdfs'}
2015-08-05 15:59:45,581 - File['/etc/hadoop/conf/task-log4j.properties'] {'content': StaticFile('task-log4j.properties'), 'mode': 0755}
2015-08-05 15:59:45,582 - File['/etc/hadoop/conf/configuration.xsl'] {'owner': 'hdfs', 'group': 'hadoop'}
2015-08-05 15:59:45,867 - Directory['/var/run/hadoop-yarn/yarn'] {'owner': 'yarn', 'group': 'hadoop', 'recursive': True}
2015-08-05 15:59:45,870 - Directory['/var/log/hadoop-yarn/yarn'] {'owner': 'yarn', 'group': 'hadoop', 'recursive': True}
2015-08-05 15:59:45,871 - Directory['/var/run/hadoop-mapreduce/mapred'] {'owner': 'mapred', 'group': 'hadoop', 'recursive': True}
2015-08-05 15:59:45,872 - Directory['/var/log/hadoop-mapreduce/mapred'] {'owner': 'mapred', 'group': 'hadoop', 'recursive': True}
2015-08-05 15:59:45,873 - Directory['/var/log/hadoop-yarn'] {'owner': 'yarn', 'ignore_failures': True, 'recursive': True}
2015-08-05 15:59:45,873 - XmlConfig['core-site.xml'] {'group': 'hadoop', 'conf_dir': '/etc/hadoop/conf', 'mode': 0644, 'configuration_attributes': ..., 'owner': 'hdfs', 'configurations': ...}
2015-08-05 15:59:45,908 - Generating config: /etc/hadoop/conf/core-site.xml
2015-08-05 15:59:45,909 - File['/etc/hadoop/conf/core-site.xml'] {'owner': 'hdfs', 'content': InlineTemplate(...), 'group': 'hadoop', 'mode': 0644, 'encoding': 'UTF-8'}
2015-08-05 15:59:45,912 - Writing File['/etc/hadoop/conf/core-site.xml'] because contents don't match
2015-08-05 15:59:45,913 - XmlConfig['mapred-site.xml'] {'group': 'hadoop', 'conf_dir': '/etc/hadoop/conf', 'mode': 0644, 'configuration_attributes': ..., 'owner': 'yarn', 'configurations': ...}
2015-08-05 15:59:45,939 - Generating config: /etc/hadoop/conf/mapred-site.xml
2015-08-05 15:59:45,940 - File['/etc/hadoop/conf/mapred-site.xml'] {'owner': 'yarn', 'content': InlineTemplate(...), 'group': 'hadoop', 'mode': 0644, 'encoding': 'UTF-8'}
2015-08-05 15:59:45,943 - Writing File['/etc/hadoop/conf/mapred-site.xml'] because contents don't match
2015-08-05 15:59:45,944 - Changing owner for /etc/hadoop/conf/mapred-site.xml from 1022 to yarn
2015-08-05 15:59:45,944 - XmlConfig['yarn-site.xml'] {'group': 'hadoop', 'conf_dir': '/etc/hadoop/conf', 'mode': 0644, 'configuration_attributes': ..., 'owner': 'yarn', 'configurations': ...}
2015-08-05 15:59:45,970 - Generating config: /etc/hadoop/conf/yarn-site.xml
2015-08-05 15:59:45,971 - File['/etc/hadoop/conf/yarn-site.xml'] {'owner': 'yarn', 'content': InlineTemplate(...), 'group': 'hadoop', 'mode': 0644, 'encoding': 'UTF-8'}
2015-08-05 15:59:45,975 - Writing File['/etc/hadoop/conf/yarn-site.xml'] because contents don't match
2015-08-05 15:59:45,976 - XmlConfig['capacity-scheduler.xml'] {'group': 'hadoop', 'conf_dir': '/etc/hadoop/conf', 'mode': 0644, 'configuration_attributes': ..., 'owner': 'yarn', 'configurations': ...}
2015-08-05 15:59:46,002 - Generating config: /etc/hadoop/conf/capacity-scheduler.xml
2015-08-05 15:59:46,003 - File['/etc/hadoop/conf/capacity-scheduler.xml'] {'owner': 'yarn', 'content': InlineTemplate(...), 'group': 'hadoop', 'mode': 0644, 'encoding': 'UTF-8'}
2015-08-05 15:59:46,005 - Writing File['/etc/hadoop/conf/capacity-scheduler.xml'] because contents don't match
2015-08-05 15:59:46,006 - Changing owner for /etc/hadoop/conf/capacity-scheduler.xml from 1021 to yarn
2015-08-05 15:59:46,006 - Directory['/var/log/hadoop-yarn/timeline'] {'owner': 'yarn', 'group': 'hadoop', 'recursive': True}
2015-08-05 15:59:46,007 - File['/etc/hadoop/conf/yarn.exclude'] {'owner': 'yarn', 'group': 'hadoop'}
2015-08-05 15:59:46,015 - File['/etc/security/limits.d/yarn.conf'] {'content': Template('yarn.conf.j2'), 'mode': 0644}
2015-08-05 15:59:46,020 - File['/etc/security/limits.d/mapreduce.conf'] {'content': Template('mapreduce.conf.j2'), 'mode': 0644}
2015-08-05 15:59:46,033 - File['/etc/hadoop/conf/yarn-env.sh'] {'content': InlineTemplate(...), 'owner': 'yarn', 'group': 'hadoop', 'mode': 0755}
2015-08-05 15:59:46,038 - File['/etc/hadoop/conf/mapred-env.sh'] {'content': InlineTemplate(...), 'owner': 'hdfs'}
2015-08-05 15:59:46,044 - File['/etc/hadoop/conf/taskcontroller.cfg'] {'content': Template('taskcontroller.cfg.j2'), 'owner': 'hdfs'}
2015-08-05 15:59:46,045 - XmlConfig['mapred-site.xml'] {'owner': 'mapred', 'group': 'hadoop', 'conf_dir': '/etc/hadoop/conf', 'configuration_attributes': ..., 'configurations': ...}
2015-08-05 15:59:46,071 - Generating config: /etc/hadoop/conf/mapred-site.xml
2015-08-05 15:59:46,072 - File['/etc/hadoop/conf/mapred-site.xml'] {'owner': 'mapred', 'content': InlineTemplate(...), 'group': 'hadoop', 'mode': None, 'encoding': 'UTF-8'}
2015-08-05 15:59:46,075 - Writing File['/etc/hadoop/conf/mapred-site.xml'] because contents don't match
2015-08-05 15:59:46,076 - Changing owner for /etc/hadoop/conf/mapred-site.xml from 1020 to mapred
2015-08-05 15:59:46,076 - XmlConfig['capacity-scheduler.xml'] {'owner': 'hdfs', 'group': 'hadoop', 'conf_dir': '/etc/hadoop/conf', 'configuration_attributes': ..., 'configurations': ...}
2015-08-05 15:59:46,103 - Generating config: /etc/hadoop/conf/capacity-scheduler.xml
2015-08-05 15:59:46,104 - File['/etc/hadoop/conf/capacity-scheduler.xml'] {'owner': 'hdfs', 'content': InlineTemplate(...), 'group': 'hadoop', 'mode': None, 'encoding': 'UTF-8'}
2015-08-05 15:59:46,106 - Changing owner for /etc/hadoop/conf/capacity-scheduler.xml from 1020 to hdfs
2015-08-05 15:59:46,106 - File['/etc/hadoop/conf/ssl-client.xml.example'] {'owner': 'mapred', 'group': 'hadoop'}
2015-08-05 15:59:46,107 - File['/etc/hadoop/conf/ssl-server.xml.example'] {'owner': 'mapred', 'group': 'hadoop'}
2015-08-05 15:59:46,110 - File['/var/run/hadoop-yarn/yarn/yarn-yarn-timelineserver.pid'] {'action': ['delete'], 'not_if': 'ls /var/run/hadoop-yarn/yarn/yarn-yarn-timelineserver.pid >/dev/null 2>&1 && ps `cat /var/run/hadoop-yarn/yarn/yarn-yarn-timelineserver.pid` >/dev/null 2>&1'}
2015-08-05 15:59:46,209 - Deleting File['/var/run/hadoop-yarn/yarn/yarn-yarn-timelineserver.pid']
2015-08-05 15:59:46,210 - Execute['ulimit -c unlimited; export HADOOP_LIBEXEC_DIR=/usr/lib/hadoop/libexec && /usr/lib/hadoop-yarn/sbin/yarn-daemon.sh --config /etc/hadoop/conf start timelineserver'] {'not_if': 'ls /var/run/hadoop-yarn/yarn/yarn-yarn-timelineserver.pid >/dev/null 2>&1 && ps `cat /var/run/hadoop-yarn/yarn/yarn-yarn-timelineserver.pid` >/dev/null 2>&1', 'user': 'yarn'}
2015-08-05 15:59:47,381 - Execute['ls /var/run/hadoop-yarn/yarn/yarn-yarn-timelineserver.pid >/dev/null 2>&1 && ps `cat /var/run/hadoop-yarn/yarn/yarn-yarn-timelineserver.pid` >/dev/null 2>&1'] {'initial_wait': 5, 'not_if': 'ls /var/run/hadoop-yarn/yarn/yarn-yarn-timelineserver.pid >/dev/null 2>&1 && ps `cat /var/run/hadoop-yarn/yarn/yarn-yarn-timelineserver.pid` >/dev/null 2>&1', 'user': 'yarn'}
2015-08-05 15:59:52,609 - Error while executing command 'start':
Traceback (most recent call last):
  File "/usr/lib/python2.6/site-packages/resource_management/libraries/script/script.py", line 123, in execute
    method(env)
  File "/var/lib/ambari-agent/cache/stacks/HDP/2.0.6/services/YARN/package/scripts/application_timeline_server.py", line 42, in start
    service('timelineserver', action='start')
  File "/var/lib/ambari-agent/cache/stacks/HDP/2.0.6/services/YARN/package/scripts/service.py", line 59, in service
    initial_wait=5
  File "/usr/lib/python2.6/site-packages/resource_management/core/base.py", line 148, in __init__
    self.env.run()
  File "/usr/lib/python2.6/site-packages/resource_management/core/environment.py", line 149, in run
    self.run_action(resource, action)
  File "/usr/lib/python2.6/site-packages/resource_management/core/environment.py", line 115, in run_action
    provider_action()
  File "/usr/lib/python2.6/site-packages/resource_management/core/providers/system.py", line 241, in action_run
    raise ex
Fail: Execution of 'ls /var/run/hadoop-yarn/yarn/yarn-yarn-timelineserver.pid >/dev/null 2>&1 && ps `cat /var/run/hadoop-yarn/yarn/yarn-yarn-timelineserver.pid` >/dev/null 2>&1' returned 1.      

问题原因:

由于ambari-1.7版本对"AppTimeline Server"进行了调整,出现与hdp2.1不兼容的现象。

网摘:

In Ambari 1.7.0, the pid file /var/run/hadoop-yarn/yarn/yarn-yarn-historyserver.pid was deprecated
in favor of /var/run/hadoop-yarn/yarn/yarn-yarn-timelineserver.pid to be consistent.
You may want to run ps on the timeline server process, kill it, and delete both pid files
above, then attempt to start it again.      

解决方法:

[[email protected]~]#su -s /bin/bash - yarn -c "ulimit -c unlimited; exportHADOOP_LIBEXEC_DIR=/usr/lib/hadoop/libexec &&/usr/lib/hadoop-yarn/sbin/yarn-daemon.sh --config /etc/hadoop/confstartorg.apache.hadoop.yarn.server.applicationhistoryservice.ApplicationHistoryServer"

[[email protected]~]#jps|grep ApplicationHistoryServer

53018 ApplicationHistoryServer

[[email protected]~]#echo 53018 >/var/run/hadoop-yarn/yarn/yarn-yarn-timelineserver.pid

解决思路:

查看其他集群上正常运行的程序:

在ambari1.6+hdp2.1上[正常]的启动命令是:

[[email protected]~]#ps -ef |grep ApplicationHistoryServer

/usr/jdk64/jdk1.7.0_45/bin/java-Dproc_historyserver -Xmx1024m-Dhadoop.log.dir=/var/log/hadoop-yarn/yarn-Dyarn.log.dir=/var/log/hadoop-yarn/yarn-Dhadoop.log.file=yarn-yarn-historyserver-UHDATA013.log-Dyarn.log.file=yarn-yarn-historyserver-UHDATA013.log-Dyarn.home.dir= -Dyarn.id.str=yarn -Dhadoop.root.logger=INFO,RFA-Dyarn.root.logger=INFO,RFA-Djava.library.path=:/usr/lib/hadoop/lib/native/Linux-amd64-64:/usr/lib/hadoop/lib/native:/usr/lib/hadoop/lib/native/Linux-amd64-64:/usr/lib/hadoop/lib/native-Dyarn.policy.file=hadoop-policy.xml-Dhadoop.log.dir=/var/log/hadoop-yarn/yarn-Dyarn.log.dir=/var/log/hadoop-yarn/yarn-Dhadoop.log.file=yarn-yarn-historyserver-UHDATA013.log-Dyarn.log.file=yarn-yarn-historyserver-UHDATA013.log-Dyarn.home.dir=/usr/lib/hadoop-yarn-Dhadoop.home.dir=/usr/lib/hadoop-yarn -Dhadoop.root.logger=INFO,RFA-Dyarn.root.logger=INFO,RFA-Djava.library.path=:/usr/lib/hadoop/lib/native/Linux-amd64-64:/usr/lib/hadoop/lib/native:/usr/lib/hadoop/lib/native/Linux-amd64-64:/usr/lib/hadoop/lib/native-classpath/etc/hadoop/conf:/etc/hadoop/conf:/etc/hadoop/conf:/usr/lib/hadoop/lib/*:/usr/lib/hadoop/.//*:/usr/lib/hadoop-hdfs/./:/usr/lib/hadoop-hdfs/lib/*:/usr/lib/hadoop-hdfs/.//*:/usr/lib/hadoop-yarn/lib/*:/usr/lib/hadoop-yarn/.//*:/usr/lib/hadoop-mapreduce/lib/*:/usr/lib/hadoop-mapreduce/.//*::/usr/share/java/mysql-connector-java-5.1.17.jar:/usr/share/java/mysql-connector-java.jar:/usr/lib/hadoop-mapreduce/*:/usr/lib/tez/*:/usr/lib/tez/lib/*:/etc/tez/conf:/usr/share/java/mysql-connector-java-5.1.17.jar:/usr/share/java/mysql-connector-java.jar:/usr/lib/hadoop-mapreduce/*:/usr/lib/tez/*:/usr/lib/tez/lib/*:/etc/tez/conf:/usr/lib/hadoop-yarn/.//*:/usr/lib/hadoop-yarn/lib/*:/etc/hadoop/conf/ahs-config/log4j.propertiesorg.apache.hadoop.yarn.server.applicationhistoryservice.ApplicationHistoryServer

在故障集群上(ambari1.7+hdp2.1)上,经过多次尝试启动"AppTimeline Server",抓取到的启动命令是:

yarn 22398 22363 0 16:56 ? 00:00:00/usr/java/default//bin/java -Dproc_timelineserver -Xmx1024m-Dhadoop.log.dir=/var/log/hadoop-yarn/yarn-Dyarn.log.dir=/var/log/hadoop-yarn/yarn-Dhadoop.log.file=yarn-yarn-timelineserver-UHVDATA013.log-Dyarn.log.file=yarn-yarn-timelineserver-UHVDATA013.log-Dyarn.home.dir= -Dyarn.id.str=yarn -Dhadoop.root.logger=INFO,RFA-Dyarn.root.logger=INFO,RFA-Djava.library.path=:/usr/lib/hadoop/lib/native/Linux-amd64-64:/usr/lib/hadoop/lib/native:/usr/lib/hadoop/lib/native/Linux-amd64-64:/usr/lib/hadoop/lib/native-Dyarn.policy.file=hadoop-policy.xml-Dhadoop.log.dir=/var/log/hadoop-yarn/yarn-Dyarn.log.dir=/var/log/hadoop-yarn/yarn-Dhadoop.log.file=yarn-yarn-timelineserver-UHVDATA013.log-Dyarn.log.file=yarn-yarn-timelineserver-UHVDATA013.log-Dyarn.home.dir=/usr/lib/hadoop-yarn-Dhadoop.home.dir=/usr/lib/hadoop-yarn -Dhadoop.root.logger=INFO,RFA-Dyarn.root.logger=INFO,RFA-Djava.library.path=:/usr/lib/hadoop/lib/native/Linux-amd64-64:/usr/lib/hadoop/lib/native:/usr/lib/hadoop/lib/native/Linux-amd64-64:/usr/lib/hadoop/lib/native-classpath/etc/hadoop/conf:/etc/hadoop/conf:/etc/hadoop/conf:/usr/lib/hadoop/lib/*:/usr/lib/hadoop/.//*:/usr/lib/hadoop-hdfs/./:/usr/lib/hadoop-hdfs/lib/*:/usr/lib/hadoop-hdfs/.//*:/usr/lib/hadoop-yarn/lib/*:/usr/lib/hadoop-yarn/.//*:/usr/lib/hadoop-mapreduce/lib/*:/usr/lib/hadoop-mapreduce/.//*::/usr/share/java/mysql-connector-java-5.1.17.jar:/usr/share/java/mysql-connector-java.jar:/usr/lib/hadoop-mapreduce/*:/usr/lib/tez/*:/usr/lib/tez/lib/*:/etc/tez/conf:/usr/share/java/mysql-connector-java-5.1.17.jar:/usr/share/java/mysql-connector-java.jar:/usr/lib/hadoop-mapreduce/*:/usr/lib/tez/*:/usr/lib/tez/lib/*:/etc/tez/conf:/usr/lib/hadoop-yarn/.//*:/usr/lib/hadoop-yarn/lib/*timelineserver

出错日志:

[[email protected]~]#cat/var/log/hadoop-yarn/yarn/yarn-yarn-timelineserver-UHVDATA013.out

Error:Could not find or load main class timelineserver

ulimit-a

corefile size (blocks, -c) unlimited

dataseg size (kbytes, -d) unlimited

schedulingpriority (-e) 0

filesize (blocks, -f) unlimited

pendingsignals (-i) 1032173

maxlocked memory (kbytes, -l) 64

maxmemory size (kbytes, -m) unlimited

openfiles (-n) 32768

pipesize (512 bytes, -p) 8

POSIXmessage queues (bytes, -q) 819200

real-timepriority (-r) 0

stacksize (kbytes, -s) 10240

cputime (seconds, -t) unlimited

maxuser processes (-u) 65535

virtualmemory (kbytes, -v) unlimited

filelocks (-x) unlimited

继续跟踪程序调用关系,得到:

真正的启动栈(注意进程的父子关系):

root 47604 42940 24 17:32 ? 00:00:00 /usr/bin/python2.6/var/lib/ambari-agent/cache/stacks/HDP/2.0.6/services/YARN/package/scripts/application_timeline_server.pySTART /var/lib/ambari-agent/data/command-3530.json/var/lib/ambari-agent/cache/stacks/HDP/2.0.6/services/YARN/package/var/lib/ambari-agent/data/structured-out-3530.json INFO/var/lib/ambari-agent/data/tmp

root 47624 47604 1 17:32 ? 00:00:00 su-s /bin/bash - yarn -c ulimit -c unlimited; exportHADOOP_LIBEXEC_DIR=/usr/lib/hadoop/libexec &&/usr/lib/hadoop-yarn/sbin/yarn-daemon.sh --config /etc/hadoop/confstart timelineserver

yarn 47625 47624 1 17:32 ? 00:00:00 -bash -c ulimit -cunlimited; export HADOOP_LIBEXEC_DIR=/usr/lib/hadoop/libexec &&/usr/lib/hadoop-yarn/sbin/yarn-daemon.sh --config /etc/hadoop/confstart timelineserver

yarn 47642 47625 3 17:32 ? 00:00:00 bash/usr/lib/hadoop-yarn/sbin/yarn-daemon.sh --config /etc/hadoop/confstart timelineserver

yarn 47677 47642 59 17:32 ? 00:00:00/usr/java/default//bin/java -Dproc_timelineserver -Xmx1024m-Dhadoop.log.dir=/var/log/hadoop-yarn/yarn-Dyarn.log.dir=/var/log/hadoop-yarn/yarn-Dhadoop.log.file=yarn-yarn-timelineserver-UHVDATA013.log-Dyarn.log.file=yarn-yarn-timelineserver-UHVDATA013.log-Dyarn.home.dir= -Dyarn.id.str=yarn -Dhadoop.root.logger=INFO,RFA-Dyarn.root.logger=INFO,RFA-Djava.library.path=:/usr/lib/hadoop/lib/native/Linux-amd64-64:/usr/lib/hadoop/lib/native:/usr/lib/hadoop/lib/native/Linux-amd64-64:/usr/lib/hadoop/lib/native-Dyarn.policy.file=hadoop-policy.xml-Dhadoop.log.dir=/var/log/hadoop-yarn/yarn-Dyarn.log.dir=/var/log/hadoop-yarn/yarn-Dhadoop.log.file=yarn-yarn-timelineserver-UHVDATA013.log-Dyarn.log.file=yarn-yarn-timelineserver-UHVDATA013.log-Dyarn.home.dir=/usr/lib/hadoop-yarn-Dhadoop.home.dir=/usr/lib/hadoop-yarn -Dhadoop.root.logger=INFO,RFA-Dyarn.root.logger=INFO,RFA-Djava.library.path=:/usr/lib/hadoop/lib/native/Linux-amd64-64:/usr/lib/hadoop/lib/native:/usr/lib/hadoop/lib/native/Linux-amd64-64:/usr/lib/hadoop/lib/native-classpath/etc/hadoop/conf:/etc/hadoop/conf:/etc/hadoop/conf:/usr/lib/hadoop/lib/*:/usr/lib/hadoop/.//*:/usr/lib/hadoop-hdfs/./:/usr/lib/hadoop-hdfs/lib/*:/usr/lib/hadoop-hdfs/.//*:/usr/lib/hadoop-yarn/lib/*:/usr/lib/hadoop-yarn/.//*:/usr/lib/hadoop-mapreduce/lib/*:/usr/lib/hadoop-mapreduce/.//*::/usr/share/java/mysql-connector-java-5.1.17.jar:/usr/share/java/mysql-connector-java.jar:/usr/lib/hadoop-mapreduce/*:/usr/lib/tez/*:/usr/lib/tez/lib/*:/etc/tez/conf:/usr/share/java/mysql-connector-java-5.1.17.jar:/usr/share/java/mysql-connector-java.jar:/usr/lib/hadoop-mapreduce/*:/usr/lib/tez/*:/usr/lib/tez/lib/*:/etc/tez/conf:/usr/lib/hadoop-yarn/.//*:/usr/lib/hadoop-yarn/lib/*timelineserver

其中红色命令行为其实际启动命令,

su-s /bin/bash - yarn -c ulimit -c unlimited; exportHADOOP_LIBEXEC_DIR=/usr/lib/hadoop/libexec &&/usr/lib/hadoop-yarn/sbin/yarn-daemon.sh --config /etc/hadoop/confstart timelineserver

通过对比正常运行的命令行和出错日志信息猜测,将timelineserver替换成org.apache.hadoop.yarn.server.applicationhistoryservice.ApplicationHistoryServer,进行尝试,发现运行正常。

并将进程id写入pid文件/var/run/hadoop-yarn/yarn/yarn-yarn-timelineserver.pid中,然后再查看ambari管理界面,显示正常了。

跟踪脚本:

while: ;do

sleep0.5;

cat/var/run/hadoop-yarn/yarn/yarn-yarn-timelineserver.pid 2>/dev/null\

&&{ ps -ef|grep -v grep|grep `cat/var/run/hadoop-yarn/yarn/yarn-yarn-timelineserver.pid` ;

ps-ef >>/tmp/qiyfongtestimel.log; } ;

done

继续阅读