天天看点

Impala Heartbeat Timeout

某天线上的ImpalaJob日志时不时的报错:

ERROR:No backends configured

Couldnot execute command:xxx

并且不是一直报错,查看statstore的日志发现:

1

2

3

4

5

6

7

8

9

<code>I022316:18:34.338698 10924 state-store.cc:194] Creating </code><code>new</code> <code>topic:</code><code>''</code><code>impala-membership</code><code>' on behalf of subscriber: '</code><code>xxxhostname:22000</code>

<code>I022316:18:34.338739 10924 state-store.cc:200] Registering: xxxhostname:22000</code>

<code>I022316:18:34.339864 10904 state-store.cc:355] Unable to update subscriber atxxxhostname:23000,  received errorCouldn't open transport </code><code>for</code> <code>xxxhostname:23000(connect() failed: Connectionrefused)</code>

<code>I022316:18:34.840463 10904 state-store.cc:355] Unable to update subscriber atxxxhostname:23000,  received errorCouldn't open transport </code><code>for</code> <code>xxxhostname:23000(connect() failed: Connectionrefused)</code>

<code>I022316:18:35.341156 10904 state-store.cc:355] Unable to update subscriber atxxxhostname:23000,  received errorCouldn't open transport </code><code>for</code> <code>xxxhostname:23000(connect() failed: Connectionrefused)</code>

<code>I022316:18:36.843724 10904 state-store.cc:365] Subscriber: xxxhostname:22000 haseither failed or disconnected.</code>

<code>I022316:18:47.536650 10911 state-store.cc:355] Unable to update subscriber atxxxhostname:23000,  received errorCouldn't open transport </code><code>for</code> <code>xxxhostname:23000(connect() failed: Connectionrefused)</code>

<code>I022316:18:54.343158 10924 state-store.cc:200] Registering: xxxhostname:22000</code>

<code>W022316:18:54.343209 10924 state-store.cc:215] Duplicate registration of subscriber:xxxhostname:22000, possible duplicate subscriber IDs or recovering subscriber</code>

再看xxxxhostname果然一直也注册失败:

<code>I022421:32:38.173282 33088 state-store-subscriber.cc:169] Trying to </code><code>register</code><code>...</code>

<code>I022421:32:38.173743 33088 state-store-subscriber.cc:172] Reconnected tostate-store. Exiting recovery mode</code>

<code>I022421:32:48.173959 33088 state-store-subscriber.cc:166] xxxhostname:22000:Connection with state-store lost, entering recovery mode</code>

<code>I022421:32:48.174018 33088 state-store-subscriber.cc:169] Trying to </code><code>register</code><code>...</code>

<code>W022421:32:48.174432 33088 state-store-subscriber.cc:181] Failed to re-</code><code>register</code> <code>withstate-store: Duplicate registration of subscriber: xxxhostname:22000</code>

估计是heartbeattimeout了,不过没找到文档有相关参数的详细解释,直接翻了下代码把相关的参数列下:最后修改statestore_subscriber_timeout_seconds=60s重启生效.

<code>statestore_subscriber_timeout_seconds, 10, </code><code>"The amount of time (in seconds) that may elapse before the connection with the statestore is considered lost."</code><code>;</code>

<code>statestore_subscriber_cnxn_attempts, 10, </code><code>"The number of times to retry an RPC connection to the statestore. A setting of 0 means retry indefinitely"</code><code>;</code>

<code>statestore_subscriber_cnxn_retry_interval_ms, 3000, </code><code>"The interval, in ms, to wait between attempts to make an RPC connection to the statestore."</code><code>;</code>

<code>statestore_max_missed_heartbeats, 5, </code><code>"Maximum number of consecutive heartbeats an impalad can miss before being declared failed by the statestore."</code><code>;</code>

<code>statestore_suspect_heartbeats, 2, </code><code>"(Advanced) Number of consecutive heartbeats an impalad can miss before being suspected of failure by the statestore"</code><code>;</code>

<code>statestore_num_heartbeat_threads, 10, </code><code>"(Advanced) Number of threads used to  send heartbeats in parallel to all registered subscribers."</code><code>;</code>

<code>statestore_heartbeat_frequency_ms, 500, </code><code>"(Advanced) Frequency (in ms) with which the statestore sends heartbeats to subscribers."</code><code>;</code>

本文转自MIKE老毕 51CTO博客,原文链接:http://blog.51cto.com/boylook/1367252,如需转载请自行联系原作者

继续阅读