天天看點

【Azure 事件中心】适用Mirror Maker生産資料發送到Azure Event Hub出現發送一段時間後Timeout Exception: Expiring 18 record(s) for xxxxxxx: 79823 ms has passed since last append

Timeout Exception: Expiring 18 record(s) for xxxxxxx: 79823 ms has passed since last append

問題描述

根據“将 Apache Kafka MirrorMaker 與事件中心配合使用”一文,成功配置了Mirror Maker來發送資料到Event Hub中。為什麼隻能成功運作一會(10分鐘 ~ 2小時左右)就會出現Timeout Exception,然後Kafka MirrorMaker就會中斷退出呢?

異常消息為:

[2022-05-25 14:29:21,683] INFO [Producer clientId=mirror_maker_producer] Proceeding to force close the producer since pending requests could not be completed within timeout 0 ms. 
(org.apache.kafka.clients.producer.KafkaProducer)
[2022-05-25 14:29:21,683] DEBUG [Producer clientId=mirror_maker_producer] Kafka producer has been closed (org.apache.kafka.clients.producer.KafkaProducer)
[2022-05-25 14:29:21,683] ERROR Error when sending message to topic xxxxxxxxxxxx with key: 16 bytes, value: 875 bytes with error: 
(org.apache.kafka.clients.producer.internals.ErrorLoggingCallback)
org.apache.kafka.common.errors.TimeoutException: Expiring 18 record(s) for xxxxxxxxxxxx-4: 79823 ms has passed since last append
[2022-05-25 14:29:21,683] INFO Closing producer due to send failure. (kafka.tools.MirrorMaker$)
[2022-05-25 14:29:21,683] INFO [Producer clientId=mirror_maker_producer] Closing the Kafka producer with timeoutMillis = 0 ms. (org.apache.kafka.clients.producer.KafkaProducer)
[2022-05-25 14:29:21,683] INFO [Producer clientId=mirror_maker_producer] Proceeding to force close the producer since pending requests could not be completed within timeout 0 ms. 
(org.apache.kafka.clients.producer.KafkaProducer)
[2022-05-25 14:29:21,683] DEBUG [Producer clientId=mirror_maker_producer] Kafka producer has been closed (org.apache.kafka.clients.producer.KafkaProducer)
[2022-05-25 14:29:21,683] ERROR Error when sending message to topic xxxxxxxxxxxx with key: 16 bytes, value: 875 bytes with error: 
(org.apache.kafka.clients.producer.internals.ErrorLoggingCallback)
org.apache.kafka.common.errors.TimeoutException: Expiring 18 record(s) for xxxxxxxxxxxx-4: 79823 ms has passed since last append
[2022-05-25 14:29:21,683] INFO Closing producer due to send failure. (kafka.tools.MirrorMaker$)
[2022-05-25 14:29:21,683] INFO [Producer clientId=mirror_maker_producer] Closing the Kafka producer with timeoutMillis = 0 ms. (org.apache.kafka.clients.producer.KafkaProducer)      

問題解析

根據錯誤消息 " Expiring 18 record(s) for xxxxxxxxxxxx-4: 79823 ms has passed since last append " 在網上進行搜尋,對發生問題的解釋有:

1) Stack Overflow上的解釋:https://stackoverflow.com/questions/56807188/how-to-fix-kafka-common-errors-timeoutexception-expiring-1-records-xxx-ms-has

The error indicates that some records are put into the queue at a faster rate than they can be sent from the client.

錯誤表示Mirror Maker中消息進入Queue中的速度快于從目前用戶端發送到服務端的速度(服務端是 Event Hub)

When your Producer sends messages, they are stored in buffer (before sending them to the target broker) and the records are grouped together into batches in order to increase throughput. When a new record is added to the batch, it must be sent within a -configurable- time window which is controlled by 

request.timeout.ms

 (the default is set to 30 seconds). If the batch is in the queue for longer time, a 

TimeoutException

 is thrown and the batch records will then be removed from the queue and won't be delivered to the broker.

當Mirror Maker(生産者)發送消息時,它們被存儲在緩沖區中(在将它們發送到目标代理之前),并且記錄被分組到一起以增加吞吐量。 将新記錄添加到批次時,它必須在由 request.timeout.ms 控制的時間視窗内發送(預設設定為 30 秒)。 如果批處理在隊列中的時間較長,則會引發 TimeoutException,然後批處理記錄将從隊列中删除并且不會傳遞給代理。

Increasing the value of 

request.timeout.ms

 should do the trick for you.

增加 request.timeout.ms 的值應該可以解決問題。

2) 部落格園解釋:https://blog.csdn.net/weixin_43432984/article/details/109180842

當每一批消息滿了(batch.size)且 requestTimeoutMs < (now - this.lastAppendTime)) 這一批消息就會被标記為過期且不會放到 RecordAccumulator 中(不會再次重試發送)

調大batch.size 參數和request.timeout.ms 參數可解決問題

3) 為什麼一出現異常就馬上停止運作呢?

因為Mirror Maker的配置參數,abort.on.send.failure 預設為true,決定生産者寫入失敗時的處理機制就是Abort,終止發送。

從 maybeExpire 函數的源碼中發現異常消息産生的根源:

【Azure 事件中心】适用Mirror Maker生産資料發送到Azure Event Hub出現發送一段時間後Timeout Exception: Expiring 18 record(s) for xxxxxxx: 79823 ms has passed since last append

是以根據以上的資訊,隻需要修改 batch.size 和 request.timeout.ms 參數即可。

修改前:
    request.timeout.ms = 60000
    batch.size = 16384

修改後:
    request.timeout.ms=180000
    batch.size=50000      

使用修改後的參數運作Mirror Maker,發送資料到Azure Event Hub正常,連續運作2天沒見Timeout 異常。問題解決!

參考資料

記一次Kafka Producer TimeoutException排查:https://blog.csdn.net/weixin_43432984/article/details/109180842

How to fix kafka.common.errors.TimeoutException: Expiring 1 record(s) xxx ms has passed since batch creation plus linger time:https://stackoverflow.com/questions/56807188/how-to-fix-kafka-common-errors-timeoutexception-expiring-1-records-xxx-ms-has

将 Apache Kafka MirrorMaker 與事件中心配合使用:https://docs.azure.cn/zh-cn/event-hubs/event-hubs-kafka-mirror-maker-tutorial

kafka-mirror-maker.sh腳本: https://blog.csdn.net/qq_41154944/article/details/108282641

當在複雜的環境中面臨問題,格物之道需:濁而靜之徐清,安以動之徐生。 雲中,恰是如此!

繼續閱讀