文章目錄
-
- 問題
-
- 現象
- 原因
- 解決
- 其他處理方式
-
- 單表配置方式
- 指令行方式
- 配置檔案方式
- 驗證配置是否生效
- 參考
問題
現象
- 機房斷電,伺服器重新開機服務啟動後,發現資料寫入報錯,檢視clickhouse報錯日志
- 關鍵提示為
和TOO_MANY_PARTS
,大概意思是太多資料碎片,要移除的損壞碎片為12,但允許的最大數量為10DB::Exception: Suspiciously many (12 parts, 0.00 B in total) broken parts to remove while maximum allowed broken parts count is 10.
- 後面也給了修複意見
。大緻意思是,修改表參數You can change the maximum value with merge tree setting 'max_suspicious_broken_parts' in <merge_tree> configuration section or in table settings in .sql file (don't forget to return setting back to default value)
,但是别忘了恢複後改回來max_suspicious_broken_parts
- 也指明了出錯的表的part
Cannot attach table radar.signal_status from metadata file /var/lib/clickhouse/store/422/4222e684-3a04-4de6-bacc-879c855ef94c/signal_status.sql from query ATTACH TABLE radar.signal_status
0. DB::Exception::Exception(std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&, int, bool) @ 0xaebed1a in /usr/bin/clickhouse
1. DB::Exception::Exception<unsigned long&>(int, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&, unsigned long&) @ 0xaf096b2 in /usr/bin/clickhouse
2. DB::MergeTreeData::delayInsertOrThrowIfNeeded(Poco::Event*) const @ 0x15488280 in /usr/bin/clickhouse
3. ? @ 0x15c0a4e9 in /usr/bin/clickhouse
4. DB::ExceptionKeepingTransform::work() @ 0x15c09c94 in /usr/bin/clickhouse
5. DB::ExecutionThreadContext::executeTask() @ 0x15a60ca3 in /usr/bin/clickhouse
6. DB::PipelineExecutor::executeStepImpl(unsigned long, std::__1::atomic<bool>*) @ 0x15a54b7e in /usr/bin/clickhouse
7. DB::PipelineExecutor::executeStep(std::__1::atomic<bool>*) @ 0x15a54280 in /usr/bin/clickhouse
8. DB::PushingPipelineExecutor::start() @ 0x15a68d14 in /usr/bin/clickhouse
9. DB::SystemLog<DB::MetricLogElement>::flushImpl(std::__1::vector<DB::MetricLogElement, std::__1::allocator<DB::MetricLogElement> > const&, unsigned long) @ 0x14d9b618 in /usr/bin/clickhouse
10. DB::SystemLog<DB::MetricLogElement>::savingThreadFunction() @ 0x14d99ab5 in /usr/bin/clickhouse
11. ? @ 0xaf80cb1 in /usr/bin/clickhouse
12. ThreadPoolImpl<std::__1::thread>::worker(std::__1::__list_iterator<std::__1::thread, void*>) @ 0xaf62837 in /usr/bin/clickhouse
13. ? @ 0xaf662fd in /usr/bin/clickhouse
14. start_thread @ 0x7ea5 in /usr/lib64/libpthread-2.17.so
15. __clone @ 0xfe96d in /usr/lib64/libc-2.17.so
(version 22.2.2.1)
2022.12.08 15:07:08.647866 [ 18188 ] {} <Error> void DB::SystemLog<DB::MetricLogElement>::flushImpl(const std::vector<LogElement> &, uint64_t) [LogElement = DB::MetricLogElement]: Code: 252. DB::Exception: Too many parts (300). Merges are processing significantly slower than inserts. (TOO_MANY_PARTS), Stack trace (when copying this message, always include the lines below):
0. DB::Exception::Exception(std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&, int, bool) @ 0xaebed1a in /usr/bin/clickhouse
1. DB::Exception::Exception<unsigned long&>(int, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&, unsigned long&) @ 0xaf096b2 in /usr/bin/clickhouse
2. DB::MergeTreeData::delayInsertOrThrowIfNeeded(Poco::Event*) const @ 0x15488280 in /usr/bin/clickhouse
3. ? @ 0x15c0a4e9 in /usr/bin/clickhouse
4. DB::ExceptionKeepingTransform::work() @ 0x15c09c94 in /usr/bin/clickhouse
5. DB::ExecutionThreadContext::executeTask() @ 0x15a60ca3 in /usr/bin/clickhouse
6. DB::PipelineExecutor::executeStepImpl(unsigned long, std::__1::atomic<bool>*) @ 0x15a54b7e in /usr/bin/clickhouse
7. DB::PipelineExecutor::executeStep(std::__1::atomic<bool>*) @ 0x15a54280 in /usr/bin/clickhouse
8. DB::PushingPipelineExecutor::start() @ 0x15a68d14 in /usr/bin/clickhouse
9. DB::SystemLog<DB::MetricLogElement>::flushImpl(std::__1::vector<DB::MetricLogElement, std::__1::allocator<DB::MetricLogElement> > const&, unsigned long) @ 0x14d9b618 in /usr/bin/clickhouse
10. DB::SystemLog<DB::MetricLogElement>::savingThreadFunction() @ 0x14d99ab5 in /usr/bin/clickhouse
11. ? @ 0xaf80cb1 in /usr/bin/clickhouse
12. ThreadPoolImpl<std::__1::thread>::worker(std::__1::__list_iterator<std::__1::thread, void*>) @ 0xaf62837 in /usr/bin/clickhouse
13. ? @ 0xaf662fd in /usr/bin/clickhouse
14. start_thread @ 0x7ea5 in /usr/lib64/libpthread-2.17.so
15. __clone @ 0xfe96d in /usr/lib64/libc-2.17.so
(version 22.2.2.1)
2022.12.08 15:07:09.448149 [ 18117 ] {} <Error> Application: DB::Exception: Suspiciously many (12 parts, 0.00 B in total) broken parts to remove while maximum allowed broken parts count is 10. You can change the maximum value with merge tree setting 'max_suspicious_broken_parts' in <merge_tree> configuration section or in table settings in .sql file (don't forget to return setting back to default value): Cannot attach table `radar`.`signal_status` from metadata file /var/lib/clickhouse/store/422/4222e684-3a04-4de6-bacc-879c855ef94c/signal_status.sql from query ATTACH TABLE radar.signal_status UUID '85937e46-5e81-4d48-bcfc-dbd2a738c086' (`time_stamp` DateTime COMMENT '信控上報時間(目前燈色開始時間)', `intersection_number` Int32 COMMENT '交叉口編号', `pattern_number` Int16 COMMENT '方案編号', `working_mode` String COMMENT '信号機工作模式', `stage_index` Int8 COMMENT '目前階段序号', `phase_number` Int8 COMMENT '目前執行相位', `stage_status` String COMMENT '目前階段燈色', `len_lamp_light` Int16 COMMENT '目前燈色的持續時間', `cycle` Int16 COMMENT '方案周期(秒)', `green_movement` Array(String) COMMENT '目前相位綠燈可通行進口流向', `green_lane` Array(String) COMMENT '目前相位綠燈可通行進口車道編号', `stage_end_time` DateTime COMMENT '目前燈色結束時間', `cycle_start_time` DateTime COMMENT '本周期開始時間') ENGINE = MergeTree PARTITION BY toYYYYMM(time_stamp) PRIMARY KEY time_stamp ORDER BY (time_stamp, intersection_number) TTL time_stamp + toIntervalMonth(6) SETTINGS index_granularity = 8192, old_parts_lifetime = 300
原因
- 這個是發生在機器斷電場景下的報錯,查找原因是說因為寫入資料造成的中繼資料和資料不一緻問題
- clickhouse在重新開機服務的時候會重新加載MergeTree表引擎資料,資料可能存在損壞情況
-
clickhouse配置原因
配置參數當中包含一個參數max_suspicious_broken_parts,預設值是10,可選值範圍是任意正整數,如果單個分區中的損壞部分數量超過max_suspicious_broken_parts 配置的值,則拒絕自動修複或者拒絕删除損壞部分的資料,并且服務啟動時候直接報錯退出
- 目前需要盡量避免該錯誤以免服務啟動失敗,推薦把該參數配置為1000或者更大的值
解決
- 我是按照提示,找到對應檔案
/var/lib/clickhouse/store/422/4222e684-3a04-4de6-bacc-879c855ef94c/signal_status.sql
- 修改了表配置,設定max_suspicious_broken_parts=20(大于損壞檔案數量)
- 重新開機資料庫就好了
其他處理方式
單表配置方式
在建立MergeTree表的時候特别配置一下max_suspicious_broken_parts參數
CREATE TABLE foo
(
`A` Int64
)
ENGINE = MergeTree
ORDER BY tuple()
SETTINGS max_suspicious_broken_parts = 1000;
指令行方式
使用ALTER TABLE … MODIFY SETTING指令修改
ALTER TABLE foo
MODIFY SETTING max_suspicious_broken_parts = 1000;
-- 恢複預設值
-- reset to default (use value from system.merge_tree_settings)
ALTER TABLE foo
RESET SETTING max_suspicious_broken_parts;
配置檔案方式
如果服務起不來了,就隻能使用這個方式解決
- 建立檔案max_suspicious_broken_parts.xml寫入如下内容
<?xml version="1.0"?>
<yandex>
<merge_tree>
<max_suspicious_broken_parts>1000</max_suspicious_broken_parts>
</merge_tree>
</yandex>
- clickhouse的配置檔案推薦放置在/etc/clickhouse-server/config.d/檔案夾下生效
- 如果是在Ubuntu或者Centos上面以DEB或RPM安裝包的形式啟動的,需要把該檔案放到/etc/clickhouse-server/config.d/,最後重新開機clickhouse就可以了
-
如果是docker compose方式啟動
修改compose.yaml配置如下,主要也是把對應檔案挂載進入容器内部相應位置
services:
clickhouse:
image: clickhouse/clickhouse-server
ulimits:
nofile:
soft: 262144
hard: 262144
restart: always
container_name: demo-clickhouse
environment:
- CLICKHOUSE_USER=demo
- CLICKHOUSE_PASSWORD=demo-pass
- CLICKHOUSE_DB=demo
ports:
- "8123:8123"
- "9000:9000"
volumes:
- ./max_suspicious_broken_parts.xml:/etc/clickhouse-server/config.d/max_suspicious_broken_parts.xml
- demo-clickhouse:/var/lib/clickhouse
healthcheck:
test: 'wget -O - http://127.0.0.1:8123 || exit 1'
volumes:
demo-clickhouse: {}
驗證配置是否生效
- 連接配接clickhouse,查詢
SELECT *
FROM system.merge_tree_settings
WHERE name LIKE '%max_suspicious_broken_parts%'
參考
- 解決clickhouse伺服器啟動異常Suspiciously many broken parts to remove
- 官方文檔:MergeTree table settings