天天看點

There is an overlap in the region chain修複

ERROR: (region day_hotstatic,860010-2355010000_20140417_12_entry_00000000321,1400060700465.fda3b0aca340570aeb64410c97e3cb73.) Multiple regions have the same startkey: 860010-2355010000_20140417_12_entry_00000000321

ERROR: (region day_hotstatic,860010-2355010000_20140417_12_entry_00000000321,1398674475358.0dc205736ec1e890bd2d37a2e3220acc.) Multiple regions have the same startkey: 860010-2355010000_20140417_12_entry_00000000321

ERROR: (regions day_hotstatic,860010-2355010000_20140417_12_entry_00000000321,1398674475358.0dc205736ec1e890bd2d37a2e3220acc. and day_hotstatic,860010-2368000000_20140413_14_visit_00000001964,1400060700465.a590268ef714ef76779486a62fe837a3.) There is an overlap in the region chain.

14/05/15 15:35:16 WARN util.HBaseFsck: reached end of problem group: 860010-2368010000_20140417_14_exit_00000000390

ERROR: Found inconsistency in table day_hotstatic

14/05/15 15:35:16 WARN util.HBaseFsck: Naming new problem group: 860010-2155000000_201404_4_entry_00000001763

ERROR: (region month_hotstatic,860010-2155000000_201404_4_entry_00000001763,1399568279705.1edc38d93e59257da8f1b3dadf68ac0b.) Multiple regions have the same startkey: 860010-2155000000_201404_4_entry_00000001763

ERROR: (region month_hotstatic,860010-2155000000_201404_4_entry_00000001763,1399958842442.ffdf1bbbbf06c0a4ecfb3a1f67568128.) Multiple regions have the same startkey: 860010-2155000000_201404_4_entry_00000001763

ERROR: (region month_hotstatic,860010-2288000000_201405_5_exit_00000047486,1399568279705.b323293466c60bcda712421657c43d5d.) Multiple regions have the same startkey: 860010-2288000000_201405_5_exit_00000047486

ERROR: (region month_hotstatic,860010-2288000000_201405_5_exit_00000047486,1399958848239.fb5eb32a3d25471b61dded04012de31f.) Multiple regions have the same startkey: 860010-2288000000_201405_5_exit_00000047486

14/05/15 15:35:16 WARN util.HBaseFsck: reached end of problem group: null

ERROR: Found inconsistency in table month_hotstatic

修複方法:找到start_key和end_key相同的幾個region,把它們的從hdfs上删除掉。然後用add_table重建meta表(會導緻丢失資料) 

    這個過程也是一個hbase的bug産生的,這個bug來自于重新開機過程。複現問題也很容易,進行以下幾步即可複現: 

    1 找到一台正在split的region所在的rs 

    2 kill掉該台rs 

    3 重新開機整個叢集或master進行切換 

    原因分析: 

    當hbase的master在主從切換或者重新開機的時候,有一個步驟是切換之後的master需要對原來所有的挂掉的regionserver上的region進行processDeadRegion,即重新上線。 

    該過程在0.90.4之前存在一個bug,即會把meta表中所有處在split期間的region也進行處理,雖然region在meta表中處于split狀态并不能證明它己經split結束還是正在split(要對split狀态進行标記還是很複雜的,是以目前的代碼還沒有對split狀态進行記錄,隻能通過一些輔助手段,比如檢查子region的狀态來說明region是否處于split狀态),但是萬一它己經split結束的話是絕對不應該上線的。是以有可能一個region己經split結束,但它在這個處理過程中又被新起的master上線了,這就導緻父子region同時服務了。而父region上線後又有可能繼續split,導緻狀況更加糟糕,同一段資料被兩個region服務,等等。 

    正确的處理辦法是在重新開機時檢查這些region的子region狀态,具體檢查方案在hbase-0.90.4中己經給出,可參見HBASE-3946。注意:打上3946的patch以後,還必須要打上3995的patch,否則單元測試無法通過。 

繼續閱讀