HBase Meta Meta Meta Info Table Repair Practices

HBase is an open-source, highly reliable, highly scalable, and high-performance distributed non-relational database, which is widely used in big data processing, real-time computing, data storage, and retrieval. In distributed clusters, hardware failures are a normal occurrence, which may lead to node or cluster-level service interruptions, meta table corruption, RIT, region voiding, overlapping and other problems.

1. Background

As a leader in distributed non-relational databases, HBase is not only stable, but also very simple to install and expand, but the lack of a mature monitoring system for HBase is extremely unfriendly to troubleshooting. If you lack a comprehensive understanding of HBase, you are often helpless in dealing with daily failures, as the operation and maintenance of 20+ HBase clusters of all sizes involve 1.x~2.x and other versions, and have experienced online problems such as meta table damage and failure to go online normally, region overlap, region holes, and permission loss, and also seek correct answers from the HBase source code with a variety of problems.

2. HBase meta meta information table

The data accuracy of the meta table is very important for the normal operation of the HBase cluster. If the data of the meta table is inconsistent, it will cause the RIT (Region In Transition) or even the cluster to fail to start normally due to the failure of HMaster to initialize, which shows the importance of the meta table in the HBase cluster.

2.1 Meta table structure

The meta table consists of three column families: info, table, and rep_barrier, which record region information and table status, respectively.

HBase Meta Meta Meta Info Table Repair Practices

2.2 Meta table loading process

Through the above meta table structure, we have an overall understanding of the table, and friends who have done HBase O&M believe that they all have this experience, some clusters start faster, some clusters start slowly, and even sometimes due to improper operation, the meta table has been stuck in the meta table loading when the cluster is restarted, and the follow-up process cannot be continued. If we have an overall understanding of the meta table loading process, we will have a psychological expectation of the startup time of each cluster, and the following is the meta table loading process:

Through the above meta table loading flow chart, we can find out why some clusters are slow to start and some clusters fail to start.

Slow cluster startup:

Usually new clusters or clusters with a small number of tables tend to start faster, clusters with a large number of tables tend to start relatively slowly, and even some clusters take 15~30 minutes to start HMaster, sometimes the cluster starts up for a long time, people can't help but wonder if there is a problem with the cluster, why can't it enter the normal state for so long? There are two areas that take a long time during the whole loading process.

Preload all table descriptors: you need to scan all the HBase data directories and parse out the data files under the .tabledesc directory and store them in HMaster memory, if the number of tables is relatively large (more than 10000 tables), this process often takes about ten minutes, if we look at the HMaster page when the word "Pre-loading table descriptors" appears, it means that the cluster is in the preloading stage, we just need to wait patiently, Because it has not yet reached the meta table loading stage.

If you want to speed up the opening speed, you can adjust the value of hbase.master.executor.openregion.threads (the default value is 5).

Cluster startup failure:

Meta table fails to go online: After the HRegionServer of the default resource group is hung up, the startcode of the meta data shard cannot be found after the startcode of the machine is changed, causing the cluster to fail to start.

Part 3: How to fix a meta table

Since the status of an HBase cluster is maintained mainly through meta tables, if the meta table is corrupted or incorrect, the HBase cluster will become unavailable and face the risk of data loss. We know that data consistency in meta tables is very important, so what are the data inconsistencies? (For details about the HBase 2.4.8 repair command, see hbase-operator-tools.)

RegionServer宕机或异常:当RegionServer宕机或异常时,meta表中存储的Region和RegionServer信息可能会出现错误或丢失。
Data corruption or errors: If the data in the meta table is corrupted or incorrect, the HBase cluster may become unavailable and data is lost.
Illegal operations: If you perform illegal operations on a meta table, such as deleting or modifying data in a meta table, errors or data in the meta table may occur.

Meta table failures have always been a general statement, and we can roughly divide them into long-term RIT, region holes, region overlaps, table description file loss, meta table hdfs path empty, meta table data loss, etc., according to the type, I will analyze and fix these types of faults respectively:

3.1 RIDE

Region In Transition (RIT) refers to the state transition of a region in an HBase cluster, which will cause the status of a region in the HBase cluster to change, such as the region server is down, the region is being split, and the region is being merged.

In order to make the Region status change clearer, we can be divided into assign, unassign, split, merge according to the operation type, if the RegionServer is down or abnormal, data corruption or error will occur during the operation, RIT will appear, although RIT is a common problem encountered in HBase operation and maintenance, but if the underlying logic is clear, it will be easier to deal with RIT problems. HBase clusters have RIT repair capabilities, and in most cases, they can recover normally without manual intervention, and manual intervention is required only when RIT occurs for a long time. Why does a long RIT occur?

If you have used HBase 1.x and HBase 2.x versions, it is obvious that HBase 2.x has less RIT, in fact, the operation of Region is mainly through the AssignmentManager class to transfer Region, and we find that the default value of the hbase.assignment.maximum.attempts parameter (the number of retries assign) is different in the two versions, HBase 2.4.8 The number of retries is the maximum integer Integer.MAX_VALUE (which defaults to 10 in HBase 1.x), which is why long RITs are less common in HBase 2.x.

RIT processing method:

RIT occurs when large tables are created or deleted, mainly due to the large number of regions and the high pressure on clusters, resulting in long response times for assign and unassign.
If the cluster version is 1.x, you can adjust the hbase.assignment.maximum.attempts value to increase the number of retries, for example, FAILED_OPEN and FAILED_CLOSE can usually heal themselves, or manually run the assign command to assign regions one by one to go online (if there are more regions, switch HMaster to fix).
If a Region fails bogus.example.com to be allocated and a RegionServer does not exist, the 1 and 1 nodes can only be restored by switching HMaster.

Question Thinking:

Why can't the manual intervention of Region be able to go online and switch HMaster to recover? (Refer to HMaster Startup Process TransitRegionStateProcedure and HMaster Class Source Code)

3.2 Region 空洞

When we create an HBase table, if we carefully analyze the region pattern, we will find that the region startkey and endkey belong to the continuous interval of left closed and right open.

If you check with HBase hbck, you will see the error message ERROR: There is a hole in the region chain between 01 and 02. You need to create a new .regioninfo and region dir in hdfs to plug the hole, the HBase cluster is often unable to heal itself, and manual intervention is needed to restore the hole. The normal practice is to add the blank Region back first, and check whether the meta table information is correct, and then put the Region online, if this series of operations are implemented manually, it is not only easy to make mistakes, but the operation time is also long, the following is a description of different versions of HBase repair methods, in fact, although the processing methods of different versions are a little different, but the processing process is the same.

How to handle region voids:

（1）HBase 1.x修复方法

HBase hbck –fixHdfsHoles: creates an empty region file path on hdfs
HBase hbck -fixMeta:修复该Region所在meta表数据
HBase hbck –fixAssignments：上线修复之后Region
或者HBase hbck –repairHoles相当于（fixHdfsHoles、fixMeta、fixAssignments）几个组合起来

（2）HBase 2.4.8修复方法(参考后面hbase-operator-tools工具)

Since HBase 2.4.8 does not provide relevant commands to add the Region directory operation, it is relatively troublesome, in fact, many utility classes in HBase 2.4.8 provide the method of creating a Region, and the HBaseTestingUtility class in the hbase-server-2.4.8-test package provides an entry related to the operation of the Region.

extraRegionsInMeta -fix:首先把meta表中hdfs目录不存在记录先删除
HBaseTestingUtility.createLocalHRegion:创建hdfs文件路径保证Region连续性
addFsRegionsMissingInMeta: adds the new region information to the meta table (the region id is returned after the addition is successful.)
assigns: Finally, the new region is added to the online line

3.3 Region 重叠

Since there will be a hole in the region, will there be such a situation, and there will be multiple startkeys and endkeys of the same type? The answer is yes, if multiple regions have the same startkey and endkey, then we call this situation a region overlap. Region overlap is difficult to simulate in HBase, and it is also a difficult problem to deal with. If we do the hbck check we get this kind of log ERROR: Multiple regions have the same startkey: 02

Another kind of overlapping region intersects with one or two rowkey ranges of adjacent shards, which is collectively referred to as overlap problem, for this difficult scenario, we use self-developed tools to simulate the recurrence of overlap problems and fix overlap (folding) and hole (hole) problems with one click.

Overlap problem simulation feature

For example, the startkey and endkey of Region01 are (01,03), and the scope of another Region02 is (01,02), so that the two regions intersect (01,02), and the hbck test will report the overlap problem.

Overlap problem will only occur in the production environment when the region is split and the machine hangs up at the same time, the overlap problem will occur, the occurrence conditions are more harsh, the recurrence of the problem is more difficult, and the ability to reproduce the problem is very important for subsequent repair and fault drills.

overlap issue reproduced

1) Generate a Region shard with overlapping rowkey ranges:

java -jar -Drepair.tableName=migrate:test_hole2 -Dfix.operator=createRegion -DRegion.startkey=06 -DRegion.endkey=07   hbase-meta-tool-0.0.1.jar

2) Move the overlap region to the table directory:

sudo -uhdfs hdfs dfs -mv /tmp/.tmp/data/migrate/test_hole2/c8662e08f6ae705237e390029161f58f /hbase/data/migrate/test_hole2

3) Delete the normal migrate:test_hole2 meta table information:

java -jar -Drepair.tableName=migrate:test_hole2 -Dfix.operator=delete  hbase-meta-tool-0.0.1.jar

4) Reconstruct the metadata information of the overlap problem table:

java -jar -Drepair.tableName=migrate:test_hole2 -Dfix.operator=fixFromHdfs  hbase-meta-tool-0.0.1.jar

5) 重启集群后hbck报告Region重叠c8662e08f6ae705237e390029161f58f,成功复现重叠问题

Method 1: Fix overlap and hole with one click

When the number of folds does not exceed 64, the self-developed tool hbase-meta-tool can merge the ranges with rowkey intersection in adjacent regions, and generate new regions with voids and missing ranges, so that the problem can be fixed.

1) Fixed the overlap and voiding of clusters:

java -jar  -Dfix.operator= fixOverlapAndHole hbase-meta-tool-0.0.1.jar

Method 2: Large-scale folding repair

It is suitable for large-scale folding of more than thousands or tens of thousands of cases to fix the abnormality reported by the server, and take the following repair measures

1) One-click to clear the metadata of the table with folded questions:

java -jar -Drepair.tableName=migrate:test1 -Dzookeeper.address=zkAddress -Dfix.operator=delete     hbase-meta-tool-0.0.1.jar

2) Back up the original table data:

hdfs dfs -mv /hbase/data/migrate/test/ /back

3) Delete the original table and import the backup data to each region shard:

hbase org.apache.hadoop.hbase.mapreduce.LoadIncrementalHFiles /back/test/region01-regionN  migrate:test1

3.4 Meta table data repair

We may encounter the following troublesome problems in HBase online clusters:

The coprocessor is misconfigured in the table, the coprocessor path cannot be found, and the jar cannot be found during the region loading, resulting in the cluster repeatedly hanging down and the drop command not being deleted.
The number of elements in the HBase meta table is incorrect, the startcode is incorrect, and the server table cannot be found during the online process.

We need to fix the problematic tables without stopping the service without affecting other tables in the cluster.

The meta data of the problematic table is fixed

1) If there is a problem with the migrate:test1 table, you can delete the metadata of the table with one click:

java -jar -Drepair.tableName=migrate:test1 -Dfix.operator=delete  hbase-meta-tool-0.0.1.jar

2) Read the contents of the .regioninfo folder of the HDFS table, and reconstruct the correct metadata with one click:

java -jar -Drepair.tableName=migrate:test1 -Dfix.operator=fixFromHdfs  hbase-meta-tool-0.0.1.jar

3.5 meta 损坏

The above five situations are repaired under the premise that the meta table is normally online, if the meta table data is damaged and cannot be online, how should we fix it? If the cluster is offline, the HBase shell or HBase API may not be able to create the table.

We analyze the meta table initialization class InitMetaProcedure and find that the meta table creation process is roughly divided into two steps:

1) Create a .tabledesc file in the Region directory

2) Assign a Region and go online.

InitMetaProcedure 心源 :

InitMetaProcedure

protected Flow executeFromState(MasterProcedureEnv env, InitMetaState state) throws ProcedureSuspendedException, ProcedureYieldException, InterruptedException {
    try {
      switch (state) {
        case INIT_META_WRITE_FS_LAYOUT:
          Configuration conf = env.getMasterConfiguration();
          Path rootDir = CommonFSUtils.getRootDir(conf);
          TableDescriptor td = writeFsLayout(rootDir, conf);
          env.getMasterServices().getTableDescriptors().update(td, true);
          setNextState(InitMetaState.INIT_META_ASSIGN_META);
          return Flow.HAS_MORE_STATE;
        case INIT_META_ASSIGN_META:
          addChildProcedure(env.getAssignmentManager().createAssignProcedures(Arrays.asList(RegionInfoBuilder.FIRST_META_RegionINFO)));
          return Flow.NO_MORE_STATE;
        default:
          throw new UnsupportedOperationException("unhandled state=" + state);
      }
    } catch (IOException e) {
}
private static TableDescriptor writeFsLayout(Path rootDir, Configuration conf) throws IOException {
    LOG.info("BOOTSTRAP: creating hbase:meta region");
    FileSystem fs = rootDir.getFileSystem(conf);
    Path tableDir = CommonFSUtils.getTableDir(rootDir, TableName.META_TABLE_NAME);
    if (fs.exists(tableDir) && !fs.delete(tableDir, true)) {
      LOG.warn("Can not delete partial created meta table, continue...");
    }
    TableDescriptor metaDescriptor = FSTableDescriptors.tryUpdateAndGetMetaTableDescriptor(conf, fs, rootDir);
    HRegion.createHRegion(RegionInfoBuilder.FIRST_META_RegionINFO, rootDir, conf, metaDescriptor, null).close();
    return metaDescriptor;
}

After the meta table goes online, we only need to write the region information of each table to meta and assign all regions to go online to restore the normal state of the cluster. Through the above process, we find that the meta table repair process is not so complicated, but if the number of tables in the production environment is relatively large or the number of large table regions is tens of thousands, it becomes very time-consuming to manually add it.

HBase 1.x修复方法

Stop the HBase cluster
sudo -u hbase hbase org.apache.hadoop.hbase.
util.hbck.OfflineMetaRepair -fix
Restart the cluster to complete the repair.

HBase 2.4.8修复方法(hbase-operator-tools工具)

1) Automatically generate meta tables based on HDFS paths

Stop the HBase cluster
sudo -u hbase hbase org.apache.hbase.
hbck1. OfflineMetaRepair -fix
Restart the cluster to complete the repair.

2) Single-table repair method

删除zookeeper中HBase根目录
删除HMaster 、RegionServer所在hdfs WALs目录
When the cluster is restarted, the meta has no data and the cluster cannot enter the normal state
执行添加Region命令把hbase:namespace、hbase:quota、hbase:rsgroup、hbase:acl四字表添加到集群,添加完成之后日志将会打印assigns后面跟随这几张表的Region,需要记录下这些Region以便下一步assign操作。

sudo -u hbase hbase --config /etc/hbase/conf hbck -j hbase-tools.jar addFsRegionsMissingInMeta hbase:namespace hbase:quota hbase:rsgroup hbase:acl

Add the print region in the previous step to the list

sudo -u hbase hbase --config /etc/hbase/conf hbck -j  hbase-hbck2.jar assigns regionid

Online business table (only need to repeat 4-5 steps to gradually go online to the business table)

Precautions

(If there are many regions in the service table, if the 5th region cannot be successfully launched without assign, you need to disable and then enable the performance to go online normally.)

备注：hbase-operator-tools OfflineMetaRepair工具存在以下几个bug需要修复。

1、HBaseFsck createNewMeta方法创建meta表缺少.tabledesc文件

Before:

TableDescriptor td = new FSTableDescriptors(getConf()).get(TableName.META_TABLE_NAME);

Modified:

FileSystem fs = rootdir.getFileSystem(conf);
TableDescriptor metaDescriptor = FSTableDescriptors.tryUpdateAndGetMetaTableDescriptor(getConf(), fs, rootdir);

2. HBaseFsck generatePuts: The default Region status is CLOSED, because only the OFFLINE status Region is launched when HMaster is restarted (if it is CLOSED, it is necessary to manually go online one Region by one, and the workload is very large)

Before:

addRegionStateToPut(p, org.apache.hadoop.hbase.master.RegionState.State.CLOSED);

Modified:

addRegionStateToPut(p, org.apache.hadoop.hbase.master.RegionState.State.OFFLINE);

shortcoming

1) Offline repair requires stopping the cluster service, and the stopping time depends on the repair time (about 10-15 minutes).

2) If there are problems such as overlapping regions and holes, you need to manually complete the process and then run the OfflineMetaRepair offline repair command.

四、hbase-operator-tools工具

HBase-operator-tools is a set of tools in HBase that help HBase administrators manage and maintain HBase clusters. hbase-operator-tools provides a series of tools, including backup and restoration tools, region management tools, and data compression and migration tools, to help administrators better manage HBase clusters and improve the stability and reliability of clusters. You need to compile the source code before you can use it, and the source code git address. Common commands are as follows:

5. Summary

The data accuracy of the HBase meta table is very important for the normal operation of the HBase cluster, how to ensure the data of the meta table is correct and how to quickly repair the data after the data is corrupted. This article mainly focuses on the meta table structure loading process, common problems, and related fixes, which can be roughly divided into the following two categories:

Online repair: Meta tables can be repaired by HBCK and self-developed tools to ensure data integrity.
Offline fix: The meta table could not be brought online, and the meta table was reconstructed based on the region information in HDFS to restore the HBase service.

If the scale of the cluster is relatively large, the offline repair time is relatively long, and the cluster needs to stop the service for a long time, in most cases, the business cannot tolerate it, and you can repair the table level according to the actual situation (unless the meta table file is damaged and cannot be brought online normally), it is recommended to do hbck check on the cluster regularly, and fix it as soon as possible to avoid the spread of the problem once the meta information is inconsistent.

If the meta information of a service table is found to be out of order, you can directly duplicate the meta table, delete the table information, and add the Region back to the meta table based on the HDFS path information (the addFsRegions-MissingInMeta command can correctly add the Region to the meta table based on the HDFS path).

References:

Apache HBase ™ Reference Guide
Apache HBase HBCK2 Tool
Appendix C: hbck In Depth

作者:vivo 互联网大数据团队 - Huang Guihu、Chen Shengzun

Source-WeChat public account: vivo Internet Technology

Source: https://mp.weixin.qq.com/s/KBp5FFI5ylDRiKsHYJTN7w