HBase預分區region自定義算法

2023-06-06 15:46:17

參考網址：http://www.soso.io/article/69759.html

1 編寫filesplit.java檔案

2.編譯該Java檔案。

$ javac -Djava.ext.dirs=/home/test/Desktop/  FileSplit.java

3.将包含有分割鍵資訊的split-keys檔案複制到編譯生成FileSplit類的目錄下。

4.運作如下腳本來在建立表的時候預建立一些區域。

$ $HBASE_HOME/bin/hbase org.apache.hadoop.hbase.util.RegionSplitter test_table FileSplit -c 2 -f f1

5.通過HBase的Web使用者界面确認表和預定義區域是否已正确建立。

在指令行界面中，輸入RegionSplitter相關指令，得到相應的幫助如下：

[email protected]:~/Desktop/hbase-1.0.1.1$ ./bin/hbase org.apache.hadoop.hbase.util.RegionSplitter

usage: RegionSplitter <TABLE> <SPLITALGORITHM>

SPLITALGORITHM is a java class name of a class

implementing SplitAlgorithm, or one of the special

strings HexStringSplit or UniformSplit, which are

built-in split algorithms. HexStringSplit treats

keys as hexadecimal ASCII, and UniformSplit treats

keys as arbitrary bytes.

-c <region count> Create a new table with a pre-split number of

regions

-D <property=value> Override HBase Configuration Settings

-f <family:family:...> Column Families to create with new table.

Required with -c

--firstrow <arg> First Row in Table for Split Algorithm

-h Print this usage help

--lastrow <arg> Last Row in Table for Split Algorithm

-o <count> Max outstanding splits that have unfinished

major compactions

-r Perform a rolling split of an existing region

--risky Skip verification steps to complete

quickly.STRONGLY DISCOURAGED for production

systems.

結合RegionSplitter的源代碼，我們可以将自己定義的分割Region算法添加進去。在使用自己算法的時候可能會發生以下錯誤：

發生錯誤的情況如下：

Exception in thread "main" java.io.IOException: Problem loading split algorithm:

at org.apache.hadoop.hbase.util.RegionSplitter.newSplitAlgoInstance(RegionSplitter.java:673)

at org.apache.hadoop.hbase.util.RegionSplitter.main(RegionSplitter.java:349)

Caused by: java.lang.InstantiationException: Test.splitTest

at java.lang.Class.newInstance(Class.java:359)

at org.apache.hadoop.hbase.util.RegionSplitter.newSplitAlgoInstance(RegionSplitter.java:671)

... 1 more

産生這個的原因的是自己定義的分割算法引入了一個字元串參數進行函數構造。但是在初始化該類的時候，沒有給予參數進行初始化，導緻錯誤。

首先找到RegionSplitter源代碼中的static main函數。

裡面有相應的cmd參數設定。我們可以在此處添加自己需要的cmd參數；

然後發現分割算法的執行個體化語句為：

SplitAlgorithm splitAlgo = newSplitAlgoInstance(conf, splitClass);//執行個體化自定義splitAlgorithm對象。

進入該函數，将執行個體化需要的參數賦給自定義算法的類即可。

以上錯誤得到解決。

HBase預分區region自定義算法

繼續閱讀

HBase第二天：HBase的API操作，判斷表存在、建立删除表、擷取表中一行或指定列族資料、向表中插入資料、HBase的wordcount、自定義HBaseMapReduce、Hbase內建Hive第6章 HBase API操作

Portainer 容器IP 固定

【Unity開發小技巧】Unity日志輸出存儲

羊了個羊的Ignite大會又來啦

【微信小程式】選擇寶——選擇困難症的拯救者

Windows 11 4K 高清桌面

Android--Selector、shape詳解（整理）

listview實作圓角

安卓自定義View----實作TextView可設定drawable寬高度前言drawable大小的實作原理自定義TextView----XXDrawableTextView 總結：

hbase thrift C++ 簡單測試

Cloudera Manager HBase Thrift 接口 Go/Python用戶端

Percolator Google的海量資料增量處理系統

RabbitMQ：消費端自定義監聽

@linux檢視使用者操作的記錄

大資料技術原理與應用（最後三天備考了！！！）

ubuntu14.04下安裝hbse1.0.1.1