CSDN軟體工程師能力認證是由CSDN制定并推出的一個能力認證标準，宗旨是讓一流的技術人才憑真才實學進大廠拿高薪，同時為企業節約大量招聘與培養成本，使命是提升高校大學生的技術能力，為行業提供人才儲備，為國家數字化戰略貢獻力量。我們每天将都會精選CSDN站内技術文章供大家學習，幫助大家系統化學習IT技術。

一什麼是HBASE

HBASE是一個高可靠性、高性能、面向列、可伸縮的分布式存儲系統，利用HBASE技術可在廉價PC Server上搭建起大規模結構化存儲叢集。

HBASE的目标是存儲并處理大型的資料，更具體來說是僅需使用普通的硬體配置，就能夠處理由成千上萬的行和列所組成的大型資料。

HBASE是Google Bigtable的開源實作，但是也有很多不同之處。比如：Google Bigtable利用GFS作為其檔案存儲系統，HBASE利用Hadoop HDFS作為其檔案存儲系統；Google運作MAPREDUCE來處理Bigtable中的海量資料，HBASE同樣利用Hadoop MapReduce來處理HBASE中的海量資料；Google Bigtable利用Chubby作為協同服務，HBASE利用Zookeeper作為對應。

HBASE與mysql、oralce、db2、sqlserver等關系型資料庫不同，它是一個NoSQL資料庫（非關系型資料庫）

Hbase的表模型與關系型資料庫的表模型不同：
Hbase的表沒有固定的字段定義；
Hbase的表中每行存儲的都是一些key-value對
Hbase的表中有列族的劃分，使用者可以指定将哪些kv插入哪個列族
Hbase的表在實體存儲上，是按照列族來分割的，不同列族的資料一定存儲在不同的檔案中
Hbase的表中的每一行都固定有一個行鍵，而且每一行的行鍵在表中不能重複
Hbase中的資料，包含行鍵，包含key，包含value，都是byte[ ]類型，hbase不負責為使用者維護資料類型
HBASE對事務的支援很差

HBASE相比于其他nosql資料庫(mongodb、redis、cassendra、hazelcast)的特點：

Hbase的表資料存儲在HDFS檔案系統中

進而，hbase具備如下特性：存儲容量可以線性擴充；資料存儲的安全性可靠性極高！

二安裝HBASE

HBASE是一個分布式系統

其中有一個管理角色： HMaster(一般2台，一台active，一台backup)

其他的資料節點角色： HRegionServer(很多台，看資料容量)

2.1 安裝準備

需要先有一個java環境

首先，要有一個HDFS叢集，并正常運作； regionserver應該跟hdfs中的datanode在一起

其次，還需要一個zookeeper叢集，并正常運作

然後，安裝HBASE

角色配置設定如下：

Hdp01: namenode datanode regionserver hmaster zookeeper

Hdp02: datanode regionserver zookeeper

Hdp03: datanode regionserver zookeeper

2.2 安裝步驟

解壓hbase安裝包

修改hbase-env.sh

export JAVA_HOME=/root/apps/jdk1.7.0_67

export HBASE_MANAGES_ZK=false

修改hbase-site.xml

<name>hbase.rootdir</name>

<value>hdfs://hdp01:9000/hbase</value>

</property>

<name>hbase.cluster.distributed</name>

</property>

<name>hbase.zookeeper.quorum</name>

</property>

</configuration>

修改 regionservers

hdp01

hdp02

hdp03

2.3 啟動hbase叢集

bin/start-hbase.sh

啟動完後，還可以在叢集中找任意一台機器啟動一個備用的master

bin/hbase-daemon.sh start master

新啟的這個master會處于backup狀态

三 hbase初體驗

3.1 啟動hbase指令行用戶端

bin/hbase shell

Hbase> list // 檢視表

Hbase> status // 檢視叢集狀态

Hbase> version // 檢視叢集版本

3.2 hbase表模型的特點

#CSDN軟體工程師能力認證學習精選# Hbase從入門到入坑一什麼是HBASE二安裝HBASE三 hbase初體驗四 HBASE用戶端API操作五 HBASE運作原理六.HBASE優化

一個表，有表名
一個表可以分為多個列族（不同列族的資料會存儲在不同檔案中）
表中的每一行有一個“行鍵rowkey”，而且行鍵在表中不能重複
表中的每一對kv資料稱作一個cell
hbase可以對資料存儲多個曆史版本（曆史版本數量可配置）
整張表由于資料量過大，會被橫向切分成若幹個region（用rowkey範圍辨別），不同region的資料也存儲在不同檔案中

#CSDN軟體工程師能力認證學習精選# Hbase從入門到入坑一什麼是HBASE二安裝HBASE三 hbase初體驗四 HBASE用戶端API操作五 HBASE運作原理六.HBASE優化
hbase會對插入的資料按順序存儲：

要點一：首先會按行鍵排序

要點二：同一行裡面的kv會按列族排序，再按k排序

3.3 hbase的表中能存儲什麼資料類型

hbase中隻支援byte[]

此處的byte[] 包括了： rowkey,key,value,列族名,表名

3.4 hbase指令行用戶端操作

名稱	指令表達式
建立表	create '表名', '列族名1','列族名2','列族名N'
檢視所有表	list
描述表	describe ‘表名’
判斷表存在	exists '表名'
判斷是否禁用啟用表	is_enabled '表名' is_disabled ‘表名’
添加記錄	put ‘表名’, ‘rowKey’, ‘列族 : 列‘ , '值'
檢視記錄rowkey下的所有資料	get '表名' , 'rowKey'
檢視表中的記錄總數	count '表名'
擷取某個列族	get '表名','rowkey','列族'
擷取某個列族的某個列	get '表名','rowkey','列族：列’
删除記錄	delete ‘表名’ ,‘行名’ , ‘列族：列'
删除整行	deleteall '表名','rowkey'
删除一張表	先要屏蔽該表，才能對該表進行删除第一步 disable ‘表名’ ，第二步 drop '表名'
清空表	truncate '表名'
檢視所有記錄	scan "表名"
檢視某個表某個列中所有資料	scan "表名" , {COLUMNS=>'列族名:列名'}
更新記錄	就是重寫一遍，進行覆寫，hbase沒有修改，都是追加

3.4.1 建表

create 't_user_info','base_info','extra_info'

表名列族名列族名

3.4.2 插入資料

hbase(main):011:0> put 't_user_info','001','base_info:username','zhangsan'

0 row(s) in 0.2420 seconds

hbase(main):012:0> put 't_user_info','001','base_info:age','18'

0 row(s) in 0.0140 seconds

hbase(main):013:0> put 't_user_info','001','base_info:sex','female'

0 row(s) in 0.0070 seconds

hbase(main):014:0> put 't_user_info','001','extra_info:career','it'

0 row(s) in 0.0090 seconds

hbase(main):015:0> put 't_user_info','002','extra_info:career','actoress'

0 row(s) in 0.0090 seconds

hbase(main):016:0> put 't_user_info','002','base_info:username','liuyifei'

0 row(s) in 0.0060 seconds

3.4.3 查詢方式一 scan掃描

hbase(main):017:0> scan 't_user_info'

ROW                               COLUMN+CELL

001                              column=base_info:age, timestamp=1496567924507, value=18

001                              column=base_info:sex, timestamp=1496567934669, value=female

001                              column=base_info:username, timestamp=1496567889554, value=zhangsan

001                              column=extra_info:career, timestamp=1496567963992, value=it

002                              column=base_info:username, timestamp=1496568034187, value=liuyifei

002                              column=extra_info:career, timestamp=1496568008631, value=actoress

3.4.4 查詢方式二 get單行資料

hbase(main):020:0> get 't_user_info','001'

COLUMN                            CELL

base_info:age                    timestamp=1496568160192, value=19

base_info:sex                    timestamp=1496567934669, value=female

base_info:username               timestamp=1496567889554, value=zhangsan

extra_info:career                timestamp=1496567963992, value=it

4 row(s) in 0.0770 seconds

3.4.5 删除一個kv資料

hbase(main):021:0> delete 't_user_info','001','base_info:sex'

0 row(s) in 0.0390 seconds

删除整行資料

hbase(main):024:0> deleteall 't_user_info','001'

0 row(s) in 0.0090 seconds

hbase(main):025:0> get 't_user_info','001'

COLUMN CELL

0 row(s) in 0.0110 seconds

3.4.6 删除整個表

hbase(main):028:0> disable 't_user_info'

0 row(s) in 2.3640 seconds

hbase(main):029:0> drop 't_user_info'

0 row(s) in 1.2950 seconds

hbase(main):030:0> list

TABLE

0 row(s) in 0.0130 seconds

=> []

3.5 Hbase重要特性-排序特性(行鍵)

與nosql資料庫們一樣,row key是用來檢索記錄的主鍵。通路HBASE table中的行，隻有三種方式：

1.通過單個row key通路

2.通過row key的range（正則）

3.全表掃描

Row key行鍵 (Row key)可以是任意字元串(最大長度是 64KB，實際應用中長度一般為 10-100bytes)，在HBASE内部，row key儲存為位元組數組。存儲時，資料按照Row key的字典序(byte order)排序存儲。設計key時，要充分排序存儲這個特性，将經常一起讀取的行存儲放到一起。(位置相關性)

插入到hbase中去的資料，hbase會自動排序存儲：

排序規則：首先看行鍵，然後看列族名，然後看列（key）名；按字典順序

Hbase的這個特性跟查詢效率有極大的關系

比如：一張用來存儲使用者資訊的表，有名字，戶籍，年齡，職業....等資訊

然後，在業務系統中經常需要：

查詢某個省的所有使用者

經常需要查詢某個省的指定姓的所有使用者

思路：如果能将相同省的使用者在hbase的存儲檔案中連續存儲，并且能将相同省中相同姓的使用者連續存儲，那麼，上述兩個查詢需求的效率就會提高！！！

做法：将查詢條件拼到rowkey内

四 HBASE用戶端API操作

4.1 簡潔版

HbaseClientDDL

package cn.hbase.demo;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.hbase.HBaseConfiguration;
import org.apache.hadoop.hbase.HColumnDescriptor;
import org.apache.hadoop.hbase.HTableDescriptor;
import org.apache.hadoop.hbase.TableName;
import org.apache.hadoop.hbase.client.Admin;
import org.apache.hadoop.hbase.client.Connection;
import org.apache.hadoop.hbase.client.ConnectionFactory;
import org.apache.hadoop.hbase.regionserver.BloomType;
import org.junit.Before;
import org.junit.Test;
public class HbaseClientDDL {
Connection conn = null;
@Before
public void getConn() throws Exception{
// 建構一個連接配接對象
Configuration conf = HBaseConfiguration.create(); // 會自動加載hbase-site.xml
conf.set("hbase.zookeeper.quorum", "hdp-01:2181,hdp-02:2181,hdp-03:2181");
conn = ConnectionFactory.createConnection(conf);
}
@Test
public void testCreateTable() throws Exception{
// 從連接配接中構造一個DDL操作器
Admin admin = conn.getAdmin();
// 建立一個表定義描述對象
HTableDescriptor hTableDescriptor = new HTableDescriptor(TableName.valueOf("user_info"));
// 建立列族定義描述對象
HColumnDescriptor hColumnDescriptor_1 = new HColumnDescriptor("base_info");
hColumnDescriptor_1.setMaxVersions(3); // 設定該列族中存儲資料的最大版本數,預設是1
HColumnDescriptor hColumnDescriptor_2 = new HColumnDescriptor("extra_info");
// 将列族定義資訊對象放入表定義對象中
hTableDescriptor.addFamily(hColumnDescriptor_1);
hTableDescriptor.addFamily(hColumnDescriptor_2);
// 用ddl操作器對象：admin 來建表
admin.createTable(hTableDescriptor);
// 關閉連接配接
admin.close();
conn.close();
}
@Test
public void testDropTable() throws Exception{
Admin admin = conn.getAdmin();
// 停用表
admin.disableTable(TableName.valueOf("user_info"));
// 删除表
admin.deleteTable(TableName.valueOf("user_info"));
admin.close();
conn.close();
}
// 修改表定義--添加一個列族
@Test
public void testAlterTable() throws Exception{
Admin admin = conn.getAdmin();
// 取出舊的表定義資訊
HTableDescriptor tableDescriptor = admin.getTableDescriptor(TableName.valueOf("user_info"));
// 新構造一個列族定義
HColumnDescriptor hColumnDescriptor = new HColumnDescriptor("other_info");
hColumnDescriptor.setBloomFilterType(BloomType.ROWCOL); // 設定該列族的布隆過濾器類型
// 将列族定義添加到表定義對象中
tableDescriptor.addFamily(hColumnDescriptor);
// 将修改過的表定義交給admin去送出
admin.modifyTable(TableName.valueOf("user_info"), tableDescriptor);
admin.close();
conn.close();
}
}

HbaseClientDML

package cn.hbase.demo;
import java.util.ArrayList;
import java.util.Iterator;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.hbase.Cell;
import org.apache.hadoop.hbase.CellScanner;
import org.apache.hadoop.hbase.HBaseConfiguration;
import org.apache.hadoop.hbase.TableName;
import org.apache.hadoop.hbase.client.Connection;
import org.apache.hadoop.hbase.client.ConnectionFactory;
import org.apache.hadoop.hbase.client.Delete;
import org.apache.hadoop.hbase.client.Get;
import org.apache.hadoop.hbase.client.Put;
import org.apache.hadoop.hbase.client.Result;
import org.apache.hadoop.hbase.client.ResultScanner;
import org.apache.hadoop.hbase.client.Scan;
import org.apache.hadoop.hbase.client.Table;
import org.apache.hadoop.hbase.util.Bytes;
import org.junit.Before;
import org.junit.Test;
public class HbaseClientDML {
Connection conn = null;
@Before
public void getConn() throws Exception{
// 建構一個連接配接對象
Configuration conf = HBaseConfiguration.create(); // 會自動加載hbase-site.xml
conf.set("hbase.zookeeper.quorum", "hdp-01:2181,hdp-02:2181,hdp-03:2181");
conn = ConnectionFactory.createConnection(conf);
}
@Test
public void testPut() throws Exception{
// 擷取一個操作指定表的table對象,進行DML操作
Table table = conn.getTable(TableName.valueOf("user_info"));
// 構造要插入的資料為一個Put類型(一個put對象隻能對應一個rowkey)的對象
Put put = new Put(Bytes.toBytes("001"));
put.addColumn(Bytes.toBytes("base_info"), Bytes.toBytes("username"), Bytes.toBytes("張三"));
put.addColumn(Bytes.toBytes("base_info"), Bytes.toBytes("age"), Bytes.toBytes("18"));
put.addColumn(Bytes.toBytes("extra_info"), Bytes.toBytes("addr"), Bytes.toBytes("北京"));
Put put2 = new Put(Bytes.toBytes("002"));
put2.addColumn(Bytes.toBytes("base_info"), Bytes.toBytes("username"), Bytes.toBytes("李四"));
put2.addColumn(Bytes.toBytes("base_info"), Bytes.toBytes("age"), Bytes.toBytes("28"));
put2.addColumn(Bytes.toBytes("extra_info"), Bytes.toBytes("addr"), Bytes.toBytes("上海"));
ArrayList<Put> puts = new ArrayList<>();
puts.add(put);
puts.add(put2);
// 插進去
table.put(puts);
table.close();
conn.close();
}
@Test
public void testManyPuts() throws Exception{
Table table = conn.getTable(TableName.valueOf("user_info"));
ArrayList<Put> puts = new ArrayList<>();
for(int i=0;i<100000;i++){
Put put = new Put(Bytes.toBytes(""+i));
put.addColumn(Bytes.toBytes("base_info"), Bytes.toBytes("username"), Bytes.toBytes("張三"+i));
put.addColumn(Bytes.toBytes("base_info"), Bytes.toBytes("age"), Bytes.toBytes((18+i)+""));
put.addColumn(Bytes.toBytes("extra_info"), Bytes.toBytes("addr"), Bytes.toBytes("北京"));
puts.add(put);
}
table.put(puts);
}
@Test
public void testDelete() throws Exception{
Table table = conn.getTable(TableName.valueOf("user_info"));
// 構造一個對象封裝要删除的資料資訊
Delete delete1 = new Delete(Bytes.toBytes("001"));
Delete delete2 = new Delete(Bytes.toBytes("002"));
delete2.addColumn(Bytes.toBytes("extra_info"), Bytes.toBytes("addr"));
ArrayList<Delete> dels = new ArrayList<>();
dels.add(delete1);
dels.add(delete2);
table.delete(dels);
table.close();
conn.close();
}
@Test
public void testGet() throws Exception{
Table table = conn.getTable(TableName.valueOf("user_info"));
Get get = new Get("002".getBytes());
Result result = table.get(get);
// 從結果中取使用者指定的某個key的value
byte[] value = result.getValue("base_info".getBytes(), "age".getBytes());
System.out.println(new String(value));
System.out.println("-------------------------");
// 周遊整行結果中的所有kv單元格
CellScanner cellScanner = result.cellScanner();
while(cellScanner.advance()){
Cell cell = cellScanner.current();
byte[] rowArray = cell.getRowArray(); //本kv所屬的行鍵的位元組數組
byte[] familyArray = cell.getFamilyArray(); //列族名的位元組數組
byte[] qualifierArray = cell.getQualifierArray(); //列名的位元組資料
byte[] valueArray = cell.getValueArray(); // value的位元組數組
System.out.println("行鍵: "+new String(rowArray,cell.getRowOffset(),cell.getRowLength()));
System.out.println("列族名: "+new String(familyArray,cell.getFamilyOffset(),cell.getFamilyLength()));
System.out.println("列名: "+new String(qualifierArray,cell.getQualifierOffset(),cell.getQualifierLength()));
System.out.println("value: "+new String(valueArray,cell.getValueOffset(),cell.getValueLength()));
}
table.close();
conn.close();
}
@Test
public void testScan() throws Exception{
Table table = conn.getTable(TableName.valueOf("user_info"));
// 包含起始行鍵，不包含結束行鍵,但是如果真的想查詢出末尾的那個行鍵，那麼，可以在末尾行鍵上拼接一個不可見的位元組（\000）
Scan scan = new Scan("10".getBytes(), "10000\001".getBytes());
ResultScanner scanner = table.getScanner(scan);
Iterator<Result> iterator = scanner.iterator();
while(iterator.hasNext()){
Result result = iterator.next();
// 周遊整行結果中的所有kv單元格
CellScanner cellScanner = result.cellScanner();
while(cellScanner.advance()){
Cell cell = cellScanner.current();
byte[] rowArray = cell.getRowArray(); //本kv所屬的行鍵的位元組數組
byte[] familyArray = cell.getFamilyArray(); //列族名的位元組數組
byte[] qualifierArray = cell.getQualifierArray(); //列名的位元組資料
byte[] valueArray = cell.getValueArray(); // value的位元組數組
System.out.println("行鍵: "+new String(rowArray,cell.getRowOffset(),cell.getRowLength()));
System.out.println("列族名: "+new String(familyArray,cell.getFamilyOffset(),cell.getFamilyLength()));
System.out.println("列名: "+new String(qualifierArray,cell.getQualifierOffset(),cell.getQualifierLength()));
System.out.println("value: "+new String(valueArray,cell.getValueOffset(),cell.getValueLength()));
}
System.out.println("----------------------");
}
}
@Test
public void test(){
String a = "000";
String b = "000\0";
System.out.println(a);
System.out.println(b);
byte[] bytes = a.getBytes();
byte[] bytes2 = b.getBytes();
System.out.println("");
}
}

4.2 完整版

package com.zgcbank.hbase;
import java.util.ArrayList;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.hbase.Cell;
import org.apache.hadoop.hbase.CellUtil;
import org.apache.hadoop.hbase.HBaseConfiguration;
import org.apache.hadoop.hbase.HColumnDescriptor;
import org.apache.hadoop.hbase.HTableDescriptor;
import org.apache.hadoop.hbase.MasterNotRunningException;
import org.apache.hadoop.hbase.TableName;
import org.apache.hadoop.hbase.ZooKeeperConnectionException;
import org.apache.hadoop.hbase.client.Connection;
import org.apache.hadoop.hbase.client.ConnectionFactory;
import org.apache.hadoop.hbase.client.Delete;
import org.apache.hadoop.hbase.client.Get;
import org.apache.hadoop.hbase.client.HBaseAdmin;
import org.apache.hadoop.hbase.client.HConnection;
import org.apache.hadoop.hbase.client.HConnectionManager;
import org.apache.hadoop.hbase.client.Put;
import org.apache.hadoop.hbase.client.Result;
import org.apache.hadoop.hbase.client.ResultScanner;
import org.apache.hadoop.hbase.client.Scan;
import org.apache.hadoop.hbase.client.Table;
import org.apache.hadoop.hbase.filter.ColumnPrefixFilter;
import org.apache.hadoop.hbase.filter.CompareFilter;
import org.apache.hadoop.hbase.filter.FilterList;
import org.apache.hadoop.hbase.filter.FilterList.Operator;
import org.apache.hadoop.hbase.filter.RegexStringComparator;
import org.apache.hadoop.hbase.filter.RowFilter;
import org.apache.hadoop.hbase.filter.SingleColumnValueFilter;
import org.apache.hadoop.hbase.util.Bytes;
import org.junit.After;
import org.junit.Before;
import org.junit.Test;
public class HbaseTest {
static Configuration config = null;
private Connection connection = null;
private Table table = null;
@Before
public void init() throws Exception {
config = HBaseConfiguration.create();// 配置
config.set("hbase.zookeeper.quorum", "master,work1,work2");// zookeeper位址
config.set("hbase.zookeeper.property.clientPort", "2181");// zookeeper端口
connection = ConnectionFactory.createConnection(config);
table = connection.getTable(TableName.valueOf("user"));
}
@Test
public void createTable() throws Exception {
// 建立表管理類
HBaseAdmin admin = new HBaseAdmin(config); // hbase表管理
// 建立表描述類
TableName tableName = TableName.valueOf("test3"); // 表名稱
HTableDescriptor desc = new HTableDescriptor(tableName);
// 建立列族的描述類
HColumnDescriptor family = new HColumnDescriptor("info"); // 列族
// 将列族添加到表中
desc.addFamily(family);
HColumnDescriptor family2 = new HColumnDescriptor("info2"); // 列族
// 将列族添加到表中
desc.addFamily(family2);
// 建立表
admin.createTable(desc); // 建立表
}
@Test
@SuppressWarnings("deprecation")
public void deleteTable() throws MasterNotRunningException,
ZooKeeperConnectionException, Exception {
HBaseAdmin admin = new HBaseAdmin(config);
admin.disableTable("test3");
admin.deleteTable("test3");
admin.close();
}
@SuppressWarnings({ "deprecation", "resource" })
@Test
public void insertData() throws Exception {
table.setAutoFlushTo(false);
table.setWriteBufferSize(534534534);
ArrayList<Put> arrayList = new ArrayList<Put>();
for (int i = 21; i < 50; i++) {
Put put = new Put(Bytes.toBytes("1234"+i));
put.add(Bytes.toBytes("info"), Bytes.toBytes("name"), Bytes.toBytes("wangwu"+i));
put.add(Bytes.toBytes("info"), Bytes.toBytes("password"), Bytes.toBytes(1234+i));
arrayList.add(put);
}
//插入資料
table.put(arrayList);
//送出
table.flushCommits();
}
@Test
public void uodateData() throws Exception {
Put put = new Put(Bytes.toBytes("1234"));
put.add(Bytes.toBytes("info"), Bytes.toBytes("namessss"), Bytes.toBytes("lisi1234"));
put.add(Bytes.toBytes("info"), Bytes.toBytes("password"), Bytes.toBytes(1234));
//插入資料
table.put(put);
//送出
table.flushCommits();
}
@Test
public void deleteDate() throws Exception {
Delete delete = new Delete(Bytes.toBytes("1234"));
table.delete(delete);
table.flushCommits();
}
@Test
public void queryData() throws Exception {
Get get = new Get(Bytes.toBytes("1234"));
Result result = table.get(get);
System.out.println(Bytes.toInt(result.getValue(Bytes.toBytes("info"), Bytes.toBytes("password"))));
System.out.println(Bytes.toString(result.getValue(Bytes.toBytes("info"), Bytes.toBytes("namessss"))));
System.out.println(Bytes.toString(result.getValue(Bytes.toBytes("info"), Bytes.toBytes("sex"))));
}
@Test
public void scanData() throws Exception {
Scan scan = new Scan();
//scan.addFamily(Bytes.toBytes("info"));
//scan.addColumn(Bytes.toBytes("info"), Bytes.toBytes("password"));
scan.setStartRow(Bytes.toBytes("wangsf_0"));
scan.setStopRow(Bytes.toBytes("wangwu"));
ResultScanner scanner = table.getScanner(scan);
for (Result result : scanner) {
System.out.println(Bytes.toInt(result.getValue(Bytes.toBytes("info"), Bytes.toBytes("password"))));
System.out.println(Bytes.toString(result.getValue(Bytes.toBytes("info"), Bytes.toBytes("name"))));
//System.out.println(Bytes.toInt(result.getValue(Bytes.toBytes("info2"), Bytes.toBytes("password"))));
//System.out.println(Bytes.toString(result.getValue(Bytes.toBytes("info2"), Bytes.toBytes("name"))));
}
}
@Test
public void scanDataByFilter1() throws Exception {
// 建立全表掃描的scan
Scan scan = new Scan();
//過濾器：列值過濾器
SingleColumnValueFilter filter = new SingleColumnValueFilter(Bytes.toBytes("info"),
Bytes.toBytes("name"), CompareFilter.CompareOp.EQUAL,
Bytes.toBytes("zhangsan2"));
// 設定過濾器
scan.setFilter(filter);
// 列印結果集
ResultScanner scanner = table.getScanner(scan);
for (Result result : scanner) {
System.out.println(Bytes.toInt(result.getValue(Bytes.toBytes("info"), Bytes.toBytes("password"))));
System.out.println(Bytes.toString(result.getValue(Bytes.toBytes("info"), Bytes.toBytes("name"))));
//System.out.println(Bytes.toInt(result.getValue(Bytes.toBytes("info2"), Bytes.toBytes("password"))));
//System.out.println(Bytes.toString(result.getValue(Bytes.toBytes("info2"), Bytes.toBytes("name"))));
}
}
@Test
public void scanDataByFilter2() throws Exception {
// 建立全表掃描的scan
Scan scan = new Scan();
//比對rowkey以wangsenfeng開頭的
RowFilter filter = new RowFilter(CompareFilter.CompareOp.EQUAL, new RegexStringComparator("^12341"));
// 設定過濾器
scan.setFilter(filter);
// 列印結果集
ResultScanner scanner = table.getScanner(scan);
for (Result result : scanner) {
System.out.println(Bytes.toInt(result.getValue(Bytes.toBytes("info"), Bytes.toBytes("password"))));
System.out.println(Bytes.toString(result.getValue(Bytes.toBytes("info"), Bytes.toBytes("name"))));
//System.out.println(Bytes.toInt(result.getValue(Bytes.toBytes("info2"), Bytes.toBytes("password"))));
//System.out.println(Bytes.toString(result.getValue(Bytes.toBytes("info2"), Bytes.toBytes("name"))));
}
}
@Test
public void scanDataByFilter3() throws Exception {
// 建立全表掃描的scan
Scan scan = new Scan();
//比對rowkey以wangsenfeng開頭的
ColumnPrefixFilter filter = new ColumnPrefixFilter(Bytes.toBytes("na"));
// 設定過濾器
scan.setFilter(filter);
// 列印結果集
ResultScanner scanner = table.getScanner(scan);
for (Result result : scanner) {
System.out.println("rowkey：" + Bytes.toString(result.getRow()));
System.out.println("info:name："
+ Bytes.toString(result.getValue(Bytes.toBytes("info"),
Bytes.toBytes("name"))));
// 判斷取出來的值是否為空
if (result.getValue(Bytes.toBytes("info"), Bytes.toBytes("age")) != null) {
System.out.println("info:age："
+ Bytes.toInt(result.getValue(Bytes.toBytes("info"),
Bytes.toBytes("age"))));
}
// 判斷取出來的值是否為空
if (result.getValue(Bytes.toBytes("info"), Bytes.toBytes("sex")) != null) {
System.out.println("infi:sex："
+ Bytes.toInt(result.getValue(Bytes.toBytes("info"),
Bytes.toBytes("sex"))));
}
// 判斷取出來的值是否為空
if (result.getValue(Bytes.toBytes("info2"), Bytes.toBytes("name")) != null) {
System.out
.println("info2:name："
+ Bytes.toString(result.getValue(
Bytes.toBytes("info2"),
Bytes.toBytes("name"))));
}
// 判斷取出來的值是否為空
if (result.getValue(Bytes.toBytes("info2"), Bytes.toBytes("age")) != null) {
System.out.println("info2:age："
+ Bytes.toInt(result.getValue(Bytes.toBytes("info2"),
Bytes.toBytes("age"))));
}
// 判斷取出來的值是否為空
if (result.getValue(Bytes.toBytes("info2"), Bytes.toBytes("sex")) != null) {
System.out.println("info2:sex："
+ Bytes.toInt(result.getValue(Bytes.toBytes("info2"),
Bytes.toBytes("sex"))));
}
}
}
@Test
public void scanDataByFilter4() throws Exception {
// 建立全表掃描的scan
Scan scan = new Scan();
//過濾器集合：MUST_PASS_ALL（and）,MUST_PASS_ONE(or)
FilterList filterList = new FilterList(Operator.MUST_PASS_ONE);
//比對rowkey以wangsenfeng開頭的
RowFilter filter = new RowFilter(CompareFilter.CompareOp.EQUAL, new RegexStringComparator("^wangsenfeng"));
//比對name的值等于wangsenfeng
SingleColumnValueFilter filter2 = new SingleColumnValueFilter(Bytes.toBytes("info"),
Bytes.toBytes("name"), CompareFilter.CompareOp.EQUAL,
Bytes.toBytes("zhangsan"));
filterList.addFilter(filter);
filterList.addFilter(filter2);
// 設定過濾器
scan.setFilter(filterList);
// 列印結果集
ResultScanner scanner = table.getScanner(scan);
for (Result result : scanner) {
System.out.println("rowkey：" + Bytes.toString(result.getRow()));
System.out.println("info:name："
+ Bytes.toString(result.getValue(Bytes.toBytes("info"),
Bytes.toBytes("name"))));
// 判斷取出來的值是否為空
if (result.getValue(Bytes.toBytes("info"), Bytes.toBytes("age")) != null) {
System.out.println("info:age："
+ Bytes.toInt(result.getValue(Bytes.toBytes("info"),
Bytes.toBytes("age"))));
}
// 判斷取出來的值是否為空
if (result.getValue(Bytes.toBytes("info"), Bytes.toBytes("sex")) != null) {
System.out.println("infi:sex："
+ Bytes.toInt(result.getValue(Bytes.toBytes("info"),
Bytes.toBytes("sex"))));
}
// 判斷取出來的值是否為空
if (result.getValue(Bytes.toBytes("info2"), Bytes.toBytes("name")) != null) {
System.out
.println("info2:name："
+ Bytes.toString(result.getValue(
Bytes.toBytes("info2"),
Bytes.toBytes("name"))));
}
// 判斷取出來的值是否為空
if (result.getValue(Bytes.toBytes("info2"), Bytes.toBytes("age")) != null) {
System.out.println("info2:age："
+ Bytes.toInt(result.getValue(Bytes.toBytes("info2"),
Bytes.toBytes("age"))));
}
// 判斷取出來的值是否為空
if (result.getValue(Bytes.toBytes("info2"), Bytes.toBytes("sex")) != null) {
System.out.println("info2:sex："
+ Bytes.toInt(result.getValue(Bytes.toBytes("info2"),
Bytes.toBytes("sex"))));
}
}
}
@After
public void close() throws Exception {
table.close();
connection.close();
}
}

4.3 MapReduce操作Hbase

4.3.1 實作方法

Hbase對MapReduce提供支援，它實作了TableMapper類和TableReducer類，我們隻需要繼承這兩個類即可。

1、寫個mapper繼承TableMapper<Text, IntWritable>

參數：Text：mapper的輸出key類型； IntWritable：mapper的輸出value類型。

其中的map方法如下：

map(ImmutableBytesWritable key, Result value,Context context)

參數：key：rowKey；value： Result ，一行資料； context上下文

2、寫個reduce繼承TableReducer<Text, IntWritable, ImmutableBytesWritable>

參數：Text:reducer的輸入key； IntWritable：reduce的輸入value；

ImmutableBytesWritable：reduce輸出到hbase中的rowKey類型。

其中的reduce方法如下：

reduce(Text key, Iterable<IntWritable> values,Context context)

參數： key：reduce的輸入key；values：reduce的輸入value；

4.3.2 準備表

1、建立資料來源表‘word’，包含一個列族‘content’

向表中添加資料，在列族中放入列‘info’，并将短文資料放入該列中，如此插入多行，行鍵為不同的資料即可

2、建立輸出表‘stat’，包含一個列族‘content’

3、通過Mr操作Hbase的‘word’表，對‘content：info’中的短文做詞頻統計，并将統計結果寫入‘stat’表的‘content：info中’，行鍵為單詞

4.3.3 實作

package com.zgcbank.hbase;
import java.io.IOException;
import java.util.ArrayList;
import java.util.List;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.hbase.HBaseConfiguration;
import org.apache.hadoop.hbase.HColumnDescriptor;
import org.apache.hadoop.hbase.HTableDescriptor;
import org.apache.hadoop.hbase.client.HBaseAdmin;
import org.apache.hadoop.hbase.client.HTable;
import org.apache.hadoop.hbase.client.Put;
import org.apache.hadoop.hbase.client.Result;
import org.apache.hadoop.hbase.client.Scan;
import org.apache.hadoop.hbase.io.ImmutableBytesWritable;
import org.apache.hadoop.hbase.mapreduce.TableMapReduceUtil;
import org.apache.hadoop.hbase.mapreduce.TableMapper;
import org.apache.hadoop.hbase.mapreduce.TableReducer;
import org.apache.hadoop.hbase.util.Bytes;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
public class HBaseMr {
static Configuration config = null;
static {
config = HBaseConfiguration.create();
config.set("hbase.zookeeper.quorum", "slave1,slave2,slave3");
config.set("hbase.zookeeper.property.clientPort", "2181");
}
public static final String tableName = "word";//表名1
public static final String colf = "content";//列族
public static final String col = "info";//列
public static final String tableName2 = "stat";//表名2
public static void initTB() {
HTable table=null;
HBaseAdmin admin=null;
try {
admin = new HBaseAdmin(config);//建立表管理
if (admin.tableExists(tableName)||admin.tableExists(tableName2)) {
System.out.println("table is already exists!");
admin.disableTable(tableName);
admin.deleteTable(tableName);
admin.disableTable(tableName2);
admin.deleteTable(tableName2);
}
HTableDescriptor desc = new HTableDescriptor(tableName);
HColumnDescriptor family = new HColumnDescriptor(colf);
desc.addFamily(family);
admin.createTable(desc);
HTableDescriptor desc2 = new HTableDescriptor(tableName2);
HColumnDescriptor family2 = new HColumnDescriptor(colf);
desc2.addFamily(family2);
admin.createTable(desc2);
table = new HTable(config,tableName);
table.setAutoFlush(false);
table.setWriteBufferSize(500);
List<Put> lp = new ArrayList<Put>();
Put p1 = new Put(Bytes.toBytes("1"));
p1.add(colf.getBytes(), col.getBytes(), ("The Apache Hadoop software library is a framework").getBytes());
lp.add(p1);
Put p2 = new Put(Bytes.toBytes("2"));p2.add(colf.getBytes(),col.getBytes(),("The common utilities that support the other Hadoop modules").getBytes());
lp.add(p2);
Put p3 = new Put(Bytes.toBytes("3"));
p3.add(colf.getBytes(), col.getBytes(),("Hadoop by reading the documentation").getBytes());
lp.add(p3);
Put p4 = new Put(Bytes.toBytes("4"));
p4.add(colf.getBytes(), col.getBytes(),("Hadoop from the release page").getBytes());
lp.add(p4);
Put p5 = new Put(Bytes.toBytes("5"));
p5.add(colf.getBytes(), col.getBytes(),("Hadoop on the mailing list").getBytes());
lp.add(p5);
table.put(lp);
table.flushCommits();
lp.clear();
} catch (Exception e) {
e.printStackTrace();
} finally {
try {
if(table!=null){
table.close();
}
} catch (IOException e) {
e.printStackTrace();
}
}
}
public static class MyMapper extends TableMapper<Text, IntWritable> {
private static IntWritable one = new IntWritable(1);
private static Text word = new Text();
@Override
//輸入的類型為：key：rowKey； value：一行資料的結果集Result
protected void map(ImmutableBytesWritable key, Result value,
Context context) throws IOException, InterruptedException {
//擷取一行資料中的colf：col
String words = Bytes.toString(value.getValue(Bytes.toBytes(colf), Bytes.toBytes(col)));// 表裡面隻有一個列族，是以我就直接擷取每一行的值
//按空格分割
String itr[] = words.toString().split(" ");
//循環輸出word和1
for (int i = 0; i < itr.length; i++) {
word.set(itr[i]);
context.write(word, one);
}
}
}
public static class MyReducer extends
TableReducer<Text, IntWritable, ImmutableBytesWritable> {
@Override
protected void reduce(Text key, Iterable<IntWritable> values,
Context context) throws IOException, InterruptedException {
//對mapper的資料求和
int sum = 0;
for (IntWritable val : values) {//疊加
sum += val.get();
}
// 建立put，設定rowkey為單詞
Put put = new Put(Bytes.toBytes(key.toString()));
// 封裝資料
put.add(Bytes.toBytes(colf), Bytes.toBytes(col),Bytes.toBytes(String.valueOf(sum)));
//寫到hbase,需要指定rowkey、put
context.write(new ImmutableBytesWritable(Bytes.toBytes(key.toString())),put);
}
}
public static void main(String[] args) throws IOException,
ClassNotFoundException, InterruptedException {
config.set("df.default.name", "hdfs://master:9000/");//設定hdfs的預設路徑
config.set("hadoop.job.ugi", "hadoop,hadoop");//使用者名，組
config.set("mapred.job.tracker", "master:9001");//設定jobtracker在哪
//初始化表
initTB();//初始化表
//建立job
Job job = new Job(config, "HBaseMr");//job
job.setJarByClass(HBaseMr.class);//主類
//建立scan
Scan scan = new Scan();
//可以指定查詢某一列
scan.addColumn(Bytes.toBytes(colf), Bytes.toBytes(col));
//建立查詢hbase的mapper，設定表名、scan、mapper類、mapper的輸出key、mapper的輸出value
TableMapReduceUtil.initTableMapperJob(tableName, scan, MyMapper.class,Text.class, IntWritable.class, job);
//建立寫入hbase的reducer，指定表名、reducer類、job
TableMapReduceUtil.initTableReducerJob(tableName2, MyReducer.class, job);
System.exit(job.waitForCompletion(true) ? 0 : 1);
}
}

五 HBASE運作原理

5.1 master職責

1.管理監控HRegionServer，實作其負載均衡。

2.處理region的配置設定或轉移，比如在HRegion split時配置設定新的HRegion；在HRegionServer退出時遷移其負責的HRegion到其他 HRegionServer上。

3.進行中繼資料的變更

4.管理namespace和table的中繼資料（實際存儲在HDFS上）。

5.權限控制（ACL）。

6.監控叢集中所有HRegionServer的狀态(通過Heartbeat和監聽ZooKeeper中的狀态)。

5.2 Region Server 職責

管理自己所負責的region資料的讀寫。
讀寫HDFS，管理Table中的資料。
Client直接通過HRegionServer讀寫資料（從HMaster中擷取中繼資料，找到RowKey所在的HRegion/HRegionServer後）。
重新整理緩存到HDFS
維護Hlog
執行壓縮
負責處理Region分片

5.3 zookeeper叢集所起作用

存放整個HBase叢集的中繼資料以及叢集的狀态資訊。
實作HMaster主從節點的failover。

注： HMaster通過監聽ZooKeeper中的Ephemeral節點(預設：/hbase/rs/*)來監控HRegionServer的加入和當機。

在第一個HMaster連接配接到ZooKeeper時會建立Ephemeral節點(預設：/hbasae/master)來表示Active的HMaster，其後加進來的HMaster則監聽該Ephemeral節點

如果目前Active的HMaster當機，則該節點消失，因而其他HMaster得到通知，而将自身轉換成Active的HMaster，在變為Active的HMaster之前，它會在/hbase/masters/下建立自己的Ephemeral節點。

5.4 HBASE讀寫資料流程

5.4.1 寫資料流程

用戶端現在要插入一條資料，rowkey=r000001, 這條資料應該寫入到table表中的那個region中呢？

1/ 用戶端要連接配接zookeeper, 從zk的 /hbase 節點找到 hbase:meta 表所在的regionserver（ host:port
）;

2/
regionserver 掃描 hbase:meta 中的每個region的起始行健，對比 r000001
這條資料在那個region的範圍内；

3/ 從對應的
info:server key中存儲了region是有哪個regionserver( host:port
)在負責的；

4/ 用戶端直接請求對應的regionserver；

5/ regionserver接收到用戶端發來的請求之後，就會将資料寫入到region中

#CSDN軟體工程師能力認證學習精選# Hbase從入門到入坑一什麼是HBASE二安裝HBASE三 hbase初體驗四 HBASE用戶端API操作五 HBASE運作原理六.HBASE優化

5.4.2 讀資料流程

用戶端現在要查詢rowkey=r000001這條資料，那麼這個流程是什麼樣子的呢？

#CSDN軟體工程師能力認證學習精選# Hbase從入門到入坑一什麼是HBASE二安裝HBASE三 hbase初體驗四 HBASE用戶端API操作五 HBASE運作原理六.HBASE優化

1/ 首先Client連接配接zookeeper, 找到hbase:meta表所在的regionserver;

2/ 請求對應的regionserver，掃描hbase:meta表，根據namespace、表名和rowkey在meta表中找到r00001所在的region是由那個regionserver負責的；

3/找到這個region對應的regionserver

4/ regionserver收到了請求之後，掃描對應的region傳回資料到Client

(先從MemStore找資料，如果沒有，再到BlockCache裡面讀；BlockCache還沒有，再到StoreFile上讀(為了讀取的效率)；

如果是從StoreFile裡面讀取的資料，不是直接傳回給用戶端，而是先寫入BlockCache，再傳回給用戶端。)

注：客戶會緩存這些位置資訊，然而第二步它隻是緩存目前RowKey對應的HRegion的位置，因而如果下一個要查的RowKey不在同一個HRegion中，則需要繼續查詢hbase:meta所在的HRegion，然而随着時間的推移，用戶端緩存的位置資訊越來越多，以至于不需要再次查找hbase:meta Table的資訊，除非某個HRegion因為當機或Split被移動，此時需要重新查詢并且更新緩存。

5.4.3 資料flush過程

1）當MemStore資料達到門檻值（預設是128M，老版本是64M），将資料刷到硬碟，将記憶體中的資料删除，同時删除HLog中的曆史資料；

2）并将資料存儲到HDFS中；

3）在HLog中做标記點。

5.4.4 資料合并過程

1）當資料塊達到3塊，Hmaster觸發合并操作，Region将資料塊加載到本地，進行合并；

2）當合并的資料超過256M，進行拆分，将拆分後的Region配置設定給不同的HregionServer管理；

3）當HregionServer當機後，将HregionServer上的hlog拆分，然後配置設定給不同的HregionServer加載，修改.META.；

4）注意：HLog會同步到HDFS。

5.5 hbase:meta表

hbase:meta表存儲了所有使用者HRegion的位置資訊：

Rowkey：tableName,regionStartKey,regionId,replicaId等；

info列族：這個列族包含三個列，他們分别是：

info:regioninfo列：

regionId,tableName,startKey,endKey,offline,split,replicaId；

info:server列：HRegionServer對應的server:port；

info:serverstartcode列：HRegionServer的啟動時間戳。

#CSDN軟體工程師能力認證學習精選# Hbase從入門到入坑一什麼是HBASE二安裝HBASE三 hbase初體驗四 HBASE用戶端API操作五 HBASE運作原理六.HBASE優化

5.6 Region Server内部機制

#CSDN軟體工程師能力認證學習精選# Hbase從入門到入坑一什麼是HBASE二安裝HBASE三 hbase初體驗四 HBASE用戶端API操作五 HBASE運作原理六.HBASE優化

WAL即Write Ahead Log，在早期版本中稱為HLog，它是HDFS上的一個檔案，如其名字所表示的，所有寫操作都會先保證将資料寫入這個Log檔案後，才會真正更新MemStore，最後寫入HFile中。WAL檔案存儲在/hbase/WALs/${HRegionServer_Name}的目錄中
BlockCache是一個讀緩存，即“引用局部性”原理（也應用于CPU，分空間局部性和時間局部性，空間局部性是指CPU在某一時刻需要某個資料，那麼有很大的機率在一下時刻它需要的資料在其附近；時間局部性是指某個資料在被通路過一次後，它有很大的機率在不久的将來會被再次的通路），将資料預讀取到記憶體中，以提升讀的性能。
HRegion是一個Table中的一個Region在一個HRegionServer中的表達。一個Table可以有一個或多個Region，他們可以在一個相同的HRegionServer上，也可以分布在不同的HRegionServer上，一個HRegionServer可以有多個HRegion，他們分别屬于不同的Table。HRegion由多個Store(HStore)構成，每個HStore對應了一個Table在這個HRegion中的一個Column Family，即每個Column Family就是一個集中的存儲單元，因而最好将具有相近IO特性的Column存儲在一個Column Family，以實作高效讀取(資料局部性原理，可以提高緩存的命中率)。HStore是HBase中存儲的核心，它實作了讀寫HDFS功能，一個HStore由一個MemStore 和0個或多個StoreFile組成。
MemStore是一個寫緩存(In Memory Sorted Buffer)，所有資料的寫在完成WAL日志寫後，會寫入MemStore中，由MemStore根據一定的算法将資料Flush到地層HDFS檔案中(HFile)，通常每個HRegion中的每個 Column Family有一個自己的MemStore。
HFile(StoreFile) 用于存儲HBase的資料(Cell/KeyValue)。在HFile中的資料是按RowKey、Column Family、Column排序，對相同的Cell(即這三個值都一樣)，則按timestamp倒序排列。
FLUSH詳述

每一次Put/Delete請求都是先寫入到MemStore中，當MemStore滿後會Flush成一個新的StoreFile(底層實作是HFile)，即一個HStore(Column Family)可以有0個或多個StoreFile(HFile)。
當一個HRegion中的所有MemStore的大小總和超過了hbase.hregion.memstore.flush.size的大小，預設128MB。此時目前的HRegion中所有的MemStore會Flush到HDFS中。
當全局MemStore的大小超過了hbase.regionserver.global.memstore.upperLimit的大小，預設40％的記憶體使用量。此時目前HRegionServer中所有HRegion中的MemStore都會Flush到HDFS中，Flush順序是MemStore大小的倒序（一個HRegion中所有MemStore總和作為該HRegion的MemStore的大小還是選取最大的MemStore作為參考？有待考證），直到總體的MemStore使用量低于hbase.regionserver.global.memstore.lowerLimit，預設38%的記憶體使用量。
目前HRegionServer中WAL的大小超過了hbase.regionserver.hlog.blocksize * hbase.regionserver.max.logs的數量，目前HRegionServer中所有HRegion中的MemStore都會Flush到HDFS中，Flush使用時間順序，最早的MemStore先Flush直到WAL的數量少于hbase.regionserver.hlog.blocksize * hbase.regionserver.max.logs這裡說這兩個相乘的預設大小是2GB，查代碼，hbase.regionserver.max.logs預設值是32，而hbase.regionserver.hlog.blocksize預設是32MB。但不管怎麼樣，因為這個大小超過限制引起的Flush不是一件好事，可能引起長時間的延遲

六.HBASE優化

6.1 高可用

在HBase中Hmaster負責監控RegionServer的生命周期，均衡RegionServer的負載，如果Hmaster挂掉了，那麼整個HBase叢集将陷入不健康的狀态，并且此時的工作狀态并不會維持太久。是以HBase支援對Hmaster的高可用配置。

1．關閉HBase叢集（如果沒有開啟則跳過此步）

[[email protected] hbase]$ bin/stop-hbase.sh

2．在conf目錄下建立backup-masters檔案

[[email protected] hbase]$ touch conf/backup-masters

3．在backup-masters檔案中配置高可用HMaster節點

[a[email protected] hbase]$ echo hadoop103 > conf/backup-masters

4．将整個conf目錄scp到其他節點

[atg[email protected] hbase]$ scp -r conf/ hadoop103:/opt/module/hbase/

[atg[email protected] hbase]$ scp -r conf/ hadoop104:/opt/module/hbase/

6.2 預分區

每一個region維護着startRow與endRowKey，如果加入的資料符合某個region維護的rowKey範圍，則該資料交給這個region維護。那麼依照這個原則，我們可以将資料所要投放的分區提前大緻的規劃好，以提高HBase性能。

1．手動設定預分區

hbase> create 'staff1','info','partition1',SPLITS => ['1000','2000','3000','4000']

2．生成16進制序列預分區

create 'staff2','info','partition2',{NUMREGIONS => 15, SPLITALGO => 'HexStringSplit'}

3．按照檔案中設定的規則預分區

建立splits.txt檔案内容如下：

aaaa

bbbb

cccc

dddd

然後執行：

create 'staff3','partition3',SPLITS_FILE => 'splits.txt'

4．使用JavaAPI建立預分區

//自定義算法，産生一系列Hash散列值存儲在二維數組中

byte[][] splitKeys = 某個散列值函數

//建立HBaseAdmin執行個體

HBaseAdmin hAdmin = new HBaseAdmin(HBaseConfiguration.create());

//建立HTableDescriptor執行個體

HTableDescriptor tableDesc = new HTableDescriptor(tableName);

//通過HTableDescriptor執行個體和散列值二維數組建立帶有預分區的HBase表

hAdmin.createTable(tableDesc, splitKeys);

6.3 RowKey設計

一條資料的唯一辨別就是rowkey，那麼這條資料存儲于哪個分區，取決于rowkey處于哪個一個預分區的區間内，設計rowkey的主要目的，就是讓資料均勻的分布于所有的region中，在一定程度上防止資料傾斜。接下來我們就談一談rowkey常用的設計方案。

1．生成随機數、hash、散列值

比如：

原本rowKey為1001的，SHA1後變成：dd01903921ea24941c26a48f2cec24e0bb0e8cc7

原本rowKey為3001的，SHA1後變成：49042c54de64a1e9bf0b33e00245660ef92dc7bd

原本rowKey為5001的，SHA1後變成：7b61dec07e02c188790670af43e717f0f46e8913

在做此操作之前，一般我們會選擇從資料集中抽取樣本，來決定什麼樣的rowKey來Hash後作為每個分區的臨界值。

2.字元串反轉

20170524000001轉成10000042507102

20170524000002轉成20000042507102

3.字元串拼接

20170524000001_a12e

20170524000001_93i7

6.4 記憶體優化

HBase操作過程中需要大量的記憶體開銷，畢竟Table是可以緩存在記憶體中的，一般會配置設定整個可用記憶體的70%給HBase的Java堆。但是不建議配置設定非常大的堆記憶體，因為GC過程持續太久會導緻RegionServer處于長期不可用狀态，一般16~48G記憶體就可以了，如果因為架構占用記憶體過高導緻系統記憶體不足，架構一樣會被系統服務拖死。

6.5 基礎優化

1．允許在HDFS的檔案中追加内容

hdfs-site.xml、hbase-site.xml

屬性：dfs.support.append

解釋：開啟HDFS追加同步，可以優秀的配合HBase的資料同步和持久化。預設值為true。

2．優化DataNode允許的最大檔案打開數

hdfs-site.xml

屬性：dfs.datanode.max.transfer.threads

解釋：HBase一般都會同一時間操作大量的檔案，根據叢集的數量和規模以及資料動作，設定為4096或者更高。預設值：4096

3．優化延遲高的資料操作的等待時間

hdfs-site.xml

屬性：dfs.image.transfer.timeout

解釋：如果對于某一次資料操作來講，延遲非常高，socket需要等待更長的時間，建議把該值設定為更大的值（預設60000毫秒），以確定socket不會被timeout掉。

4．優化資料的寫入效率

mapred-site.xml

屬性：

mapreduce.map.output.compress

mapreduce.map.output.compress.codec

解釋：開啟這兩個資料可以大大提高檔案的寫入效率，減少寫入時間。第一個屬性值修改為true，第二個屬性值修改為：org.apache.hadoop.io.compress.GzipCodec或者其他壓縮方式。

5．設定RPC監聽數量

hbase-site.xml

屬性：hbase.regionserver.handler.count

解釋：預設值為30，用于指定RPC監聽的數量，可以根據用戶端的請求數進行調整，讀寫請求較多時，增加此值。

6．優化HStore檔案大小

hbase-site.xml

屬性：hbase.hregion.max.filesize

解釋：預設值10737418240（10GB），如果需要運作HBase的MR任務，可以減小此值，因為一個region對應一個map任務，如果單個region過大，會導緻map任務執行時間過長。該值的意思就是，如果HFile的大小達到這個數值，則這個region會被切分為兩個Hfile。

7．優化hbase用戶端緩存

hbase-site.xml

屬性：hbase.client.write.buffer

解釋：用于指定HBase用戶端緩存，增大該值可以減少RPC調用次數，但是會消耗更多記憶體，反之則反之。一般我們需要設定一定的緩存大小，以達到減少RPC次數的目的。

8．指定scan.next掃描HBase所擷取的行數

hbase-site.xml

屬性：hbase.client.scanner.caching

解釋：用于指定scan.next方法擷取的預設行數，值越大，消耗記憶體越大。

9．flush、compact、split機制

當MemStore達到門檻值，将Memstore中的資料Flush進Storefile；compact機制則是把flush出來的小檔案合并成大的Storefile檔案。split則是當Region達到門檻值，會把過大的Region一分為二。

涉及屬性：

即：128M就是Memstore的預設門檻值

hbase.hregion.memstore.flush.size：134217728

即：這個參數的作用是當單個HRegion内所有的Memstore大小總和超過指定值時，flush該HRegion的所有memstore。RegionServer的flush是通過将請求添加一個隊列，模拟生産消費模型來異步處理的。那這裡就有一個問題，當隊列來不及消費，産生大量積壓請求時，可能會導緻記憶體陡增，最壞的情況是觸發OOM。

hbase.regionserver.global.memstore.upperLimit：0.4

hbase.regionserver.global.memstore.lowerLimit：0.38

即：當MemStore使用記憶體總量達到hbase.regionserver.global.memstore.upperLimit指定值時，将會有多個MemStores flush到檔案中，MemStore flush 順序是按照大小降序執行的，直到重新整理到MemStore使用記憶體略小于lowerLimit

關于CSDN軟體工程師能力認證

CSDN軟體工程師能力認證（以下簡稱C系列認證）是由中國軟體開發者網CSDN制定并推出的一個能力認證标準。C系列認證曆經近一年的實際線下調研、考察、疊代、測試，并梳理出軟體工程師開發過程中所需的各項技術技能，結合企業招聘需求和人才應聘痛點，基于公開、透明、公正的原則，甑别人才時確定真實業務場景、全部上機實操、所有過程留痕、存檔不可篡改。C系列認證的宗旨是讓一流的技術人才憑真才實學進大廠拿高薪，同時為企業節約大量招聘與培養成本，使命是提升高校大學生的技術能力，為行業提供人才儲備，為國家數字化戰略貢獻力量。

了解詳情可點選：CSDN軟體工程師能力認證介紹

本文出處：https://blog.csdn.net/zuochang_liu/article/details/81452124?ops_request_misc=&request_id=&biz_id=102&utm_term=Hbase&utm_medium=distribute.pc_search_result.none-task-blog-2~all~sobaiduweb~default-3-81452124.pc_search_result_before_js&spm=1018.2226.3001.4187

#CSDN軟體工程師能力認證學習精選# Hbase從入門到入坑一什麼是HBASE二安裝HBASE三 hbase初體驗四 HBASE用戶端API操作五 HBASE運作原理六.HBASE優化

一什麼是HBASE

二安裝HBASE

三 hbase初體驗

四 HBASE用戶端API操作

五 HBASE運作原理

5.1 master職責

5.2 Region Server 職責

5.3 zookeeper叢集所起作用

5.4 HBASE讀寫資料流程

5.4.1 寫資料流程

5.4.2 讀資料流程

5.4.3 資料flush過程

5.4.4 資料合并過程

5.5 hbase:meta表

5.6 Region Server内部機制

六.HBASE優化

6.1 高可用

6.2 預分區

6.3 RowKey設計

6.4 記憶體優化

6.5 基礎優化

繼續閱讀

hbase shell出現ERROR: org.apache.hadoop.hbase.ipc.ServerNotRunningYetException

HBase 列族屬性配置

史上最簡單的HBase表結構分析（有圖有真相）

Hbase-之架構設計(schema design)Hbase-之架構設計（schema design）

初識Hbase：第一個Hbase程式

HBASE預先配置設定regions的實作

HBase 實踐

HBASE通過預先建立regions，來平衡資料的負載

HBase Java API使用操作例子

impala、hive、phoenix、hbase映射測試

HBase第二天：HBase的API操作，判斷表存在、建立删除表、擷取表中一行或指定列族資料、向表中插入資料、HBase的wordcount、自定義HBaseMapReduce、Hbase內建Hive第6章 HBase API操作

hbase thrift C++ 簡單測試

Cloudera Manager HBase Thrift 接口 Go/Python用戶端

Percolator Google的海量資料增量處理系統

大資料技術原理與應用（最後三天備考了！！！）

ubuntu14.04下安裝hbse1.0.1.1

#CSDN軟體工程師能力認證學習精選# Hbase從入門到入坑一 什麼是HBASE二 安裝HBASE三 hbase初體驗四 HBASE用戶端API操作五 HBASE運作原理六.HBASE優化

一 什麼是HBASE

二 安裝HBASE

三 hbase初體驗

四 HBASE用戶端API操作

五 HBASE運作原理

5.1 master職責

5.2 Region Server 職責

5.3 zookeeper叢集所起作用

5.4 HBASE讀寫資料流程

5.4.1 寫資料流程

5.4.2 讀資料流程

5.4.3 資料flush過程

5.4.4 資料合并過程

5.5 hbase:meta表

5.6 Region Server内部機制

六.HBASE優化

6.1 高可用

6.2 預分區

6.3 RowKey設計

6.4 記憶體優化

6.5 基礎優化

繼續閱讀

#CSDN軟體工程師能力認證學習精選# Hbase從入門到入坑一什麼是HBASE二安裝HBASE三 hbase初體驗四 HBASE用戶端API操作五 HBASE運作原理六.HBASE優化

一什麼是HBASE

二安裝HBASE