天天看點

Hypertable工具之csdump

指令:csdump [options] <filename>

該指令導出位于DFS中指定名稱的CellStore檔案的内容。

Options:

  -a [ --all ]   Dump everything, including key/value pairs

  -c [ --compact ] Only prints the cellstore name and a status ('ok' or 'corrupt')

  -c [ --count ]   Count the number of key/value pairs

  --column-id-map arg Column family id to name map, format = <id>=<name>[,<id>=<name>...]

  --end-key arg   Ignore keys that are greater than <arg>

  --start-key arg Ignore keys that are less than or equal to <arg>

  --tsv-format Output data in TSV format

  --dfs arg DFS client endpoint in <host:port> format

  --dfs-timeout arg  Timeout in milliseconds for DFS client connections

  -h [ --help ] Show this help message and exit

  --help-config Show help message for config properties

  --version Show version information and exit

  -v [ --verbose ] Show more verbose output

  --debug Show debug output (shortcut of --logging-level debug)

  --quiet Negate verbose

  --silent Show as little output as possible

  -l [ --logging-level ] arg (=info)  Logging level: debug, info, notice, warn, error, crit, alert, fatal

  --config arg (=/home/cloudil/work/hypertable/0.9.7.3/conf/hypertable.cfg) Configuration file.                                                                            

  --induce-failure arg Arguments for inducing failure

  --workers arg Number of worker threads

  --reactors arg Number of reactor threads

  -t [ --timeout ] arg System wide timeout in milliseconds

1 All參數

導出CellStore檔案中的所有内容,包括:Cell(K/V對)、BLOCK INDEX、TRAILER、BLOOMFILTER等資訊。

示例:csdump --all /hypertable/tables/2/33/f/qyoNKN5rd__dbHKv/cs1;結果顯示為

……

control=(REV|TS) row='key2451' family=1 qualifier='key2451' ts=1845830851000002451 rev=1367140356873908151 INSERT

control=(REV|TS) row='key2451' family=1 qualifier='key2451' ts=1845830851000002451 rev=1367140356873908149 INSERT

control=(REV|TS) row='key2451' family=1 qualifier='key2451' ts=1845830851000002451 rev=1367140356873908147 INSERT

control=(REV|TS) row='key2451' family=1 qualifier='key2451' ts=1845830851000002451 rev=1367140356873908145 INSER

……

BLOCK INDEX:

0: offset=0 size=5632 row=key2450

1: offset=5632 size=65024 row=key2456

2: offset=70656 size=66560 row=key246

……

582: offset=38261248 size=65536 row=key996

583: offset=38326784 size=65536 row=key997

584: offset=38392320 size=65536 row=key999

585: offset=38457856 size=6144 row=key999

sizeof(OffsetT) = 4

BLOOM FILTER SIZE: 0

REPLACED FILES: 

TRAILER:

[CellStoreTrailerV6]

  trailer_checksum: fee18b28

  fix_index_offset: 38464000

  var_index_offset: 38466560

  filter_offset: 38477312

  replaced_files_offset: 38482944

  index_entries: 586

  total_entries: 4609454

  filter_length: 44177

  filter_items_estimate: 4609

  filter_items_actual: 4608455

  replaced_files_length: 0

  replaced_files_entries: 0

  blocksize: 65536

  revision: 1367142964771348076

  timestamp_min: 1845830851000000003

  timestamp_max: 1845830851000003083

  expiration_time: -9223372036854775807

  create_time: 1367142969901644000

  expirable_data: 0

  delete_count: 0

  key_bytes: 156446558

  value_bytes: 308833418

  table_id: 33

  table_generation: 1

  flags=6

  alignment=512

  compression_ratio: 0.0816307

  compression_type: 5

  key_compression_scheme: 1

  bloom_filter_mode=ROWS

  bloom_filter_hash_count=6

  version: 6

實驗表明:

通過該參數可以清晰的看到CellStore檔案的内部結構。顯示的K/V對即按照rowkey字元排序的Cell,其數目與該指令count參數的結果一緻。

BLOCK INDEX部分表示CellStore檔案中的塊索引資訊。Offset表示每塊的檔案内偏移量;row表示每塊中最後的rowkey,即最大的 rowkey。

每個CellStore檔案可存儲相鄰的多個range的資料,但是隻能存儲Range中的一個Access Group資料,即CellStore檔案可以跨Range,但不能跨Access Group。METADATA表中的BlockCount字段表示Range在CellStore檔案中的塊數量。

2 tsv-format參數

與all參數相似,但是隻導出Cell(K/V對),而不導出BLOCK INDEX、TRAILER、BLOOMFILTER等資訊。該參數将會導出

示例:csdump –tsv-format /hypertable/tables/2/33/f/qyoNKN5rd__dbHKv/cs1;結果顯示為

#timestamp      row     column  value

1845830851000002450     key2450 1:key2450       abc

1845830851000002450     key2450 1:key2450       abc

1845830851000002450     key2450 1:key2450       abc

……

3 start-key和end-key參數

start-key:導出大于該參數的Cell(K/V對);end-key:導出小于等于該參數的Cell(K/V對)。這兩個參數必須與all或者tsv-format參數合用。

示例:csdump --tsv-format --start-key key2452 --end-key key2454 /hypertable/tables/2/33/f/qyoNKN5rd__dbHKv/cs1;将輸出key2452<row<=key2454的cell。

4 Compact參數

輸出該cellstore檔案的名稱和狀态(ok或者corrupt),即該參數可以檢驗一個cellstore檔案是否損壞。

示例:csdump –compact /hypertable/tables/2/33/f/qyoNKN5rd__dbHKv/cs1;結果顯示為

/hypertable/tables/2/33/f/qyoNKN5rd__dbHKv/cs1: ok

5 其它參數

類似于dumplog指令的Dfs相關參數和其它參數用法。