天天看点

Hypertable工具之csdump

命令:csdump [options] <filename>

该命令导出位于DFS中指定名称的CellStore文件的内容。

Options:

  -a [ --all ]   Dump everything, including key/value pairs

  -c [ --compact ] Only prints the cellstore name and a status ('ok' or 'corrupt')

  -c [ --count ]   Count the number of key/value pairs

  --column-id-map arg Column family id to name map, format = <id>=<name>[,<id>=<name>...]

  --end-key arg   Ignore keys that are greater than <arg>

  --start-key arg Ignore keys that are less than or equal to <arg>

  --tsv-format Output data in TSV format

  --dfs arg DFS client endpoint in <host:port> format

  --dfs-timeout arg  Timeout in milliseconds for DFS client connections

  -h [ --help ] Show this help message and exit

  --help-config Show help message for config properties

  --version Show version information and exit

  -v [ --verbose ] Show more verbose output

  --debug Show debug output (shortcut of --logging-level debug)

  --quiet Negate verbose

  --silent Show as little output as possible

  -l [ --logging-level ] arg (=info)  Logging level: debug, info, notice, warn, error, crit, alert, fatal

  --config arg (=/home/cloudil/work/hypertable/0.9.7.3/conf/hypertable.cfg) Configuration file.                                                                            

  --induce-failure arg Arguments for inducing failure

  --workers arg Number of worker threads

  --reactors arg Number of reactor threads

  -t [ --timeout ] arg System wide timeout in milliseconds

1 All参数

导出CellStore文件中的所有内容,包括:Cell(K/V对)、BLOCK INDEX、TRAILER、BLOOMFILTER等信息。

示例:csdump --all /hypertable/tables/2/33/f/qyoNKN5rd__dbHKv/cs1;结果显示为

……

control=(REV|TS) row='key2451' family=1 qualifier='key2451' ts=1845830851000002451 rev=1367140356873908151 INSERT

control=(REV|TS) row='key2451' family=1 qualifier='key2451' ts=1845830851000002451 rev=1367140356873908149 INSERT

control=(REV|TS) row='key2451' family=1 qualifier='key2451' ts=1845830851000002451 rev=1367140356873908147 INSERT

control=(REV|TS) row='key2451' family=1 qualifier='key2451' ts=1845830851000002451 rev=1367140356873908145 INSER

……

BLOCK INDEX:

0: offset=0 size=5632 row=key2450

1: offset=5632 size=65024 row=key2456

2: offset=70656 size=66560 row=key246

……

582: offset=38261248 size=65536 row=key996

583: offset=38326784 size=65536 row=key997

584: offset=38392320 size=65536 row=key999

585: offset=38457856 size=6144 row=key999

sizeof(OffsetT) = 4

BLOOM FILTER SIZE: 0

REPLACED FILES: 

TRAILER:

[CellStoreTrailerV6]

  trailer_checksum: fee18b28

  fix_index_offset: 38464000

  var_index_offset: 38466560

  filter_offset: 38477312

  replaced_files_offset: 38482944

  index_entries: 586

  total_entries: 4609454

  filter_length: 44177

  filter_items_estimate: 4609

  filter_items_actual: 4608455

  replaced_files_length: 0

  replaced_files_entries: 0

  blocksize: 65536

  revision: 1367142964771348076

  timestamp_min: 1845830851000000003

  timestamp_max: 1845830851000003083

  expiration_time: -9223372036854775807

  create_time: 1367142969901644000

  expirable_data: 0

  delete_count: 0

  key_bytes: 156446558

  value_bytes: 308833418

  table_id: 33

  table_generation: 1

  flags=6

  alignment=512

  compression_ratio: 0.0816307

  compression_type: 5

  key_compression_scheme: 1

  bloom_filter_mode=ROWS

  bloom_filter_hash_count=6

  version: 6

实验表明:

通过该参数可以清晰的看到CellStore文件的内部结构。显示的K/V对即按照rowkey字符排序的Cell,其数目与该命令count参数的结果一致。

BLOCK INDEX部分表示CellStore文件中的块索引信息。Offset表示每块的文件内偏移量;row表示每块中最后的rowkey,即最大的 rowkey。

每个CellStore文件可存储相邻的多个range的数据,但是只能存储Range中的一个Access Group数据,即CellStore文件可以跨Range,但不能跨Access Group。METADATA表中的BlockCount字段表示Range在CellStore文件中的块数量。

2 tsv-format参数

与all参数相似,但是只导出Cell(K/V对),而不导出BLOCK INDEX、TRAILER、BLOOMFILTER等信息。该参数将会导出

示例:csdump –tsv-format /hypertable/tables/2/33/f/qyoNKN5rd__dbHKv/cs1;结果显示为

#timestamp      row     column  value

1845830851000002450     key2450 1:key2450       abc

1845830851000002450     key2450 1:key2450       abc

1845830851000002450     key2450 1:key2450       abc

……

3 start-key和end-key参数

start-key:导出大于该参数的Cell(K/V对);end-key:导出小于等于该参数的Cell(K/V对)。这两个参数必须与all或者tsv-format参数合用。

示例:csdump --tsv-format --start-key key2452 --end-key key2454 /hypertable/tables/2/33/f/qyoNKN5rd__dbHKv/cs1;将输出key2452<row<=key2454的cell。

4 Compact参数

输出该cellstore文件的名称和状态(ok或者corrupt),即该参数可以检验一个cellstore文件是否损坏。

示例:csdump –compact /hypertable/tables/2/33/f/qyoNKN5rd__dbHKv/cs1;结果显示为

/hypertable/tables/2/33/f/qyoNKN5rd__dbHKv/cs1: ok

5 其它参数

类似于dumplog命令的Dfs相关参数和其它参数用法。