天天看點

MySQL · myrocks · collation 限制

背景

MyRocks中的資料是按索引列以memcmp方式進行排序的。對于一些數字類型,需要進行轉化才能直接通過memcmp進行比較, 例如有符号數在計算機中是用補碼表示的,那麼如果負數和正數直接按位元組比較,結果負數會比正數大,實際存儲時會将符号會反轉存儲,讀取時再轉化回來。對于字元類型,處理更加複雜,涉及到字元集的轉換。 記錄格式可以參考

[1]

,

[2]

MyRocks索引字段如果包含字元類型,預設隻支援binary collation,binary、latin1_bin、 utf8_bin其中的一種

## Error
create table t1(c1 int primary key, c2 varchar(10) unique) engine =rocksdb character set gbk;
ERROR 3046 (HY000): Unsupported collation on string indexed column test.t1.c2 Use binary collation (binary, latin1_bin, utf8_bin).

## Error
create table t1(c1 int primary key, c2 varchar(10) unique) engine =rocksdb character set utf8 collate utf8_general_ci;
ERROR 3046 (HY000): Unsupported collation on string indexed column test.t1.c2 Use binary collation (binary, latin1_bin, utf8_bin).

## OK
create table t1(c1 int primary key, c2 varchar(10) unique) engine =rocksdb character set utf8 collate utf8_bin;
Query OK, 0 rows affected (0.00 sec)
           

通過設定rocksdb_strict_collation_check參數為OFF可以跳出binary collation的限制

set global rocksdb_strict_collation_check=OFF;

## OK
create table t1(c1 int primary key, c2 varchar(10) unique) engine =rocksdb character set gbk;
Query OK, 0 rows affected (0.01 sec)

           

問題

MyRocks和InnoDB一樣支援covering index. MyRocks在使用二級索引查詢的時候,應盡量使用

covering index

, 因為MyRocks回表通過主鍵随機查詢資料的開銷比較大。

例如以下場景,idx1作為convering index被使用

create table t1(c1 int primary key, c2 varchar(10) unique) engine =rocksdb character set utf8 collate utf8_bin;

insert into t1 values(1,'ab');
insert into t1 values(2,'cd');

## covering index
explain select c2 from t1 where c2='ab';
+----+-------------+-------+-------+---------------+------+---------+-------+------+-------------+
| id | select_type | table | type  | possible_keys | key  | key_len | ref   | rows | Extra       |
+----+-------------+-------+-------+---------------+------+---------+-------+------+-------------+
|  1 | SIMPLE      | t1    | const | c2            | c2   | 33      | const |    1 | Using index |
+----+-------------+-------+-------+---------------+------+---------+-------+------+-------------+
           

然而設定set global rocksdb_strict_collation_check=OFF;在某些情況下會導緻我們無法使用covering index.

set global rocksdb_strict_collation_check=ON;

## Error
create table t1(c1 int primary key, c2 int, c3 varchar(10), key idx1(c2,c3)) engine =rocksdb  character set utf8 collate utf8_general_ci;
ERROR 3046 (HY000): Unsupported collation on string indexed column test.t1.c3 Use binary collation (binary, latin1_bin, utf8_bin).

set global rocksdb_strict_collation_check=OFF;

## OK
create table t1(c1 int primary key, c2 varchar(10) unique) engine =rocksdb character set utf8 collate utf8_general_ci;
Query OK, 0 rows affected (0.00 sec)

insert into t1 values(1,1,'ab');
insert into t1 values(2,2,'cd');


insert into t1 values(1,'ab');
insert into t1 values(2,'cd');

## non-covering index
explain select c2 from t1 where c2='ab';
+----+-------------+-------+-------+---------------+------+---------+-------+------+-------+
| id | select_type | table | type  | possible_keys | key  | key_len | ref   | rows | Extra |
+----+-------------+-------+-------+---------------+------+---------+-------+------+-------+
|  1 | SIMPLE      | t1    | const | c2            | c2   | 33      | const |    1 | NULL  |
+----+-------------+-------+-------+---------------+------+---------+-------+------+-------+
1 row in set (0.00 sec)
           

原因

MyRocks二級索引由于collation的關系導緻查詢沒有走covering index. MyRocks中索引列需要轉化為memcomparable的形式,轉化分為以下三種情況

  • 1) 直接轉換,不需要存儲額外資訊

    例如 binary、latin1_bin、 utf8_bin三種collation可以直接轉換

    這種情況二級索引列資料可以完整的從二級索引中取到,不影響covering index使用

  • 2) 間接轉換,需在value中增加unpack_info

    例如latin1_general_ci,latin2_general_ci, ascii_general_ci,greek_general_ci等collation,具體可以參考函數rdb_is_collation_supported

    這種情況二級索引列資料可以從key和unpack_info中解析取到,也不影響covering index使用

create table t1(c1 int primary key, c2 varchar(10) unique) engine =rocksdb character set latin1 collate latin1_general_ci;

insert into t1 values(1,'ab');
insert into t1 values(2,'cd');

## covering index
explain select c2 from t1 where c2='ab';
+----+-------------+-------+-------+---------------+------+---------+-------+------+-------------+
| id | select_type | table | type  | possible_keys | key  | key_len | ref   | rows | Extra       |
+----+-------------+-------+-------+---------------+------+---------+-------+------+-------------+
|  1 | SIMPLE      | t1    | const | c2            | c2   | 13      | const |    1 | Using index |
+----+-------------+-------+-------+---------------+------+---------+-------+------+-------------+
1 row in set (0.00 sec)
           
  • 3) 無法轉換

    除1,2兩種情況外的collation, 例如utf8_general_ci

    此時從二級索引中擷取不到key的完整資訊,需要從主鍵索引上擷取,是以不能走covering index

create table t1(c1 int primary key, c2 varchar(10) unique) engine =rocksdb character set utf8 collate utf8_general_ci;

 insert into t1 values(1,'ab');
 insert into t1 values(2,'cd');

## non-covering index
explain select c2 from t1 where c2='ab';
+----+-------------+-------+-------+---------------+------+---------+-------+------+-------+
| id | select_type | table | type  | possible_keys | key  | key_len | ref   | rows | Extra |
+----+-------------+-------+-------+---------------+------+---------+-------+------+-------+
|  1 | SIMPLE      | t1    | const | c2            | c2   | 33      | const |    1 | NULL  |
+----+-------------+-------+-------+---------------+------+---------+-------+------+-------+
1 row in set (0.00 sec)
           

此時的資料擷取路徑如下

0  myrocks::ha_rocksdb::convert_record_from_storage_format 
1  myrocks::ha_rocksdb::get_row_by_rowid
2  myrocks::ha_rocksdb::get_row_by_rowid
3  myrocks::ha_rocksdb::read_row_from_secondary_key 
4  myrocks::ha_rocksdb::index_read_map_impl
5  myrocks::ha_rocksdb::index_read_map
6  handler::index_read_idx_map
7  handler::ha_index_read_idx_map
8  join_read_const
9  join_read_const_table
10 make_join_statistics
11 JOIN::optimize
12 mysql_execute_select
13 mysql_select
14 handle_select
15 execute_sqlcom_select
16 mysql_execute_command
           

轉換的具體實作可以參考函數Rdb_field_packing::setup

MyRocks從索引讀取資料時,不能僅通過keyread_only來判斷是否可以使用covering index, 還需要判斷是否存在collation列資料轉換的問題,如果通路的列無法轉換就不能使用covering index.

MyRocks會在value中存儲covered_bitmap,表示索引列是否可以轉換, read_row_from_secondary_key/secondary_index_read讀取時會根據covered_bitmap來決定是否能使用covering index

bool Rdb_key_def::covers_lookup(TABLE *const table,
                                const rocksdb::Slice *const unpack_info,
                                const MY_BITMAP *const lookup_bitmap) const {
  DBUG_ASSERT(lookup_bitmap != nullptr);
  if (!use_covered_bitmap_format() || lookup_bitmap->bitmap == nullptr) {
    return false;
  }

  Rdb_string_reader unp_reader = Rdb_string_reader::read_or_empty(unpack_info);

  // Check if this unpack_info has a covered_bitmap
  const char *unpack_header = unp_reader.get_current_ptr();
  const bool has_covered_unpack_info =
      unp_reader.remaining_bytes() &&
      unpack_header[0] == RDB_UNPACK_COVERED_DATA_TAG;
  if (!has_covered_unpack_info ||
      !unp_reader.read(RDB_UNPACK_COVERED_HEADER_SIZE)) {
    return false;
  }

  MY_BITMAP covered_bitmap;
  my_bitmap_map covered_bits;
  bitmap_init(&covered_bitmap, &covered_bits, MAX_REF_PARTS, false);
  covered_bits = rdb_netbuf_to_uint16((const uchar *)unpack_header +
                                      sizeof(RDB_UNPACK_COVERED_DATA_TAG) +
                                      RDB_UNPACK_COVERED_DATA_LEN_SIZE);

  return bitmap_is_subset(lookup_bitmap, &covered_bitmap);
}
           

總結

MyRocks在字元collation上的限制需要引起我們關注,使用不當會影響查詢效率。