使用者視角的wiredtiger -- Storage options (1) -- 模式、列簇、索引和投影

本文主要介紹Schema, Columns, Column Groups, Indices and Projections 。

wiredtiger不僅支援簡單的K/V格式的table，還支援設定schema。

Tables, rows and columns，支援行存和列存兩種方式。行存的方式為将一行的所有元素（列）存儲為一個元素，然後依次存下一行；列存将一列的所有元素（行）存儲為一個元素，然後依次存儲下一列。wiredtiger還可以混合使用兩種存儲方式。wiredtiger支援将一個表拆分成一個或多個列簇（column family)，所有列簇存儲的所有列需包含了表中的任一列，每個列簇的key為表的primary key（即唯一的KEY）。每個表可以建立0個或更多個索引，使得能夠快速的按非primary key的順序查找表中的列。

Column types， wiredtiger以傳統的KV store 工作。可以指定表中的一列或多列為key或value，其可以直接通過WT_ITEM的格式讀取key/value的raw byte array。

Key/Value pairs，底層wiredtiger檔案以KV的方式進行存儲。file cursor可以直接周遊該檔案的KV。以一個index為例，其會以KV的格式存在一個檔案中，通過直接讀取該檔案，可以避免當該索引與Table的資料不一緻的錯誤（因為處于性能的考慮，可以對索引配置為“對更新操作不更新索引的記錄”）。在行存的格式中，KV是變長的字元串（K或V的長度最多為4GB - 512B）；在列存儲中，key為64位的record number，value為變長的字元串（長度最多為4GB - 512B）或1-8位的bit序列。

Format types，使用了類似Python的format string描述table中每一列的類型。其中，對于’t’類型，其除了用于在列存中，表示record number的key外，和’Q’類型一樣，用于表示uint64_t。其他具體類型這裡略去。

Packing and Unpacking Data 介紹了根據format string打包資料的方法， it naturally sorts in lexicographic order, and the packed format uses variable-sized encoding of values to reduce the data size。可通過wiredtiger_struct_pack和wiredtiger_struct_unpack進行資料的pack和unpack（例子見ex_pack.c.）。

WT_COLLATOR struct 提供了表中記錄自定義排序的接口。

Key and value formats，如上面Column types所說，可以指定一列或多列為key或value。通過将參數key_format和value_format傳遞給WT_SESSION::create函數可以指定列的類型，如K/V都隻包含單個變長字元串的，以行存為存儲格式的表，可以通過如下方式建立

value隻包含單個單個變長字元串的，以列存的存儲格式的表，可以通過如下方式建立

Cursor formats，cursor 的key的格式和表的格式一樣，通過 WT_CURSOR::set_key設定cursor的key，通過WT_CURSOR::get_key讀取cursor的key。其參數數目為組成key的column的個數（對于不同的表，參數數目可變），其使用方法分别類似printf和scanf。cursor的value格式和表的格式一樣，除非在 WT_SESSION::open_cursor時使用了Projection。通過 WT_CURSOR::set_value設定cursor的value，通過WT_CURSOR::get_value讀取cursor的value，其使用方法和key對應函數的使用方法一樣。

Columns，通過columns 參數傳遞給WT_SESSION::create函數以指定每個column的名字。column的名字首先賦給key_format指定的column，然後是value_format指定的column。必須指定每個column指定名字，且名字不能重複。如建立一個以列存為存儲方式的表：

/*
     * Create a table with columns: keys are record numbers, values are
     * (string, signed 32-bit integer, unsigned 16-bit integer).
     */
    error_check(session->create(session, "table:mytable",
      "key_format=r,value_format=SiH,"
      "columns=(id,department,salary,year-started)"));

一旦表建立成功，無需再調用WT_SESSION::create函數。然而，仍可以通過調用本函數驗證表格是否存在，table schema是否比對（exclusive參數，預設為false，當true時，表格存在會報錯，當false時，當表格已經存在時會檢查配置是否比對）。

Column groups，當使用使用column names時，便可以進行column group的配置。列簇主要用于定義存儲的格式以對cache的行為進行調優, 因為每個column group是分開存儲在各自的一個檔案中的。建立列簇分為兩個步驟：1. WT_SESSION::create建立表時（URI為table:table_name)時，指定colgroups參數添加要建立的若幹個列簇的名字；2. 對每一個列簇，通過WT_SESSION::create将URI參數設定為table:table_name:column_group_name 建立列簇，以及columns的配置指定列簇中每個列的名字。每個列至少在一個列簇中出現，可以在多個列簇中出現而使得被存儲多次。提供一個例子：

/*
     * Create the population table. Keys are record numbers, the format for values is (5-byte
     * string, uint16_t, uint64_t). See ::wiredtiger_struct_pack for details of the format strings.
     */
    error_check(session->create(session, "table:poptable",
      "key_format=r,"
      "value_format=5sHQ,"
      "columns=(id,country,year,population),"
      "colgroups=(main,population)"));
    /*
     * Create two column groups: a primary column group with the country code, year and population
     * (named "main"), and a population column group with the population by itself (named
     * "population").
     */
    error_check(
      session->create(session, "colgroup:poptable:main", "columns=(country,year,population)"));
    error_check(session->create(session, "colgroup:poptable:population", "columns=(population)"));

每個列簇的key總與table的key相同。這對于列存的存儲格式的表尤其有用，因為record number（即key）不會顯式存儲在磁盤中，是以在多個檔案中不會重複存儲key。如果以行存的存儲格式存儲，key會在每個列簇中重複存儲。使用者通過在WT_SESSION::open_cursor中指定列簇的URI來打開基于列簇的cursor，如本例子中的URI為colgroup:poptable:main 或 colgroup:poptable:population。

Indices，列（名）可用于建立和配置索引。當表進行了更新，索引自動進行更新。表的索引是隻讀的，使用者不能進行更新。通過WT_SESSION::create 函數指定索引的URI index:table:index_name，同時通過columns配置指定一個或多個列名，來建立索引。例如：

/* Create an index with a simple key. */
    error_check(session->create(session, "index:poptable:country", "columns=(country)"));
    /* Create an index with a composite key (country,year). */
    error_check(session->create(session, "index:poptable:country_plus_year", "columns=(country,year)"));

通過 WT_SESSION::open_cursor時的URI參數指定為index的URI來打開索引。索引的key由打開時columns參數指定的一個或多個列組成。索引的value為表的value。例如：

/* Search in a composite index. */
    error_check(session->open_cursor(session, "index:poptable:country_plus_year", NULL, NULL, &cursor));
    cursor->set_key(cursor, "USA\0\0", (uint16_t)1900);
    error_check(cursor->search(cursor));
    error_check(cursor->get_value(cursor, &country, &year, &population));
    printf("US 1900: country %s, year %" PRIu16 ", population %" PRIu64 "\n", country, year, population);

Immutable indices，在建立索引時，可以通過immutable參數，使得表中主鍵對應的value更新時，不更新索引（先删除再插入一條新的）以優化性能。然而，當更新後，索引資料與實際資料将不一緻。

Index cursor projections，預設情況下，index cursor傳回的value為表中所有的value列。調用 WT_SESSION::open_cursor函數時，在URI的後面加上"（a list of column names）"，可以使得cursor在get_value時隻獲得表的一部分value列（即Projection）。如果project的列都可以在index中獲得（包括 primary key columns, 其為index的value), 那麼這些資料隻需要從index或得即可，無需從其他列簇中獲得。有了這一點，可以通過在index中備援key中存儲的列來（真正的組成key的column存儲在前面）避免讀取其他column family的資料。

對于以列式存儲的表，不需要建立基于record number（為primary key)的index，無作用。

Code Examples, 上述的例子在 ex_schema.c可見；ex_call_center.c.也包含了使用列簇，索引，以及通過索引來模拟SQL加速查找的例子。

使用者視角的wiredtiger -- Storage options (1) -- 模式、列簇、索引和投影

繼續閱讀

解析MongoDB存儲引擎WiredTiger：事務實作

WiredTiger的一些特性

使用者視角的wiredtiger -- Storage options (2) -- 檔案格式、壓縮方法和加密

WiredTiger系列3:Checkpoint/Block Mgr

Wiredtiger 存儲引擎概述

MongoVUE連接配接mongoDB 不顯示資料問題

WiredTiger的事務實作詳解一、多文檔事務的基本概念二、事務與複制集以及存儲引擎之間的關系三、WiredTiger事務的實作原理四、WiredTiger事務過程五、WiredTiger的事務隔離六、WiredTiger的事務日志七、總結

mongoDB的讀書筆記(via3.0)(00)_【概覽】(02)_mongoDB3.0中的mongod啟動方式小試牛刀mongod啟動Storage format的影響

WiredTiger實作：一個LRU cache深坑引發的分析eviction cahce原理eviction cache與checkpoint之間的事後記