Clustered and Secondary Indexes(聚集索引和二級索引)
Every InnoDB table has a special index called the clustered index where the data for the rows is stored. Typically, the clustered index is synonymous with the primary key. To get the best performance from queries, inserts, and other database operations, you must understand how InnoDB uses the clustered index to optimize the most common lookup and DML operations for each table.
每張使用 InnoDB 作為存儲引擎的表都有一個特殊的索引稱為聚集索引,它儲存着每一行的資料,通常,聚集索引就是主鍵索引。為了得到更高效的查詢、插入以及其他的資料庫操作的性能,你必須了解 InnoDB 引擎是如何使用聚集索引來優化常見的查找和 DML 操作。
-
When you define a PRIMARY KEY on your table, InnoDB uses it as the clustered index. Define a primary key for each table that you create. If there is no logical unique and non-null column or set of columns, add a new auto-increment column, whose values are filled in automatically.
如果你的表定義了一個主鍵,InnoDB 就使用它作為聚集索引。是以,盡可能的為你的表定義一個主鍵,如果實在沒有一個資料列是唯一且非空的可以作為主鍵列,建議添加一個自動遞增列作為主鍵列。
-
If you do not define a PRIMARY KEY for your table, MySQL locates the first UNIQUE index where all the key columns are NOT NULL and InnoDB uses it as the clustered index.
如果你的表沒有定義主鍵,InnoDB 會選擇第一個唯一非空索引來作為聚集索引。
-
If the table has no PRIMARY KEY or suitable UNIQUE index, InnoDB internally generates a hidden clustered index named GEN_CLUST_INDEX on a synthetic column containing row ID values. The rows are ordered by the ID that InnoDB assigns to the rows in such a table. The row ID is a 6-byte field that increases monotonically as new rows are inserted. Thus, the rows ordered by the row ID are physically in insertion order.
如果你的表既沒有主鍵,又沒有合适的唯一索引,InnoDB 内部會生成一個隐式聚集索引 —— GEN_CLUST_INDEX,該索引建立在由 rowid 組成的合成列上。資料行根據 InnoDB 配置設定的 rowid 排序,rowid 是一個 6 位元組的字段,随着資料插入而單調遞增。也就是說,資料行根據 rowid 排序實際上是根據插入順序排序。
How the Clustered Index Speeds Up Queries(聚集索引如何提升查詢效率)
Accessing a row through the clustered index is fast because the index search leads directly to the page with all the row data. If a table is large, the clustered index architecture often saves a disk I/O operation when compared to storage organizations that store row data using a different page from the index record.
通過聚集索引來通路一行資料是非常快的,這是因為所有的行資料和索引在同一頁上。如果表特别大,相較于行資料和索引在不同頁上存儲結構(比如 myisam 引擎),這将大大節省磁盤 I/O 資源。
How Secondary Indexes Relate to the Clustered Index(二級索引和聚集索引如何關聯)
All indexes other than the clustered index are known as secondary indexes. In InnoDB, each record in a secondary index contains the primary key columns for the row, as well as the columns specified for the secondary index. InnoDB uses this primary key value to search for the row in the clustered index.
除了聚集索引外的其他索引類型都屬于二級索引。在 InnoDB 中,二級索引中的每個記錄都包含該行的主鍵列,以及二級索引指定的列;聚集索引中,InnoDB 通過主鍵值來查詢資料行。
If the primary key is long, the secondary indexes use more space, so it is advantageous to have a short primary key.
如果主鍵過長,二級索引就需要更大的空間,是以,使用短的主鍵列是很有利的。
對于二級索引,葉子節點并不包含行記錄的全部資料。葉子節點除了包含鍵值以外,每個葉子節點中的索引行中還包含了一個書簽 —— 相應行資料的聚集索引鍵。
如果在一棵高度為 3 的二級索引樹中查找資料,那需要對這顆二級索引樹周遊3次找到指定聚集索引鍵。如果聚集索引樹的高度同樣為 3 ,那麼還需要對聚集索引樹進行 3 次查找,最終找到一個完整的行資料所在的頁,是以一共需要 6 次邏輯 IO 通路以得到最終的一個資料頁。