sql查詢長度大于8_為什麼 HashMap 中連結清單長度大于 8 才轉化為紅黑樹？

Java 中的 HashMap 采用連結清單法來解決哈希沖突（HashMap 原理），即具有相同桶下标的鍵值對使用一個連結清單儲存。當連結清單變長時，查找和添加（需要确定 key 是否已經存在）都需要周遊這個連結清單，速度會變慢。JDK 1.8 後加入了連結清單轉換為紅黑樹的機制，但是紅黑樹的轉換并不是一個廉價的操作，隻有當連結清單長度大于等于 TREEIFY_THRESHOLD 才會 treeify。

/**
     * The bin count threshold for using a tree rather than list for a
     * bin.  Bins are converted to trees when adding an element to a
     * bin with at least this many nodes. The value must be greater
     * than 2 and should be at least 8 to mesh with assumptions in
     * tree removal about conversion back to plain bins upon
     * shrinkage.
     */
    static final int TREEIFY_THRESHOLD = 8;

TREEIFY_THRESHOLD 預設為 8，

putVal()

源碼如下：

if (binCount >= TREEIFY_THRESHOLD - 1) // -1 for 1st
        treeifyBin(tab, hash);

實際上并不是隻要連結清單長度大于 8 就會 treeify。當 table.length（桶的個數）小于 MIN_TREEIFY_CAPACITY 時會優先擴容而不是轉換為紅黑樹。

/**
     * The smallest table capacity for which bins may be treeified.
     * (Otherwise the table is resized if too many nodes in a bin.)
     * Should be at least 4 * TREEIFY_THRESHOLD to avoid conflicts
     * between resizing and treeification thresholds.
     */
    static final int MIN_TREEIFY_CAPACITY = 64;

MIN_TREEIFY_CAPACITY 預設為 64，

treeifyBin()

源碼大緻是這樣：

final void treeifyBin(Node<K,V>[] tab, int hash) {
        int n, index; Node<K,V> e;
        if (tab == null || (n = tab.length) < MIN_TREEIFY_CAPACITY) // 大小不夠，優先擴容
            resize();
        else if ((e = tab[index = (n - 1) & hash]) != null) {
            // ...
        }
    }

為什麼是 8 ？

TREEIFY_THRESHOLD 的預設值 8 是如何确定的呢？源碼中的注解其實給出了答案：

Because TreeNodes are about twice the size of regular nodes, we use them only when bins contain enough nodes to warrant use (see TREEIFY_THRESHOLD). And when they become too small (due to removal or resizing) they are converted back to plain bins. In usages with well-distributed user hashCodes, tree bins are rarely used. Ideally, under random hashCodes, the frequency of nodes in bins follows a Poisson distribution with a parameter of about 0.5 on average for the default resizing threshold of 0.75, although with a large variance because of resizing granularity. Ignoring variance, the expected occurrences of list size k are (exp(-0.5) * pow(0.5, k) / factorial(k)). The first values are:

treeify 過程會把原本的 Node 對象轉化為 TreeNode 對象，而 TreeNode 大小是 Node 的兩倍；除去記憶體的損耗，treeify 本身也是一個耗時的過程，并且在紅黑樹節點數小于等于 UNTREEIFY_THRESHOLD（預設為 6）時，紅黑樹又會重新轉換為連結清單，如果出現頻繁的互相轉化，這是一筆不小的開銷。

是以我們需要确定一個 k 值，連結清單長度大于等于 k 時，會轉化為紅黑樹。k 不能太大，因為連結清單的長度過大會影響插入和查找效率；k 也不能太小，treeify 并不是一個廉價的操作，我們希望連結清單長度大于等于 k 的機率要足夠小，這樣就可以盡量避免 treeify。

連結清單長度為 k 的機率就是出現 k 個鍵值對都在同一個桶中的機率，假設 table 的長度為 m，也就是有 m 個桶，一個鍵值對落入每一個桶都是等機率的，求不同 k 對應的機率就是标準的泊松分布問題：