天天看點

最通俗易懂的HashMap底層原理圖文詳解

HashMap的底層原理面試必考題。為什麼面試官如此青睐這道題?

HashMap裡面涉及了很多的知識點,可以比較全面考察面試者的基本功,想要拿到一個好offer,這是一個邁不過的坎,接下來我們用最通俗易懂的語言帶着大家揭開HashMap的神秘面紗。

資料結構

HashMap的資料結構為 : 數組+(連結清單或紅黑樹)。

/**
     * The table, initialized on first use, and resized as
     * necessary. When allocated, length is always a power of two.
     * (We also tolerate length zero in some operations to allow
     * bootstrapping mechanics that are currently not needed.)
     */
    transient Node<K,V>[] table;      

HashMap的資料模型:

最通俗易懂的HashMap底層原理圖文詳解
public class HashMap<K,V> extends AbstractMap<K,V> implements Map<K,V>, Cloneable, Serializable {

    static class Node<K,V> implements Map.Entry<K,V> {
        final int hash;
        final K key;
        V value;
        Node<K,V> next;
    }

    ...

    /**
     * Entry for Tree bins. Extends LinkedHashMap.Entry (which in turn
     * extends Node) so can be used as extension of either regular or
     * linked node.
     */
    static final class TreeNode<K,V> extends LinkedHashMap.Entry<K,V> {
        TreeNode<K,V> parent;  // red-black tree links
        TreeNode<K,V> left;
        TreeNode<K,V> right;
        TreeNode<K,V> prev;    // needed to unlink next upon deletion
        boolean red;
        ...
    }

}      
最通俗易懂的HashMap底層原理圖文詳解

Node的模型:

最通俗易懂的HashMap底層原理圖文詳解

TreeNode的模型:

最通俗易懂的HashMap底層原理圖文詳解
最通俗易懂的HashMap底層原理圖文詳解

算法

連結清單轉紅黑樹

紅黑樹插入節點、删除節點、查詢節點算法

紅黑樹的五個性質

最通俗易懂的HashMap底層原理圖文詳解

紅黑樹,一種二叉查找樹,但在每個結點上增加一個存儲位表示結點的顔色,可以是Red或Black。

  

通過對任何一條從根到葉子的路徑上各個結點着色方式的限制,紅黑樹確定沒有一條路徑會比其他路徑長出倆倍,因而是接近平衡的。

紅黑樹,作為一棵二叉查找樹,滿足二叉查找樹的一般性質。下面,來了解下 二叉查找樹的一般性質。

  

二叉查找樹,也稱有序二叉樹(ordered binary tree),或已排序二叉樹(sorted binary tree),是指一棵空樹或者具有下列性質的二叉樹:

  • 若任意節點的左子樹不空,則左子樹上所有結點的值均小于它的根結點的值;
  • 若任意節點的右子樹不空,則右子樹上所有結點的值均大于它的根結點的值;
  • 任意節點的左、右子樹也分别為二叉查找樹。
  • 沒有鍵值相等的節點(no duplicate nodes)。

因為一棵由n個結點随機構造的二叉查找樹的高度為lgn,是以順理成章,二叉查找樹的一般操作的執行時間為O(lgn)。但二叉查找樹若退化成了一棵具有n個結點的線性鍊後,則這些操作最壞情況運作時間為O(n)。

  

紅黑樹雖然本質上是一棵二叉查找樹,但它在二叉查找樹的基礎上增加了着色和相關的性質使得紅黑樹相對平衡,進而保證了紅黑樹的查找、插入、删除的時間複雜度最壞為O(log n)。

  

但它是如何保證一棵n個結點的紅黑樹的高度始終保持在logn的呢?

這就引出了紅黑樹的5個性質:

  

1.每個結點要麼是紅的要麼是黑的。

2.根結點是黑的。

3.每個葉結點(葉結點即指樹尾端NIL指針或NULL結點)都是黑的。

4.如果一個結點是紅的,那麼它的兩個兒子都是黑的。

5.對于任意結點而言,其到葉結點樹尾端NIL指針的每條路徑都包含相同數目的黑結點。

正是紅黑樹的這5條性質,使一棵n個結點的紅黑樹始終保持了logn的高度,進而也就解釋了上面所說的:

紅黑樹的查找、插入、删除的時間複雜度最壞為O(log n)

這一結論成立的原因。

這個源碼有點複雜,後面再分析。

為什麼節點數大于8的時候轉紅黑樹?

/*
     * Implementation notes.
     *
     * This map usually acts as a binned (bucketed) hash table, but
     * when bins get too large, they are transformed into bins of
     * TreeNodes, each structured similarly to those in
     * java.util.TreeMap. Most methods try to use normal bins, but
     * relay to TreeNode methods when applicable (simply by checking
     * instanceof a node).  Bins of TreeNodes may be traversed and
     * used like any others, but additionally support faster lookup
     * when overpopulated. However, since the vast majority of bins in
     * normal use are not overpopulated, checking for existence of
     * tree bins may be delayed in the course of table methods.
     *
     * Tree bins (i.e., bins whose elements are all TreeNodes) are
     * ordered primarily by hashCode, but in the case of ties, if two
     * elements are of the same "class C implements Comparable<C>",
     * type then their compareTo method is used for ordering. (We
     * conservatively check generic types via reflection to validate
     * this -- see method comparableClassFor).  The added complexity
     * of tree bins is worthwhile in providing worst-case O(log n)
     * operations when keys either have distinct hashes or are
     * orderable, Thus, performance degrades gracefully under
     * accidental or malicious usages in which hashCode() methods
     * return values that are poorly distributed, as well as those in
     * which many keys share a hashCode, so long as they are also
     * Comparable. (If neither of these apply, we may waste about a
     * factor of two in time and space compared to taking no
     * precautions. But the only known cases stem from poor user
     * programming practices that are already so slow that this makes
     * little difference.)
     *
     * Because TreeNodes are about twice the size of regular nodes, we
     * use them only when bins contain enough nodes to warrant use
     * (see TREEIFY_THRESHOLD). And when they become too small (due to
     * removal or resizing) they are converted back to plain bins.  In
     * usages with well-distributed user hashCodes, tree bins are
     * rarely used.  Ideally, under random hashCodes, the frequency of
     * nodes in bins follows a Poisson distribution
     * (http://en.wikipedia.org/wiki/Poisson_distribution) with a
     * parameter of about 0.5 on average for the default resizing
     * threshold of 0.75, although with a large variance because of
     * resizing granularity. Ignoring variance, the expected
     * occurrences of list size k are (exp(-0.5) * pow(0.5, k) /
     * factorial(k)). The first values are:
     *
     * 0:    0.60653066
     * 1:    0.30326533
     * 2:    0.07581633
     * 3:    0.01263606
     * 4:    0.00157952
     * 5:    0.00015795
     * 6:    0.00001316
     * 7:    0.00000094
     * 8:    0.00000006
     * more: less than 1 in ten million
     *
     * The root of a tree bin is normally its first node.  However,
     * sometimes (currently only upon Iterator.remove), the root might
     * be elsewhere, but can be recovered following parent links
     * (method TreeNode.root()).
     *
     * All applicable internal methods accept a hash code as an
     * argument (as normally supplied from a public method), allowing
     * them to call each other without recomputing user hashCodes.
     * Most internal methods also accept a "tab" argument, that is
     * normally the current table, but may be a new or old one when
     * resizing or converting.
     *
     * When bin lists are treeified, split, or untreeified, we keep
     * them in the same relative access/traversal order (i.e., field
     * Node.next) to better preserve locality, and to slightly
     * simplify handling of splits and traversals that invoke
     * iterator.remove. When using comparators on insertion, to keep a
     * total ordering (or as close as is required here) across
     * rebalancings, we compare classes and identityHashCodes as
     * tie-breakers.
     *
     * The use and transitions among plain vs tree modes is
     * complicated by the existence of subclass LinkedHashMap. See
     * below for hook methods defined to be invoked upon insertion,
     * removal and access that allow LinkedHashMap internals to
     * otherwise remain independent of these mechanics. (This also
     * requires that a map instance be passed to some utility methods
     * that may create new nodes.)
     *
     * The concurrent-programming-like SSA-based coding style helps
     * avoid aliasing errors amid all of the twisty pointer operations.
     */      

LinkedHashMap

public class LinkedHashMap<K,V>
extends HashMap<K,V>
implements Map<K,V>      

Hash table and linked list implementation of the Map interface, with predictable iteration order. This implementation differs from HashMap in that it maintains a doubly-linked list running through all of its entries. This linked list defines the iteration ordering, which is normally the order in which keys were inserted into the map (insertion-order). Note that insertion order is not affected if a key is re-inserted into the map. (A key k is reinserted into a map m if m.put(k, v) is invoked when m.containsKey(k) would return true immediately prior to the invocation.)

This implementation spares its clients from the unspecified, generally chaotic ordering provided by HashMap (and Hashtable), without incurring the increased cost associated with TreeMap. It can be used to produce a copy of a map that has the same order as the original, regardless of the original map's implementation:

void foo(Map m) {
     Map copy = new LinkedHashMap(m);
     ...
 }      

This technique is particularly useful if a module takes a map on input, copies it, and later returns results whose order is determined by that of the copy. (Clients generally appreciate having things returned in the same order they were presented.)

最通俗易懂的HashMap底層原理圖文詳解

參考資料

​​https://www.bilibili.com/read/cv369222?from=category_34​​

Kotlin開發者社群