Jdk8 HashMap源碼閱讀

版本 Oracle 1.8

目标

了解了以下問題，本篇筆記的目的就達到了

1. 哈希基本原理？（答：散清單、hash碰撞、連結清單、紅黑樹）

2. hashmap查詢的時間複雜度，影響因素和原理？（答：最好O（1），最差O（n），如果是紅黑O（logn））

3. resize如何實作的，記住已經沒有rehash了！！！（答：拉鍊entry根據高位bit散列到目前位置i和size+i位置）

4. get put 的過程。

資料結構

HashMap 基本就是哈希表，其底層實作就是圍繞哈希表展開的，源碼閱讀也是在了解了哈希表的前提下進行為上策。

哈希表的理念就是Key值與存儲位置有固定的對應關系，而這種關系需要通過某中函數來構造出來，這個函數就是哈希函數。

哈希函數有6種實作：

摘錄自百度百科–哈希表

1. 直接尋址法：取關鍵字或關鍵字的某個線性函數值為散列位址。即H(key)=key或H(key) = a·key + b，其中a和b為常數（這種散列函數叫做自身函數）。若其中H(key）中已經有值了，就往下一個找，直到H>(key）中沒有值了，就放進去。

2. 數字分析法：分析一組資料，比如一組員工的出生年月日，這時我們發現出生年月日的前幾位數字大體相同，這樣的話，出現沖突的幾率就會很大，但是我們發現年月日的後幾位表示月份和具體日期的數字差别很>大，如果用後面的數字來構成散列位址，則沖突的幾率會明顯降低。是以數字分析法就是找出數字的規律，盡可能利用這些資料來構造沖突幾率較低的散列位址。

3. 平方取中法：當無法确定關鍵字中哪幾位分布較均勻時，可以先求出關鍵字的平方值，然後按需要取平方值的中間幾位作為哈希位址。這是因為：平方後中間幾位和關鍵字中每一位都相關，故不同關鍵字會以較高的機率産生不同的哈希位址。

4. 折疊法：将關鍵字分割成位數相同的幾部分，最後一部分位數可以不同，然後取這幾部分的疊加和（去除進位）作為散列位址。數位疊加可以有移位疊加和間界疊加兩種方法。移位疊加是将分割後的每一部分的最低位對齊，然後相加；間界疊加是從一端向另一端沿分割界來回折疊，然後對齊相加。

5. 随機數法：選擇一随機函數，取關鍵字的随機值作為散列位址，通常用于關鍵字長度不同的場合。

6. 除留餘數法：取關鍵字被某個不大于散清單表長m的數p除後所得的餘數為散列位址。即 H(key) = key MOD p,p<=m。不僅可以對關鍵字直接取模，也可在折疊、平方取中等運算之後取模。對p的選擇很重要，一般取素數或m，若p選的不好，容易産生同義詞。

除留餘數法最常用，也是Java

HashMap

使用的方法，有關哈希沖突參考另一篇筆記哈希沖突處理 http://www.idiary.cc/2017/07/11/Hash-%E5%86%B2%E7%AA%81%E5%A4%84%E7%90%86/。

其中鍊位址的處理方法正式

HashMap

處理沖突的基本方法。

下圖是經典的hashtable結構,JDK 1.8 中連結清單也可以由紅黑樹代替。

Jdk8 HashMap源碼閱讀

1.8中的結構(參考：http://www.cnblogs.com/leesf456/p/5242233.html)

Jdk8 HashMap源碼閱讀

HashMap 類總覽

首先看兩段注釋文檔

/** 
 * This map usually acts as a binned (bucketed) hash table, but
 * when bins get too large, they are transformed into bins of
 * TreeNodes, each structured similarly to those in
 * java.util.TreeMap. Most methods try to use normal bins, but
 * relay to TreeNode methods when applicable (simply by checking
 * instanceof a node).  Bins of TreeNodes may be traversed and
 * used like any others, but additionally support faster lookup
 * when overpopulated. However, since the vast majority of bins in
 * normal use are not overpopulated, checking for existence of
 * tree bins may be delayed in the course of table methods.
 */

其中

but when bins get too large, they are transformed into bins of TreeNodes

這句說明了當bin過大時就會采用紅黑樹進行存儲。

/**
 * The bin count threshold for using a tree rather than list for a
 * bin.  Bins are converted to trees when adding an element to a
 * bin with at least this many nodes. The value must be greater
 * than 2 and should be at least 8 to mesh with assumptions in
 * tree removal about conversion back to plain bins upon
 * shrinkage.
 */
static final int TREEIFY_THRESHOLD = ;
/**
 * The bin count threshold for untreeifying a (split) bin during a
 * resize operation. Should be less than TREEIFY_THRESHOLD, and at
 * most 6 to mesh with shrinkage detection under removal.
 */
static final int UNTREEIFY_THRESHOLD = ;
/**
 * The smallest table capacity for which bins may be treeified.
 * (Otherwise the table is resized if too many nodes in a bin.)
 * Should be at least 4 * TREEIFY_THRESHOLD to avoid conflicts
 * between resizing and treeification thresholds.
 */
static final int MIN_TREEIFY_CAPACITY = ;

上面這段指明了連結清單在大于8的情況下會使用紅黑樹，在小于6的情況下重新使用連結清單和轉化為紅黑樹對應的table的最小大小（64）。

HashMap 類圖

Jdk8 HashMap源碼閱讀

類屬性說明

public class HashMap<K,V> extends AbstractMap<K,V> implements Map<K,V>, Cloneable, Serializable {
    // 預設的初始容量是16，必須是2的幂
    static final int DEFAULT_INITIAL_CAPACITY =  << ;   
    // 最大容量
    static final int MAXIMUM_CAPACITY =  << ; 
    // 預設的填充因子
    static final float DEFAULT_LOAD_FACTOR = f;
    // 當桶(bucket)上的結點數大于這個值時會轉成紅黑樹
    static final int TREEIFY_THRESHOLD = ; 
    // 當桶(bucket)上的結點數小于這個值時樹轉連結清單
    static final int UNTREEIFY_THRESHOLD = ;
    // 桶中結構轉化為紅黑樹對應的table的最小大小
    static final int MIN_TREEIFY_CAPACITY = ;
    // 存儲元素的數組，總是2的幂
    transient Node<k,v>[] table; 
    // 存放具體元素的集
    transient Set<map.entry<k,v>> entrySet;
    // 存放元素的個數，注意這個不等于數組的長度。
    transient int size;
    // 每次擴容和更改map結構的計數器
    transient int modCount;   
    // 臨界值 當實際大小(容量*填充因子)超過臨界值時，會進行擴容
    int threshold;
    // 填充因子
    final float loadFactor;
}

函數詳解

構造函數

HashMap一共提供了四個構造函數。

/**
 * Constructs an empty <tt>HashMap</tt> with the default initial capacity
 * (16) and the default load factor (0.75).
 */
public HashMap() {
    this.loadFactor = DEFAULT_LOAD_FACTOR; // all other fields defaulted
}

/**
  * Constructs an empty <tt>HashMap</tt> with the specified initial
  * capacity and the default load factor (0.75).
  *
  * @param  initialCapacity the initial capacity.
  * @throws IllegalArgumentException if the initial capacity is negative.
  */
public HashMap(int initialCapacity) {
    this(initialCapacity, DEFAULT_LOAD_FACTOR);
}

/**
  * Constructs an empty <tt>HashMap</tt> with the specified initial
  * capacity and load factor.
  *
  * @param  initialCapacity the initial capacity
  * @param  loadFactor      the load factor
  * @throws IllegalArgumentException if the initial capacity is negative
  *         or the load factor is nonpositive
  */
public HashMap(int initialCapacity, float loadFactor) {
    if (initialCapacity < )
        throw new IllegalArgumentException("Illegal initial capacity: " +
                                            initialCapacity);
    if (initialCapacity > MAXIMUM_CAPACITY)
        initialCapacity = MAXIMUM_CAPACITY;
    if (loadFactor <=  || Float.isNaN(loadFactor))
        throw new IllegalArgumentException("Illegal load factor: " +
                                            loadFactor);
    this.loadFactor = loadFactor;
    this.threshold = tableSizeFor(initialCapacity);
}

/**
  * Constructs a new <tt>HashMap</tt> with the same mappings as the
  * specified <tt>Map</tt>.  The <tt>HashMap</tt> is created with
  * default load factor (0.75) and an initial capacity sufficient to
  * hold the mappings in the specified <tt>Map</tt>.
  *
  * @param   m the map whose mappings are to be placed in this map
  * @throws  NullPointerException if the specified map is null
  */
public HashMap(Map<? extends K, ? extends V> m) {
    this.loadFactor = DEFAULT_LOAD_FACTOR;
    putMapEntries(m, false);
}

這裡着重看下

HashMap(int initialCapacity, float loadFactor)

函數中的

tableSizeFor(initialCapacity)

。注釋翻譯過來的意思就是傳回大于

cap

的最小二的幂數，這個函數的目的就是通過無符号右移動操作先取到比最小幂小1的數（後面連續幾個零的二進制的數）。至于為何在開始的時候減一是因為在假設

cap

已經是二的幂數了，這個時候就是得到

cap

2倍的幂，更好的解釋參見HashMap源碼注解之靜态工具方法hash()、tableSizeFor()（四）。

/**
 * Returns a power of two size for the given target capacity.
 */
static final int tableSizeFor(int cap) {
    int n = cap - ;
    n |= n >>> ;
    n |= n >>> ;
    n |= n >>> ;
    n |= n >>> ;
    n |= n >>> ;
    return (n < ) ?  : (n >= MAXIMUM_CAPACITY) ? MAXIMUM_CAPACITY : n + ;
}

最後一個構造函數是把一個已有的Map對象最為參數，其中主要是調用

putMapEntries

方法，這個函數中最後是調用了

putVal

,這個方法在下面介紹。

/**
 * Implements Map.putAll and Map constructor
 *
 * @param m the map
 * @param evict false when initially constructing this map, else
 * true (relayed to method afterNodeInsertion).
 */
final void putMapEntries(Map<? extends K, ? extends V> m, boolean evict) {
    int s = m.size();
    if (s > ) {
        if (table == null) { // pre-size
            float ft = ((float)s / loadFactor) + F;
            int t = ((ft < (float)MAXIMUM_CAPACITY) ?
                     (int)ft : MAXIMUM_CAPACITY);
            if (t > threshold)
                threshold = tableSizeFor(t);
        }
        else if (s > threshold)
            resize();
        for (Map.Entry<? extends K, ? extends V> e : m.entrySet()) {
            K key = e.getKey();
            V value = e.getValue();
            putVal(hash(key), key, value, false, evict);
        }
    }
}

PUT

有了對象，就要開始往裡面放資料，現在就看下

put

這個方法。

/**
 * Associates the specified value with the specified key in this map.
 * If the map previously contained a mapping for the key, the old
 * value is replaced.
 *
 * @param key key with which the specified value is to be associated
 * @param value value to be associated with the specified key
 * @return the previous value associated with <tt>key</tt>, or
 *         <tt>null</tt> if there was no mapping for <tt>key</tt>.
 *         (A <tt>null</tt> return can also indicate that the map
 *         previously associated <tt>null</tt> with <tt>key</tt>.)
 */
public V put(K key, V value) {
    return putVal(hash(key), key, value, false, true);
}

這個方法隻是

putVal

方法的封裝

/**
 * Implements Map.put and related methods
 *
 * @param hash hash for key
 * @param key the key
 * @param value the value to put
 * @param onlyIfAbsent if true, don't change existing value
 * @param evict if false, the table is in creation mode.
 * @return previous value, or null if none
 */
final V putVal(int hash, K key, V value, boolean onlyIfAbsent,
                boolean evict) {
    Node<K,V>[] tab; Node<K,V> p; int n, i;
    if ((tab = table) == null || (n = tab.length) == ) //如果原來的table為空則resize
        n = (tab = resize()).length;
    if ((p = tab[i = (n - ) & hash]) == null)      //對應的位置中沒有值，則直接放進去
        tab[i] = newNode(hash, key, value, null);
    else {
        Node<K,V> e; K k;
        if (p.hash == hash &&
            ((k = p.key) == key || (key != null && key.equals(k)))) //如果hash值相同、key相同（equals判斷）則進行更新操作
            e = p;
        else if (p instanceof TreeNode)
            e = ((TreeNode<K,V>)p).putTreeVal(this, tab, hash, key, value);
        else {              //hash沖突處理
            for (int binCount = ; ; ++binCount) {
                if ((e = p.next) == null) {
                    p.next = newNode(hash, key, value, null);   //如果為目前連結清單的最後一個元素則直接将新元素放到最後
                    if (binCount >= TREEIFY_THRESHOLD - ) // -1 for 1st    //超過轉為黑紅樹的上限時轉換為黑紅樹
                        treeifyBin(tab, hash);
                    break;
                }
                if (e.hash == hash &&
                    ((k = e.key) == key || (key != null && key.equals(k)))) //連結清單中有hash值相同、key相同（equals判斷）則進行更新操作
                    break;
                p = e;
            }
        }
        if (e != null) { // existing mapping for key
            V oldValue = e.value;
            if (!onlyIfAbsent || oldValue == null)
                e.value = value;
            afterNodeAccess(e);
            return oldValue;
        }
    }
    ++modCount;
    if (++size > threshold)
        resize();
    afterNodeInsertion(evict);
    return null;
}

這裡取得對象在table中索引的代碼就是

i = (n - 1) & hash

，這個實際上就是取模運算

n%hash

，這裡為什麼采用前一種方法，看一下說明

這個方法非常巧妙，它通過h & (table.length -1)來得到該對象的儲存位，而HashMap底層數組的長度總是2的n次方，這是HashMap在速度上的優化。當length總是2的n次方時，h& (length-1)運算等價于對length取模，也就是h%length，但是&比%具有更高的效率。

https://zhuanlan.zhihu.com/p/21673805

上面的源碼中當

table

為

null

或長度為

時需要重新初始化表格

n = (tab = resize()).length;

，看下

resize

方法：

/**
 * Initializes or doubles table size.  If null, allocates in
 * accord with initial capacity target held in field threshold. 初始化或者翻倍table大小，當原始table為空時使用threshold中存儲的值初始化table。
 * Otherwise, because we are using power-of-two expansion, the
 * elements from each bin must either stay at same index, or move
 * with a power of two offset in the new table.
 * 
 * @return the table
 */
final Node<K,V>[] resize() {
    Node<K,V>[] oldTab = table;
    int oldCap = (oldTab == null) ?  : oldTab.length;
    int oldThr = threshold;
    int newCap, newThr = ;
    if (oldCap > ) {
        if (oldCap >= MAXIMUM_CAPACITY) {   // 超過最大值就不再擴充了
            threshold = Integer.MAX_VALUE;
            return oldTab;
        }
        else if ((newCap = oldCap << ) < MAXIMUM_CAPACITY &&   //容量擴充到原來的兩倍
                    oldCap >= DEFAULT_INITIAL_CAPACITY)
            newThr = oldThr << ; // double threshold       // 臨界值擴充到原來的兩倍
    }
    else if (oldThr > ) //當原來的table長度為0時，并且指定了初始容量（HashMap(int initialCapacity, float loadFactor)），進行到這裡
        newCap = oldThr;
    else {               // 沒有指定初始容量時采用預設
        newCap = DEFAULT_INITIAL_CAPACITY;
        newThr = (int)(DEFAULT_LOAD_FACTOR * DEFAULT_INITIAL_CAPACITY);
    }
    if (newThr == ) {
        float ft = (float)newCap * loadFactor;
        newThr = (newCap < MAXIMUM_CAPACITY && ft < (float)MAXIMUM_CAPACITY ?
                    (int)ft : Integer.MAX_VALUE);
    }
    threshold = newThr;
    @SuppressWarnings({"rawtypes","unchecked"})
        Node<K,V>[] newTab = (Node<K,V>[])new Node[newCap];
    table = newTab;
    if (oldTab != null) {
        for (int j = ; j < oldCap; ++j) {  
            Node<K,V> e;
            if ((e = oldTab[j]) != null) {
                oldTab[j] = null;
                if (e.next == null)
                    newTab[e.hash & (newCap - )] = e;
                else if (e instanceof TreeNode)
                    ((TreeNode<K,V>)e).split(this, newTab, j, oldCap);
                else { // preserve order
                    Node<K,V> loHead = null, loTail = null;
                    Node<K,V> hiHead = null, hiTail = null;
                    Node<K,V> next;
                    do {
                        next = e.next;
                        if ((e.hash & oldCap) == ) {   //索引為低位的元素
                            if (loTail == null)
                                loHead = e;
                            else
                                loTail.next = e;
                            loTail = e;
                        }
                        else {                          //索引為高位的元素
                            if (hiTail == null)
                                hiHead = e;
                            else
                                hiTail.next = e;
                            hiTail = e;
                        }
                    } while ((e = next) != null);
                    if (loTail != null) {
                        loTail.next = null;
                        newTab[j] = loHead;
                    }
                    if (hiTail != null) {
                        hiTail.next = null;
                        newTab[j + oldCap] = hiHead;
                    }
                }
            }
        }
    }
    return newTab;
}

方法的文檔說明中有這樣一句話

Otherwise, because we are using power-of-two expansion, the elements from each bin must either stay at same index, or move with a power of two offset in the new table.

是說，因為使用的是2的幂數作為擴充的方式，這裡就出現了一種情況舊table中的元素對應（重新進行hash後）到新table中的索引位置必然在原位置或2的幂的位移（移動舊table的長度）

經過觀測可以發現，我們使用的是2次幂的擴充(指長度擴為原來2倍)，是以，元素的位置要麼是在原位置，要麼是在原位置再移動2次幂的位置。看下圖可以明白這句話的意思，n為table的長度，圖（a）表示擴容前的key1和key2兩種key确定索引位置的示例，圖（b）表示擴容後key1和key2兩種key确定索引位置的示例，其中hash1是key1對應的哈希與高位運算結果。

Jdk8 HashMap源碼閱讀
元素在重新計算hash之後，因為n變為2倍，那麼n-1的mask範圍在高位多1bit(紅色)，是以新的index就會發生這樣的變化：

Jdk8 HashMap源碼閱讀

是以，我們在擴充HashMap的時候，不需要像JDK1.7的實作那樣重新計算hash，隻需要看看原來的hash值新增的那個bit是1還是0就好了，是0的話索引沒變，是1的話索引變成“原索引+oldCap”。

https://zhuanlan.zhihu.com/p/21673805

put的總體流程可參考下圖

Jdk8 HashMap源碼閱讀
http://idiary.oss-cn-zhangjiakou.aliyuncs.com/images/2017-07-12–HashMap-%E6%BA%90%E7%A0%81%E9%98%85%E8%AF%BB/hashmap-put.jpg

參考

哈希沖突處理 http://www.idiary.cc/2017/07/11/Hash-%E5%86%B2%E7%AA%81%E5%A4%84%E7%90%86/
HashMap面試問題http://blog.csdn.net/song19890528/article/details/16891015
【集合架構】JDK1.8源碼分析之HashMap（一） http://www.cnblogs.com/leesf456/p/5242233.html
HashMap源碼分析（JDK1.8）- 你該知道的都在這裡了 http://blog.csdn.net/brycegao321/article/details/52527236
Java基礎系列之(三) - HashMap深度分析 http://www.cnblogs.com/wuhuangdi/p/4175991.html
HashMap源碼注解之靜态工具方法hash()、tableSizeFor()（四）
Java8系列之重新認識HashMap

Jdk8 HashMap源碼閱讀

目标

資料結構

HashMap 類總覽

函數詳解

構造函數

PUT

參考

繼續閱讀

關于Gradle配置的小結

Java小案例——随機數猜測随機數猜測

nginx location中斜線的位置的重要性

27 Best Free Eclipse Plug-ins for Java Developer to be ProductiveCode Quality PluginsText Editor PluginsDependency ManagementVersion Control Integration PluginsFramework Development Continuous Integration Related PluginsOther Utility Plugins

Java String.format方法的簡單使用

neo4j之cypher使用文檔

GitHub連夜封殺！這份阿裡 10W 字内部 Java 字面試手冊到底有多強？

spark/scala關于【資源檔案】加載方法概述外部檔案加載方案測試資源檔案打包入jar包中小結

mybatis_入門程式Mybatis入門

AOP程式設計_Android優雅權限架構(1)概念基礎，2021金三銀四前言正文大綱正文

Effective Java 8:通用程式設計

OOM三種類型

工廠模式-三種類型

【遞歸】高效率求2的n次幂

win10本地scala和spark安裝安裝scala安裝spark

scala (3) Function 和 Method