22.源碼閱讀（jdk1.6 HashMap源碼和原理分析）

HashMap 底層采用數組 + 連結清單的的實作方式來降低資料插入和查詢的時間複雜度,理想狀态下可以實作時間複雜度位O(1)，今天就從源碼的角度看一下它是如何實作的。我們從它的兩個關鍵方法put和get入手。

put方法

public V put(K key, V value) {
        if (key == null)
            return putForNullKey(value);
        int hash = hash(key.hashCode());
        int i = indexFor(hash, table.length);
        //數組中i位置存在Entry對象
        for (Entry<K,V> e = table[i]; e != null; e = e.next) {
            Object k;
            if (e.hash == hash && ((k = e.key) == key || key.equals(k))) {
                V oldValue = e.value;
                e.value = value;
                e.recordAccess(this);
                return oldValue;
            }
        }
        //不存在則建立一個新的Entry加入數組，并将計數器加1
        modCount++;
        addEntry(hash, key, value, i);
        return null;
    }

這裡有兩個關鍵方法hash和indexFor，hash()方法的作用是對key的hashCode值進行一系列的位運算，看下邊的源碼

/**
     * Applies a supplemental hash function to a given hashCode, which
     * defends against poor quality hash functions.  This is critical
     * because HashMap uses power-of-two length hash tables, that
     * otherwise encounter collisions for hashCodes that do not differ
     * in lower bits. Note: Null keys always map to hash 0, thus index 0.
     * 注釋的意思大概是說下邊代碼是對給定的哈希值應用補充散列函數
     * 我們隻需要知道一點，經過這些位運算之後，hash值的散列效果會
     * 得到優化即可
     */
    static int hash(int h) {
        // This function ensures that hashCodes that differ only by
        // constant multiples at each bit position have a bounded
        // number of collisions (approximately 8 at default load factor).
        h ^= (h >>> 20) ^ (h >>> 12);
        return h ^ (h >>> 7) ^ (h >>> 4);
    }

再看indexFor方法,将得到的hash值與hashMap中數組的長度-1的值進行與運算，得到的就是這個元素将會被放置在數組中的哪個位置的index，并且絕不用擔心發生數組越界之類的問題，原因請自行發現

/**
     * Returns index for hash code h.
     */
    static int indexFor(int h, int length) {
        return h & (length-1);
    }

我們再回到上邊的put方法中，擷取到目前元素将要插入的index後，會從目前的這個内置的數組中查找這個index位置是否已經存放了Entry，如果有則變量這個index位置的連結清單，判斷是否有相同key值的元素存在（目前的這個數組指的就是在new HashMap的時候預設建立出來的一個Entry數組，它有一個預設長度，此時在沒有進行put操作的時候，數組中元素都是null）如果此時是第一次put，那麼數組中是空的，直接繞過這個for循環往下執行。如果進行put操作的時候，數組中存這個index，那麼會進入這個for循環，拿到i這個位置的單連結清單，周遊連結清單中所有的Entry,如果有相同key的元素，則找到它，用新的value值替換舊value，并傳回舊value

for (Entry<K,V> e = table[i]; e != null; e = e.next) {
            Object k;
            if (e.hash == hash && ((k = e.key) == key || key.equals(k))) {
                V oldValue = e.value;
                e.value = value;
                e.recordAccess(this);
                return oldValue;
            }
        }

繼續往下，如果不存在，那麼建立新的Entry對象

addEntry(hash, key, value, i)

hash：key值計算得到的hash值

i: 經過hash和indexFor方法計算得到的被put的key，value即将被放入數組中的位置

/**
     * Adds a new entry with the specified key, value and hash code to
     * the specified bucket.  It is the responsibility of this
     * method to resize the table if appropriate.
     *
     * Subclass overrides this to alter the behavior of put method.
     */
    void addEntry(int hash, K key, V value, int bucketIndex) {
        //拿到數組中bucketIndex位置的Entry對象，第一次執行，他是null
    Entry<K,V> e = table[bucketIndex];
        //new了一個Entry對象放在這個bucketIndex位置上
        table[bucketIndex] = new Entry<K,V>(hash, key, value, e);
        //判斷是否已經達到了需要擴容的條件，如果是則對HashMap進行擴容
        if (size++ >= threshold)
            resize(2 * table.length);
    }

Entry構造方法中第四個參數很重要，我們知道HashMap是基于數組和單連結清單的，數組就是指的Entry數組，而單連結清單呢？就在Entry對象的這個next屬性上

static class Entry<K,V> implements Map.Entry<K,V> {
        final K key;
        V value;
        Entry<K,V> next;
        final int hash;
}

添加一個Entry到數組中某一位置後，會同時給它指定它的next，這個next就是它所指向的下一個Entry元素，這樣，每次添加一個Entry都會把這個Entry加入到這個連結清單的第一個位置，它的next就是原來在第一個位置的Entry對象，這樣就連成了一個鍊子

添加了第一個Entry到數組中某一index後，它的next指向的是一個null，因為在沒有添加的時候，這個index的位置存放的就是null值

此時我們的數組中添加了第一個Entry，此時會判斷目前數組的長度是否已經達到了極值（每次添加都會進行一次判斷），如果是則對數組進行擴容，執行 resize(2 * table.length)可以看到，每次會擴容一倍的長度

/**
     * Rehashes the contents of this map into a new array with a
     * larger capacity.  This method is called automatically when the
     * number of keys in this map reaches its threshold.
     *
     * If current capacity is MAXIMUM_CAPACITY, this method does not
     * resize the map, but sets threshold to Integer.MAX_VALUE.
     * This has the effect of preventing future calls.
     *
     * @param newCapacity the new capacity, MUST be a power of two;
     *        must be greater than current capacity unless current
     *        capacity is MAXIMUM_CAPACITY (in which case value
     *        is irrelevant).
     */
    void resize(int newCapacity) {
        Entry[] oldTable = table;
        int oldCapacity = oldTable.length;
        //判斷目前數組的長度是否已經達到了設定的MAXIMUM_CAPACITY
        //如果是則将threshold （達到這個值就滿足擴容的條件），設定為int的最大值
        //那麼此時hashMap停止擴容，可見，hashMap容量是有極限的
        if (oldCapacity == MAXIMUM_CAPACITY) {
            threshold = Integer.MAX_VALUE;
            return;
        }
        //建立一個新的Entry數組
        Entry[] newTable = new Entry[newCapacity];
        //将舊的數組中的值經過轉換放入新的數組中
        transfer(newTable);
        //使用擴容後的數組
        table = newTable;
        //擴容門檻值更新
        threshold = (int)(newCapacity * loadFactor);
    }

HashMap擴容的原理也很簡單，因為它是基于數組實作的，是以所謂的擴容并不是将原有的數組長度擴大，而是建立了一個新的數組，這個數組的長度等于擴容後的長度，然後将舊數組中的元素轉移到新數組中，舊的數組廢棄，這個轉移的過程仍然是經過了一系列的變換的，我們知道之前的數組中每一個put進來的元素的index都是經過一系列的位運算得到的，那麼這裡在進行轉移的時候同樣要再次進行運算得到新的index，因為這個index是基于hash值和數組長度生成的，hash值不會改變但是數組長度經過擴容後是不同了，是以要重新計算，否則會導緻查詢資料失敗

/**
     * Transfers all entries from current table to newTable.
     */
    void transfer(Entry[] newTable) {
        Entry[] src = table;
        int newCapacity = newTable.length;
        //周遊舊的數組中所有的Entry對象
        for (int j = 0; j < src.length; j++) {
            Entry<K,V> e = src[j];
            //拿到index位置的Entry後，如果這個Entry不為null，則循環周遊這個連結清單
            if (e != null) {
                src[j] = null;
                do {
                    //拿到連結清單上每一個Entry
                    Entry<K,V> next = e.next;
                    //計算這個Entry在擴容後數組中的index值
                    int i = indexFor(e.hash, newCapacity);
                    //設定這個Entry的next值
                    e.next = newTable[i];
                    //把這個Entry放在計算得到的index位置上
                    newTable[i] = e;
                    //指派擷取下一個Entry，直到這個連結清單周遊完成
                    e = next;
                } while (e != null);
            }
        }
    }

源碼是這樣的先後順序放置的，如果你把順序調換一下會更好了解。這裡我們先調整一下代碼的順序如下（不影響結果，但是更易了解）。大概過程就是假設我第一個找到了index=1的位置的這個單連結清單（這個連結清單可能隻有一個元素，也可能有多個），我會擷取到這個位置的第一個Entry,計算得到它在新數組中的index,然後把這個Entry放在這個數組的index位置上，将它的next指定為原本這個位置的值，其實就是null,這是第一步，第二步就是擷取到第一個Entry的next值，也就是連結清單中第二個位置的Entry，把這個Entry重新指定為e，重複上邊的操作，這樣一整個過程下來，舊的數組中的所有元素都被正确的放置在了新數組中的正确位置上，達到了擴容的效果

do {
        //計算這個Entry在擴容後數組中的index值
        int i = indexFor(e.hash, newCapacity);
        //把這個Entry放在計算得到的index位置上
        newTable[i] = e;
       //設定這個Entry的next值
        e.next = newTable[i];

        //拿到連結清單上每一個Entry
        Entry<K,V> next = e.next;
        //指派擷取下一個Entry，直到這個連結清單周遊完成
        e = next;
  } while (e != null);

get方法

put看完之後get就簡單多了。根據傳入的key值，進行hash和indexFor運算，得到key在數組中的index，然後拿到數組中index位置的entry對象，再從這個位置的單連結清單中查找key值相同的value傳回

/**
     * Returns the value to which the specified key is mapped,
     * or {@code null} if this map contains no mapping for the key.
     *
     * <p>More formally, if this map contains a mapping from a key
     * {@code k} to a value {@code v} such that {@code (key==null ? k==null :
     * key.equals(k))}, then this method returns {@code v}; otherwise
     * it returns {@code null}.  (There can be at most one such mapping.)
     *
     * <p>A return value of {@code null} does not <i>necessarily</i>
     * indicate that the map contains no mapping for the key; it's also
     * possible that the map explicitly maps the key to {@code null}.
     * The {@link #containsKey containsKey} operation may be used to
     * distinguish these two cases.
     *
     * @see #put(Object, Object)
     */
    public V get(Object key) {
        if (key == null)
            return getForNullKey();
        int hash = hash(key.hashCode());
        for (Entry<K,V> e = table[indexFor(hash, table.length)];
             e != null;
             e = e.next) {
            Object k;
            if (e.hash == hash && ((k = e.key) == key || key.equals(k)))
                return e.value;
        }
        return null;
    }

總結一下：

1.HashMap底層基于數組和單連結清單

2.hash()和indexFor()方法作用很關鍵

3.如果你去看看native曾hashCode值擷取的方式，你會得到兩個結論(hashCode值并不是簡單的位址值，而是位址值進行右移16位之後的一個值，參閱ndk底層c++代碼實作)：

兩個不同的對象 hashCode 值可能會相等，hashCode 值不相等的兩個對象肯定不是同一對象。

附上手寫的HashMap代碼

public class MyHashMap<K,V> {
    
    public MapEntry<K,V>[] table;
    
    int size;
    
    /**
     * 擴容門檻值,達到這個值時就要擴容
     */
    int threshold = 8;
    
    /**
     * 擴容因子
     */
    final float loadFactor = 0.75f; 
    
    public class MapEntry<K,V>{
        private K key;
        private V value;
        MapEntry<K,V> next;
        int hash;
        
        public MapEntry(int hash,K key,V value,MapEntry<K,V> next) {
            this.key = key;
            this.value = value;
            this.hash =hash;
            this.next = next;
        }
    }

    public V put(K key,V value) {
        if(table == null) {
            table = new MapEntry[8];
        }
        
        //判斷key為null
        if(key == null) {
            return null;
        }
        
        int hash = hash(key);
        int index = getIndex(hash,table.length);
        System.out.println("key = "+key+" hash="+hash+" index="+ index);
        //判斷這個key是否存在
        for (MapEntry<K,V> e = table[index]; e!=null;e = e.next) {
            Object k = null;
            if(e.hash == hash && ((k = e.key) == key) || (key.equals(k))){
                V oldV = e.value;
                e.value = value;
                return oldV;
            }
        }
        
        //添加新的
        addEntry(hash,key,value,index);
        return null;
    }
    
    private void addEntry(int hash, K key, V value, int index) {
        //hash值相等，兩個對象不一定相等，兩個對象不相等，hash值有可能相等
        //hashCode值 從native看，mask_bits(value()>> hash_shift,hash_mask))
        //是位址右移16位的結果
        if(size >= threshold && table[index] != null) {
            //右移一位，擴容一倍
            resize(size << 1);
            //重新計算index
            index = getIndex(hash,table.length);
        }
        //添加
        createEntry(hash,key,value,index);
        size++;
    }

    private void createEntry(int hash, K key, V value, int index) {
        MapEntry<K, V> newEntry = new MapEntry<K, V>(hash, key, value, table[index]);
        table[index] = newEntry;
    }

    private void resize(int newCapacity) {
        MapEntry<K, V> [] newTable = new MapEntry[newCapacity];
        //将原來數組中的内容經過轉換之後放入新數組
        transform(newTable);
        table = newTable;
        threshold = (int) (newCapacity*loadFactor);
        System.out.println("擴容之後：newCapacity="+newCapacity+" threshold="+threshold);
    }

    /**
     * 重新計算散列
     * @param newTable
     */
    private void transform(MyHashMap<K, V>.MapEntry<K, V>[] newTable) {
        int newCapacity = newTable.length;
        for(MapEntry<K,V> e:table) {
            while(null != e) {
                MapEntry<K, V> next = e.next;
                int index = getIndex(e.hash,newCapacity);
                //把原來數組中的e放在新的數組中index位置
                newTable[index] = e;
                //将這個e插入到index位置的第一個，将原來index位置的e指定給e的next
                e.next =  newTable[index];
                //把原來數組中某index位置這條連結清單上的每一個元素都重新歸位
                e = next;
            }
        }
    }

    /**
     * 通過hash值找到table的index
     * @param hash
     * @return
     * 
     * & : 兩位同時為“1”，結果才為“1”，否則為0
     */ 
    private int getIndex(int hash,int length) {
        return hash & length-1;
    }

    /**
     * ^ : 就是相同為0不同為1
     * @param key
     * @return
     */
    private int hash(K key) {
        int h = 0;
        h = key.hashCode();
        int h16 = h>>>16;
        return (key == null)?0:(h^(h16));
    }

    public V get(K key) {
        if(key == null) {
            return null;
        }
        MapEntry<K, V> entry = getEntry(key);
        return entry == null?null:entry.value;
    }

    private MyHashMap<K, V>.MapEntry<K, V> getEntry(K key) {
        int hash = hash(key);
        int index = getIndex(hash,table.length);
        
        for (MapEntry<K,V> e = table[index]; e!=null;e = e.next) {
            Object k = null;
            if(e.hash == hash && ((k = e.key) == key) || (key.equals(k))){
                return e;
            }
        }
        return null;
    }

    public int getSize() {
        // TODO Auto-generated method stub
        return size;
    }
}

22.源碼閱讀（jdk1.6 HashMap源碼和原理分析）

put方法

get方法

繼續閱讀

C語言第四章自述2第四章選擇結構程式設計

面試題:vector和map的差別，異同。空間分布，100萬資料存哪個比較合适。一、疊代器差別二、vector三、Map、Set四、vector_map 為什麼比map效率高五、如何選擇六、容器選擇原則七、效率對比

C++ 多線程用條件變量确定線程的執行順序而不是使用 sleep(1)

POJ 1284 Primitive Roots (歐拉函數&原根定理)

CQ V1.0分詞bates(基于雙數組tire樹)—應該是目前最快的中文分詞算法

成員函數初始化清單

2021-08-13c++——類之操作符重載

swmm與lisflood-fp源碼如何一起編譯 CMake指令

Windows下VS開發環境環境安裝工程項目設定關于Debug和Release的提示

一文看懂字元串的加減乘除

C++ 第十五周報告1--《冒泡法排序》

C++實作簡單順序表

C經典書籍筆記——C陷阱與缺陷②(文法陷阱之優先級)一、錯誤案列二、優先級規律

線性表之順序表的實作

C++判斷素數、求最大公約數代碼判斷一個數是否為素數求兩個數的最大公約數

SequoiaDB巨杉資料庫C++驅動概述