Jdk8 HashMap源码阅读

版本 Oracle 1.8

目标

理解了以下问题，本篇笔记的目的就达到了

1. 哈希基本原理？（答：散列表、hash碰撞、链表、红黑树）

2. hashmap查询的时间复杂度，影响因素和原理？（答：最好O（1），最差O（n），如果是红黑O（logn））

3. resize如何实现的，记住已经没有rehash了！！！（答：拉链entry根据高位bit散列到当前位置i和size+i位置）

4. get put 的过程。

数据结构

HashMap 基本就是哈希表，其底层实现就是围绕哈希表展开的，源码阅读也是在理解了哈希表的前提下进行为上策。

哈希表的理念就是Key值与存储位置有固定的对应关系，而这种关系需要通过某中函数来构造出来，这个函数就是哈希函数。

哈希函数有6种实现：

摘录自百度百科–哈希表

1. 直接寻址法：取关键字或关键字的某个线性函数值为散列地址。即H(key)=key或H(key) = a·key + b，其中a和b为常数（这种散列函数叫做自身函数）。若其中H(key）中已经有值了，就往下一个找，直到H>(key）中没有值了，就放进去。

2. 数字分析法：分析一组数据，比如一组员工的出生年月日，这时我们发现出生年月日的前几位数字大体相同，这样的话，出现冲突的几率就会很大，但是我们发现年月日的后几位表示月份和具体日期的数字差别很>大，如果用后面的数字来构成散列地址，则冲突的几率会明显降低。因此数字分析法就是找出数字的规律，尽可能利用这些数据来构造冲突几率较低的散列地址。

3. 平方取中法：当无法确定关键字中哪几位分布较均匀时，可以先求出关键字的平方值，然后按需要取平方值的中间几位作为哈希地址。这是因为：平方后中间几位和关键字中每一位都相关，故不同关键字会以较高的概率产生不同的哈希地址。

4. 折叠法：将关键字分割成位数相同的几部分，最后一部分位数可以不同，然后取这几部分的叠加和（去除进位）作为散列地址。数位叠加可以有移位叠加和间界叠加两种方法。移位叠加是将分割后的每一部分的最低位对齐，然后相加；间界叠加是从一端向另一端沿分割界来回折叠，然后对齐相加。

5. 随机数法：选择一随机函数，取关键字的随机值作为散列地址，通常用于关键字长度不同的场合。

6. 除留余数法：取关键字被某个不大于散列表表长m的数p除后所得的余数为散列地址。即 H(key) = key MOD p,p<=m。不仅可以对关键字直接取模，也可在折叠、平方取中等运算之后取模。对p的选择很重要，一般取素数或m，若p选的不好，容易产生同义词。

除留余数法最常用，也是Java

HashMap

使用的方法，有关哈希冲突参考另一篇笔记哈希冲突处理 http://www.idiary.cc/2017/07/11/Hash-%E5%86%B2%E7%AA%81%E5%A4%84%E7%90%86/。

其中链地址的处理方法正式

HashMap

处理冲突的基本方法。

下图是经典的hashtable结构,JDK 1.8 中链表也可以由红黑树代替。

Jdk8 HashMap源码阅读

1.8中的结构(参考：http://www.cnblogs.com/leesf456/p/5242233.html)

Jdk8 HashMap源码阅读

HashMap 类总览

首先看两段注释文档

/** 
 * This map usually acts as a binned (bucketed) hash table, but
 * when bins get too large, they are transformed into bins of
 * TreeNodes, each structured similarly to those in
 * java.util.TreeMap. Most methods try to use normal bins, but
 * relay to TreeNode methods when applicable (simply by checking
 * instanceof a node).  Bins of TreeNodes may be traversed and
 * used like any others, but additionally support faster lookup
 * when overpopulated. However, since the vast majority of bins in
 * normal use are not overpopulated, checking for existence of
 * tree bins may be delayed in the course of table methods.
 */

其中

but when bins get too large, they are transformed into bins of TreeNodes

这句说明了当bin过大时就会采用红黑树进行存储。

/**
 * The bin count threshold for using a tree rather than list for a
 * bin.  Bins are converted to trees when adding an element to a
 * bin with at least this many nodes. The value must be greater
 * than 2 and should be at least 8 to mesh with assumptions in
 * tree removal about conversion back to plain bins upon
 * shrinkage.
 */
static final int TREEIFY_THRESHOLD = ;
/**
 * The bin count threshold for untreeifying a (split) bin during a
 * resize operation. Should be less than TREEIFY_THRESHOLD, and at
 * most 6 to mesh with shrinkage detection under removal.
 */
static final int UNTREEIFY_THRESHOLD = ;
/**
 * The smallest table capacity for which bins may be treeified.
 * (Otherwise the table is resized if too many nodes in a bin.)
 * Should be at least 4 * TREEIFY_THRESHOLD to avoid conflicts
 * between resizing and treeification thresholds.
 */
static final int MIN_TREEIFY_CAPACITY = ;

上面这段指明了链表在大于8的情况下会使用红黑树，在小于6的情况下重新使用链表和转化为红黑树对应的table的最小大小（64）。

HashMap 类图

Jdk8 HashMap源码阅读

类属性说明

public class HashMap<K,V> extends AbstractMap<K,V> implements Map<K,V>, Cloneable, Serializable {
    // 默认的初始容量是16，必须是2的幂
    static final int DEFAULT_INITIAL_CAPACITY =  << ;   
    // 最大容量
    static final int MAXIMUM_CAPACITY =  << ; 
    // 默认的填充因子
    static final float DEFAULT_LOAD_FACTOR = f;
    // 当桶(bucket)上的结点数大于这个值时会转成红黑树
    static final int TREEIFY_THRESHOLD = ; 
    // 当桶(bucket)上的结点数小于这个值时树转链表
    static final int UNTREEIFY_THRESHOLD = ;
    // 桶中结构转化为红黑树对应的table的最小大小
    static final int MIN_TREEIFY_CAPACITY = ;
    // 存储元素的数组，总是2的幂
    transient Node<k,v>[] table; 
    // 存放具体元素的集
    transient Set<map.entry<k,v>> entrySet;
    // 存放元素的个数，注意这个不等于数组的长度。
    transient int size;
    // 每次扩容和更改map结构的计数器
    transient int modCount;   
    // 临界值 当实际大小(容量*填充因子)超过临界值时，会进行扩容
    int threshold;
    // 填充因子
    final float loadFactor;
}

函数详解

构造函数

HashMap一共提供了四个构造函数。

/**
 * Constructs an empty <tt>HashMap</tt> with the default initial capacity
 * (16) and the default load factor (0.75).
 */
public HashMap() {
    this.loadFactor = DEFAULT_LOAD_FACTOR; // all other fields defaulted
}

/**
  * Constructs an empty <tt>HashMap</tt> with the specified initial
  * capacity and the default load factor (0.75).
  *
  * @param  initialCapacity the initial capacity.
  * @throws IllegalArgumentException if the initial capacity is negative.
  */
public HashMap(int initialCapacity) {
    this(initialCapacity, DEFAULT_LOAD_FACTOR);
}

/**
  * Constructs an empty <tt>HashMap</tt> with the specified initial
  * capacity and load factor.
  *
  * @param  initialCapacity the initial capacity
  * @param  loadFactor      the load factor
  * @throws IllegalArgumentException if the initial capacity is negative
  *         or the load factor is nonpositive
  */
public HashMap(int initialCapacity, float loadFactor) {
    if (initialCapacity < )
        throw new IllegalArgumentException("Illegal initial capacity: " +
                                            initialCapacity);
    if (initialCapacity > MAXIMUM_CAPACITY)
        initialCapacity = MAXIMUM_CAPACITY;
    if (loadFactor <=  || Float.isNaN(loadFactor))
        throw new IllegalArgumentException("Illegal load factor: " +
                                            loadFactor);
    this.loadFactor = loadFactor;
    this.threshold = tableSizeFor(initialCapacity);
}

/**
  * Constructs a new <tt>HashMap</tt> with the same mappings as the
  * specified <tt>Map</tt>.  The <tt>HashMap</tt> is created with
  * default load factor (0.75) and an initial capacity sufficient to
  * hold the mappings in the specified <tt>Map</tt>.
  *
  * @param   m the map whose mappings are to be placed in this map
  * @throws  NullPointerException if the specified map is null
  */
public HashMap(Map<? extends K, ? extends V> m) {
    this.loadFactor = DEFAULT_LOAD_FACTOR;
    putMapEntries(m, false);
}

这里着重看下

HashMap(int initialCapacity, float loadFactor)

函数中的

tableSizeFor(initialCapacity)

。注释翻译过来的意思就是返回大于

cap

的最小二的幂数，这个函数的目的就是通过无符号右移动操作先取到比最小幂小1的数（后面连续几个零的二进制的数）。至于为何在开始的时候减一是因为在假设

cap

已经是二的幂数了，这个时候就是得到

cap

2倍的幂，更好的解释参见HashMap源码注解之静态工具方法hash()、tableSizeFor()（四）。

/**
 * Returns a power of two size for the given target capacity.
 */
static final int tableSizeFor(int cap) {
    int n = cap - ;
    n |= n >>> ;
    n |= n >>> ;
    n |= n >>> ;
    n |= n >>> ;
    n |= n >>> ;
    return (n < ) ?  : (n >= MAXIMUM_CAPACITY) ? MAXIMUM_CAPACITY : n + ;
}

最后一个构造函数是把一个已有的Map对象最为参数，其中主要是调用

putMapEntries

方法，这个函数中最后是调用了

putVal

,这个方法在下面介绍。

/**
 * Implements Map.putAll and Map constructor
 *
 * @param m the map
 * @param evict false when initially constructing this map, else
 * true (relayed to method afterNodeInsertion).
 */
final void putMapEntries(Map<? extends K, ? extends V> m, boolean evict) {
    int s = m.size();
    if (s > ) {
        if (table == null) { // pre-size
            float ft = ((float)s / loadFactor) + F;
            int t = ((ft < (float)MAXIMUM_CAPACITY) ?
                     (int)ft : MAXIMUM_CAPACITY);
            if (t > threshold)
                threshold = tableSizeFor(t);
        }
        else if (s > threshold)
            resize();
        for (Map.Entry<? extends K, ? extends V> e : m.entrySet()) {
            K key = e.getKey();
            V value = e.getValue();
            putVal(hash(key), key, value, false, evict);
        }
    }
}

PUT

有了对象，就要开始往里面放数据，现在就看下

put

这个方法。

/**
 * Associates the specified value with the specified key in this map.
 * If the map previously contained a mapping for the key, the old
 * value is replaced.
 *
 * @param key key with which the specified value is to be associated
 * @param value value to be associated with the specified key
 * @return the previous value associated with <tt>key</tt>, or
 *         <tt>null</tt> if there was no mapping for <tt>key</tt>.
 *         (A <tt>null</tt> return can also indicate that the map
 *         previously associated <tt>null</tt> with <tt>key</tt>.)
 */
public V put(K key, V value) {
    return putVal(hash(key), key, value, false, true);
}

这个方法只是

putVal

方法的封装

/**
 * Implements Map.put and related methods
 *
 * @param hash hash for key
 * @param key the key
 * @param value the value to put
 * @param onlyIfAbsent if true, don't change existing value
 * @param evict if false, the table is in creation mode.
 * @return previous value, or null if none
 */
final V putVal(int hash, K key, V value, boolean onlyIfAbsent,
                boolean evict) {
    Node<K,V>[] tab; Node<K,V> p; int n, i;
    if ((tab = table) == null || (n = tab.length) == ) //如果原来的table为空则resize
        n = (tab = resize()).length;
    if ((p = tab[i = (n - ) & hash]) == null)      //对应的位置中没有值，则直接放进去
        tab[i] = newNode(hash, key, value, null);
    else {
        Node<K,V> e; K k;
        if (p.hash == hash &&
            ((k = p.key) == key || (key != null && key.equals(k)))) //如果hash值相同、key相同（equals判断）则进行更新操作
            e = p;
        else if (p instanceof TreeNode)
            e = ((TreeNode<K,V>)p).putTreeVal(this, tab, hash, key, value);
        else {              //hash冲突处理
            for (int binCount = ; ; ++binCount) {
                if ((e = p.next) == null) {
                    p.next = newNode(hash, key, value, null);   //如果为当前链表的最后一个元素则直接将新元素放到最后
                    if (binCount >= TREEIFY_THRESHOLD - ) // -1 for 1st    //超过转为黑红树的上限时转换为黑红树
                        treeifyBin(tab, hash);
                    break;
                }
                if (e.hash == hash &&
                    ((k = e.key) == key || (key != null && key.equals(k)))) //链表中有hash值相同、key相同（equals判断）则进行更新操作
                    break;
                p = e;
            }
        }
        if (e != null) { // existing mapping for key
            V oldValue = e.value;
            if (!onlyIfAbsent || oldValue == null)
                e.value = value;
            afterNodeAccess(e);
            return oldValue;
        }
    }
    ++modCount;
    if (++size > threshold)
        resize();
    afterNodeInsertion(evict);
    return null;
}

这里取得对象在table中索引的代码就是

i = (n - 1) & hash

，这个实际上就是取模运算

n%hash

，这里为什么采用前一种方法，看一下说明

这个方法非常巧妙，它通过h & (table.length -1)来得到该对象的保存位，而HashMap底层数组的长度总是2的n次方，这是HashMap在速度上的优化。当length总是2的n次方时，h& (length-1)运算等价于对length取模，也就是h%length，但是&比%具有更高的效率。

https://zhuanlan.zhihu.com/p/21673805

上面的源码中当

table

为

null

或长度为

时需要重新初始化表格

n = (tab = resize()).length;

，看下

resize

方法：

/**
 * Initializes or doubles table size.  If null, allocates in
 * accord with initial capacity target held in field threshold. 初始化或者翻倍table大小，当原始table为空时使用threshold中存储的值初始化table。
 * Otherwise, because we are using power-of-two expansion, the
 * elements from each bin must either stay at same index, or move
 * with a power of two offset in the new table.
 * 
 * @return the table
 */
final Node<K,V>[] resize() {
    Node<K,V>[] oldTab = table;
    int oldCap = (oldTab == null) ?  : oldTab.length;
    int oldThr = threshold;
    int newCap, newThr = ;
    if (oldCap > ) {
        if (oldCap >= MAXIMUM_CAPACITY) {   // 超过最大值就不再扩充了
            threshold = Integer.MAX_VALUE;
            return oldTab;
        }
        else if ((newCap = oldCap << ) < MAXIMUM_CAPACITY &&   //容量扩展到原来的两倍
                    oldCap >= DEFAULT_INITIAL_CAPACITY)
            newThr = oldThr << ; // double threshold       // 临界值扩展到原来的两倍
    }
    else if (oldThr > ) //当原来的table长度为0时，并且指定了初始容量（HashMap(int initialCapacity, float loadFactor)），进行到这里
        newCap = oldThr;
    else {               // 没有指定初始容量时采用默认
        newCap = DEFAULT_INITIAL_CAPACITY;
        newThr = (int)(DEFAULT_LOAD_FACTOR * DEFAULT_INITIAL_CAPACITY);
    }
    if (newThr == ) {
        float ft = (float)newCap * loadFactor;
        newThr = (newCap < MAXIMUM_CAPACITY && ft < (float)MAXIMUM_CAPACITY ?
                    (int)ft : Integer.MAX_VALUE);
    }
    threshold = newThr;
    @SuppressWarnings({"rawtypes","unchecked"})
        Node<K,V>[] newTab = (Node<K,V>[])new Node[newCap];
    table = newTab;
    if (oldTab != null) {
        for (int j = ; j < oldCap; ++j) {  
            Node<K,V> e;
            if ((e = oldTab[j]) != null) {
                oldTab[j] = null;
                if (e.next == null)
                    newTab[e.hash & (newCap - )] = e;
                else if (e instanceof TreeNode)
                    ((TreeNode<K,V>)e).split(this, newTab, j, oldCap);
                else { // preserve order
                    Node<K,V> loHead = null, loTail = null;
                    Node<K,V> hiHead = null, hiTail = null;
                    Node<K,V> next;
                    do {
                        next = e.next;
                        if ((e.hash & oldCap) == ) {   //索引为低位的元素
                            if (loTail == null)
                                loHead = e;
                            else
                                loTail.next = e;
                            loTail = e;
                        }
                        else {                          //索引为高位的元素
                            if (hiTail == null)
                                hiHead = e;
                            else
                                hiTail.next = e;
                            hiTail = e;
                        }
                    } while ((e = next) != null);
                    if (loTail != null) {
                        loTail.next = null;
                        newTab[j] = loHead;
                    }
                    if (hiTail != null) {
                        hiTail.next = null;
                        newTab[j + oldCap] = hiHead;
                    }
                }
            }
        }
    }
    return newTab;
}

方法的文档说明中有这样一句话

Otherwise, because we are using power-of-two expansion, the elements from each bin must either stay at same index, or move with a power of two offset in the new table.

是说，因为使用的是2的幂数作为扩展的方式，这里就出现了一种情况旧table中的元素对应（重新进行hash后）到新table中的索引位置必然在原位置或2的幂的位移（移动旧table的长度）

经过观测可以发现，我们使用的是2次幂的扩展(指长度扩为原来2倍)，所以，元素的位置要么是在原位置，要么是在原位置再移动2次幂的位置。看下图可以明白这句话的意思，n为table的长度，图（a）表示扩容前的key1和key2两种key确定索引位置的示例，图（b）表示扩容后key1和key2两种key确定索引位置的示例，其中hash1是key1对应的哈希与高位运算结果。

Jdk8 HashMap源码阅读
元素在重新计算hash之后，因为n变为2倍，那么n-1的mask范围在高位多1bit(红色)，因此新的index就会发生这样的变化：

Jdk8 HashMap源码阅读

因此，我们在扩充HashMap的时候，不需要像JDK1.7的实现那样重新计算hash，只需要看看原来的hash值新增的那个bit是1还是0就好了，是0的话索引没变，是1的话索引变成“原索引+oldCap”。

https://zhuanlan.zhihu.com/p/21673805

put的总体流程可参考下图

Jdk8 HashMap源码阅读
http://idiary.oss-cn-zhangjiakou.aliyuncs.com/images/2017-07-12–HashMap-%E6%BA%90%E7%A0%81%E9%98%85%E8%AF%BB/hashmap-put.jpg

参考

哈希冲突处理 http://www.idiary.cc/2017/07/11/Hash-%E5%86%B2%E7%AA%81%E5%A4%84%E7%90%86/
HashMap面试问题http://blog.csdn.net/song19890528/article/details/16891015
【集合框架】JDK1.8源码分析之HashMap（一） http://www.cnblogs.com/leesf456/p/5242233.html
HashMap源码分析（JDK1.8）- 你该知道的都在这里了 http://blog.csdn.net/brycegao321/article/details/52527236
Java基础系列之(三) - HashMap深度分析 http://www.cnblogs.com/wuhuangdi/p/4175991.html
HashMap源码注解之静态工具方法hash()、tableSizeFor()（四）
Java8系列之重新认识HashMap

Jdk8 HashMap源码阅读

目标

数据结构

HashMap 类总览

函数详解

构造函数

PUT

参考

继续阅读

关于Gradle配置的小结

Java小案例——随机数猜测随机数猜测

nginx location中斜线的位置的重要性

27 Best Free Eclipse Plug-ins for Java Developer to be ProductiveCode Quality PluginsText Editor PluginsDependency ManagementVersion Control Integration PluginsFramework Development Continuous Integration Related PluginsOther Utility Plugins

Java String.format方法的简单使用

neo4j之cypher使用文档

GitHub连夜封杀！这份阿里 10W 字内部 Java 字面试手册到底有多强？

spark/scala关于【资源文件】加载方法概述外部文件加载方案测试资源文件打包入jar包中小结

mybatis_入门程序Mybatis入门

AOP编程_Android优雅权限框架(1)概念基础，2021金三银四前言正文大纲正文

Effective Java 8:通用程序设计

OOM三种类型

工厂模式-三种类型

【递归】高效率求2的n次幂

win10本地scala和spark安装安装scala安装spark

scala (3) Function 和 Method