快速存取

值存取

在 HashMap 中，我们最常用的两个操作就是：put(Key,Value) 和 get(Key)。HashMap 中的 Key 是唯一的，那它是如何保证唯一性的呢？最简单的就是用 equals 比较，但随着元素的增多，put 和 get 的效率将越来越低，这里的时间复杂度是 O(n)。实际上，HashMap 很少会用到 equals 方法，因为其内通过一个哈希表管理所有元素，利用哈希算法可以快速的存取元素。当我们调用 put 方法存值时，HashMap 首先会调用 Key 的 hashCode 方法，然后基于此获取 Key 哈希码，通过哈希码快速找到某个桶，这个位置可以被称之为 bucketIndex。

理论上，hashCode 可能存在碰撞的情况，当碰撞发生时，这时会取出 bucketIndex 桶内已存储的元素，并通过 hashCode() 和 equals() 来逐个比较以判断 Key 是否已存在。如果已存在，则使用新 Value 值替换旧 Value 值，并返回旧 Value 值；如果不存在，则存放新的键值对 <Key, Value> 到桶中。因此，在 HashMap 中，equals() 方法只有在哈希码碰撞时才会被用到。

put

下面这段代码是 java.util.HashMap 的中 put 方法的具体实现：

/**
    * Associates the specified value with the specified key in this map.
    * If the map previously contained a mapping for the key, the old
    * value is replaced.
    *
    * @param key key with which the specified value is to be associated
    * @param value value to be associated with the specified key
    * @return the previous value associated with key, or null if there was no mapping for key.
    *  Note that a null return can also indicate that the map previously associated null with key.
    */
public V put(K key, V value) {

    //当key为null时，调用putForNullKey方法，并将该键值对保存到table的第一个位置
    if (key == null)
        return putForNullKey(value);

    //根据key的hashCode计算hash值
    int hash = hash(key.hashCode());             //  ------- (1)

    //计算该键值对在数组中的存储位置（哪个桶）
    int i = indexFor(hash, table.length);              // ------- (2)

    //在table的第i个桶上进行迭代，寻找 key 保存的位置
    for (Entry<K,V> e = table[i]; e != null; e = e.next) {      // ------- (3)
        Object k;
        //判断该条链上是否存在hash值相同且key值相等的映射，若存在，则直接覆盖 value，并返回旧value
        if (e.hash == hash && ((k = e.key) == key || key.equals(k))) {
            V oldValue = e.value;
            e.value = value;
            e.recordAccess(this);
            return oldValue;    // 返回旧值
        }
    }

    modCount++; //修改次数增加1，快速失败机制

    //原HashMap中无该映射，将该添加至该链的链头
    addEntry(hash, key, value, i);
    return null;
}

通过上述源码我们可以清楚了解到 HashMap 保存数据的过程。首先，判断 key 是否为 null，若为 null，则直接调用 putForNullKey 方法；若不为空，则先计算 key 的 hash 值，然后根据 hash 值搜索在 table 数组中的索引位置，如果 table 数组在该位置处有元素，则查找是否存在相同的 key，若存在则覆盖原来 key 的 value，否则将该元素保存在链头（最先保存的元素放在链尾）。此外，若 table 在该处没有元素，则直接保存。

源码中的 (3) 处，此处迭代原因就是为了防止存在相同的 key 值。如果发现两个 hash 值（key）相同时，HashMap 的处理方式是用新 value 替换旧 value，这里并没有处理 key，这正好解释了 HashMap 中没有两个相同的 key。

对 NULL 键的特别处理：putForNullKey()

    /**
     * Offloaded version of put for null keys
     */
    private V putForNullKey(V value) {
        // 若key==null，则将其放入table的第一个桶，即 table[0]
        for (Entry<K,V> e = table[0]; e != null; e = e.next) {
            if (e.key == null) {   // 若已经存在key为null的键，则替换其值，并返回旧值
                V oldValue = e.value;
                e.value = value;
                e.recordAccess(this);
                return oldValue;
            }
        }
        modCount++;        // 快速失败
        addEntry(0, null, value, 0);       // 否则，将其添加到 table[0] 的桶中
        return null;
    }

通过上述源码我们可以清楚知到，HashMap 中可以保存键为 NULL 的键值对，且该键值对是唯一的。若再次向其中添加键为 NULL 的键值对，将覆盖其原值。此外，如果 HashMap 中存在键为 NULL 的键值对，那么一定在第一个桶中。

HashMap 中的哈希策略（算法）

在上述的 put(key,vlaue) 方法的源码中，我们标出了 HashMap 中的哈希策略（即(1)、(2)两处），hash() 方法用于对 Key 的 hashCode 进行重新计算，而 indexFor() 方法用于生成这个 Entry 对象的插入位置。当计算出来的 hash 值与 hashMap 的(length-1)做了&运算后，会得到位于区间[0，length-1]的一个值。特别地，这个值分布的越均匀， HashMap 的空间利用率也就越高，存取效率也就越好。

我们首先看(1)处的 hash() 方法，该方法为一个纯粹的数学计算，用于进一步计算 key 的 hash 值，源码如下：

    /**
     * Applies a supplemental hash function to a given hashCode, which
     * defends against poor quality hash functions.  This is critical
     * because HashMap uses power-of-two length hash tables, that
     * otherwise encounter collisions for hashCodes that do not differ
     * in lower bits.
     *
     * Note: Null keys always map to hash 0, thus index 0.
     */
    static int hash(int h) {
        // This function ensures that hashCodes that differ only by
        // constant multiples at each bit position have a bounded
        // number of collisions (approximately 8 at default load factor).
        h ^= (h >>> 20) ^ (h >>> 12);
        return h ^ (h >>> 7) ^ (h >>> 4);
    }

正如 JDK 官方对该方法的描述那样，使用 hash()方法对一个对象的 hashCode 进行重新计算是为了防止质量低下的 hashCode()函数实现。由于 hashMap 的支撑数组长度总是 2 的幂次，通过右移可以使低位的数据尽量的不同，从而使 hash 值的分布尽量均匀。

通过上述 hash()方法计算得到 Key 的 hash 值后，怎么才能保证元素均匀分布到 table 的每个桶中呢？我们会想到取模，但是由于取模的效率较低，HashMap 是通过调用(2)处的 indexFor()方法处理的，其不但简单而且效率很高，对应源码如下所示：

/**
* Returns index for hash code h.
*/
static int indexFor(int h, int length) {
    return h & (length-1);  // 作用等价于取模运算，但这种方式效率更高
}

我们知道，HashMap 的底层数组长度总是 2 的 n 次方。当 length 为 2 的 n 次方时，h&(length - 1)就相当于对 length 取模，而且速度比直接取模要快得多，这是 HashMap 在速度上的一个优化。

总而言之，上述的 hash()方法和 indexFor()方法的作用只有一个：保证元素均匀分布到 table 的每个桶中以便充分利用空间。

HashMap 中键值对的添加：addEntry()

/**
* Adds a new entry with the specified key, value and hash code to
* the specified bucket.  It is the responsibility of this
* method to resize the table if appropriate.
*
* Subclass overrides this to alter the behavior of put method.
*
* 永远都是在链表的表头添加新元素
*/
void addEntry(int hash, K key, V value, int bucketIndex) {

    //获取bucketIndex处的链表
    Entry<K,V> e = table[bucketIndex];

    //将新创建的 Entry 链入 bucketIndex处的链表的表头
    table[bucketIndex] = new Entry<K,V>(hash, key, value, e);

    //若HashMap中元素的个数超过极限值 threshold，则容量扩大两倍
    if (size++ >= threshold)
        resize(2 * table.length);
}

通过上述源码我们可以清楚地了解到链的产生时机。HashMap 总是将新的 Entry 对象添加到 bucketIndex 处，若 bucketIndex 处已经有了 Entry 对象，那么新添加的 Entry 对象将指向原有的 Entry 对象，并形成一条新的以它为链头的 Entry 链；但是，若 bucketIndex 处原先没有 Entry 对象，那么新添加的 Entry 对象将指向 null，也就生成了一条长度为 1 的全新的 Entry 链了。HashMap 永远都是在链表的表头添加新元素。此外，若 HashMap 中元素的个数超过极限值 threshold，其将进行扩容操作，一般情况下，容量将扩大至原来的两倍。

HashMap 的扩容：resize()

随着 HashMap 中元素的数量越来越多，发生碰撞的概率将越来越大，所产生的子链长度就会越来越长，这样势必会影响 HashMap 的存取速度。为了保证 HashMap 的效率，系统必须要在某个临界点进行扩容处理，该临界点就是 HashMap 中元素的数量在数值上等于 threshold（table 数组长度 * 加载因子）。但是，不得不说，扩容是一个非常耗时的过程，因为它需要重新计算这些元素在新 table 数组中的位置并进行复制处理。所以，如果我们能够提前预知 HashMap 中元素的个数，那么在构造 HashMap 时预设元素的个数能够有效的提高 HashMap 的性能。我们直接看其源码：

     /**
     * Rehashes the contents of this map into a new array with a
     * larger capacity.  This method is called automatically when the
     * number of keys in this map reaches its threshold.
     *
     * If current capacity is MAXIMUM_CAPACITY, this method does not
     * resize the map, but sets threshold to Integer.MAX_VALUE.
     * This has the effect of preventing future calls.
     *
     * @param newCapacity the new capacity, MUST be a power of two;
     *        must be greater than current capacity unless current
     *        capacity is MAXIMUM_CAPACITY (in which case value
     *        is irrelevant).
     */
    void resize(int newCapacity) {
        Entry[] oldTable = table;
        int oldCapacity = oldTable.length;

        // 若 oldCapacity 已达到最大值，直接将 threshold 设为 Integer.MAX_VALUE
        if (oldCapacity == MAXIMUM_CAPACITY) {
            threshold = Integer.MAX_VALUE;
            return;             // 直接返回
        }

        // 否则，创建一个更大的数组
        Entry[] newTable = new Entry[newCapacity];

        //将每条Entry重新哈希到新的数组中
        transfer(newTable);

        table = newTable;
        threshold = (int)(newCapacity * loadFactor);  // 重新设定 threshold
    }

HashMap 的重哈希：transfer()

重哈希的主要是一个重新计算原 HashMap 中的元素在新 table 数组中的位置并进行复制处理的过程，我们直接看其源码：

/**
* Transfers all entries from current table to newTable.
*/
void transfer(Entry[] newTable) {

    // 将原数组 table 赋给数组 src
    Entry[] src = table;
    int newCapacity = newTable.length;

    // 将数组 src 中的每条链重新添加到 newTable 中
    for (int j = 0; j < src.length; j++) {
        Entry<K,V> e = src[j];
        if (e != null) {
            src[j] = null;   // src 回收

            // 将每条链的每个元素依次添加到 newTable 中相应的桶中
            do {
                Entry<K,V> next = e.next;

                // e.hash指的是 hash(key.hashCode())的返回值;
                // 计算在newTable中的位置，注意原来在同一条子链上的元素可能被分配到不同的子链
                int i = indexFor(e.hash, newCapacity);
                e.next = newTable[i];
                newTable[i] = e;
                e = next;
            } while (e != null);
        }
    }
}

特别需要注意的是，在重哈希的过程中，原属于一个桶中的 Entry 对象可能被分到不同的桶，因为 HashMap 的容量发生了变化，那么 h&(length - 1) 的值也会发生相应的变化。极端地说，如果重哈希后，原属于一个桶中的 Entry 对象仍属于同一桶，那么重哈希也就失去了意义。

get

相对于 HashMap 的存储而言，读取就显得比较简单了。因为，HashMap 只需通过 key 的 hash 值定位到 table 数组的某个特定的桶，然后查找并返回该 key 对应的 value 即可，源码如下：

/**
* Returns the value to which the specified key is mapped,
* or {@code null} if this map contains no mapping for the key.
*
* <p>More formally, if this map contains a mapping from a key
* {@code k} to a value {@code v} such that {@code (key==null ? k==null
* key.equals(k))}, then this method returns {@code v}; otherwise
* it returns {@code null}.  (There can be at most one such mapping.)
*
* <p>A return value of {@code null} does not <i>necessarily</i>
* indicate that the map contains no mapping for the key; it's also
* possible that the map explicitly maps the key to {@code null}.
* The {@link #containsKey containsKey} operation may be used to
* distinguish these two cases.
*
* @see #put(Object, Object)
*/
public V get(Object key) {
    // 若为null，调用getForNullKey方法返回相对应的value
    if (key == null)
        // 从table的第一个桶中寻找 key 为 null 的映射；若不存在，直接返回null
        return getForNullKey();

    // 根据该 key 的 hashCode 值计算它的 hash 码
    int hash = hash(key.hashCode());
    // 找出 table 数组中对应的桶
    for (Entry<K,V> e = table[indexFor(hash, table.length)];
            e != null;
            e = e.next) {
        Object k;
        //若搜索的key与查找的key相同，则返回相对应的value
        if (e.hash == hash && ((k = e.key) == key || key.equals(k)))
            return e.value;
    }
    return null;
}

在这里能够根据 key 快速的取到 value，除了和 HashMap 的数据结构密不可分外，还和 Entry 有莫大的关系。在前面就已经提到过，HashMap 在存储过程中并没有将 key，value 分开来存储，而是当做一个整体 key-value 来处理的，这个整体就是 Entry 对象。特别地，在 Entry 对象中，value 的地位要比 key 低一些，相当于是 key 的附属。

其中，针对键为 NULL 的键值对，HashMap 提供了专门的处理：getForNullKey()，其源码如下：

 /**
     * Offloaded version of get() to look up null keys.  Null keys map
     * to index 0.  This null case is split out into separate methods
     * for the sake of performance in the two most commonly used
     * operations (get and put), but incorporated with conditionals in
     * others.
     */
    private V getForNullKey() {
        // 键为NULL的键值对若存在，则必定在第一个桶中
        for (Entry<K,V> e = table[0]; e != null; e = e.next) {
            if (e.key == null)
                return e.value;
        }
        // 键为NULL的键值对若不存在，则直接返回 null
        return null;
    }

因此，调用 HashMap 的 get(Object key)方法后，若返回值是 NULL，则存在如下两种可能：

该 key 对应的值就是 null;
HashMap 中不存在该 key。

最近更新于 0001-01-01