天天看點

Collection Framework的Iterator實作

在比較

HashMap

ConcurrentHashMap

的不同之處發現了一個細節——關于

Iterator

的實作的不同,其實

HashMap

ConcurrentHashMap

還有更多不同的地方,這也是面試經常問到的問題,有一篇文章我覺得講的很好了, Java進階(六)從ConcurrentHashMap的演進看Java多線程核心技術

Iterator是一種設計模式,在

Java Collection Framework

中經常作為容器的視圖(view),大多數時候隻支援删除、不支援增加,提供統一的接口方法等特點。在

Java Collection Framework

Iterator

實作中大多數是

fast-fail

方式的,而支援并發的容器資料結構則沒有這個限制。

非并發資料結構的情況

常見的使用方法

1)使用Iterator周遊字元串清單

List<String> lists = Arrays.asList("a","b","c");
Iterator<String> iterator = lists.iterator();
while (iterator.hasNext()) {
    String val = iterator.next();
    System.out.println(val);
}           

這種做法是for..each的文法的展開形式

for(String val: lists){
    //sout
}           

2)使用Iterator周遊LinkedList

LinkedList<String> linkedList = new LinkedList<>(lists);
iterator = linkedList.iterator();
while (iterator.hasNext()) {
    String val = iterator.next();
    System.out.println(val);
}           

3) 使用Iterator周遊HashMap

Map<String,Integer> hmap = new HashMap<>(3);
hmap.put("a",1);
hmap.put("b",2);
hmap.put("c",3);

Iterator<Map.Entry<String,Integer>> mapIterator = hmap.entrySet().iterator();
while (mapIterator.hasNext()) {
    Map.Entry<String,Integer> entry = mapIterator.next();
    System.out.println(entry.getKey() + ":" + entry.getValue());
}           

非并發資料結構Iterator的實作

1)ArrayList中的Iterator

list中的結構是順序的,Iterator既然是List的視圖,那它也表現了相同的順序。

ArrayList獲得Iterator,

/**
* Returns an iterator over the elements in this list in proper sequence.
*
* <p>The returned iterator is <a href="#fail-fast"><i>fail-fast</i></a>.
*
* @return an iterator over the elements in this list in proper sequence
*/
public Iterator<E> iterator() {
    return new Itr();
}           

源碼,

/**
* An optimized version of AbstractList.Itr
*/
private class Itr implements Iterator<E> {
    int cursor;       // index of next element to return
    int lastRet = -1; // index of last element returned; -1 if no such
    int expectedModCount = modCount;

    public boolean hasNext() {
        return cursor != size;
    }

    @SuppressWarnings("unchecked")
    public E next() {
        checkForComodification();
        int i = cursor;
        if (i >= size)
            throw new NoSuchElementException();
        Object[] elementData = ArrayList.this.elementData;
        if (i >= elementData.length)
            throw new ConcurrentModificationException();
        cursor = i + 1;
        return (E) elementData[lastRet = i];
    }

    public void remove() {
        if (lastRet < 0)
            throw new IllegalStateException();
        checkForComodification();

        try {
            ArrayList.this.remove(lastRet);
            cursor = lastRet;
            lastRet = -1;
            expectedModCount = modCount;
        } catch (IndexOutOfBoundsException ex) {
            throw new ConcurrentModificationException();
        }
    }

    final void checkForComodification() {
        if (modCount != expectedModCount)
            throw new ConcurrentModificationException();
    }
}           

Itr

ArrayList

的一個内部類,它能使用宿主類的成員變量,事實上

Itr

反映了ArrayList的内部情況,使用了

size

expectedModCount

elementData

等屬性。通過遊标cursor的方式不斷往前遞進,隻要遊标小于size就說明依然還有元素可以通路。

應該看到的是,在調用了

new Iterator()

之後,可以看做

Itr

ArrayList

做了快照,這裡的快照并不是很嚴格,是基于

modCount

比較來實作的。它在初始化時備份了

modCount

的值,儲存為私有的變量

expectedModCount

首先

Iterator

接口并沒有諸如add的方法,即不能通過Iterator來為容器增加元素;

其次,如果有其他線程變化了容器的結構(structural modification),那麼

ArrayList.this.modCount

的值會發生改變,那麼在

Itr

執行next或者remove方法時會判斷出來

modCount != expectedModCount

的情況,進而抛出異常

fast-fail

再次,如果執行了

Itr

的remove方法,它能夠調用

ArrayList.this.remove

的方法,然後修正遊标和

expectedModCount

等。

ArrayList.this.remove(lastRet);
cursor = lastRet;
lastRet = -1;
expectedModCount = modCount;           

2)LinkedList中的Iterator

LinkedList

Iterator

ArrayList

中的有一些類似的地方。

首先,

LinkedList

的iterator入口方法其實是

AbstractSequentialList

抽象類中,

/**
* Returns an iterator over the elements in this list (in proper
* sequence).<p>
*
* This implementation merely returns a list iterator over the list.
*
* @return an iterator over the elements in this list (in proper sequence)
*/
public Iterator<E> iterator() {
    return listIterator();
}

/**
* Returns a list iterator over the elements in this list (in proper
* sequence).
*
* @param  index index of first element to be returned from the list
*         iterator (by a call to the <code>next</code> method)
* @return a list iterator over the elements in this list (in proper
*         sequence)
* @throws IndexOutOfBoundsException {@inheritDoc}
*/
public abstract ListIterator<E> listIterator(int index);           

而這個

ListIterator

是一個接口,它被

LinkedList$ListItr

實作,

private class ListItr implements ListIterator<E> {
    private Node<E> lastReturned = null;
    private Node<E> next;
    private int nextIndex;
    private int expectedModCount = modCount;

    ListItr(int index) {
        // assert isPositionIndex(index);
        next = (index == size) ? null : node(index);
        nextIndex = index;
    }

    public boolean hasNext() {
        return nextIndex < size;
    }

    public E next() {
        checkForComodification();
        if (!hasNext())
            throw new NoSuchElementException();

        lastReturned = next;
        next = next.next;
        nextIndex++;
        return lastReturned.item;
    }

    public boolean hasPrevious() {
        return nextIndex > 0;
    }

    public E previous() {
        checkForComodification();
        if (!hasPrevious())
            throw new NoSuchElementException();

        lastReturned = next = (next == null) ? last : next.prev;
        nextIndex--;
        return lastReturned.item;
    }

    public int nextIndex() {
        return nextIndex;
    }

    public int previousIndex() {
        return nextIndex - 1;
    }

    public void remove() {
        checkForComodification();
        if (lastReturned == null)
            throw new IllegalStateException();

        Node<E> lastNext = lastReturned.next;
        unlink(lastReturned);
        if (next == lastReturned)
            next = lastNext;
        else
            nextIndex--;
        lastReturned = null;
        expectedModCount++;
    }

    public void set(E e) {
        if (lastReturned == null)
            throw new IllegalStateException();
        checkForComodification();
        lastReturned.item = e;
    }

    public void add(E e) {
        checkForComodification();
        lastReturned = null;
        if (next == null)
            linkLast(e);
        else
            linkBefore(e, next);
        nextIndex++;
        expectedModCount++;
    }

    final void checkForComodification() {
        if (modCount != expectedModCount)
            throw new ConcurrentModificationException();
    }
}           

LinkedList

Iterator

要比

ArrayList

中的複雜一些,它更支援了add等方法;

類似原來遊标的周遊方式,基于

size

expectedModCount

等比較邏輯依然存在,隻不過周遊的方式不是原來的下标增進,而是節點之間的next指針來實作。

3)HashMap中的Iterator

HashMap

有多個view視圖,

keySet

values

entrySet

,這裡分析下

entrySet

這個視圖,另外兩個原理和

entrySet

視圖的差不多。

private final class EntrySet extends AbstractSet<Map.Entry<K,V>> {
    public Iterator<Map.Entry<K,V>> iterator() {
        return newEntryIterator();
    }
    public boolean contains(Object o) {
        if (!(o instanceof Map.Entry))
            return false;
        Map.Entry<K,V> e = (Map.Entry<K,V>) o;
        Entry<K,V> candidate = getEntry(e.getKey());
        return candidate != null && candidate.equals(e);
    }
    public boolean remove(Object o) {
        return removeMapping(o) != null;
    }
    public int size() {
        return size;
    }
    public void clear() {
        HashMap.this.clear();
    }
}           

EntrySet的iterator方法中調用了

newEntryIterator

,将構造

EntryIterator

執行個體,

EntryIterator

源碼

private final class EntryIterator extends HashIterator<Map.Entry<K,V>> {
    public Map.Entry<K,V> next() {
        return nextEntry();
    }
}           

EntryIterator

繼承了

HashIterator

類,複用了父類的大部分方法,隻是覆寫了next方法。

HashIterator

private abstract class HashIterator<E> implements Iterator<E> {
    Entry<K,V> next;        // next entry to return
    int expectedModCount;   // For fast-fail
    int index;              // current slot
    Entry<K,V> current;     // current entry

    HashIterator() {
        expectedModCount = modCount;
        if (size > 0) { // advance to first entry
            Entry[] t = table;
            while (index < t.length && (next = t[index++]) == null)
                ;
        }
    }

    public final boolean hasNext() {
        return next != null;
    }

    final Entry<K,V> nextEntry() {
        if (modCount != expectedModCount)
            throw new ConcurrentModificationException();
        Entry<K,V> e = next;
        if (e == null)
            throw new NoSuchElementException();

        if ((next = e.next) == null) {
            Entry[] t = table;
            while (index < t.length && (next = t[index++]) == null)
                ;
        }
        current = e;
        return e;
    }

    public void remove() {
        if (current == null)
            throw new IllegalStateException();
        if (modCount != expectedModCount)
            throw new ConcurrentModificationException();
        Object k = current.key;
        current = null;
        HashMap.this.removeEntryForKey(k);
        expectedModCount = modCount;
    }
}           

由于HashMap的結構并不是順序的,在執行Iterator.next方法時不能通過next指針或下标的方式直接找到下一個元素,

HashIterator

為了能達到這個目的,在構造函數和

nextEntry

方法中預先做了

advance

處理。

//構造函數中
if (size > 0) { // advance to first entry
    Entry[] t = table;
    while (index < t.length && (next = t[index++]) == null)
        ;
}
//nextEntry中
if ((next = e.next) == null) {
    Entry[] t = table;
    while (index < t.length && (next = t[index++]) == null)
        ;
}            

構造函數中預先在HashMap的table數組找到第一個頭結點不為null的元素;

(next = t[index++]) == null

的寫法有點迷惑性,不考慮HashMap為空的情況,index自增停在

next != null

的情況,即 next = t[index-1], index已經往前一步了;

在nextEntry中如果發現e.next是null,此時表示table這個數組元素的連結清單周遊結束了,需要跳到下一個頭節點不為空的元素繼續周遊,而index剛好往前一步了,此時繼續執行

next = t[index++]           

假設next[index]不為空,那麼下一個周遊的數組元素頭節點找到,并且index已經自增了。

并發資料結構的情況

ConcurrentHashMap

為例,看

ConcurrentHashMap$HashInteraotr

的實作

abstract class HashIterator {
    int nextSegmentIndex;
    int nextTableIndex;
    HashEntry<K,V>[] currentTable;
    HashEntry<K, V> nextEntry;
    HashEntry<K, V> lastReturned;

    HashIterator() {
        nextSegmentIndex = segments.length - 1;
        nextTableIndex = -1;
        advance();
    }

    /**
    * Set nextEntry to first node of next non-empty table
    * (in backwards order, to simplify checks).
    */
    final void advance() {
        for (;;) {
            if (nextTableIndex >= 0) {
                if ((nextEntry = entryAt(currentTable,
                                            nextTableIndex--)) != null)
                    break;
            }
            else if (nextSegmentIndex >= 0) {
                Segment<K,V> seg = segmentAt(segments, nextSegmentIndex--);
                if (seg != null && (currentTable = seg.table) != null)
                    nextTableIndex = currentTable.length - 1;
            }
            else
                break;
        }
    }

    final HashEntry<K,V> nextEntry() {
        HashEntry<K,V> e = nextEntry;
        if (e == null)
            throw new NoSuchElementException();
        lastReturned = e; // cannot assign until after null check
        if ((nextEntry = e.next) == null)
            advance();
        return e;
    }

    public final boolean hasNext() { return nextEntry != null; }
    public final boolean hasMoreElements() { return nextEntry != null; }

    public final void remove() {
        if (lastReturned == null)
            throw new IllegalStateException();
        ConcurrentHashMap.this.remove(lastReturned.key);
        lastReturned = null;
    }
}           

這裡能看到ConcurrentHashMap的segment分段因素所在,在構造函數中指定了最後一個segment數組元素,然後做advance處理,也是從後往前處理的。首先找到不為null的分段segment,然後才是在segment的table數組中找到不為null的元素,這都是從後往前“前進”的。

而與HashMap不同的地方,ConcurrentHashMap的Iterator并不是

fast-fail

的,它并沒有判斷modCount;除此之外還應該看到它對

nextEntry

的處理,在advance的方法調用以下兩個方法,

/**
* Gets the jth element of given segment array (if nonnull) with
* volatile element access semantics via Unsafe. (The null check
* can trigger harmlessly only during deserialization.) Note:
* because each element of segments array is set only once (using
* fully ordered writes), some performance-sensitive methods rely
* on this method only as a recheck upon null reads.
*/
@SuppressWarnings("unchecked")
static final <K,V> Segment<K,V> segmentAt(Segment<K,V>[] ss, int j) {
    long u = (j << SSHIFT) + SBASE;
    return ss == null ? null :
        (Segment<K,V>) UNSAFE.getObjectVolatile(ss, u);
}
/**
* Gets the ith element of given table (if nonnull) with volatile
* read semantics. Note: This is manually integrated into a few
* performance-sensitive methods to reduce call overhead.
*/
@SuppressWarnings("unchecked")
static final <K,V> HashEntry<K,V> entryAt(HashEntry<K,V>[] tab, int i) {
    return (tab == null) ? null :
        (HashEntry<K,V>) UNSAFE.getObjectVolatile
        (tab, ((long)i << TSHIFT) + TBASE);
}           

它們都是調用了

UNSAFE.getObjectVolatile

方法,利用了volatile access的方式,相較于上鎖的方式性能更好。

番外篇

JavaScript實作的Iterator的例子

這個例子來自MDN的文檔,做法比較簡潔,

疊代器
function makeIterator(array){
    var nextIndex = 0;

    return {
       next: function(){
           return nextIndex < array.length ?
               {value: array[nextIndex++], done: false} :
               {done: true};
       }
    };
}
var it = makeIterator(['yo', 'ya']);
console.log(it.next().value); // 'yo'
console.log(it.next().value); // 'ya'
console.log(it.next().done);  // true           

可以考慮給這個

makeIterator

的傳回值加上

hasNext

屬性,

return {
    next: ...,
    hasNext: function() {
        return nextIndex < array.length;
    }
}           

JavaScript利用了閉包實作了Iterator和Java利用内部類實作有相似的地方。

總結

Iterator的主要目的還是為了表現底層資料結構的所有元素,提供一種統一的周遊方式。在不同的資料結構需要針對不同語義做出改動,像

LinkedList

的支援add方法,像

ConcurrentHashMap

HashMap

advance

處理,像

ConcurrentHashMap

那樣不判斷

modeCount

而使用

volatile access