重學c#系列——字典(十一)

前言

重學c#系列繼續更新，簡單看一下字典的源碼。

看源碼主要是解釋一下江湖中的兩個傳言:

字典foreach 順序是字典添加的順序
字典删除元素後，字典順序将會改變

正文

那麼就從執行個體化開始看起，這裡我們假定key 是string 情況下開始看。

一般我們之間執行個體化:

Dictionary<string, string> keys = new Dictionary<string, string>();

那麼看下内部的執行個體化是怎麼樣的。

public Dictionary() : this(0, null) { }

看下這個this 是什麼:

public Dictionary(int capacity, IEqualityComparer<TKey>? comparer)
{
  if (capacity < 0)
  {
    ThrowHelper.ThrowArgumentOutOfRangeException(ExceptionArgument.capacity);
  }

  if (capacity > 0)
  {
    Initialize(capacity);
  }

  if (comparer != null && comparer != EqualityComparer<TKey>.Default) // first check for null to avoid forcing default comparer instantiation unnecessarily
  {
    _comparer = comparer;
  }

  // Special-case EqualityComparer<string>.Default, StringComparer.Ordinal, and StringComparer.OrdinalIgnoreCase.
  // We use a non-randomized comparer for improved perf, falling back to a randomized comparer if the
  // hash buckets become unbalanced.

  if (typeof(TKey) == typeof(string))
  {
    if (_comparer is null)
    {
      _comparer = (IEqualityComparer<TKey>)NonRandomizedStringEqualityComparer.WrappedAroundDefaultComparer;
    }
    else if (ReferenceEquals(_comparer, StringComparer.Ordinal))
    {
      _comparer = (IEqualityComparer<TKey>)NonRandomizedStringEqualityComparer.WrappedAroundStringComparerOrdinal;
    }
    else if (ReferenceEquals(_comparer, StringComparer.OrdinalIgnoreCase))
    {
      _comparer = (IEqualityComparer<TKey>)NonRandomizedStringEqualityComparer.WrappedAroundStringComparerOrdinalIgnoreCase;
    }
  }
}

首先有個參數，capacity 這個表示容量，然後有另外一個參數comparer，從名字上看來是用來比較排序的。

那麼這時候可以大膽猜測一下字典的周遊循環是否和比較器（comparer）有關呢？不過這裡預設是空的，暫時就不關心了。

繼續往下看:

if (typeof(TKey) == typeof(string))
{
  if (_comparer is null)
  {
    _comparer = (IEqualityComparer<TKey>)NonRandomizedStringEqualityComparer.WrappedAroundDefaultComparer;
  }
  else if (ReferenceEquals(_comparer, StringComparer.Ordinal))
  {
    _comparer = (IEqualityComparer<TKey>)NonRandomizedStringEqualityComparer.WrappedAroundStringComparerOrdinal;
  }
  else if (ReferenceEquals(_comparer, StringComparer.OrdinalIgnoreCase))
  {
    _comparer = (IEqualityComparer<TKey>)NonRandomizedStringEqualityComparer.WrappedAroundStringComparerOrdinalIgnoreCase;
  }
}

一般來說我們的key一般都是string，那麼預設就是NonRandomizedStringEqualityComparer.WrappedAroundDefaultComparer。

如果周遊和_comparer 有關的話，那麼NonRandomizedStringEqualityComparer.WrappedAroundDefaultComparer 就很重要，但是現在還不能确定，先不看。

先看一下添加元素的情況。

public void Add(TKey key, TValue value)
{
  bool modified = TryInsert(key, value, InsertionBehavior.ThrowOnExisting);
  Debug.Assert(modified); // If there was an existing key and the Add failed, an exception will already have been thrown.
}

TryInsert 比較長，那麼就一部分一部分看。

if (key == null)
{
  ThrowHelper.ThrowArgumentNullException(ExceptionArgument.key);
}

if (_buckets == null)
{
  Initialize(0);
}
Debug.Assert(_buckets != null);
Entry[]? entries = _entries;
Debug.Assert(entries != null, "expected entries to be non-null");

_buckets 一開始是是null，那麼先來看一下Initialize。

private int Initialize(int capacity)
{
  int size = HashHelpers.GetPrime(capacity);
  int[] buckets = new int[size];
  Entry[] entries = new Entry[size];

  // Assign member variables after both arrays allocated to guard against corruption from OOM if second fails
  _freeList = -1;
#if TARGET_64BIT
  _fastModMultiplier = HashHelpers.GetFastModMultiplier((uint)size);
#endif
  _buckets = buckets;
  _entries = entries;

  return size;
}

那麼看一下初始化的時候dic 的size是多少，通過 HashHelpers.GetPrime(capacity)。

我們知道如果我們沒有指定capacity，那麼capacity預設是0；

public static int GetPrime(int min)
{
  if (min < 0)
    throw new ArgumentException(SR.Arg_HTCapacityOverflow);

  foreach (int prime in s_primes)
  {
    if (prime >= min)
      return prime;
  }

  // Outside of our predefined table. Compute the hard way.
  for (int i = (min | 1); i < int.MaxValue; i += 2)
  {
    if (IsPrime(i) && ((i - 1) % HashPrime != 0))
      return i;
  }
  return min;
}

這個s_primes 我貼一下。

private static readonly int[] s_primes =
{
  3, 7, 11, 17, 23, 29, 37, 47, 59, 71, 89, 107, 131, 163, 197, 239, 293, 353, 431, 521, 631, 761, 919,
  1103, 1327, 1597, 1931, 2333, 2801, 3371, 4049, 4861, 5839, 7013, 8419, 10103, 12143, 14591,
  17519, 21023, 25229, 30293, 36353, 43627, 52361, 62851, 75431, 90523, 108631, 130363, 156437,
  187751, 225307, 270371, 324449, 389357, 467237, 560689, 672827, 807403, 968897, 1162687, 1395263,
  1674319, 2009191, 2411033, 2893249, 3471899, 4166287, 4999559, 5999471, 7199369
};

把0帶入進去，那麼結果應該是3，看來一開始是3個存儲桶，那麼我們可以思考一下這個存儲桶數量是否和性能有關，如果動态指定後性能是否更好，後面會在細節篇介紹。

TryInsert 這個方法繼續往下看:

IEqualityComparer<TKey>? comparer = _comparer;
uint hashCode = (uint)((comparer == null) ? key.GetHashCode() : comparer.GetHashCode(key));

uint collisionCount = 0;
ref int bucket = ref GetBucket(hashCode);
int i = bucket - 1; // Value in _buckets is 1-based

在TryInsert 出現了_comparer，每個comparer 有自己的hashcode 方式，那麼這個hashcode 會決定存儲到哪個存儲桶裡面嗎。

這裡我們假設是在string情況下, 如果我們在不填寫comparer的情況下，預設是OrdinalComparer。

private sealed class OrdinalComparer : NonRandomizedStringEqualityComparer
{
  internal OrdinalComparer(IEqualityComparer<string> wrappedComparer)
  : base(wrappedComparer)
  {
  }

  public override bool Equals(string x, string y)
  {
  return string.Equals(x, y);
  }

  public override int GetHashCode(string obj)
  {
  return obj.GetNonRandomizedHashCode();
  }
}

通過下面這個comparer.GetHashCode，擷取了hashcode。

介紹一下GetNonRandomizedHashCode 這個哈，這個就是說每個string 每次生成的都是确定的hashcode,而且其碰撞的可能性非常低,如果想看的話可以看一下源碼，這裡非重點，或許會在後續的介紹hash中說明一下。

uint hashCode = (uint)((comparer == null) ? key.GetHashCode() : comparer.GetHashCode(key));

然後通過hashcode 擷取在bucket:

ref int bucket = ref GetBucket(hashCode);
int i = bucket - 1; // Value in _buckets is 1-based

檢視:GetBucket

[MethodImpl(MethodImplOptions.AggressiveInlining)]
private ref int GetBucket(uint hashCode)
{
  int[] buckets = _buckets!;
#if TARGET_64BIT
  return ref buckets[HashHelpers.FastMod(hashCode, (uint)buckets.Length, _fastModMultiplier)];
#else
  return ref buckets[hashCode % (uint)buckets.Length];
#endif
}

通過上面這兩段代碼得知，如果是第一次添加元素，那麼bucket 是0，i 為-1；

繼續在tryinsert 往下看。

while (true)
{
  // Should be a while loop https://github.com/dotnet/runtime/issues/9422
  // Test uint in if rather than loop condition to drop range check for following array access
  if ((uint)i >= (uint)entries.Length)
  {
    break;
  }

  if (entries[i].hashCode == hashCode && comparer.Equals(entries[i].key, key))
  {
    if (behavior == InsertionBehavior.OverwriteExisting)
    {
      entries[i].value = value;
      return true;
    }

    if (behavior == InsertionBehavior.ThrowOnExisting)
    {
      ThrowHelper.ThrowAddingDuplicateWithKeyArgumentException(key);
    }

    return false;
  }

  i = entries[i].next;

  collisionCount++;
  if (collisionCount > (uint)entries.Length)
  {
    // The chain of entries forms a loop; which means a concurrent update has happened.
    // Break out of the loop and throw, rather than looping forever.
    ThrowHelper.ThrowInvalidOperationException_ConcurrentOperationsNotSupported();
  }
}

這裡i 為-1 然後entries.Length 是3。

是以(uint)i >= (uint)entries.Length 是true。

第一次我們添加的時候就直接break了。

值得一提的是另外一件事情，那就是當我們反編譯的時候，看到的是這樣的。

那麼這裡就有人問了，i = -1的時候，那麼這個時候不是會出現異常嗎？

實際上并不會，要知道當我們運作的時候比較的是二進制代碼，故而建議編譯後的調試的時候把16進制打開。

if (_freeCount > 0)
{
  index = _freeList;
  Debug.Assert((StartOfFreeList - entries[_freeList].next) >= -1, "shouldn't overflow because `next` cannot underflow");
  _freeList = StartOfFreeList - entries[_freeList].next;
  _freeCount--;
}
else
{
  int count = _count;
  if (count == entries.Length)
  {
    Resize();
    bucket = ref GetBucket(hashCode);
  }
  index = count;
  _count = count + 1;
  entries = _entries;
}

ref Entry entry = ref entries![index];
entry.hashCode = hashCode;
entry.next = bucket - 1; // Value in _buckets is 1-based
entry.key = key;
entry.value = value; // Value in _buckets is 1-based
bucket = index + 1;
_version++;

// Value types never rehash
if (!typeof(TKey).IsValueType && collisionCount > HashHelpers.HashCollisionThreshold && comparer is NonRandomizedStringEqualityComparer)
{
  // If we hit the collision threshold we'll need to switch to the comparer which is using randomized string hashing
  // i.e. EqualityComparer<string>.Default.
  Resize(entries.Length, true);
}

這裡_freeCount 是0，從這個命名(空閑數量)來看結合别人說的那個remove 之後順序會變，猜測一下，是不是删除的時候保留了一個空位，然後如果有空位然後就往裡面添加呢？

這個先不管隻有知道如果一直添加那麼其一直是0。

那麼就按照_freeCount 是0的節奏走，那麼就是判斷是count == entries.Length 判斷是否滿了，如果滿了肯定就是擴容一下，調整一下之類的了。

然後總量加1，同時設定了左邊是最後一位，那麼猜測一下，如果一直這樣加的話，如果周遊的時候沒有使用到comparer，那麼就是按照添加的順序了。

那麼看一下字典的疊代步驟:

public bool MoveNext()
{
  if (_version != _dictionary._version)
  {
    ThrowHelper.ThrowInvalidOperationException_InvalidOperation_EnumFailedVersion();
  }

  // Use unsigned comparison since we set index to dictionary.count+1 when the enumeration ends.
  // dictionary.count+1 could be negative if dictionary.count is int.MaxValue
  while ((uint)_index < (uint)_dictionary._count)
  {
    ref Entry entry = ref _dictionary._entries![_index++];

    if (entry.next >= -1)
    {
      _current = new KeyValuePair<TKey, TValue>(entry.key, entry.value);
      return true;
    }
  }

  _index = _dictionary._count + 1;
  _current = default;
  return false;
}

_index 一開始預設是0，那麼我們存儲的值是一個數組，那麼如果按照上面這樣邏輯的周遊，那麼是按照添加順序來的。

這裡有一個周遊出來的條件，entry.next >= -1，那麼是否是如果删除的話那麼就做一個标記呢？因為如果移動數組，那麼的确是一件艱難的事情（性能）。

那麼看remove了。

public bool Remove(TKey key)
{
  // The overload Remove(TKey key, out TValue value) is a copy of this method with one additional
  // statement to copy the value for entry being removed into the output parameter.
  // Code has been intentionally duplicated for performance reasons.

  if (key == null)
  {
    ThrowHelper.ThrowArgumentNullException(ExceptionArgument.key);
  }

  if (_buckets != null)
  {
    Debug.Assert(_entries != null, "entries should be non-null");
    uint collisionCount = 0;
    uint hashCode = (uint)(_comparer?.GetHashCode(key) ?? key.GetHashCode());
    ref int bucket = ref GetBucket(hashCode);
    Entry[]? entries = _entries;
    int last = -1;
    int i = bucket - 1; // Value in buckets is 1-based
    while (i >= 0)
    {
      ref Entry entry = ref entries[i];

      if (entry.hashCode == hashCode && (_comparer?.Equals(entry.key, key) ?? EqualityComparer<TKey>.Default.Equals(entry.key, key)))
      {
        if (last < 0)
        {
          bucket = entry.next + 1; // Value in buckets is 1-based
        }
        else
        {
          entries[last].next = entry.next;
        }

        Debug.Assert((StartOfFreeList - _freeList) < 0, "shouldn't underflow because max hashtable length is MaxPrimeArrayLength = 0x7FEFFFFD(2146435069) _freelist underflow threshold 2147483646");
        entry.next = StartOfFreeList - _freeList;

        if (RuntimeHelpers.IsReferenceOrContainsReferences<TKey>())
        {
          entry.key = default!;
        }

        if (RuntimeHelpers.IsReferenceOrContainsReferences<TValue>())
        {
          entry.value = default!;
        }

        _freeList = i;
        _freeCount++;
        return true;
      }

      last = i;
      i = entry.next;

      collisionCount++;
      if (collisionCount > (uint)entries.Length)
      {
        // The chain of entries forms a loop; which means a concurrent update has happened.
        // Break out of the loop and throw, rather than looping forever.
        ThrowHelper.ThrowInvalidOperationException_ConcurrentOperationsNotSupported();
      }
    }
  }
  return false;
}

這裡就直接說明一下這個思路哈。

首先給出一張概念圖:

這張是什麼意思呢？字典實際上預設情況下實際上是3個資料桶。

但是資料本身是存在數組中的，然後通過資料key的hashcode 來決定資料放在哪個資料桶中。

且同一個資料桶中是鍊狀結構的，那麼删除步驟就是下面這個概念圖。

就是進行連結清單删除一樣，同樣把這個資料的key和value 清空，然後把next進行弄為比-1還小的數StartOfFreeList - _freeList，StartOfFreeList 預設是-3，_freeList 最小是-1，這樣就在數組中标記删除了。

那麼我們再來看添加的時候，如果有删除的位置的時候的代碼。

if (_freeCount > 0)
{
  index = _freeList;
  Debug.Assert((StartOfFreeList - entries[_freeList].next) >= -1, "shouldn't overflow because `next` cannot underflow");
  _freeList = StartOfFreeList - entries[_freeList].next;
  _freeCount--;
}

如果有删除的位置那麼會頂替删除的位置。

那麼删除的模型是怎麼樣的？看StartOfFreeList - entries[_freeList].next。

這裡删除邏輯其實是這樣的。

删除的我們指定隻是做了标記，但是這個标記可不是這麼簡單的标記，而是形成了鍊狀結構，每次如果添加的時候就替補了最後一個删除為，然後最後一個删除位又得到了更新。

這裡給出一個例子。

static void Main(string[] args)
{
  Console.WriteLine("Hello World!");
  Dictionary<string, string> keys = new Dictionary<string, string>();
  keys.Add("aaa","aaa");

  keys.Add("cccc", "cccc");

  keys.Add("ddddd", "ddddd");

  keys.Remove("aaa");

  keys.Remove("cccc");

  keys.Add("xxxx", "bbbbb");

  keys.Add("zzzz", "bbbbb");

  foreach (var a in keys)
  {
    Console.WriteLine(a.Key);
  }

  Console.ReadLine();
}

結

那麼得出結論。

1. 字典foreach 順序是字典添加的順序

2. 字典删除元素後，字典順序将會改變

如果字典一直隻添加，那麼會foreach 會是原來的順序。

如果字典進行了删除，那麼這個最後一個删除位将會成為下一個添加位。

整理查找我就不介紹了，也是通過定位在哪個存儲桶，然後進行鍊狀查詢。

在該系列，在後面應該也會介紹一下list源碼。

重學c#系列——字典(十一)

前言

正文

結

繼續閱讀

ZOJ 1041 Transmitters

CSU 1562 Fun House

CodeChef PALPROB Palindromeness

UVA 10344- 23 out of 5

ZOJ 1104 Leaps Tall Buildings

ZOJ 3700 Ever Dream

HDU 2821 Pusher

ZOJ 1199 Point of Intersection

UVA 1401 Remember the Word

ZOJ 2748 Free Kick

CSU 1567 Reverse Rot

JAVA 系列——>開發工具IntelliJ IDEA的安裝以及配置、快捷鍵IDEA 簡介

UVA 519 Puzzle (II)

磁盤結構及在Linux中的命名

數組和類數組

詳解STM32單片機的堆棧