天天看點

如何擷取x86 CPU L1、L2和L3 cache的大小

CPU cache是介于CPU核心和實體記憶體(動态記憶體 dynamic RAM)之間的若幹塊靜态記憶體(staic RAM),static RAM的通路速度比dynamic RAM的通路速度要快很多,而且不需要像dynamic RAM那樣由于會漏電需要保持一定的重新整理頻率。static RAM作為通路dynamic RAM的緩存,對于系統的記憶體通路性能起到了很大的提升作用,但是由于static RAM的成本比較高,是以一般static RAM或者說Cache的大小都比較有限,一般都在幾十KB到幾十MB的範圍内。

每個CPU Socket中,會包含有L1、L2(也叫MLC,Middle Level Cache)和L3(也叫LLC,Last Level Cache)的Cache,用于加速記憶體通路的速度,其基本結構如下所示:

如何擷取x86 CPU L1、L2和L3 cache的大小

每個CPU core都會有自己的L1 cache,而且經常會分成L1 data cache(DL1)和L1 instruction cache(IL1),分别用于緩存資料和指令。L2 cache可能是的隸屬單個CPU core,也可能是多個CPU core之間共享(酷睿架構是隸屬單個CPU core)。L3 cache則是在一個CPU Socket/Package上的所有(也有可能是部分)CPU core共享。

按照x86 CPU的傳統,CPU的特性一般可以通過CPUID指令擷取,Cache的相關特性也不例外,隻是CPUID指令傳回的資訊并沒有直覺地告知各級Cache的大小。

根據Intel SDM(Software Developer's Manual)手冊Volume 2對CPUID指令的介紹,CPUID Leaf 2(EAX=02H時執行CPUID指令),能夠在EAX、EBX、ECX和EDX寄存器中傳回TLB(Translation Lookaside Buffer)和Cache的相關資訊,資訊以位元組為機關進行編碼,如下圖所示:

如何擷取x86 CPU L1、L2和L3 cache的大小

需要根據CPUID指令傳回的資訊,以位元組為機關查表以擷取詳細的資訊,這種方法一般适用于比較老的CPU,因為這樣的表最多支援256個表項,支援的範圍有限。現在的CPU基本不怎麼使用這種方式,而是采用CPUID Leaf 4的方式來擷取Cache的資訊。

當支援CPUID Leaf 4的時候,不僅需要将EAX設定為04H,而且需要在ECX設定一個索引值,因為CPU中存在多個種類或級别的Cache,該索引值從0開始向上遞增,直到傳回的EAX寄存器中Cache Type File為0,表示沒有更多的Cache了。

如何擷取x86 CPU L1、L2和L3 cache的大小
如何擷取x86 CPU L1、L2和L3 cache的大小

CPUID Leaf 4傳回的資訊并沒有直接指明Cache的大小,而是需要根據一定的公式進行計算:

Cache Size = (Ways + 1) * (Partitions + 1) * (Linue_Size + 1) * (Sets + 1) 即

Cache Size = (EBX[31:22] + 1) * (EBX[21:12] + 1) * (EBX[11:0] + 1) * (ECX + 1)

在Linux上可以通過cpuid這個工具來擷取CPUID指令的輸出,在Intel(R) Core(TM) i5-8500 CPU @ 3.00GHz的系統上CPUID leaf 4的輸出如下所示:

deterministic cache parameters (4):
      --- cache 0 ---
      cache type                           = data cache (1)
      cache level                          = 0x1 (1)
      self-initializing cache level        = true
      fully associative cache              = false
      extra threads sharing this cache     = 0x1 (1)
      extra processor cores on this die    = 0x7 (7)
      system coherency line size           = 0x3f (63)
      physical line partitions             = 0x0 (0)
      ways of associativity                = 0x7 (7)
      ways of associativity                = 0x0 (0)
      WBINVD/INVD behavior on lower caches = false
      inclusive to lower caches            = false
      complex cache indexing               = false
      number of sets - 1 (s)               = 63
      --- cache 1 ---
      cache type                           = instruction cache (2)
      cache level                          = 0x1 (1)
      self-initializing cache level        = true
      fully associative cache              = false
      extra threads sharing this cache     = 0x1 (1)
      extra processor cores on this die    = 0x7 (7)
      system coherency line size           = 0x3f (63)
      physical line partitions             = 0x0 (0)
      ways of associativity                = 0x7 (7)
      ways of associativity                = 0x0 (0)
      WBINVD/INVD behavior on lower caches = false
      inclusive to lower caches            = false
      complex cache indexing               = false
      number of sets - 1 (s)               = 63
      --- cache 2 ---
      cache type                           = unified cache (3)
      cache level                          = 0x2 (2)
      self-initializing cache level        = true
      fully associative cache              = false
      extra threads sharing this cache     = 0x1 (1)
      extra processor cores on this die    = 0x7 (7)
      system coherency line size           = 0x3f (63)
      physical line partitions             = 0x0 (0)
      ways of associativity                = 0x3 (3)
      ways of associativity                = 0x0 (0)
      WBINVD/INVD behavior on lower caches = false
      inclusive to lower caches            = false
      complex cache indexing               = false
      number of sets - 1 (s)               = 1023
      --- cache 3 ---
      cache type                           = unified cache (3)
      cache level                          = 0x3 (3)
      self-initializing cache level        = true
      fully associative cache              = false
      extra threads sharing this cache     = 0xf (15)
      extra processor cores on this die    = 0x7 (7)
      system coherency line size           = 0x3f (63)
      physical line partitions             = 0x0 (0)
      ways of associativity                = 0xb (11)
      ways of associativity                = 0x6 (6)
      WBINVD/INVD behavior on lower caches = false
      inclusive to lower caches            = true
      complex cache indexing               = true
      number of sets - 1 (s)               = 12287
           

可以看出,該CPU上包含了L1 Data Cache、L1 Instruction Cache,L2 Cache和L3 Cache,并且可以根據上面的公式計算出L3 Cache的大小為:

(11 + 1) * (0 + 1) * (63 + 1) * (12287 + 1) = 9MB

類似,也可以算出其他Cache的大小。

搞得這麼複雜主要是因為在CPU Cache的設計過程中,為了提高Cache的使用率、準确性,并兼顧成本,需要對Cache進行更精細的劃分和管理。

如果隻是想要簡單直接地檢視CPU Cache的詳情,在Windows下可以借助CPU-Z這個工具軟體:

如何擷取x86 CPU L1、L2和L3 cache的大小

而在Linux下,則可以借助hwloc工具包中的lstopo指令,如下所示:

如何擷取x86 CPU L1、L2和L3 cache的大小

繼續閱讀