天天看點

risc-v 文檔翻譯: RVWMO Memory ConsistencyModel

前言

多核處理器通過共享記憶體進行通信,如果記憶體操作亂序,相當于通信順序亂序,而沒有編寫過多線程的程式員可能意識不到這一點,假設了通信是順序執行的,這會對程式正确性産生影響。

這部分在非特權級 14 章部分,目前版本為 0.1,我們開始吧。

總覽

本章定義了RISC-V記憶體一緻性模型。記憶體一緻性模型是一組規則,指定 load 記憶體傳回的值。RISC-V使用一種稱為“RVWMO”(RISC-V弱記憶體序)的記憶體模型,該模型旨在為架構師提供建構高性能可伸縮設計的靈活性,同時支援可處理的程式設計模型。

This chapter defines the RISC-V memory consistency model. A memory consistency model is a set of rules specifying the values that can be returned by loads of memory. RISC-V uses a memory model called “RVWMO” (RISC-V Weak Memory Ordering) which is designed to provide flexibility for architects to build high-performance scalable designs while simultaneously supporting a tractable programming model.

在RVWMO下,從同一 hart 中的其他記憶體指令的角度來看,在單個hart上運作的代碼是按順序執行的,但來自另一個hart的記憶體指令可能觀察到來自第一個 hart 的記憶體指令以不同的順序執行。是以,多線程代碼可能需要顯式同步來保證來自不同hart的記憶體指令之間的順序。基本的RISC-V ISA為此提供了一個FENCE指令,如2.7節所述,而原子擴充“A”額外定義了負載保留/存儲條件和原子讀-修改-寫指令

Under RVWMO, code running on a single hart appears to execute in order from the perspective of other memory instructions in the same hart, but memory instructions from another hart may observe the memory instructions from the first hart being executed in a different order. Therefore, multithreaded code may require explicit synchronization to guarantee ordering between memory instructions from different harts. The base RISC-V ISA provides a FENCE instruction for this purpose, described in Section 2.7, while the atomics extension “A” additionally defines load reserved/store-conditional and atomic read-modify-write instructions

用于非對齊原子通路的标準 ISA 擴充“Zam”(第22章)和用于 TSO 記憶體模型的标準ISA擴充“Ztso”(第23章)用特定的附加規則增強了 RVWMO。

The standard ISA extension for misaligned atomics “Zam” (Chapter 22) and the standard ISA extension for total store ordering “Ztso” (Chapter 23) augment RVWMO with additional rules specific to those extensions.

該規範的附錄提供了記憶體一緻性模型的公理和操作語義形式,以及額外的解釋性材料。

The appendices to this specification provide both axiomatic and operational formalizations of the memory consistency model as well as additional explanatory material.

本章為正常的記憶體操作定義了記憶體模型。而記憶體模型與I/O記憶體、指令擷取、FENCE.I,頁表周遊(page table walks),和SFENCE.VMA的互動尚未正式形式化。上述部分或全部可能在本規範的未來修訂中正式确定。RV128基礎ISA和未來的ISA擴充(如“V”向量、“T”事務記憶體和“J”JIT擴充)也需要合并到未來的修訂版中。

同時支援不同寬度的重疊記憶體通路的記憶體一緻性模型仍然是學術研究的一個活躍領域,但尚未得到充分了解。關于不同大小的記憶體通路在RVWMO下如何互動的細節已經盡了我們目前的最大能力,但是如果發現了新的問題,它們可能會進行修訂。

This chapter defines the memory model for regular main memory operations. The interaction of the memory model with I/O memory, instruction fetches, FENCE.I, page table walks, and SFENCE.VMA is not (yet) formalized. Some or all of the above may be formalized in a future revision of this specification. The RV128 base ISA and future ISA extensions such as the “V” vector, “T” transactional memory, and “J” JIT extensions will need to be incorporated into a future revision as well.
同時支援不同寬度的重疊記憶體通路的記憶體一緻性模型仍然是學術研究的一個活躍領域,但尚未得到充分了解。關于不同大小的記憶體通路在RVWMO下如何互動的細節已經盡了我們目前的最大能力,但是如果發現了新的問題,它們可能會進行修訂。
Memory consistency models supporting overlapping memory accesses of different widths simultaneously remain an active area of academic research and are not yet fully understood. The specifics of how memory accesses of different sizes interact under RVWMO are specified to the best of our current abilities, but they are subject to revision should new issues be uncovered.

14.1 RVWMO 的定義

RFC: 對于global memory order, 具體内容和《Time, Clocks, and the Ordering of Events in a Distributed System》這篇論文提到的 partial order 挺像, 應該指的是一個東西

RVWMO記憶體模型是根據 global memory order 定義的,即所有hart産生的記憶體操作的總順序。一般來說,多線程程式有許多不同的可能執行,每個執行都有自己相應的全局記憶體順序。全局記憶體順序是在由記憶體指令生成的基本加載和存儲操作上定義的。然後,它受制于本章其餘部分定義的限制。任何滿足所有記憶體模型限制的執行都是合法的執行(就記憶體模型而言)。

The RVWMO memory model is defined in terms of the global memory order, a total ordering of the memory operations produced by all harts. In general, a multithreaded program has many different possible executions, with each execution having its own corresponding global memory order. The global memory order is defined over the primitive load and store operations generated by memory instructions. It is then subject to the constraints defined in the rest of this chapter. Any execution satisfying all of the memory model constraints is a legal execution (as far as the memory model is concerned).

記憶體模型原語

記憶體操作的程式順序反映了生成每次加載和存儲的指令在該hart的動态指令流中的邏輯布局順序;也就是說,一個簡單的按順序處理器執行該hart指令的順序。記憶體通路指令産生記憶體操作。記憶體操作可以是加載操作、存儲操作,或者同時是加載操作和存儲操作。所有的記憶體操作都是單拷貝原子:它們永遠不會在部分完成狀态下被觀察到。在RV32GC和RV64GC中的指令中,每個對齊的記憶體指令隻産生一個記憶體操作,隻有兩個例外。首先,不成功的SC指令不會引起任何記憶體操作。第二,如果XLEN<64, FLD和FSD指令都可能引發多個記憶體操作,如12.3節所述并在下面澄清。對齊的AMO會産生一個記憶體操作,它同時是加載操作和存儲操作。

The program order over memory operations reflects the order in which the instructions that generate each load and store are logically laid out in that hart’s dynamic instruction stream; i.e., the order in which a simple in-order processor would execute the instructions of that hart. Memory-accessing instructions give rise to memory operations. A memory operation can be either a load operation, a store operation, or both simultaneously. All memory operations are single-copy atomic: they can never be observed in a partially-complete state. Among instructions in RV32GC and RV64GC, each aligned memory instruction gives rise to exactly one memory operation, with two exceptions. First, an unsuccessful SC instruction does not give rise to any memory operations. Second, FLD and FSD instructions may each give rise to multiple memory operations if XLEN<64, as stated in Section 12.3 and clarified below. An aligned AMO gives rise to a single memory operation that is both a load operation and a store operation simultaneously.
RV128基本指令集和未來ISA擴充中的指令,如V (vector)和 P (SIMD)可能産生多個記憶體操作。然而,記憶體模型為這些擴充尚未正式确定。
Instructions in the RV128 base instruction set and in future ISA extensions such as V (vector) and P (SIMD) may give rise to multiple memory operations. However, the memory model for these extensions has not yet been formalized.

錯誤的加載或存儲指令可以被分解成任意粒度的一組元件記憶體操作。XLEN<64的FLD或FSD指令也可以分解為任意粒度的一組元件記憶體操作。由這些指令産生的記憶體操作彼此之間不是按程式順序排序的,但它們是按程式順序按照前一個和後一個指令産生的記憶體操作正常排序的。原子擴充“A”根本不需要執行環境來支援錯位的原子指令;然而,如果通過“Zam”擴充支援錯位原子,則LRs、sc和AMOs可以根據錯位原子的原子性公理進行分解,該公理在第22章中定義。

A misaligned load or store instruction may be decomposed into a set of component memory operations of any granularity. An FLD or FSD instruction for which XLEN<64 may also be decomposed into a set of component memory operations of any granularity. The memory operations generated by such instructions are not ordered with respect to each other in program order, but they are ordered normally with respect to the memory operations generated by preceding and subsequent instructions in program order. The atomics extension “A” does not require execution environments to support misaligned atomic instructions at all; however, if misaligned atomics are supported via the “Zam” extension, LRs, SCs, and AMOs may be decomposed subject to the constraints of the atomicity axiom for misaligned atomics, which is defined in Chapter 22.
将錯誤對齊的記憶體操作分解到位元組粒度,有助于對不支援錯誤對齊通路的實作進行仿真。例如,這樣的實作可能隻是逐個周遊未對齊的通路的位元組。
The decomposition of misaligned memory operations down to byte granularity facilitates emulation on implementations that do not natively support misaligned accesses. Such implementations might, for example, simply iterate over the bytes of a misaligned access one by one.

如果一個LR指令在程式順序上在SC之前,并且在兩者之間沒有其他LR或SC指令,則稱LR指令和SC指令成對;相應的記憶體操作也被認為是成對的(除非SC失敗,沒有生成存儲操作)。決定SC是否必須成功、可能成功或必須失敗的完整條件清單在第8.2節中定義。

An LR instruction and an SC instruction are said to be paired if the LR precedes the SC in program order and if there are no other LR or SC instructions in between; the corresponding memory operations are said to be paired as well (except in case of a failed SC, where no store operation is generated). The complete list of conditions determining whether an SC must succeed, may succeed, or must fail is defined in Section 8.2.

加載和存儲操作還可以從以下集合中攜帶一個或多個排序注釋:" acquire-RCpc “、” acquire-RCsc “、” release-RCpc “和” release-RCsc "。帶有aq集的AMO或LR指令有一個“acquire-RCsc”注釋。帶有rl集的AMO或SC指令具有“release RCsc”注釋。一個同時具有aq和rl集的AMO、LR或SC指令具有“擷取- rcsc”和“釋放- rcsc”注釋。

Load and store operations may also carry one or more ordering annotations from the following set: “acquire-RCpc”, “acquire-RCsc”, “release-RCpc”, and “release-RCsc”. An AMO or LR instruction with aq set has an “acquire-RCsc” annotation. An AMO or SC instruction with rl set has a “release-RCsc” annotation. An AMO, LR, or SC instruction with both aq and rl set has both “acquire-RCsc” and “release-RCsc” annotations.

為了友善起見,我們使用術語“擷取注釋(acquire annotation)”來指代acquire-rcpc注釋或acquire-RCsc注釋。同樣,“釋出注釋(release annotation)”指的是release-rcpc注釋或release-rcsc注釋。“RCpc注釋”是指 acquire-RCpc注釋或釋出release-RCpc注釋。“RCsc注釋”是指acquire-RCsc注釋或release-RCsc注釋。

For convenience, we use the term “acquire annotation” to refer to an acquire-RCpc annotation or an acquire-RCsc annotation. Likewise, a “release annotation” refers to a release-RCpc annotation or a release-RCsc annotation. An “RCpc annotation” refers to an acquire-RCpc annotation or a releaseRCpc annotation. An “RCsc annotation” refers to an acquire-RCsc annotation or a release-RCsc annotation.
在記憶體模型文獻中,術語“RCpc”表示與處理器一緻同步操作的釋出一緻性,術語“RCsc”表示與順序一緻同步操作[5]的釋出一緻性。雖然文獻中對擷取和釋出注釋有許多不同的定義,但在RVWMO的背景下,這些術語由保留程式指令第5-7條定義,簡潔而完整。“RCpc”注釋目前僅在隐式配置設定給每個記憶體通路時使用,每個标準擴充“Ztso”(第23章)。此外,盡管ISA目前不包含本機負載擷取或存儲釋放指令,也不包含RCpc的變體,但RVWMO模型本身被設計為向前相容,可以在未來的擴充中将上述任何或所有指令添加到ISA中。
In the memory model literature, the term “RCpc” stands for release consistency with processor consistent synchronization operations, and the term “RCsc” stands for release consistency with sequentially-consistent synchronization operations [5]. While there are many different definitions for acquire and release annotations in the literature, in the context of RVWMO these terms are concisely and completely defined by Preserved Program Order rules 5–7. “RCpc” annotations are currently only used when implicitly assigned to every memory access per the standard extension “Ztso” (Chapter 23). Furthermore, although the ISA does not currently contain native load-acquire or store-release instructions, nor RCpc variants thereof, the RVWMO model itself is designed to be forwards-compatible with the potential addition of any or all of the above into the ISA in a future extension.

文法依賴

RVWMO記憶體模型的定義部分依賴于文法依賴的概念,定義如下。在定義依賴項的上下文中,“寄存器”指的是整個通用寄存器、CSR的一部分或整個CSR。通過CSR跟蹤依賴關系的粒度特定于每個CSR,在第14.2節中定義。文法依賴是根據指令的源寄存器、指令的目标寄存器以及指令将依賴從源寄存器攜帶到目标寄存器的方式來定義的。本節提供所有這些術語的一般定義;但是,第14.3節提供了每條指令的詳細細節的完整清單。一般來說,如果存在以下任何一種情況,除x0外的寄存器r是指令i的源寄存器:

  • 在i的操作碼中,rs1、rs2或rs3被設定為r
  • i是一個CSR指令,在i的操作碼中,CSR被設定為r,除非i是CSRRW或CSRRWI, rd被設定為x0
  • r是一個CSR和i的隐式源寄存器,如14.3節所定義
  • r是一個CSR,它與i的另一個源寄存器别名

The definition of the RVWMO memory model depends in part on the notion of a syntactic dependency, defined as follows. In the context of defining dependencies, a “register” refers either to an entire general-purpose register, some portion of a CSR, or an entire CSR. The granularity at which dependencies are tracked through CSRs is specific to each CSR and is defined in Section 14.2. Syntactic dependencies are defined in terms of instructions’ source registers, instructions’ destination registers, and the way instructions carry a dependency from their source registers to their destination registers. This section provides a general definition of all of these terms; however, Section 14.3 provides a complete listing of the specifics for each instruction. In general, a register r other than x0 is a source register for an instruction i if any of the following hold:

• In the opcode of i, rs1, rs2, or rs3 is set to r

• i is a CSR instruction, and in the opcode of i, csr is set to r, unless i is CSRRW or CSRRWI and rd is set to x0

• r is a CSR and an implicit source register for i, as defined in Section 14.3

• r is a CSR that aliases with another source register for i

記憶體指令還進一步指定哪些源寄存器是位址源寄存器,哪些是資料源寄存器。

Memory instructions also further specify which source registers are address

source registers and which are data source registers.

一般來說,如果存在以下任何一種情況,則除x0之外的寄存器r是指令i的目标寄存器:

  • 在i的操作碼中,rd被設定為r
  • i是一個CSR指令,在i的操作碼中,CSR被設定為r,除非i是CSRRS或CSRRC, rs1被設定為x0,或者i是CSRRSI或CSRRCI, uimm[4:0]被設定為0。
  • r是一個CSR和i的隐式目标寄存器,如14.3節所定義
  • r是一個CSR,它與i的另一個目标寄存器别名

In general, a register r other than x0 is a destination register for an instruction i if any of the following hold:

• In the opcode of i, rd is set to r

• i is a CSR instruction, and in the opcode of i, csr is set to r, unless i is CSRRS or CSRRC and rs1 is set to x0 or i is CSRRSI or CSRRCI and uimm[4:0] is set to zero.

• r is a CSR and an implicit destination register for i, as defined in Section 14.3

• r is a CSR that aliases with another destination register for i

大多數非記憶體指令從它們的每個源寄存器攜帶一個依賴項到它們的每個目标寄存器。然而,這條規則也有例外;參見14.3節

Most non-memory instructions carry a dependency from each of their source registers to each of their destination registers. However, there are exceptions to this rule; see Section 14.3

指令 j 通過 i 的目标寄存器 s 和 j 的源寄存器 r 對指令 i 有文法依賴,如果下列任何一種保出現:

  • s與r相同,在I和j之間排序的指令程式都沒有r作為目标寄存器
  • 在i和j之間有一個指令m程式,它的順序是:
  1. J通過目标寄存器q和源寄存器r對m有文法依賴性
  2. M通過目标寄存器s和源寄存器p對I有文法依賴性
  3. M攜帶從p到q的依賴關系

Instruction j has a syntactic dependency on instruction i via destination register s of i and source register r of j if either of the following hold:

• s is the same as r, and no instruction program-ordered between i and j has r as a destination register

• There is an instruction m program-ordered between i and j such that all of the following hold:

  1. j has a syntactic dependency on m via destination register q and source register r
  2. m has a syntactic dependency on i via destination register s and source register p
  3. m carries a dependency from p to q

最後,在下面的定義中,設a和b是兩個記憶體操作,設i和j分别是生成a和b的指令。b依賴文法位址注冊如果r是一個位址來源j和句法依賴 i 通過源寄存器r b有一個文法資料依賴如果b是一個存儲操作,r是一個資料源注冊j和j句法依賴 i 通過源寄存器r b有文法控制依賴如果之間有一條指令m program-ordered i和j, m是一個分支或間接跳轉和m有文法對 i 的依賴。

Finally, in the definitions that follow, let a and b be two memory operations, and let i and j be the instructions that generate a and b, respectively. b has a syntactic address dependency on a if r is an address source register for j and j has a syntactic dependency on i via source register r b has a syntactic data dependency on a if b is a store operation, r is a data source register for j, and j has a syntactic dependency on i via source register r b has a syntactic control dependency on a if there is an instruction m program-ordered between i and j such that m is a branch or indirect jump and m has a syntactic dependency on i.
一般來說,非amo load 指令沒有資料源寄存器,無條件的非amo store指令沒有目标寄存器。然而,一條成功的SC指令被認為具有rd中指定的寄存器作為目标寄存器,是以,一條指令可能在文法上依賴于在它之前的一個成功的SC指令。
Generally speaking, non-AMO load instructions do not have data source registers, and unconditional non-AMO store instructions do not have destination registers. However, a successful SC instruction is considered to have the register specified in rd as a destination register, and hence it is possible for an instruction to have a syntactic dependency on a successful SC instruction that precedes it in program order.

Preserved Program Order

任何給定程式執行的全局記憶體順序都尊重部分但不是全部的hart的程式順序。必須被全局記憶體順序尊重的程式順序子集稱為保留程式順序。

The global memory order for any given execution of a program respects some but not all of each hart’s program order. The subset of program order that must be respected by the global memory order is known as preserved program order.

儲存程式順序的完整定義如下(注意AMOs同時是加載和存儲):記憶體操作a在儲存程式順序中先于記憶體操作b(是以也在全局記憶體順序中),如果a在程式順序中先于b, a和b都通路正常的主記憶體(而不是I/O區域),并且以下任何一個都成立:

Overlapping-Address序:
  1. b是一個 store,a和b通路重疊的記憶體位址
  2. a 和 b 是 load,x 是 a 和 b 都讀的位元組,a和b之間沒有按程式順序 store 到x, a 和 b 傳回由不同記憶體操作寫入的x的值
  3. a是由AMO或SC指令生成的,b是一個load, b傳回由a寫入的值
明确同步:
  1. 有一個FENCE指令在b之前指令a
  2. A有一個acquire注釋
  3. B有一個釋出注釋
  4. a和b都有RCsc注釋
  5. A與b配對
文法的依賴性
  1. B對a有一個文法位址依賴性
  2. B對a有文法資料依賴關系
  3. B是一個store,B對a有一個文法控制依賴性
流水線依賴
  1. B是一個load,在a和b之間有一個存儲m,按照程式順序,m有一個位址或資料依賴于a, b傳回一個由m寫入的值
  2. b是一個load,在a和b之間有一個指令m,在程式順序中,m對a有位址依賴性
The complete definition of preserved program order is as follows (and note that AMOs are simultaneously both loads and stores): memory operation a precedes memory operation b in preserved program order (and hence also in the global memory order) if a precedes b in program order, a and b both access regular main memory (rather than I/O regions), and any of the following hold:
Overlapping-Address Orderings:
  1. b is a store, and a and b access overlapping memory addresses
  2. a and b are loads, x is a byte read by both a and b, there is no store to x between a and b in program order, and a and b return values for x written by different memory operations
  3. a is generated by an AMO or SC instruction, b is a load, and b returns a value written by a
Explicit Synchronization
  1. There is a FENCE instruction that orders a before b
  2. a has an acquire annotationfy
  3. b has a release annotation
  4. a and b both have RCsc annotations
  5. a is paired with b
Syntactic Dependencies
  1. b has a syntactic address dependency on a
  2. b has a syntactic data dependency on a
  3. b is a store, and b has a syntactic control dependency on a
Pipeline Dependencies
  1. b is a load, and there exists some store m between a and b in program order such that m has an address or data dependency on a, and b returns a value written by m
  2. b is a store, and there exists some instruction m between a and b in program order such that m has an address dependency on a

Memory Model Axioms

RISC-V程式的執行服從RVWMO記憶體一緻性模型,前提是存在一個符合保留程式順序的全局記憶體順序,并且滿足 load value 公理、原子性公理和進展公理。

An execution of a RISC-V program obeys the RVWMO memory consistency model only if there exists a global memory order conforming to preserved program order and satisfying the load value axiom, the atomicity axiom, and the progress axiom.

Load value 公理每個Load i的每個位元組傳回由以下存儲中全局記憶體順序最新的存儲寫入該位元組的值:

  1. 存儲在全局記憶體順序中寫入該位元組和i之前的位元組的存儲
  2. 存儲按程式順序寫入該位元組和i之前的位元組的資料
Load Value Axiom Each byte of each load i returns the value written to that byte by the store that is the latest in global memory order among the following stores:
  1. Stores that write that byte and that precede i in the global memory order
  2. Stores that write that byte and that precede i in program order

原子性公理(Atomicity Axiom)如果r和w是由hart h中對齊的LR和SC指令生成的配對加載和存儲操作,s是對位元組x的存儲,r傳回s寫入的值,那麼s必須在全局記憶體順序中位于w的前面,并且在全局記憶體順序中,除了h之外不能有從hart到位元組x的存儲。

Atomicity Axiom If r and w are paired load and store operations generated by aligned LR and SC instructions in a hart h, s is a store to byte x, and r returns a value written by s, then s must precede w in the global memory order, and there can be no store from a hart other than h to byte x following s and preceding w in the global memory order.
原子性公理理論上支援LR/SC對的不同寬度和不比對的位址,因為實作允許SC操作在這種情況下成功。然而,在實踐中,我們認為這樣的模式很少,并且不鼓勵使用它們。
The Atomicity Axiom theoretically supports LR/SC pairs of different widths and to mismatched addresses, since implementations are permitted to allow SC operations to succeed in such cases. However, in practice, we expect such patterns to be rare, and their use is discouraged.

進展公理:在全局記憶體順序中,任何記憶體操作都不能被無限序列的其他記憶體操作先于。

Progress Axiom No memory operation may be preceded in the global memory order by an infinite sequence of other memory operations.

後記

後面的都是表格居多,RVWMO 内容還是非常多的,目前的計劃是後續會将其分成幾部分,并且加上更多的 RFC。文中引用了相當多的外部資料,會給這些資料一些簡單的解釋。

繼續閱讀