天天看点

risc-v 文档翻译: RVWMO Memory ConsistencyModel

前言

多核处理器通过共享内存进行通信,如果内存操作乱序,相当于通信顺序乱序,而没有编写过多线程的程序员可能意识不到这一点,假设了通信是顺序执行的,这会对程序正确性产生影响。

这部分在非特权级 14 章部分,目前版本为 0.1,我们开始吧。

总览

本章定义了RISC-V内存一致性模型。内存一致性模型是一组规则,指定 load 内存返回的值。RISC-V使用一种称为“RVWMO”(RISC-V弱内存序)的内存模型,该模型旨在为架构师提供构建高性能可伸缩设计的灵活性,同时支持可处理的编程模型。

This chapter defines the RISC-V memory consistency model. A memory consistency model is a set of rules specifying the values that can be returned by loads of memory. RISC-V uses a memory model called “RVWMO” (RISC-V Weak Memory Ordering) which is designed to provide flexibility for architects to build high-performance scalable designs while simultaneously supporting a tractable programming model.

在RVWMO下,从同一 hart 中的其他内存指令的角度来看,在单个hart上运行的代码是按顺序执行的,但来自另一个hart的内存指令可能观察到来自第一个 hart 的内存指令以不同的顺序执行。因此,多线程代码可能需要显式同步来保证来自不同hart的内存指令之间的顺序。基本的RISC-V ISA为此提供了一个FENCE指令,如2.7节所述,而原子扩展“A”额外定义了负载保留/存储条件和原子读-修改-写指令

Under RVWMO, code running on a single hart appears to execute in order from the perspective of other memory instructions in the same hart, but memory instructions from another hart may observe the memory instructions from the first hart being executed in a different order. Therefore, multithreaded code may require explicit synchronization to guarantee ordering between memory instructions from different harts. The base RISC-V ISA provides a FENCE instruction for this purpose, described in Section 2.7, while the atomics extension “A” additionally defines load reserved/store-conditional and atomic read-modify-write instructions

用于非对齐原子访问的标准 ISA 扩展“Zam”(第22章)和用于 TSO 内存模型的标准ISA扩展“Ztso”(第23章)用特定的附加规则增强了 RVWMO。

The standard ISA extension for misaligned atomics “Zam” (Chapter 22) and the standard ISA extension for total store ordering “Ztso” (Chapter 23) augment RVWMO with additional rules specific to those extensions.

该规范的附录提供了内存一致性模型的公理和操作语义形式,以及额外的解释性材料。

The appendices to this specification provide both axiomatic and operational formalizations of the memory consistency model as well as additional explanatory material.

本章为常规的内存操作定义了内存模型。而内存模型与I/O内存、指令获取、FENCE.I,页表遍历(page table walks),和SFENCE.VMA的交互尚未正式形式化。上述部分或全部可能在本规范的未来修订中正式确定。RV128基础ISA和未来的ISA扩展(如“V”向量、“T”事务内存和“J”JIT扩展)也需要合并到未来的修订版中。

同时支持不同宽度的重叠内存访问的内存一致性模型仍然是学术研究的一个活跃领域,但尚未得到充分理解。关于不同大小的内存访问在RVWMO下如何交互的细节已经尽了我们目前的最大能力,但是如果发现了新的问题,它们可能会进行修订。

This chapter defines the memory model for regular main memory operations. The interaction of the memory model with I/O memory, instruction fetches, FENCE.I, page table walks, and SFENCE.VMA is not (yet) formalized. Some or all of the above may be formalized in a future revision of this specification. The RV128 base ISA and future ISA extensions such as the “V” vector, “T” transactional memory, and “J” JIT extensions will need to be incorporated into a future revision as well.
同时支持不同宽度的重叠内存访问的内存一致性模型仍然是学术研究的一个活跃领域,但尚未得到充分理解。关于不同大小的内存访问在RVWMO下如何交互的细节已经尽了我们目前的最大能力,但是如果发现了新的问题,它们可能会进行修订。
Memory consistency models supporting overlapping memory accesses of different widths simultaneously remain an active area of academic research and are not yet fully understood. The specifics of how memory accesses of different sizes interact under RVWMO are specified to the best of our current abilities, but they are subject to revision should new issues be uncovered.

14.1 RVWMO 的定义

RFC: 对于global memory order, 具体内容和《Time, Clocks, and the Ordering of Events in a Distributed System》这篇论文提到的 partial order 挺像, 应该指的是一个东西

RVWMO内存模型是根据 global memory order 定义的,即所有hart产生的内存操作的总顺序。一般来说,多线程程序有许多不同的可能执行,每个执行都有自己相应的全局内存顺序。全局内存顺序是在由内存指令生成的基本加载和存储操作上定义的。然后,它受制于本章其余部分定义的约束。任何满足所有内存模型约束的执行都是合法的执行(就内存模型而言)。

The RVWMO memory model is defined in terms of the global memory order, a total ordering of the memory operations produced by all harts. In general, a multithreaded program has many different possible executions, with each execution having its own corresponding global memory order. The global memory order is defined over the primitive load and store operations generated by memory instructions. It is then subject to the constraints defined in the rest of this chapter. Any execution satisfying all of the memory model constraints is a legal execution (as far as the memory model is concerned).

内存模型原语

内存操作的程序顺序反映了生成每次加载和存储的指令在该hart的动态指令流中的逻辑布局顺序;也就是说,一个简单的按顺序处理器执行该hart指令的顺序。内存访问指令产生内存操作。内存操作可以是加载操作、存储操作,或者同时是加载操作和存储操作。所有的内存操作都是单拷贝原子:它们永远不会在部分完成状态下被观察到。在RV32GC和RV64GC中的指令中,每个对齐的内存指令只产生一个内存操作,只有两个例外。首先,不成功的SC指令不会引起任何内存操作。第二,如果XLEN<64, FLD和FSD指令都可能引发多个内存操作,如12.3节所述并在下面澄清。对齐的AMO会产生一个内存操作,它同时是加载操作和存储操作。

The program order over memory operations reflects the order in which the instructions that generate each load and store are logically laid out in that hart’s dynamic instruction stream; i.e., the order in which a simple in-order processor would execute the instructions of that hart. Memory-accessing instructions give rise to memory operations. A memory operation can be either a load operation, a store operation, or both simultaneously. All memory operations are single-copy atomic: they can never be observed in a partially-complete state. Among instructions in RV32GC and RV64GC, each aligned memory instruction gives rise to exactly one memory operation, with two exceptions. First, an unsuccessful SC instruction does not give rise to any memory operations. Second, FLD and FSD instructions may each give rise to multiple memory operations if XLEN<64, as stated in Section 12.3 and clarified below. An aligned AMO gives rise to a single memory operation that is both a load operation and a store operation simultaneously.
RV128基本指令集和未来ISA扩展中的指令,如V (vector)和 P (SIMD)可能产生多个内存操作。然而,内存模型为这些扩展尚未正式确定。
Instructions in the RV128 base instruction set and in future ISA extensions such as V (vector) and P (SIMD) may give rise to multiple memory operations. However, the memory model for these extensions has not yet been formalized.

错误的加载或存储指令可以被分解成任意粒度的一组组件内存操作。XLEN<64的FLD或FSD指令也可以分解为任意粒度的一组组件内存操作。由这些指令产生的内存操作彼此之间不是按程序顺序排序的,但它们是按程序顺序按照前一个和后一个指令产生的内存操作正常排序的。原子扩展“A”根本不需要执行环境来支持错位的原子指令;然而,如果通过“Zam”扩展支持错位原子,则LRs、sc和AMOs可以根据错位原子的原子性公理进行分解,该公理在第22章中定义。

A misaligned load or store instruction may be decomposed into a set of component memory operations of any granularity. An FLD or FSD instruction for which XLEN<64 may also be decomposed into a set of component memory operations of any granularity. The memory operations generated by such instructions are not ordered with respect to each other in program order, but they are ordered normally with respect to the memory operations generated by preceding and subsequent instructions in program order. The atomics extension “A” does not require execution environments to support misaligned atomic instructions at all; however, if misaligned atomics are supported via the “Zam” extension, LRs, SCs, and AMOs may be decomposed subject to the constraints of the atomicity axiom for misaligned atomics, which is defined in Chapter 22.
将错误对齐的内存操作分解到字节粒度,有助于对不支持错误对齐访问的实现进行仿真。例如,这样的实现可能只是逐个遍历未对齐的访问的字节。
The decomposition of misaligned memory operations down to byte granularity facilitates emulation on implementations that do not natively support misaligned accesses. Such implementations might, for example, simply iterate over the bytes of a misaligned access one by one.

如果一个LR指令在程序顺序上在SC之前,并且在两者之间没有其他LR或SC指令,则称LR指令和SC指令成对;相应的内存操作也被认为是成对的(除非SC失败,没有生成存储操作)。决定SC是否必须成功、可能成功或必须失败的完整条件列表在第8.2节中定义。

An LR instruction and an SC instruction are said to be paired if the LR precedes the SC in program order and if there are no other LR or SC instructions in between; the corresponding memory operations are said to be paired as well (except in case of a failed SC, where no store operation is generated). The complete list of conditions determining whether an SC must succeed, may succeed, or must fail is defined in Section 8.2.

加载和存储操作还可以从以下集合中携带一个或多个排序注释:" acquire-RCpc “、” acquire-RCsc “、” release-RCpc “和” release-RCsc "。带有aq集的AMO或LR指令有一个“acquire-RCsc”注释。带有rl集的AMO或SC指令具有“release RCsc”注释。一个同时具有aq和rl集的AMO、LR或SC指令具有“获取- rcsc”和“释放- rcsc”注释。

Load and store operations may also carry one or more ordering annotations from the following set: “acquire-RCpc”, “acquire-RCsc”, “release-RCpc”, and “release-RCsc”. An AMO or LR instruction with aq set has an “acquire-RCsc” annotation. An AMO or SC instruction with rl set has a “release-RCsc” annotation. An AMO, LR, or SC instruction with both aq and rl set has both “acquire-RCsc” and “release-RCsc” annotations.

为了方便起见,我们使用术语“获取注释(acquire annotation)”来指代acquire-rcpc注释或acquire-RCsc注释。同样,“发布注释(release annotation)”指的是release-rcpc注释或release-rcsc注释。“RCpc注释”是指 acquire-RCpc注释或发布release-RCpc注释。“RCsc注释”是指acquire-RCsc注释或release-RCsc注释。

For convenience, we use the term “acquire annotation” to refer to an acquire-RCpc annotation or an acquire-RCsc annotation. Likewise, a “release annotation” refers to a release-RCpc annotation or a release-RCsc annotation. An “RCpc annotation” refers to an acquire-RCpc annotation or a releaseRCpc annotation. An “RCsc annotation” refers to an acquire-RCsc annotation or a release-RCsc annotation.
在内存模型文献中,术语“RCpc”表示与处理器一致同步操作的发布一致性,术语“RCsc”表示与顺序一致同步操作[5]的发布一致性。虽然文献中对获取和发布注释有许多不同的定义,但在RVWMO的背景下,这些术语由保留程序命令第5-7条定义,简洁而完整。“RCpc”注释目前仅在隐式分配给每个内存访问时使用,每个标准扩展“Ztso”(第23章)。此外,尽管ISA目前不包含本机负载获取或存储释放指令,也不包含RCpc的变体,但RVWMO模型本身被设计为向前兼容,可以在未来的扩展中将上述任何或所有指令添加到ISA中。
In the memory model literature, the term “RCpc” stands for release consistency with processor consistent synchronization operations, and the term “RCsc” stands for release consistency with sequentially-consistent synchronization operations [5]. While there are many different definitions for acquire and release annotations in the literature, in the context of RVWMO these terms are concisely and completely defined by Preserved Program Order rules 5–7. “RCpc” annotations are currently only used when implicitly assigned to every memory access per the standard extension “Ztso” (Chapter 23). Furthermore, although the ISA does not currently contain native load-acquire or store-release instructions, nor RCpc variants thereof, the RVWMO model itself is designed to be forwards-compatible with the potential addition of any or all of the above into the ISA in a future extension.

语法依赖

RVWMO内存模型的定义部分依赖于语法依赖的概念,定义如下。在定义依赖项的上下文中,“寄存器”指的是整个通用寄存器、CSR的一部分或整个CSR。通过CSR跟踪依赖关系的粒度特定于每个CSR,在第14.2节中定义。语法依赖是根据指令的源寄存器、指令的目标寄存器以及指令将依赖从源寄存器携带到目标寄存器的方式来定义的。本节提供所有这些术语的一般定义;但是,第14.3节提供了每条指令的详细细节的完整列表。一般来说,如果存在以下任何一种情况,除x0外的寄存器r是指令i的源寄存器:

  • 在i的操作码中,rs1、rs2或rs3被设置为r
  • i是一个CSR指令,在i的操作码中,CSR被设置为r,除非i是CSRRW或CSRRWI, rd被设置为x0
  • r是一个CSR和i的隐式源寄存器,如14.3节所定义
  • r是一个CSR,它与i的另一个源寄存器别名

The definition of the RVWMO memory model depends in part on the notion of a syntactic dependency, defined as follows. In the context of defining dependencies, a “register” refers either to an entire general-purpose register, some portion of a CSR, or an entire CSR. The granularity at which dependencies are tracked through CSRs is specific to each CSR and is defined in Section 14.2. Syntactic dependencies are defined in terms of instructions’ source registers, instructions’ destination registers, and the way instructions carry a dependency from their source registers to their destination registers. This section provides a general definition of all of these terms; however, Section 14.3 provides a complete listing of the specifics for each instruction. In general, a register r other than x0 is a source register for an instruction i if any of the following hold:

• In the opcode of i, rs1, rs2, or rs3 is set to r

• i is a CSR instruction, and in the opcode of i, csr is set to r, unless i is CSRRW or CSRRWI and rd is set to x0

• r is a CSR and an implicit source register for i, as defined in Section 14.3

• r is a CSR that aliases with another source register for i

内存指令还进一步指定哪些源寄存器是地址源寄存器,哪些是数据源寄存器。

Memory instructions also further specify which source registers are address

source registers and which are data source registers.

一般来说,如果存在以下任何一种情况,则除x0之外的寄存器r是指令i的目标寄存器:

  • 在i的操作码中,rd被设置为r
  • i是一个CSR指令,在i的操作码中,CSR被设置为r,除非i是CSRRS或CSRRC, rs1被设置为x0,或者i是CSRRSI或CSRRCI, uimm[4:0]被设置为0。
  • r是一个CSR和i的隐式目标寄存器,如14.3节所定义
  • r是一个CSR,它与i的另一个目标寄存器别名

In general, a register r other than x0 is a destination register for an instruction i if any of the following hold:

• In the opcode of i, rd is set to r

• i is a CSR instruction, and in the opcode of i, csr is set to r, unless i is CSRRS or CSRRC and rs1 is set to x0 or i is CSRRSI or CSRRCI and uimm[4:0] is set to zero.

• r is a CSR and an implicit destination register for i, as defined in Section 14.3

• r is a CSR that aliases with another destination register for i

大多数非内存指令从它们的每个源寄存器携带一个依赖项到它们的每个目标寄存器。然而,这条规则也有例外;参见14.3节

Most non-memory instructions carry a dependency from each of their source registers to each of their destination registers. However, there are exceptions to this rule; see Section 14.3

指令 j 通过 i 的目标寄存器 s 和 j 的源寄存器 r 对指令 i 有语法依赖,如果下列任何一种保出现:

  • s与r相同,在I和j之间排序的指令程序都没有r作为目标寄存器
  • 在i和j之间有一个指令m程序,它的顺序是:
  1. J通过目标寄存器q和源寄存器r对m有语法依赖性
  2. M通过目标寄存器s和源寄存器p对I有语法依赖性
  3. M携带从p到q的依赖关系

Instruction j has a syntactic dependency on instruction i via destination register s of i and source register r of j if either of the following hold:

• s is the same as r, and no instruction program-ordered between i and j has r as a destination register

• There is an instruction m program-ordered between i and j such that all of the following hold:

  1. j has a syntactic dependency on m via destination register q and source register r
  2. m has a syntactic dependency on i via destination register s and source register p
  3. m carries a dependency from p to q

最后,在下面的定义中,设a和b是两个内存操作,设i和j分别是生成a和b的指令。b依赖语法地址注册如果r是一个地址来源j和句法依赖 i 通过源寄存器r b有一个语法数据依赖如果b是一个存储操作,r是一个数据源注册j和j句法依赖 i 通过源寄存器r b有语法控制依赖如果之间有一条指令m program-ordered i和j, m是一个分支或间接跳转和m有语法对 i 的依赖。

Finally, in the definitions that follow, let a and b be two memory operations, and let i and j be the instructions that generate a and b, respectively. b has a syntactic address dependency on a if r is an address source register for j and j has a syntactic dependency on i via source register r b has a syntactic data dependency on a if b is a store operation, r is a data source register for j, and j has a syntactic dependency on i via source register r b has a syntactic control dependency on a if there is an instruction m program-ordered between i and j such that m is a branch or indirect jump and m has a syntactic dependency on i.
一般来说,非amo load 指令没有数据源寄存器,无条件的非amo store指令没有目标寄存器。然而,一条成功的SC指令被认为具有rd中指定的寄存器作为目标寄存器,因此,一条指令可能在语法上依赖于在它之前的一个成功的SC指令。
Generally speaking, non-AMO load instructions do not have data source registers, and unconditional non-AMO store instructions do not have destination registers. However, a successful SC instruction is considered to have the register specified in rd as a destination register, and hence it is possible for an instruction to have a syntactic dependency on a successful SC instruction that precedes it in program order.

Preserved Program Order

任何给定程序执行的全局内存顺序都尊重部分但不是全部的hart的程序顺序。必须被全局内存顺序尊重的程序顺序子集称为保留程序顺序。

The global memory order for any given execution of a program respects some but not all of each hart’s program order. The subset of program order that must be respected by the global memory order is known as preserved program order.

保存程序顺序的完整定义如下(注意AMOs同时是加载和存储):内存操作a在保存程序顺序中先于内存操作b(因此也在全局内存顺序中),如果a在程序顺序中先于b, a和b都访问常规的主内存(而不是I/O区域),并且以下任何一个都成立:

Overlapping-Address序:
  1. b是一个 store,a和b访问重叠的内存地址
  2. a 和 b 是 load,x 是 a 和 b 都读的字节,a和b之间没有按程序顺序 store 到x, a 和 b 返回由不同内存操作写入的x的值
  3. a是由AMO或SC指令生成的,b是一个load, b返回由a写入的值
明确同步:
  1. 有一个FENCE指令在b之前命令a
  2. A有一个acquire注释
  3. B有一个发布注释
  4. a和b都有RCsc注释
  5. A与b配对
语法的依赖性
  1. B对a有一个语法地址依赖性
  2. B对a有语法数据依赖关系
  3. B是一个store,B对a有一个语法控制依赖性
流水线依赖
  1. B是一个load,在a和b之间有一个存储m,按照程序顺序,m有一个地址或数据依赖于a, b返回一个由m写入的值
  2. b是一个load,在a和b之间有一个指令m,在程序顺序中,m对a有地址依赖性
The complete definition of preserved program order is as follows (and note that AMOs are simultaneously both loads and stores): memory operation a precedes memory operation b in preserved program order (and hence also in the global memory order) if a precedes b in program order, a and b both access regular main memory (rather than I/O regions), and any of the following hold:
Overlapping-Address Orderings:
  1. b is a store, and a and b access overlapping memory addresses
  2. a and b are loads, x is a byte read by both a and b, there is no store to x between a and b in program order, and a and b return values for x written by different memory operations
  3. a is generated by an AMO or SC instruction, b is a load, and b returns a value written by a
Explicit Synchronization
  1. There is a FENCE instruction that orders a before b
  2. a has an acquire annotationfy
  3. b has a release annotation
  4. a and b both have RCsc annotations
  5. a is paired with b
Syntactic Dependencies
  1. b has a syntactic address dependency on a
  2. b has a syntactic data dependency on a
  3. b is a store, and b has a syntactic control dependency on a
Pipeline Dependencies
  1. b is a load, and there exists some store m between a and b in program order such that m has an address or data dependency on a, and b returns a value written by m
  2. b is a store, and there exists some instruction m between a and b in program order such that m has an address dependency on a

Memory Model Axioms

RISC-V程序的执行服从RVWMO内存一致性模型,前提是存在一个符合保留程序顺序的全局内存顺序,并且满足 load value 公理、原子性公理和进展公理。

An execution of a RISC-V program obeys the RVWMO memory consistency model only if there exists a global memory order conforming to preserved program order and satisfying the load value axiom, the atomicity axiom, and the progress axiom.

Load value 公理每个Load i的每个字节返回由以下存储中全局内存顺序最新的存储写入该字节的值:

  1. 存储在全局内存顺序中写入该字节和i之前的字节的存储
  2. 存储按程序顺序写入该字节和i之前的字节的数据
Load Value Axiom Each byte of each load i returns the value written to that byte by the store that is the latest in global memory order among the following stores:
  1. Stores that write that byte and that precede i in the global memory order
  2. Stores that write that byte and that precede i in program order

原子性公理(Atomicity Axiom)如果r和w是由hart h中对齐的LR和SC指令生成的配对加载和存储操作,s是对字节x的存储,r返回s写入的值,那么s必须在全局内存顺序中位于w的前面,并且在全局内存顺序中,除了h之外不能有从hart到字节x的存储。

Atomicity Axiom If r and w are paired load and store operations generated by aligned LR and SC instructions in a hart h, s is a store to byte x, and r returns a value written by s, then s must precede w in the global memory order, and there can be no store from a hart other than h to byte x following s and preceding w in the global memory order.
原子性公理理论上支持LR/SC对的不同宽度和不匹配的地址,因为实现允许SC操作在这种情况下成功。然而,在实践中,我们认为这样的模式很少,并且不鼓励使用它们。
The Atomicity Axiom theoretically supports LR/SC pairs of different widths and to mismatched addresses, since implementations are permitted to allow SC operations to succeed in such cases. However, in practice, we expect such patterns to be rare, and their use is discouraged.

进展公理:在全局内存顺序中,任何内存操作都不能被无限序列的其他内存操作先于。

Progress Axiom No memory operation may be preceded in the global memory order by an infinite sequence of other memory operations.

后记

后面的都是表格居多,RVWMO 内容还是非常多的,目前的计划是后续会将其分成几部分,并且加上更多的 RFC。文中引用了相当多的外部资料,会给这些资料一些简单的解释。

继续阅读