以太坊EVM源码分析之数据结构

EVM代码整体结构

EVM相关的源码目录结构：

~/go-ethereum-master/core/vm# tree
.
├── analysis.go                     // 分析合约字节码，标记是否是跳转目标(jumpdest)
├── analysis_test.go
├── common.go                       // 一些公用的方法
├── contract.go                     // 智能合约数据结构
├── contracts.go                    // 预编译合约集
├── contracts_test.go
├── doc.go
├── eips.go                         // 一些EIP的实现
├── errors.go                       // 列出执行时错误
├── evm.go                          // 执行器 提供一些对外接口
├── gas.go                          // call gas花费计算 一级指令耗费gas级别
├── gas_table.go                    // 各个指令对应的计算耗费gas的函数 
├── gas_table_test.go
├── gen_structlog.go
├── instructions.go                 // 指令对应的执行函数
├── instructions_test.go
├── interface.go                    // StateDB接口、EVM调用约定基本接口CallContext
├── interpreter.go                  // 解释器 调用核心
├── intpool.go                      // int值池 用来加速bit.Int的分配。 辅助对象
├── intpool_test.go
├── int_pool_verifier_empty.go
├── int_pool_verifier.go
├── jump_table.go                   // 指令和指令操作（操作，花费，验证）对应表
├── logger.go                       // logger、Tracer  辅助对象
├── logger_json.go
├── logger_test.go
├── memory.go                       // 内存模型及其访问函数
├── memory_table.go                 // EVM 内存操作表 计算指令所需内存大小
├── opcodes.go                      // EVM指令集
├── stack.go                        // 堆栈及其方法
└── stack_table.go                  // 一些栈的辅助函数, minStack、maxStack等

~/go-ethereum-master/core# tree
.
├── evm.go                          // EVM用到的一些函数
├── gaspool.go                      // GasPool实时记录区块在执行交易期间可用的gas量
├── state_processor.go              // 处理状态转移
├── state_transition.go             // 状态转换模型
└── types                           // 一些核心数据结构
    ├── block.go                    // block、blockHeader
    ├── log.go                      // log
    ├── receipt.go                  // receipt
    ├── transaction.go              // transaction、message
    └── transaction_signing.go

以太坊EVM源码注释之数据结构以太坊EVM源码分析之数据结构

这是网上找到的一张EVM模块的整体结构图，有些已经发生变化。到目前为止(2020.02.25)，EVM的指令集版本已经有7个了。operation的字段也有一些改动。

core/vm/jump_table.go

// 指令集, 下面定义了7种指令集,针对7种不同的以太坊版本
var (
	frontierInstructionSet         = newFrontierInstructionSet()
	homesteadInstructionSet        = newHomesteadInstructionSet()
	tangerineWhistleInstructionSet = newTangerineWhistleInstructionSet()
	spuriousDragonInstructionSet   = newSpuriousDragonInstructionSet()
	byzantiumInstructionSet        = newByzantiumInstructionSet()
	constantinopleInstructionSet   = newConstantinopleInstructionSet()
	istanbulInstructionSet         = newIstanbulInstructionSet()
)

EVM的代码结构要比想象的简单。EVM涉及的核心对象有解释器

Interpreter

、解释器的配置选项

Config

、为EVM提供辅助信息的执行上下文

Context

以及用于完整状态查询的EVM数据库

stateDB

。

从上图可以看出，EVM通过解释器运行智能合约，而解释器依赖于

config

的核心结构：

JumpTable [256]operation

，

JumpTable

的下标是操作码，

JumpTable[opCode]

对应的

operation

对象存储了指令对应的处理逻辑, gas计算函数, 堆栈验证方法, memory使用的大小以及一些flag。

以太坊的不同版本对应着不同的

JumpTable

，只有

frontierInstructionSet

的初始化函数中初始化了基本指令的

operation

对象,之后的版本都是对前一个版本的修修补补：首先生成前一个版本的指令，然后应用一些EIP，增加自己特有的指令，或者改动某些指令。例如最新的

Istanbul

版本：

core/vm/jump_table.go

// newIstanbulInstructionSet returns the frontier, homestead
// byzantium, contantinople and petersburg instructions.
// 先初始化前一个版本Constantinople的指令集，然后应用一些EIP.
func newIstanbulInstructionSet() JumpTable {
	instructionSet := newConstantinopleInstructionSet()

	enable1344(&instructionSet) // ChainID opcode - https://eips.ethereum.org/EIPS/eip-1344
	enable1884(&instructionSet) // Reprice reader opcodes - https://eips.ethereum.org/EIPS/eip-1884
	enable2200(&instructionSet) // Net metered SSTORE - https://eips.ethereum.org/EIPS/eip-2200

	return instructionSet
}

Contract

EVM是智能合约的运行时环境，因此我们有必要了解一下合约的结构以及比较重要的方法。

core/vm/contract.go

// ContractRef is a reference to the contract's backing object
// Contrtref是对背后的合约对象的引用
type ContractRef interface {
	Address() common.Address // Address方法返回合约地址
}

// Contract represents an ethereum contract in the state database. It contains
// the contract code, calling arguments. Contract implements ContractRef
// Contract在状态数据库中表示一个以太坊合约。它包含合约代码，调用参数。
// Contract 实现 ContractRef接口
type Contract struct {
	// CallerAddress is the result of the caller which initialised this
	// contract. However when the "call method" is delegated this value
	// needs to be initialised to that of the caller's caller.
	// CallerAddress是初始化此合约的调用者的结果。
	// 然而，当“调用方法”被委托时，需要将此值初始化为调用者的调用者的地址。
	CallerAddress common.Address
	caller        ContractRef // 调用者
	self          ContractRef // 合约自身

    // JUMPDEST分析结果聚合。
    // 实际是合约字节码对应字节是指令还是普通数据的分析结果，若是指令，则可以作为jumpdest。
	jumpdests map[common.Hash]bitvec // Aggregated result of JUMPDEST analysis.
	// 本地保存本合约的代码JUMPDEST分析结果，不保存在调用者上下文中
	analysis bitvec // Locally cached result of JUMPDEST analysis

	Code     []byte          // 代码
	CodeHash common.Hash     // 代码hash
	CodeAddr *common.Address // 代码地址
	Input    []byte          // 合约输入的参数

	Gas   uint64   // Gas数量
	value *big.Int // 携带的数据，如交易的数额
}

构造函数

core/vm/contract.go

// NewContract returns a new contract environment for the execution of EVM.
// NewContract 为EVM的执行返回一个新的合约环境
func NewContract(caller ContractRef, object ContractRef, value *big.Int, gas uint64) *Contract {
	// 初始化Contract对象
	c := &Contract{CallerAddress: caller.Address(), caller: caller, self: object}

	// 将ContractRef接口类转换为Contarct具体类型，当成功标志为真时，
	// 表示成功将接口转换为具体类型，否则表示该接口不是具体类型的实例。
	if parent, ok := caller.(*Contract); ok {
		// Reuse JUMPDEST analysis from parent context if available.
		// 重用调用者上下文中的JUMPDEST
		c.jumpdests = parent.jumpdests
	} else {
		// 初始化新的jumpdests
		c.jumpdests = make(map[common.Hash]bitvec)
	}

	// Gas should be a pointer so it can safely be reduced through the run
	// Gas应为一个指针，这样它可以通过run方法安全地减少
	// This pointer will be off the state transition
	// 这个指针将脱离状态转换
	c.Gas = gas
	// ensures a value is set
	// 确保value被设置
	c.value = value

	return c //返回合约指针
}

方法

core/vm/contract.go

// Contract结构方法
// 判断跳转目标是否有效
func (c *Contract) validJumpdest(dest *big.Int) bool {
	udest := dest.Uint64() //将目标转换为Uint64类型
	// PC cannot go beyond len(code) and certainly can't be bigger than 63bits.
	// PC大小不能超过代码长度，并且位数不大于63位
	// Don't bother checking for JUMPDEST in that case.
	// 在这种情况下，不必检查JUMPDEST。
	if dest.BitLen() >= 63 || udest >= uint64(len(c.Code)) {
		return false
	}
	// Only JUMPDESTs allowed for destinations
	// 只有JUMPDEST可以成为跳转目标
	if OpCode(c.Code[udest]) != JUMPDEST {
		return false
	}

	// 下面的代码检查目的地的值是否是指令，而非普通数据

	// Do we have a contract hash already?
	// 我们已经有一个合约hash了吗?
	if c.CodeHash != (common.Hash{}) { //若不是空hash
		// Does parent context have the analysis?
		// 调用者上下文是否已经有一个分析，c.jumpdests = parent.jumpdests
		// go 中map 是引用类型,因此在这里c.jumpdests与parent.jumpdests指向同一结构，同一片内存区域
		analysis, exist := c.jumpdests[c.CodeHash] // 查看元素是否存在
		if !exist {                                //若不存在
			// Do the analysis and save in parent context
			// 进行分析并保存在调用者上下文中
			// We do not need to store it in c.analysis
			// 不需要在c.analysis存储结果
			analysis = codeBitmap(c.Code)
			//由于Map是引用类型，改变c.jumpdests等同于改变parent.jumpdests，键是各自的代码hash
			c.jumpdests[c.CodeHash] = analysis
		}
		// 检查跳转位置是否在代码段中，并返回结果
		return analysis.codeSegment(udest)
	}
	// We don't have the code hash, most likely a piece of initcode not already
	// in state trie. In that case, we do an analysis, and save it locally, so
	// we don't have to recalculate it for every JUMP instruction in the execution
	// However, we don't save it within the parent context
	// 我们还没有代码hash, 很可能是因为一部分初试代码还没有保存到状态树中。
	// 在那种情况下，我们进行代码分析并局部保存，在执行过程中我们就不必为每条跳转指令重新进行代码分析
	// 然而，我们并没有将分析结果保存在调用者上下文中
	// 一般是因为新合约创建,还未将合约写入状态数据库。
	if c.analysis == nil {
		c.analysis = codeBitmap(c.Code)
	}
	return c.analysis.codeSegment(udest)
}

// AsDelegate sets the contract to be a delegate call and returns the current
// contract (for chaining calls)
// AsDelegate将合约设置为委托调用并返回当前合约(用于链式调用)
func (c *Contract) AsDelegate() *Contract {
	// NOTE: caller must, at all times be a contract. It should never happen
	// that caller is something other than a Contract.
	// 注:调用者在任何时候都必须是一个合约，不应该是合约以外的东西。
	parent := c.caller.(*Contract)         //调用者的合约对象
	c.CallerAddress = parent.CallerAddress //将调用者地址设置为调用者的调用者的地址
	c.value = parent.value                 //值也是

	return c //返回当前合约
}

Contract在EVM中的使用

Transaction

被转换成

Message

后传入

EVM

，在调用

EVM.Call

或者

EVM.Create

时，会将

Message

转换为

Contract

对象，以便后续执行。转换过程如图所示,合约代码从相应的状态数据库地址获取，然后加载到合约对象中。

以太坊EVM源码注释之数据结构以太坊EVM源码分析之数据结构

EVM

core/vm/evm.go

// EVM is the Ethereum Virtual Machine base object and provides
// the necessary tools to run a contract on the given state with
// the provided context. It should be noted that any error
// generated through any of the calls should be considered a
// revert-state-and-consume-all-gas operation, no checks on
// specific errors should ever be performed. The interpreter makes
// sure that any errors generated are to be considered faulty code.
// EVM 是以太坊虚拟机的基本对象，并且提供必要的工具，以便在给定的状态下使用提供的上下文运行合约。
// 应该注意的是，通过任何调用产生的任何错误都会导致状态回滚并消耗掉所有gas，
// 不应该执行任何对特定错误的检查。解释器确保产生的任何错误都被认为是错误代码。
//
// The EVM should never be reused and is not thread safe.
// EVM不应该被重用，而且也不是线程安全的。
type EVM struct {
	// Context provides auxiliary blockchain related information
	// Context提供区块链相关的辅助信息 提供访问当前区块链数据和挖矿环境的函数和数据
	Context
	// StateDB gives access to the underlying state
	// StateDB 以太坊状态数据库对象 提供对底层状态的访问
	StateDB StateDB
	// Depth is the current call stack
	// Depth 是当前调用堆栈
	depth int

	// chainConfig contains information about the current chain
	// chainConfig包括当前链的配置信息 当前节点的区块链配置信息
	chainConfig *params.ChainConfig
	// chain rules contains the chain rules for the current epoch
	// chainRules包含当前阶段的链规则
	chainRules params.Rules
	// virtual machine configuration options used to initialise the
	// evm.
	// vmConfig 是用于初始化evm的虚拟机配置选项。 虚拟机配置信息
	vmConfig Config
	// global (to this context) ethereum virtual machine
	// used throughout the execution of the tx.
	// 交易执行所采用的全局(对于这个上下文来说)以太坊虚拟机
	interpreters []Interpreter
	interpreter  Interpreter
	// abort is used to abort the EVM calling operations
	// NOTE: must be set atomically
	// abort用来终止EVM的调用操作
	// 注意：设置时必须是原子操作
	abort int32
	// callGasTemp holds the gas available for the current call. This is needed because the
	// available gas is calculated in gasCall* according to the 63/64 rule and later
	// applied in opCall*.
	// callGasTemp 保存当前调用可用的gas。这是必要的，因为可用的gas是根据63/64规则在gasCall*中计算的，之后应用在opCall*中。
	// 
	// 除去父合约在内存等方面花去的杂七杂八的gas成本，实际用于执行子合约的gas。也就是子合约可以使用的gas数量。
	callGasTemp uint64
}

构造函数

core/vm/evm.go

// NewEVM returns a new EVM. The returned EVM is not thread safe and should
// only ever be used *once*.
// NewEVM是EVM的构造函数。返回的EVM不是线程安全的，应该只使用*一次*。
func NewEVM(ctx Context, statedb StateDB, chainConfig *params.ChainConfig, vmConfig Config) *EVM {
	// 初始化字段
	evm := &EVM{
		Context:      ctx,
		StateDB:      statedb,
		vmConfig:     vmConfig,
		chainConfig:  chainConfig,
		chainRules:   chainConfig.Rules(ctx.BlockNumber),
		interpreters: make([]Interpreter, 0, 1), // 长度0，容量1
	}

	if chainConfig.IsEWASM(ctx.BlockNumber) {
		// to be implemented by EVM-C and Wagon PRs.
		// 由EVM-C和Wagon PRs实现。
		// 注释代码主要是向解释器集合添加新的解释器， EVMVCInterpreter 或者 EWASMInterpreter
		// if vmConfig.EWASMInterpreter != "" {
		//  extIntOpts := strings.Split(vmConfig.EWASMInterpreter, ":")
		//  path := extIntOpts[0]
		//  options := []string{}
		//  if len(extIntOpts) > 1 {
		//    options = extIntOpts[1..]
		//  }
		//  evm.interpreters = append(evm.interpreters, NewEVMVCInterpreter(evm, vmConfig, options))
		// } else {
		// 	evm.interpreters = append(evm.interpreters, NewEWASMInterpreter(evm, vmConfig))
		// }
		panic("No supported ewasm interpreter yet.")
	}

	// vmConfig.EVMInterpreter will be used by EVM-C, it won't be checked here
	// as we always want to have the built-in EVM as the failover option.
	// vmConfig.EVMInterpreter将被EVM-C使用, 这里不会选中它，因为我们总是希望将内置的EVM作为故障转移(失败备援)选项。
	evm.interpreters = append(evm.interpreters, NewEVMInterpreter(evm, vmConfig))
	// 到目前为止，函数只为interpreters添加了一个解释器。当前源代码中只有一个版本的解释器EVMInterpreter
	evm.interpreter = evm.interpreters[0]

	return evm
}

此函数创建一个新的虚拟机对象，将EVM字段初始化，然后调用

NewEVMInterpreter

创建解释器对象，添加解释器，目前只有一个版本的解释器

EVMInterperter

，注释代码中描述了下一代解释器

ewasm interpreter

的添加过程。注意，此函数参数

vmConfig

并未填充

vmConfig.JumpTable

，此结构在

NewEVMInterperter

中进行填充。

Context

core/vm/evm.go

// Context provides the EVM with auxiliary information. Once provided
// it shouldn't be modified.
// Context为EVM提供辅助信息。一旦提供不能被修改
type Context struct {
	// CanTransfer returns whether the account contains
	// sufficient ether to transfer the value
	// CanTransfer返回账户是否拥有足够的ether进行交易
	CanTransfer CanTransferFunc
	// Transfer transfers ether from one account to the other
	// Transfer把ether从一个账户转移到另一个账户
	Transfer TransferFunc
	// GetHash returns the hash corresponding to n
	// GetHash返回区块链中第n个块的hash
	GetHash GetHashFunc

	// Message information
	// 提供发起者信息 sender的地址
	Origin common.Address // Provides information for ORIGIN
	// gas价格
	GasPrice *big.Int // Provides information for GASPRICE

	// Block information

	// 受益人，一般是矿工地址
	Coinbase common.Address // Provides information for COINBASE
	// 区块所能消耗的gas限制，可由矿工投票调整
	GasLimit uint64 // Provides information for GASLIMIT
	// 区块号
	BlockNumber *big.Int // Provides information for NUMBER
	// 时间
	Time *big.Int // Provides information for TIME
	// 难度，当前挖矿要解决的难题难度
	Difficulty *big.Int // Provides information for DIFFICULTY
}

构造函数

找到该交易的打包者，然后将各个字段填充。

core/evm.go

// NewEVMContext creates a new context for use in the EVM.
// NewEVMContext创建一个用于EVM的新上下文。
func NewEVMContext(msg Message, header *types.Header, chain ChainContext, author *common.Address) vm.Context {
	// If we don't have an explicit author (i.e. not mining), extract from the header
	// 如果我们没有一个明确的作者，从块头提取
	var beneficiary common.Address
	if author == nil {
		// 忽略错误，我们已经通过了头部的有效性验证
		beneficiary, _ = chain.Engine().Author(header) // Ignore error, we're past header validation
	} else {
		beneficiary = *author
	}
	return vm.Context{
		CanTransfer: CanTransfer,
		Transfer:    Transfer,
		GetHash:     GetHashFn(header, chain),
		Origin:      msg.From(),
		Coinbase:    beneficiary,
		BlockNumber: new(big.Int).Set(header.Number),
		Time:        new(big.Int).SetUint64(header.Time),
		Difficulty:  new(big.Int).Set(header.Difficulty),
		GasLimit:    header.GasLimit,
		GasPrice:    new(big.Int).Set(msg.GasPrice()),
	}
}

StateDB

core/vm/interface.go

// StateDB is an EVM database for full state querying.
// StateDB是一个用于完整状态查询的EVM数据库。
type StateDB interface {
	CreateAccount(common.Address)

	SubBalance(common.Address, *big.Int)
	AddBalance(common.Address, *big.Int)
	GetBalance(common.Address) *big.Int

	GetNonce(common.Address) uint64
	SetNonce(common.Address, uint64)

	GetCodeHash(common.Address) common.Hash
	GetCode(common.Address) []byte
	SetCode(common.Address, []byte)
	GetCodeSize(common.Address) int

	AddRefund(uint64)
	SubRefund(uint64)
	GetRefund() uint64

	GetCommittedState(common.Address, common.Hash) common.Hash
	GetState(common.Address, common.Hash) common.Hash
	SetState(common.Address, common.Hash, common.Hash)

	Suicide(common.Address) bool
	HasSuicided(common.Address) bool

	// Exist reports whether the given account exists in state.
	// Notably this should also return true for suicided accounts.
	// Exist报告在状态中是否存在给定帐户。
	// 值得注意的是已经自毁的账号也返回true。
	Exist(common.Address) bool
	// Empty returns whether the given account is empty. Empty
	// is defined according to EIP161 (balance = nonce = code = 0).
	// Empty返回给定帐户是否为空。
	// 空的概念根据EIP161定义(balance = nonce = code = 0)
	Empty(common.Address) bool

	RevertToSnapshot(int)
	Snapshot() int

	AddLog(*types.Log)
	AddPreimage(common.Hash, []byte)

	ForEachStorage(common.Address, func(common.Hash, common.Hash) bool) error
}

core/state/statedb.go

// StateDBs within the ethereum protocol are used to store anything
// within the merkle trie. StateDBs take care of caching and storing
// nested states. It's the general query interface to retrieve:
// * Contracts
// * Accounts
// stateDB用来存储以太坊中关于merkle trie的所有内容。 StateDB负责缓存和存储嵌套状态。
// 这是检索合约和账户的一般查询界面：
type StateDB struct {
	db   Database // 后端的数据库
	trie Trie     // 树 main account trie

	// This map holds 'live' objects, which will get modified while processing a state transition.
	// 下面的Map用来存储当前活动的对象，这些对象在状态转换的时候会被修改。
	stateObjects map[common.Address]*stateObject
	// State objects finalized but not yet written to the trie 已完成修改的状态对象(state object)，但尚未写入trie
	stateObjectsPending map[common.Address]struct{}
	// State objects modified in the current execution 在当前执行过程中修改的状态对象(state object)
	stateObjectsDirty map[common.Address]struct{}

	// DB error. 数据库错误
	// State objects are used by the consensus core and VM which are
	// unable to deal with database-level errors. Any error that occurs
	// during a database read is memoized here and will eventually be returned
	// by StateDB.Commit.
	// stateObject会被共识算法的核心和VM使用，在这些代码内部无法处理数据库级别的错误。
	// 在数据库读取期间发生的任何错误都会记录在这里，最终由StateDB.Commit返回。
	dbErr error

	// The refund counter, also used by state transitioning.
	// 退款计数器，用于状态转换
	refund uint64

	thash, bhash common.Hash                  // 当前的transaction hash 和block hash
	txIndex      int                          // 当前的交易的index
	logs         map[common.Hash][]*types.Log // 日志 key是交易的hash值
	logSize      uint                         // 日志大小

	preimages map[common.Hash][]byte // SHA3的原始byte[], EVM计算的 SHA3->byte[]的映射关系

	// Journal of state modifications. This is the backbone of
	// Snapshot and RevertToSnapshot.
	// 状态修改日志。这是快照和回滚到快照的支柱。
	journal        *journal
	validRevisions []revision
	nextRevisionId int

	// Measurements gathered during execution for debugging purposes
	// 为调试目的而在执行期间收集的度量
	AccountReads   time.Duration
	AccountHashes  time.Duration
	AccountUpdates time.Duration
	AccountCommits time.Duration
	StorageReads   time.Duration
	StorageHashes  time.Duration
	StorageUpdates time.Duration
	StorageCommits time.Duration
}

构造函数

core/state/statedb.go

// StateDB的构造函数
// 一般的用法 statedb, _ := state.New(common.Hash{}, state.NewDatabase(db))

// Create a new state from a given trie.
func New(root common.Hash, db Database) (*StateDB, error) {
	tr, err := db.OpenTrie(root)
	if err != nil {
		return nil, err
	}
	return &StateDB{
		db:                  db,
		trie:                tr,
		stateObjects:        make(map[common.Address]*stateObject),
		stateObjectsPending: make(map[common.Address]struct{}),
		stateObjectsDirty:   make(map[common.Address]struct{}),
		logs:                make(map[common.Hash][]*types.Log),
		preimages:           make(map[common.Hash][]byte),
		journal:             newJournal(),
	}, nil
}

Config

core/vm/interpreter.go

// Config are the configuration options for the Interpreter
// Config是解释器的配置选项
type Config struct {
	Debug                   bool   // Enables debugging 调试模式
	Tracer                  Tracer // Opcode logger 日志记录
	NoRecursion             bool   // Disables call, callcode, delegate call and create 禁用Call, callCode, delegate call和create.
	EnablePreimageRecording bool   // Enables recording of SHA3/keccak preimages 记录SHA3的原象

	// EVM指令表 如果未设置，将自动填充
	// 解释器每拿到一个准备执行的新指令时，就会从 JumpTable 中获取指令相关的信息，即 operation 对象。
	JumpTable [256]operation // EVM instruction table, automatically populated if unset

	EWASMInterpreter string // External EWASM interpreter options 外部EWASM解释器选项
	EVMInterpreter   string // External EVM interpreter options 外部EVM解释器选项

	ExtraEips []int // Additional EIPS that are to be enabled 启用的额外的EIP
}

core/vm/gas_table.go

// operation存储了一条指令的所需要的函数。一个operation对应一条指令。
// operation存储了指令对应的处理逻辑, gas消耗, 堆栈验证方法, memory使用的大小等。
type operation struct {
	// execute is the operation function
	// 执行函数，指令处理逻辑
	execute     executionFunc
	constantGas uint64  // 固定gas
	dynamicGas  gasFunc // 指令消耗gas的计算函数
	// minStack tells how many stack items are required
	// minStack 表示需要多少个堆栈项
	minStack int
	// maxStack specifies the max length the stack can have for this operation
	// to not overflow the stack.
	// maxStack指定这个操作不会使堆栈溢出的堆栈最大长度。
	// 也就是说，只要堆栈不超过这个最大长度，这个操作就不会导致栈溢出。
	maxStack int

	// memorySize returns the memory size required for the operation
	// 指令需要的内存大小
	memorySize memorySizeFunc

	// 表示操作是否停止进一步执行。指令执行完成后是否停止解释器的执行。
	halts bool // indicates whether the operation should halt further execution
	// 指示程序计数器是否不增加。 若是跳转指令，则pc不需要自增，而是直接改为跳转目标地址。
	jumps bool // indicates whether the program counter should not increment
	// 确定这个操作是否修改状态。是否是写指令（会修改 StatDB 中的数据）
	writes bool // determines whether this a state modifying operation
	// 指示检索到的操作是否有效并且已知 是不是一个有效操作码
	valid bool // indication whether the retrieved operation is valid and known
	// 确定操作是否回滚状态（隐式停止）。指令指行完后是否中断执行并回滚状态数据库。
	reverts bool // determines whether the operation reverts state (implicitly halts)
	// 确定操作是否设置了返回数据内容 指示该操作是否有返回值
	returns bool // determines whether the operations sets the return data content
}

Interpreter

core/vm/interpreter.go

// Interpreter is used to run Ethereum based contracts and will utilise the
// passed environment to query external sources for state information.
// The Interpreter will run the byte code VM based on the passed
// configuration.
// 解释器用于运行基于以太坊的合约，并将使用传递的环境来查询外部源以获取状态信息。
// 解释器将根据传递的配置运行VM字节码。
type Interpreter interface {
	// Run loops and evaluates the contract's code with the given input data and returns
	// the return byte-slice and an error if one occurred.
	// 用给定的入参循环执行合约的代码，并返回返回结果的字节切片，如果出现错误的话返回错误。
	Run(contract *Contract, input []byte, static bool) ([]byte, error)
	// CanRun tells if the contract, passed as an argument, can be
	// run by the current interpreter. This is meant so that the
	// caller can do something like:
	// CanRun告诉当前解释器是否可以运行当前合约，合约作为参数传递。这表示调用者可以这样做：
	//
	// ```golang
	// for _, interpreter := range interpreters {
	//   if interpreter.CanRun(contract.code) {
	//     interpreter.Run(contract.code, input)
	//   }
	// }
	// ```
	CanRun([]byte) bool
}

// EVMInterpreter represents an EVM interpreter
// EVMInterpreter表示一个EVM解释器，实现了Interpreter接口
type EVMInterpreter struct {
	evm *EVM
	cfg Config

	intPool *intPool

	// Keccak256 hasher实例跨指令共享
	hasher keccakState // Keccak256 hasher instance shared across opcodes
	// Keccak256 hasher结果数组跨指令共享
	hasherBuf common.Hash // Keccak256 hasher result array shared aross opcodes

	readOnly bool // Whether to throw on stateful modifications
	// 最后一个调用的返回数据，便于接下来复用
	returnData []byte // Last CALL's return data for subsequent reuse
}

构造函数

先将

vmConfig.JumpTable

初始化为对应版本的指令集，然后生成

EVMInterpreter

对象并返回。

core/vm/interpreter.go

// 构造函数
// NewEVMInterpreter returns a new instance of the Interpreter.
// NewEVMInterpreter返回解释器的一个新实例。
func NewEVMInterpreter(evm *EVM, cfg Config) *EVMInterpreter {
	// We use the STOP instruction whether to see
	// the jump table was initialised. If it was not
	// we'll set the default jump table.
	// 我们使用STOP指令来判断指令表是否被初始化。若没有，设置默认的指令表。
	if !cfg.JumpTable[STOP].valid {
		var jt JumpTable
		switch {
		case evm.chainRules.IsIstanbul:
			jt = istanbulInstructionSet
		case evm.chainRules.IsConstantinople:
			jt = constantinopleInstructionSet
		case evm.chainRules.IsByzantium:
			jt = byzantiumInstructionSet
		case evm.chainRules.IsEIP158:
			jt = spuriousDragonInstructionSet
		case evm.chainRules.IsEIP150:
			jt = tangerineWhistleInstructionSet
		case evm.chainRules.IsHomestead:
			jt = homesteadInstructionSet
		default:
			jt = frontierInstructionSet
		}
		for i, eip := range cfg.ExtraEips {
			if err := EnableEIP(eip, &jt); err != nil {
				// Disable it, so caller can check if it's activated or not
				// 若出现了错误，禁用它，这样调用者可以检查它是否激活。
				cfg.ExtraEips = append(cfg.ExtraEips[:i], cfg.ExtraEips[i+1:]...)
				log.Error("EIP activation failed", "eip", eip, "error", err)
			}
		}
		cfg.JumpTable = jt
	}

	return &EVMInterpreter{
		evm: evm,
		cfg: cfg,
	}
}

Input数据结构

Contract Application Binary Interface(ABI)是以太坊生态系统中与合约交互的标准方式，既可用于从区块链之外交互，也可用于合约间的交互。数据根据其类型进行编码。编码不具备自解释性，因此需要一个范式来解码。

我们假设合约的接口函数是强类型的，在编译时已知，并且是静态的。我们假设所有合约在编译时都具有它们调用的合约的接口定义。[7]

函数选择子(Function Selector)与参数编码(Argument Encoding)

函数调用的

Input

的前4个字节指定要调用的函数。它是函数签名的Keccak-256 (SHA-3)hash的前(左，大端高阶)4字节。函数签名即是带有参数类型括号列表的函数名，参数类型由逗号分隔而不是空格。签名不包括函数的返回类型。从第5个字节开始，后面跟着是编码后的参数。给定一个合约：[7]

pragma solidity >=0.4.16 <0.7.0;


contract Foo {
    function bar(bytes3[2] memory) public pure {}
    function baz(uint32 x, bool y) public pure returns (bool r) { r = x > 32 || y; }
    function sam(bytes memory, bool, uint[] memory) public pure {}
}

如果我们想用参数69和true来调用

baz

方法，我们将传递68字节的

Input

，可以分解为以下几个部分:

0xcdcd77c0 ：函数选择子或者说是方法ID。 baz 的函数签名是 baz(uint32,bool) ,然后进行hash操作并取前4个字节作为函数选择子: KeccakHash("baz(uint32,bool)")[0:4] => 0xcdcd77c0
0x0000000000000000000000000000000000000000000000000000000000000045 :第一个参数，uint32, 值 69 填充为32字节。
0x0000000000000000000000000000000000000000000000000000000000000001 :第二个参数，布尔值 true 。

所以调用

Foo

合约时传输的

Input

的值为：

0xcdcd77c000000000000000000000000000000000000000000000000000000000000000450000000000000000000000000000000000000000000000000000000000000001

函数选择子告诉了EVM我们想要调用合约的哪个方法，它和参数数据一起，被编码到了交易的

data

数据中。跟合约代码一起送到解释器里的还有

Input

，而这个

Input

中的数据是由交易的

data

提供的。函数选择子和参数的解析功能并不由EVM完成，而是合约编译器在编译时插入代码完成的。

以太坊EVM源码注释之数据结构以太坊EVM源码分析之数据结构

在我们编译智能合约的时候，编译器会自动在生成的字节码的最前面增加一段函数选择逻辑: 首先通过

CALLDATALOAD

指令将“4-byte signature”压入堆栈中，然后依次跟该合约中包含的函数进行比对，如果匹配则调用

JUMPI

指令跳入该段代码继续执行。

数据加载相关的指令

CALLDATALOAD ：把输入数据加载到 Stack 中
CALLDATACOPY ：把输入数据加载到 Memory 中
CODECOPY ：把当前合约代码拷贝到 Memory 中
EXTCODECOPY ：把外部合约代码拷贝到 Memory 中

这些指令对应的操作如下图所示：

以太坊EVM源码注释之数据结构以太坊EVM源码分析之数据结构

Appendix A

Stack结构及其操作

core/vm/stack.go

// Stack is an object for basic stack operations. Items popped to the stack are
// expected to be changed and modified. stack does not take care of adding newly
// initialised objects.
// Stack是用于堆栈基本操作的对象。弹出到堆栈中的项将被更改。堆栈不负责添加新初始化的对象。
type Stack struct {
	data []*big.Int //指针的切片，堆栈中本质上存储的是指针
}

func newstack() *Stack {
	return &Stack{data: make([]*big.Int, 0, 1024)} //初始长度为0，容量1024
}

//-----------栈的方法----------------

// Data returns the underlying big.Int array.
// Data返回底层的big.Int数组
func (st *Stack) Data() []*big.Int {
	return st.data
}

func (st *Stack) push(d *big.Int) {
	// NOTE push limit (1024) is checked in baseCheck
	// 注意:在baseCheck中已经检查了堆栈最大限制 (1024)
	//stackItem := new(big.Int).Set(d)
	//st.data = append(st.data, stackItem)
	st.data = append(st.data, d) // 数组末尾就是堆栈的顶部
}
func (st *Stack) pushN(ds ...*big.Int) { // 一次性压入堆栈多个条目
	st.data = append(st.data, ds...)
}

// 弹出栈顶元素
func (st *Stack) pop() (ret *big.Int) {
	ret = st.data[len(st.data)-1]      // 弹出的条目
	st.data = st.data[:len(st.data)-1] // 堆栈深度减一
	return
}

func (st *Stack) len() int { // 堆栈长度，深度
	return len(st.data)
}

// 将堆栈中第n项与栈顶元素交换
func (st *Stack) swap(n int) {
	st.data[st.len()-n], st.data[st.len()-1] = st.data[st.len()-1], st.data[st.len()-n]
}

// 将栈的第n项复制并入栈
func (st *Stack) dup(pool *intPool, n int) {
	st.push(pool.get().Set(st.data[st.len()-n]))
}

// 获取栈顶元素的值但不弹出
func (st *Stack) peek() *big.Int {
	return st.data[st.len()-1]
}

// Back returns the n'th item in stack
// Back返回栈的第n项
func (st *Stack) Back(n int) *big.Int {
	return st.data[st.len()-n-1]
}

core/vm/stack_table.go

// 一些栈的辅助函数

// maxStack specifies the max length the stack can have for this operation
// to not overflow the stack.
// maxStack指定该操作不会使堆栈溢出的堆栈最大长度。
// 也就是说，只要堆栈不超过这个最大长度，这个操作就不会导致栈溢出。
// 参数：pops 该操作执行过程中所做的pop次数; pushs 该操作执行过程中所做的push次数
func maxStack(pop, push int) int {
	return int(params.StackLimit) + pop - push
}

// minStack tells how many stack items are required
// minStack 表示需要多少个堆栈项
// 参数：pops 该操作执行过程中所做的pop次数; pushs 该操作执行过程中所做的push次数
// 需要的堆栈项数就是该操作pop的次数
func minStack(pops, push int) int {
	return pops
}

Memory结构及其操作

core/vm/memory.go

// Memory implements a simple memory model for the ethereum virtual machine.
// Memory为以太坊虚拟机实现一个简单的内存模型
type Memory struct {
	store       []byte // 字节数组
	lastGasCost uint64 // 已分配的内存所花费的gas，用于扩展内存时计算花费的gas。
}

// NewMemory returns a new memory model.
// NewMemory返回一个新的内存模型
func NewMemory() *Memory {
	return &Memory{}
}

// Set sets offset + size to value
// 将offset--offset+size区域设置为value
func (m *Memory) Set(offset, size uint64, value []byte) {
	// It's possible the offset is greater than 0 and size equals 0. This is because
	// the calcMemSize (common.go) could potentially return 0 when size is zero (NO-OP)
	// offset大于而size等于是可能出现的情况。这是因为CalcMemSize函数当size为0的时候会返回0.(空操作)
	if size > 0 {
		// length of store may never be less than offset + size.
		// The store should be resized PRIOR to setting the memory
		// store的长度可能永远不会小于offset+size。
		// 在设置这片内存之前，store的大小应该会被调整
		// 所以若出现offset+size > store长度的情况，内存还没被分配，store为空。
		if offset+size > uint64(len(m.store)) {
			panic("invalid memory: store empty")
		}
		copy(m.store[offset:offset+size], value)
	}
}

// Set32 sets the 32 bytes starting at offset to the value of val, left-padded with zeroes to
// 32 bytes.
// Set32将offset--offset+32的区域设置为val，若不够32字节，左填充0
func (m *Memory) Set32(offset uint64, val *big.Int) {
	// length of store may never be less than offset + size.
	// The store should be resized PRIOR to setting the memory
	if offset+32 > uint64(len(m.store)) {
		panic("invalid memory: store empty")
	}
	// Zero the memory area
	// 先将那片内存区域置0.
	copy(m.store[offset:offset+32], []byte{0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0})
	// Fill in relevant bits
	// 将val填入该内存区域
	math.ReadBits(val, m.store[offset:offset+32])
}

// Resize resizes the memory to size
// Resize重新调整内存大小到size
func (m *Memory) Resize(size uint64) {
	if uint64(m.Len()) < size {
		m.store = append(m.store, make([]byte, size-uint64(m.Len()))...)
	}
}

// Get returns offset + size as a new slice
// GetCopy将内存offset:offset+size的内容复制然后返回
func (m *Memory) GetCopy(offset, size int64) (cpy []byte) {
	if size == 0 {
		return nil
	}

	if len(m.store) > int(offset) {
		cpy = make([]byte, size)
		copy(cpy, m.store[offset:offset+size])

		return
	}

	return
}

// GetPtr returns the offset + size
// GetPtr返回该内存区域的指针
func (m *Memory) GetPtr(offset, size int64) []byte {
	if size == 0 {
		return nil
	}

	if len(m.store) > int(offset) {
		return m.store[offset : offset+size]
	}

	return nil
}

// Len returns the length of the backing slice
// Len返回内存长度
func (m *Memory) Len() int {
	return len(m.store)
}

// Data returns the backing slice
// Data返回整个内存的数据
func (m *Memory) Data() []byte {
	return m.store
}

intPool结构及其操作

intPool就是256大小的

big.Int

的池,用来加速

big.Int

的分配。节省频繁创建和销毁

big.Int

对象的开销。

core/vm/intPool.go

var checkVal = big.NewInt(-42) // 为什么是-42??

const poolLimit = 256

// intPool is a pool of big integers that
// can be reused for all big.Int operations.
// intPool是一个大整数池，可以为所有big.Int操作重用。
type intPool struct {
	pool *Stack // 这个big.Int池以栈的形式存在
}

// intPool的构造函数
func newIntPool() *intPool {
	return &intPool{pool: newstack()}
}

// get retrieves a big int from the pool, allocating one if the pool is empty.
// Note, the returned int's value is arbitrary and will not be zeroed!
// get从池中获取一个big.Int，如果池是空的，就分配一个。
// 注意：返回的big.Int值是随机的，不是归0以后的
func (p *intPool) get() *big.Int {
	if p.pool.len() > 0 { // 若不是空的，则直接获取
		return p.pool.pop()
	}
	return new(big.Int) // 若是空的，则现场分配一个big.Int。效率较低
}

// getZero retrieves a big int from the pool, setting it to zero or allocating
// a new one if the pool is empty.
// getZero从池中获取一个big.Int，并将之归0.如果池是空的，就分配一个。
func (p *intPool) getZero() *big.Int {
	if p.pool.len() > 0 {
		return p.pool.pop().SetUint64(0)
	}
	return new(big.Int)
}

// put returns an allocated big int to the pool to be later reused by get calls.
// Note, the values as saved as is; neither put nor get zeroes the ints out!
// put放入池中一些已经分配的big.Int，稍后由get方法重用。
// 注意，这些值原样保留。put和get方法都不会将这些整数归0.
func (p *intPool) put(is ...*big.Int) {
	if len(p.pool.data) > poolLimit { // 若池已满，返回
		return
	}
	for _, i := range is {
		// verifyPool is a build flag. Pool verification makes sure the integrity
		// of the integer pool by comparing values to a default value.
		// verifyPool是一个生成标志。池的验证函数通过将值与默认值进行比较来确保整数池的完整性。
		if verifyPool {
			i.Set(checkVal)
		}
		p.pool.push(i)
	}
}

// The intPool pool's default capacity
// intPool的池的默认容量
const poolDefaultCap = 25

// intPoolPool manages a pool of intPools.
// intPoolPool管理一个intPool的池
type intPoolPool struct {
	pools []*intPool // intPool的切片
	lock  sync.Mutex // 互斥信号量
}

// 初始化
var poolOfIntPools = &intPoolPool{
	pools: make([]*intPool, 0, poolDefaultCap),
}

// get is looking for an available pool to return.
// get查找一个可用的池然后返回
func (ipp *intPoolPool) get() *intPool {
	ipp.lock.Lock()         // 上锁
	defer ipp.lock.Unlock() // 运行完解锁

	if len(poolOfIntPools.pools) > 0 {
		ip := ipp.pools[len(ipp.pools)-1]        // 找到一个池
		ipp.pools = ipp.pools[:len(ipp.pools)-1] // 从intPool池中删除该项
		return ip                                // 返回
	}
	return newIntPool() // 若为空，现场初始化一个intPool返回
}

// put a pool that has been allocated with get.
// 放入已分配的池。
func (ipp *intPoolPool) put(ip *intPool) {
	ipp.lock.Lock()
	defer ipp.lock.Unlock()

	if len(ipp.pools) < cap(ipp.pools) {
		ipp.pools = append(ipp.pools, ip)
	}
}

参考文献

Ethereum Yellow Paper

ETHEREUM: A SECURE DECENTRALISED GENERALISED TRANSACTION LEDGER

https://ethereum.github.io/yellowpaper/paper.pdf
Ethereum White Paper

A Next-Generation Smart Contract and Decentralized Application Platform

https://github.com/ethereum/wiki/wiki/White-Paper
Ethereum EVM Illustrated

https://github.com/takenobu-hs/ethereum-evm-illustrated
Go Ethereum Code Analysis

https://github.com/ZtesoftCS/go-ethereum-code-analysis
以太坊源码解析：evm

https://yangzhe.me/2019/08/12/ethereum-evm/
以太坊 - 深入浅出虚拟机

https://learnblockchain.cn/2019/04/09/easy-evm/
Contract ABI Specification

https://solidity.readthedocs.io/en/v0.5.10/abi-spec.html?highlight=selector#function-selector
认识以太坊智能合约

https://yangzhe.me/2019/08/01/ethereum-cognition-and-deployment/#%E8%B0%83%E7%94%A8%E5%90%88%E7%BA%A6

ntPoolPool) put(ip *intPool) {

ipp.lock.Lock()

defer ipp.lock.Unlock()

if len(ipp.pools) < cap(ipp.pools) {

ipp.pools = append(ipp.pools, ip)

}

}

## 参考文献
1. Ethereum Yellow Paper   
   ETHEREUM: A SECURE DECENTRALISED GENERALISED TRANSACTION LEDGER   
	https://ethereum.github.io/yellowpaper/paper.pdf
2. Ethereum White Paper   
   A Next-Generation Smart Contract and Decentralized Application Platform
	https://github.com/ethereum/wiki/wiki/White-Paper
3. Ethereum EVM Illustrated   
   https://github.com/takenobu-hs/ethereum-evm-illustrated
4. Go Ethereum Code Analysis
   https://github.com/ZtesoftCS/go-ethereum-code-analysis
5. 以太坊源码解析：evm
   https://yangzhe.me/2019/08/12/ethereum-evm/
6. 以太坊 - 深入浅出虚拟机   
   https://learnblockchain.cn/2019/04/09/easy-evm/
7. Contract ABI Specification
   https://solidity.readthedocs.io/en/v0.5.10/abi-spec.html?highlight=selector#function-selector
8. 认识以太坊智能合约
   https://yangzhe.me/2019/08/01/ethereum-cognition-and-deployment/#%E8%B0%83%E7%94%A8%E5%90%88%E7%BA%A6

以太坊EVM源码注释之数据结构以太坊EVM源码分析之数据结构

以太坊EVM源码分析之数据结构

EVM代码整体结构

Contract

构造函数

方法

Contract在EVM中的使用

EVM

构造函数

Context

构造函数

StateDB

构造函数

Config

Interpreter

构造函数

Input数据结构

函数选择子(Function Selector)与参数编码(Argument Encoding)

数据加载相关的指令

Appendix A

Stack结构及其操作

Memory结构及其操作

intPool结构及其操作

参考文献

继续阅读