天天看點

用python寫語言的解釋器

我花了一下午的時間完成了一個簡單語言的解釋器,我會在最後帖出所有代碼,但是今天不打算詳細解釋程式的每一個步驟,最近考慮找實習、做論文,要花一些時間。有時間了我會解釋每一部分,在這裡隻提醒一下讀者,程式的寫作過程和它呈現出來的不一樣,總體來說我的寫作過程是先寫一個隻包含一條指令的解釋器,然後逐漸加入其他指令。ps:我是多麼的想看看高手們寫程式的過程,而不隻是結果,但是就像graham說的“寫的過程往往顯得冗長”,是以這方面的書籍不多。我覺得比較接近的書包括《clean code》、《paip》、《software tools》。隻可惜現在隻看完了一本半。

想法的來源

昨天聖誕節,坐在不能上網的實驗室,覺得好無聊。然後想起了學習《可計算性與計算複雜性》的時候,裡面用到了一種含有5條指令的原語言,老師也要求我們用這個語言寫程式,是以我就想寫一個解釋器,來執行用該語言,并把這個解釋器作為自己的聖誕禮物。

原語言描述

書中說它是一個fortran式的簡單語言,是以我就給它起了個簡單的名字Metafor,表示meta fortran。下面來看一下它的具體描述,然後貼上我在代碼裡更詳細的描述,我為每一種指令都起了個名字(注意其中把指派語句命名為setf是由于我最近在寫common lisp多一些)。

+-------------------------------------------------------------------------------------------+

指令                         描述

x = x + 1                   變量x的值加1

x = x - 1                    變元x的值減1。若x的值為0,則結果仍為0

TO A IF x≠0             若x≠0,則轉标号為A的指令;否則執行下一條指令

TO A                        無條件轉到标号為A的指令

y=x                           把x的值賦給變元y,x值保持不變

+-------------------------------------------------------------------------------------------+

1. Metafor has five instructions:

name instruction description

inc: x = x + 1, Increase by 1 the value of the variable x.
dec: x = x - 1, If the value of x is 0, leave it unchanged; otherwise
                decrease by 1 the value of x.
con_goto: TO A IF x != 0, If the value of x is nonzero, perform the instruction
                          with the label A next; otherwise proceed to the next
                          instruction in the list.
goto: TO A, Perform the instruction with the label A next.
setf: y = x, Change the value variable y to the value of variable x.

2. Some specification:

(1) Input variables are represented as:
x1, x2, x3, ...

(2) Local variables are represented as:
z1, z2, z3, ...

(3) Output variable is represented as:
y

note: the num 1 is often omitted(i.e., x stands for x1 and z stands
for z1).

3. Labeled instructions:

Instructions may or may not have labels. When an instruction is labeled,
the label is written to its left in square brackets. For example,

[B] Z = Z - l

4. A smaple program:

"Program to compute y = x1+ x2"

    y = x1
[B] TO A IF x2 != 0
    TO E
[A] x2 = x2 - 1
    y = y + 1
    TO B

4. For more information, please refer to 《可計算性與計算複雜性》,
周長林、李占山著.
           

整體思路

由于該語言中含有goto語句,是以我不能一句一句的解釋執行,我需要把整個程式讀進來,轉換成一種方面的内部格式後,作為整體執行。例如,這是計算y = x + x2的語言程式:

y = x
[B] TO A IF x2 != 0
    TO E
[A] x2 = x2 - 1
    y = y + 1
    TO B
           

它轉換為内部格式後,為:

[['setf', 'y', 'x'],
 ['labeled_exp', 'B', ['con_goto', 'A', 'x2']],
 ['goto', 'E'],
 ['labeled_exp', 'A', ['dec', 'x2']],
 ['inc', 'y'],
 ['goto', 'B']]
           

運作示範——從檔案獲得程式

還是前面那個y = x + x2的程式,首先我需要為兩個輸入參數(x, x2)給定值,如果沒有為其指派,那麼它們的值預設為0。另外,我們還可以檢視程式的内部表示結構,以及簡單調試(就檢視其他環境變量的值)。其示範結果如下:

======================================================
*Metafor 1.0, Welecome to Metafor shell environment. *
*Author: Zhu zhaolong([email protected])              *
======================================================

Metafor> 2 3
5
Metafor> 5 6
11
Metafor> code
[['setf', 'y', 'x'],
 ['labeled_exp', 'B', ['con_goto', 'A', 'x2']],
 ['goto', 'E'],
 ['labeled_exp', 'A', ['dec', 'x2']],
 ['inc', 'y'],
 ['goto', 'B']]
>>> ========================================== RESTART ==========================================
>>> 
======================================================
*Metafor 1.0, Welecome to Metafor shell environment. *
*Author: Zhu zhaolong([email protected])              *
======================================================

Metafor> 234 5
239
Metafor> debug
debug> x
234
debug> x2
0
debug> y
239
debug> exit
>>> 
           

運作示範——從終端使用者直接輸入原語言程式

我寫了一個簡單的repl互動環境,這樣我們可以友善的測試簡單的原語言程式,例如下面給出了一個y =  x + 1的程式,示例如下:

>>> repl()
======================================================
*Metafor 1.0, Welecome to Metafor shell environment. *
*Author: Zhu zhaolong([email protected])              *
======================================================



Input your program:
x = x + 1
y = x

inputs> 5
=>6


Input your program:
           

全部代碼

在這裡我貼出所有python代碼:

""" Metafor: meta fortran, is a small language used in
"Computabilit and Complexity" course in Jilin University.

@Author: Zhu Zhaolong([email protected])
@Date: 2011.12.25


1. Metafor has five instructions:

name instruction description

inc: x = x + 1, Increase by 1 the value of the variable x.
dec: x = x - 1, If the value of x is 0, leave it unchanged; otherwise
                decrease by 1 the value of x.
con_goto: TO A IF x != 0, If the value of x is nonzero, perform the instruction
                          with the label A next; otherwise proceed to the next
                          instruction in the list.
goto: TO A, Perform the instruction with the label A next.
setf: y = x, Change the value variable y to the value of variable x.

2. Some specification:

(1) Input variables are represented as:
x1, x2, x3, ...

(2) Local variables are represented as:
z1, z2, z3, ...

(3) Output variable is represented as:
y

note: the num 1 is often omitted(i.e., x stands for x1 and z stands
for z1).

3. Labeled instructions:

Instructions may or may not have labels. When an instruction is labeled,
the label is written to its left in square brackets. For example,

[B] Z = Z - l

4. A smaple program:

"Program to compute y = x1+ x2"

    y = x1
[B] TO A IF x2 != 0
    TO E
[A] x2 = x2 - 1
    y = y + 1
    TO B

4. For more information, please refer to 《可計算性與計算複雜性》,
周長林、李占山著.

"""
###========================================================
import re
from pprint import pprint

###===========================parse========================
def parse(program):
    return [read(e) for e in program]

def read(s):
    """Read a metafor insctruction from a string. and change
    it to internal represtation."""
    return read_from(tokenize(s))

def tokenize(s):
    "Convert a string into alist of tokens."
    return s.split()

def read_from(tokens):
    "Read an exp from a sequence of tokens."
    if len(tokens) == 0:
        raise SyntaxError("uncexpected EOF while reading.")

    if is_setf(tokens):
        return make_setf(tokens[0], tokens[2])
    elif is_inc(tokens):
        return make_inc(tokens[0])
    elif is_dec(tokens):
        return make_dec(tokens[0])
    elif is_goto(tokens):
        return make_goto(tokens[1])
    elif is_con_goto(tokens):
        return make_con_goto(tokens[1], tokens[3])
    elif is_labeled_exp(tokens):
        return make_labeled_exp(get_label(tokens[0]), read_from(tokens[1:]))
    else:
        raise SyntaxError("unexpected instructions: %s" % " ".join(tokens))


#=======================================================

def is_variable(token):
    if token.startswith("x") or \
       token.startswith("z") or \
       token == "y":
        return True
    else:
        return False

def get_label(token):
    "Token is like [A1]. we want get A1."
    return token.replace('[', '').replace(']', '')

## setf    
def is_setf(tokens):
    if tokens[1] == "=" and len(tokens) == 3:
        if is_variable(tokens[0]) and is_variable(tokens[2]):
            return True
        else:
            raise SyntaxError("unexpected setf instruction: %s" % " ".join(tokens))
    else:
        return False

def make_setf(des_var, src_var):
    return ["setf", des_var, src_var]

## inc 
def is_inc(tokens):
    if len(tokens) == 5 and tokens[1] == "=" \
       and tokens[3] == "+" and tokens[4] == "1":
        if tokens[0] == tokens[2] and is_variable(tokens[0]):
            return True
        else:
            raise SyntaxError("unexpected inc instruction: %s" % " ".join(tokens))
    else:
        return False
def make_inc(var):
    return ["inc", var]

## dec
def is_dec(tokens):
    if len(tokens) == 5 and tokens[1] == "=" \
       and tokens[3] == "-" and tokens[4] == "1":
        if tokens[0] == tokens[2] and is_variable(tokens[0]):
            return True
        else:
            raise SyntaxError("unexpected dec instruction: %s" % " ".join(tokens))
    else:
        return False
def make_dec(var):
    return ["dec", var]

## goto
def is_goto(tokens):
    if len(tokens) == 2 and tokens[0] == "TO":
        return True
    else:
        return False

def make_goto(lable):
    return ["goto", lable]

## con_goto
def is_con_goto(tokens):
    if len(tokens) == 6 and tokens[0] == "TO" \
       and tokens[2] == "IF" and is_variable(tokens[3]) \
       and tokens[4] == "!=" and tokens[5] == "0":
        return True
    else:
        return False

def make_con_goto(lable, var):
    return ["con_goto", lable, var]

## labeled exp
def is_labeled_exp(tokens):
    return tokens[0].startswith('[')

def make_labeled_exp(label, exp):
    return ["labeled_exp", label, exp]


###==========================CodeSeg======================

def is_label(s):
    return s == "labeled_exp"
        
class CodeSeg:
    """ To store internal instructions (exps).
    N : the number of instructions.
    exps: the list of instructions.
    label_2_exps: a dict of {'label': num} pairs."""
    
    def __init__(self, exps):
        self.N = len(exps)
        self.exps = []
        self.label_2_exp={}
        for (i, e) in enumerate(exps):
            if is_label(e[0]):
                (_, label, exp) = e
                self.label_2_exp[label] = i
                self.exps.append(exp)
            else:
                self.exps.append(e)

    def get_instruction_num(self, label):
        "Return instruction num with the label."
        if label in self.label_2_exp.keys():
            return self.label_2_exp[label]
        # IF there is no instruction with that label,
        # return a instruction num that doesn't exit.
        else: 
            return self.N

    def get_instruction(self, pc):
        return self.exps[pc]
    
        
###=============================Env======================

class Env():
    "An enviromnet: a dict of {'var': val} paris, var's default value is 0."
    def __init__(self, args):
        self.pairs = {}
        for (i, v) in enumerate(args):
            if i == 0:
                self.pairs['x'] = int(v)
            else:
                self.pairs['x'+str(i+1)] = int(v)

    def get_v(self, var):
        if var in self.pairs.keys():
            return self.pairs[var]
        else:
            return 0
        
    def set_v(self, var, value):
        self.pairs[var] = value

###=========================mainloop=======================

def mainloop(codeseg, env):
    "pc refers to the current instruction." 
    pc = 0
    while pc < codeseg.N:
        e = codeseg.get_instruction(pc)
        if e[0] == 'inc':
            (_, v) = e
            env.set_v(v, env.get_v(v) + 1)
            pc += 1
        elif e[0] == 'dec':
            (_, v) = e
            env.set_v(v, max(env.get_v(v) - 1, 0))
            pc += 1
        elif e[0] == 'setf':
            (_, x, y) = e
            env.set_v(x, env.get_v(y))
            pc += 1
        elif e[0] == 'goto':
            (_, label) = e
            pc = codeseg.get_instruction_num(label)
        elif e[0] == "con_goto":
            (_, label, var) = e
            val = env.get_v(var)
            if val != 0:
                pc = codeseg.get_instruction_num(label)
            else:
                pc += 1
        else:
            raise SyntaxError("unexpected instructions: %s" % " ".join(e))
    return env.get_v('y')

###========================repl================================

def repl(input_prompt='inputs> '):
    "A prompt-read-eval-print loop."
    print_info()
    while True:
        program = get_prog_from_shell()
        if program == []:
            return "Return successfully!"
        codes = CodeSeg(parse(program))
        inputs = raw_input(input_prompt)
        env = Env(inputs.split())
        print("=>" + str(mainloop(codes, env)))
            

def get_prog_from_shell():
    program = []
    exp = raw_input("\n\nInput your program:\n")
    while exp:
        program.append(exp)
        exp = raw_input()
    return program

def print_info():
    print("======================================================\n"
          "*Metafor 1.0, Welecome to Metafor shell environment. *\n"
          "*Author: Zhu zhaolong([email protected])              *\n"
          "======================================================\n")

###========================run=============================

def run(filepath, prompt='Metafor> '):
    "This function can run Metafor program from file."
    print_info()
    program = get_prog_from_file(filepath)
    parsed_prog = parse(program)
    codes = CodeSeg(parsed_prog)

    inputs = raw_input(prompt)
    while True:
        env = Env(inputs.split())
        print(mainloop(codes, env))
        
        inputs = raw_input(prompt)
        if inputs == "exit":
            return "Return successfully!"
        if inputs == "debug":
            while True:
                var = raw_input('debug> ')
                if var == "exit":
                    return None
                print(env.get_v(var))
        if inputs == "code":
            pprint(parsed_prog)
            return None

def get_prog_from_file(filepath):
    f = open(filepath, "r")
    program = f.readlines()
    f.close()

    return program

###===========================test==================================

# y = x + x2
test_file_1 = "metafor_test.mf"
# y = 2 * x
test_file_2 = "metafor_test2.mf"

if __name__ == "__main__": 
    run(test_file_1)