天天看點

Monads for the Curious Programmer, Part 2 (中英文對照版) Challenges of Functional Programming函數式程式設計的挑戰Error Propagation and Exceptions錯誤傳播和異常Non-deterministic Computations非确定性計算Conclusions結論Bibliography文獻

Monads for the Curious Programmer: Part 2

In my previous post I talked about categories, functors, endofunctors, and a little about monads. Why didn’t I just start with a Haskell definition of a monad and skip the math? Because I believe that monads are bigger than that. I don’t want to pigeonhole monads based on some specific language or application. If you know what to look for you’ll find monads everywhere. Some languages support them better, as a single general abstraction, others require you to re-implement them from scratch for every use case. I want you to be able to look at a piece of C++ code and say, “Hey, that looks like a monad.” That’s why I started from category theory, which sits above all programming languages and provides a natural platform to talk about monads.

在我的上一篇部落格中,我講到了範疇,函子,自函子以及一點單子。為什麼我不直接從Haskell關于單子的定義直接開講且跳過數學部分呢?因為我覺得單子的概念要更重要。我不想從一些特定的語言或者應用來限制單子。如果你知道你要找什麼,那單子到處都是。一些語言支援的較好,就像一個單一的一般化抽象;而另一些則要求你對每一種情況都從頭開始實作。我期望你可以看到一些C++代碼就說“嘿,這不就是單子嘛!”。這就是為什麼我從要從範疇論開始,範疇論提供了讨論單子的一個自然的平台而居于所有的程式設計語言之上。

In this installment I will talk about some specific programming challenges that, on the surface, seem to have very little in common. I will show how they can be solved and how from these disparate solutions a pattern emerges that can be recognized as a monad. I’ll talk in some detail about the

Maybe

monad and the

List

monad, since they are simple and illustrate the basic ideas nicely. I will leave the really interesting examples, like the state monad and its variations, to the next installment.

這部分中,我将讨論幾個特定的程式設計問題。這些問題初看起來幾乎完全不同。我将向你展示如何解決它們,他們的不同解決方案中孕育一個可以被識别為單子的模式。我會詳細讨論Maybe單子以及List單子,因為它們不僅簡單而且優雅地展示了基本的思想。我把真正有趣的單子,如狀态單子及其變體留待下一部分。

Challenges of Functional Programming

函數式程式設計的挑戰

You know what a function is: called with the same argument it always returns the same result. A lot of useful computations map directly to functions. Still, a lot don’t. We often have to deal with some notions of computations that are not functions. We can describe them in words or implement as procedures with side effects. In non-functional languages such computations are often called “functions” anyway– I’ll enclose such usage with quotation marks. Eugenio Moggi, the guy who introduced monads to computing, listed a few examples:

你已經知道什麼是函數:用同樣的參數調用,總是傳回同樣的記過。許多有用的計算可以直接對應到函數,同樣也有許多不能。我們常常處理那些不是函數的計算概念。我們可以把它們描述為帶副作用的過程。在非函數式程式設計語言中,這樣的計算也被叫做“函數”——我會把這種用法用引号括起來。把單子的概念引入到計算中的Eugenio Moggi列出了一些例子:

  • Partial “functions”: For some arguments they never terminate (e.g., go into an infinite loop). These are not really functions in the mathematical sense.
  • 不完整“函數”:對于有些參數,它們永不停止(例如進入無窮循環)。數學的意義上這不是真正的函數。
  • Nondeterministic “functions”: They don’t return a single result but a choice of results. Non-deterministic parsers are like this. When given an input, they return a set of possible parses. Which of them is right depends on the context. These are not really functions because a function must return a single value.
  • 非确定性“函數”:它們不是傳回一個值,而是一些值的其中之一。非确定性解析就是這樣的。對于給定的輸入,它們傳回一個可能解析的集合。那一個有效取決于上下文環境。這些不是真正的函數因為函數隻能傳回一個值。
  • “Functions” with side effects: They access and modify some external state and, when called repeatedly, return different results depending on that state.
  • 帶副作用的“函數”:它們通路和修改一些外部狀态。這樣重複調用的時候,傳回值依賴于這些狀态。
  • “Functions” that throw exceptions: They are not defined for certain arguments (or states) and they throw exceptions instead of returning a result.
  • 抛出異常的“函數”:對于某些參數(或狀态),它們未定義。同時抛出一個異常而不是傳回一個值。
  • Continuations.
  • 延續體。
  • Interactive input:

    getchar

    is a good example. It’s result depends on what hits the keyboard.
  • 互動式輸入:getchar就是個例子。其結果取決于鍵盤上敲什麼;
  • Interactive output:

    putchar

    is a good example. It’s not a function because it doesn’t return anything (if it returns an error code, it doesn’t depend on the argument). Still, it can’t be just optimized away if we want any output to appear on the screen.
  • 互動式輸出:putchar就是個例子。它不傳回任何東西(它可以傳回錯誤嗎,但是這不是決定于輸入參數),所有不是函數。另外,如果你想在螢幕上看到任何輸出的,這個也絕對不能被優化掉。

The amazing thing is that, using some creative thinking, all these problems can be solved using pure functional methods. I won’t discuss all of them, but I’ll show you a series of progressively more interesting examples.

最令人興奮的事情是,使用一些創新型的思維,所有這些問題都可用純函數是的方法解決。我不會全部讨論這些,而是給出一系列漸進而且有趣的例子。

Error Propagation and Exceptions

錯誤傳播和異常

Who doesn’t deal with error propagation? Let’s dissect the problem: You are defining a computation that can return a valid result only for a subset of possible arguments. So what happens when the argument is “wrong”? Well, you can return an error.

誰不會遇到錯誤的傳播問題呢?我們來讨論這個問題。你可以定義一個計算過程,對僅僅有效的的輸入參數子集才傳回一個有效的值。那麼當參數是“錯誤”的時候怎麼辦呢?你可以傳回一個錯誤。

In the simplest case you might add one extra bit, success or failure, to your result. Sometimes you can spare a bit in the result type: If the correct result is always a non-negative number, you may return negative one as an error. Or, if the result is a pointer, you might return a null pointer instead. But these are just small hacks introduced for the sake of performance. And, like most hacks, they can lead to dangerous code. Consider this (likely incorrect) code:

簡單的情況下,可以在傳回值中增加額外的bit來表征成功和失敗。有時候可以在傳回值中設定一些備用位。如果結果總是非負數字,你可以傳回負一作為錯誤;如果結果是指針,那麼你可以傳回空指針。但是這不過是小點的蠻幹罷了。和大多數的蠻幹類似,這也會導緻危險的代碼,看下面的(看似正确)的例子:

size_t off = fileName.find('.');      
string ext = fileName.substr(off, fileName.length() - off);      

If

fileName

contains no dot, the result is a special value

npos

signifying an error. The problem is that

npos

is of the same type as a non-error result, so it can be passed quietly to

string::substr

causing undefined behavior.

如果fileName不包含點,傳回特殊值npos來表征錯誤/問題在于npos和非錯誤的傳回值是一樣類型的。是以可以毫不聲響地傳給string::subst,最後導緻為定義的行為。

A more general and safer solution is to change– extend– the type of the result to include an error bit. You can do it for any type of result. In Haskell, you just define a type constructor called

Maybe

:

一個更通用,更安全的方案是去改變——擴充——傳回值的類型來包含錯誤位。對任何的錯誤都可以這樣做。在Haskell中,定義一個叫做Maybe的類型構造器:

data Maybe a = Nothing | Just a      

It takes an arbitrary type

a

and defines a new type that adds the special value

Nothing

to the set of possible values of

a

. In C++ that would be equivalent to a template:

它接受任意類型a然後通過把a的可能取值的集合加一個特殊值Nothing來定義一個新的類型。C++中,它等價于如下模闆:

template<class T>      
struct Maybe {      
    T just; // valid if 'nothing' is false      
    bool nothing;      
};      

(This is just an example. I’m not arguing that

Maybe

would be very useful in C++ which has other mechanisms for error propagation.)

(這僅僅用作例子,我并不是争辯說Maybe在C++中非常有用,C++有處理錯誤傳播的其他機制。)

So here we have a way to turn a computation that’s not defined for all values into a function that’s defined over the whole domain, but which returns a new richer type.

是以現在我們有了一種方法,可以把并不是對所有的值都有定義的函數轉換成一個在所有的域上都有定義的函數,不過傳回一個新的更豐富的類型。

The next question is, do these new things– functions returning

Maybe

– compose? What should the caller do with the result of such a function when it’s part of a larger computation? One thing is for sure, this result cannot be passed directly to an unsuspecting function–the error cannot be easily ignored– which is good. If we replace

find

by a new function

safe_find

that returns a

Maybe<size_t>

, the client won’t call

substr

with it. The types wouldn’t match. Instead, the result of

safe_find

must be unpacked and (much more likely than before) tested.

接下來的問題是,這個傳回Maybe的新東西可以組合嗎?當調用者本身是更大的計算的一部分的時候如何這樣的函數的傳回值呢?可以确定的一件事是,傳回值不能直接傳遞給一個對此一無所知的函數,否則錯誤很容易被忽略。這對我們來說是好事。如果我們用傳回Maybe<size_t>的safe_find來替換find的話,因為類型不比對,客戶代碼就不能調用substr了。以此,我們需要解包safe_find的傳回值并且做測試判斷(和先前很類似)。

Maybe<size_t> off = safe_find(fileName, '.');      
string ext;      
if (!off.nothing)      
    ext = fileName.substr(off.just, fileName.length() - off.just);      

Notice what happened here: By changing the return type of a function we broke the natural way functions compose– the result of one becoming the input of the next. On the other hand, we have come up with a new way of composing such functions. Let me chain a few of such compositions for better effect:

注意這兒的情況:由于改變了函數的傳回類型,我們也破壞了原先組合函數的自然方法——把一個函數的輸出作為另一函數的輸入。但是與此同時,我們也得到了另一種新的組合這樣函數的方式。我們把這樣的組合串起來使其效果明顯。

Maybe<Foo> foo = f(x);      
if (!foo.nothing) {      
    Maybe<Bar> bar = g(foo.just);      
    if (!bar.nothing) {      
        Maybe baz = h(bar.just);      
        if (!baz.nothing) {      
            ...      
        }      
    }      
}      

I have highlighted the elements of the “glue,” which is used to compose our new, more error-conscious functions. Notice how this boilerplate glue gets in the way of code clarity. Ideally, we would like to write something like this:

我把“膠水”代碼高亮顯示,這些代碼用來組合我們這些包含錯誤處理的新函數。注意那些樣闆代碼是如何直覺地把事情搞定的。理想情況下,我們最好可以寫如下的代碼

DO      
{      
    auto y = f(x);      
    auto v = g(y);      
    auto z = h(v);      
    return z;      
}      

where

DO

would magically provide the glue that’s implicit in the chaining of

f

,

g

, and

h

.

這裡DO會魔術般地把f,g和h隐式地串聯起來。

It’s not easy to do this– abstract the glue away– in C++. I’m not even going to try. But in Haskell it’s a different story. Let’s start with the almost direct translation of the C++ code into Haskell:

在C++中把這些膠水代碼抽象出去不大容易,我甚至不想去嘗試。不過在Haskell中情況就不同了。我們先從最把C++代碼直接翻譯成Haskell代碼開始:

compose n =      
    let m = f n in      
    case m of      
    Nothing -> Nothing      
    Just n1 ->      
        let m1 = g n1 in      
        case m1 of      
        Nothing -> Nothing      
        Just n2 ->      
            let n3 = h n2 in      
            n3      

The if statements are replaced by Haskell’s pattern matching (

case x of

). The

m

s are the

Maybe

s and the

n

s art their contents (if they aren’t

Nothing

s).

If語句被Haskell的模式比對(case x of)所取代。ms表示一些Maybe而ns表示其内容(如果内容不是Nothing)。

Notice the characteristic cascading style of this code– the nested conditionals or pattern matches. Let’s analyze one level of such a cascade. We start with a

Maybe

value (one returned by

f n

). We unpack it and examine the contents. If the result is not an error (

Just n1

), we continue with the rest of the body of

compose

. I have highlighted the “rest of the body” in blue (the “glue” is still in red).

注意這些代碼非常明顯的層疊式風格,無論是嵌套的條件判斷還是模式比對。我們分析一層這樣的層疊。從一個Maybe值開始(由f n所傳回),我們首先解包,然後處理其内容。如果不是錯誤(Just n1)則繼續compose函數的後面部分。我們後面的這部分用藍色高亮顯示(膠水代碼依舊使用紅色)。

What’s also important is that, if the result is an error (the

Nothing

branch of the pattern match) the whole “rest of the body” is skipped. In order to abstract the glue, we have to abstract the “rest of the body” as well. Such an abstraction is called a continuation. Let’s write this continuation as a lambda (lambdas in Haskell are written using the backslash, which is nothing but the Greek letter λ missing one leg):

另一個重要點在于,如果結果是錯誤(模式比對的Nothing分支),整個“後面部分”都直接挑過。要想抽象這些輔助代碼,我們先得抽象“後面部分”。這樣的抽象稱為“延續體”。現在把這個延續體寫為lambda運算(Haskell中lambda使用反斜線開始,也就是少了一條腿的希臘字母λ):

/ n1 ->      
    let m1 = g n1 in      
    case m1 of      
    Nothing -> Nothing      
    Just n2 ->      
        let n3 = h n2 in      
        n3      

And here’s the trick: We can abstract the glue as a (higher-order) function that takes a

Maybe

value and a continuation. Let’s call this new function

bind

and rewrite

compose

with its help:

技巧性在于,我們可以把輔助的膠水代碼抽象為一個帶有Maybe值的(高階)函數。現在把這個函數叫做bind,然後使用它重寫compose:

compose n =      
    let m = f n in      
    -- the first argument is m, the second is the whole continuation      
    bind m (/n1 ->      
                let m1 = g n1 in      
                case m1 of      
                Nothing -> Nothing      
                Just n2 ->      
                    let n3 = h n2 in      
                    n3)      

Here’s how

bind

is implemented:

下面的bind的實作:

bind m f =      
    case m of      
    Nothing -> Nothing  -- bypass the continuation      
    Just n -> f n       -- pass n to the continuation      

or, more compactly,

或者更精簡的形式,

bind Nothing cont  = Nothing      
bind (Just n) cont = cont n      

Figure 1 illustrates the complete code transformation. The result of

f n

, which is a

Maybe

, is passed to

bind

, represented by a blue box. Inside

bind

the

Maybe

is unpacked. If its value is

Nothing

, nothing happens. If its value is

Just n1

, the rest of the code, the continuation, is called with the argument

n1

. The continuation itself is rewritten using

bind

, and so on. The final continuation calls

return

, which I will explain shortly.

圖1展示了完整的代碼轉換。f n的調用結果是Maybe,然後傳給bind,bind用藍色的框表示。在bind内部,Maybe首先解包。如果是Nothing,一切早就;如果是Just n1,就會用參數n1調用餘下部分代碼,也就是延續體。最後一個延續體調用return,這個稍後解釋。

Monads for the Curious Programmer, Part 2 (中英文對照版) Challenges of Functional Programming函數式程式設計的挑戰Error Propagation and Exceptions錯誤傳播和異常Non-deterministic Computations非确定性計算Conclusions結論Bibliography文獻

Fig 1. Composition of functions that return Maybe results

圖1 傳回Maybe結果的函數組合

The position of

bind

, the blue box in Figure 1, between its

Maybe

argument and the continuation suggests that infix notation might be more appropriate. Indeed, in Haskell

bind

is represented by the infix operator,

>>=

:

Bind的位置,圖一中藍色的框,處于其參數Maybe和延續體之間,這暗示着中綴表示法可能更合适。事實上,Haskell中bind就是用中綴操作符>>=來表達的:

Nothing  >>= cont = Nothing      
(Just x) >>= cont = cont x      

(The left-hand side of the equal sign is the operator between its two arguments [actually, patterns], and the right-hand side is the body of the function.) We can express the type signature of the bind function as:

(等号左邊的是其兩個參數『亦即模式』的操作符,等号右邊的是函數體)我們可以把bind函數的類型簽名表示為:

(>>=) :: Maybe a -> (a -> Maybe b) -> Maybe b      

It takes a

Maybe a

and a function from

a

to

Maybe b

(that’s our continuation). It returns a

Maybe b

. When dealing with higher order functions I find type signatures extremely useful; so much so that I often use them as comments in my C++ meta-code.

它帶兩個參數,Maybe a和從a到Maybe b的映射的函數(延續體),傳回Maybe b。處理高階函數的時候,我發現類型簽名極其有用,以至于我常常把它們用作C++元代碼的注釋。

The complete rewrite of

compose

using

>>=

looks like this:

使用>>=完全重寫compose後是這個樣子

compose1 n =      
    f n >>= /n1 ->      
        g n1 >>= /n2 ->      
            h n2 >>= /n3 ->      
                return n3      

Now is the time to explain the mysterious

return

at the end of the function. No, it’s not the keyword for returning a result from a function. It’s a function that takes an argument of type

a

and turns it into a

Maybe a

:

顯示可以解釋函數末尾那個神秘的return了。這不是一個從函數中傳回值的關鍵字。它是一個函數,以類型a為參數,把它轉換為Maybe a:

return n = Just n      

We need

return

because the result of

compose

is a

Maybe

. If any of the functions returns

Nothing

,

compose

returns

Nothing

(it’s part of the definition of

>>=

). Only when all functions succeed do we call

return

with the correct result,

n3

. It turns

n3

into

Just n3

, thus announcing the success of the computation and encapsulating the final result.

Return之是以必要是因為compose的結果是Maybe類型。如果其中任何一個傳回Nothing,compose就傳回Nothing(這是>>=定義的一部分)。僅當所有的函數調用成功,我們才會和n3調用return。它把n3轉化為Just n3然後宣稱計算成功并封裝了最終的結果。

You might say that using

return

instead of

Just n3

is an overkill, but there are good reasons to do that. One is to encapsulate and hide direct access to the implementation of

Maybe

. The other has to do with the ways this pattern is generalized beyond

Maybe

.

你可能會說使用return而不是直接的Just n3有點小題大做,但是這樣做是有充分理由的。其一是封裝并隐藏對Maybe實作的直接通路;其二是這是種通用的處理方式,适用于除了Maybe的其他情況。

The Maybe Monad

Maybe單子

What does

Maybe

have to do with monads? Let’s see what we have done so far from a more general perspective.

這個Maybe和單子有什麼關聯呢?我們從更一般的角度看一下我們所做的一切。

We started from a computation which didn’t behave like a function– here, it was not defined for all arguments. We had found a clever way to turn it into a function by enriching its return type. Such general “enriching” of types can be expressed as a type constructor.

我們從一個行為不像函數(并沒有對所有的參數都有定義)的計算開始。我們找到一種巧妙的辦法,通過強化它的傳回值類型來使之變成一個函數。這種類型強化方法可以使用類型構造器來表達。

From a category-theoretical point of view, defining a type constructor, which is a mapping from types to types, gets us half way towards defining an endofunctor. The other part is the mapping of functions (morphisms). For any function that takes values of type

a

and returns values of type

b

we need a corresponding function that acts between the enriched types. In Haskell, the mapping of functions to functions can be expressed as a higher-order function. When this mapping is part of an endofunctor, the corresponding higher-order function is called

fmap

.

從範疇論的角度看,定義類型構造器(比一個類型對應到另一個類型)已經讓我們走在了通往自函子的大道上。另一部分是函數(映射)的對應。對于任何的帶類型為a的參數,并且傳回類型為b的函數,我們需要一個相應的作用在增強的類型上的函數。Haskell中,函數到函數的對應可以表達為稿件函數。當這種對應是自函子的一部分的時候,對應的高階函數被稱為fmap。

Let’s call the type constructor

M

. It constructs the enriched type

M a

from any type

a

. The function

fmap

has the following signature:

我們把類型構造器稱為M。它從任意的類型a可以構造一個增強的類型M a。函數fmap有如下簽名:

fmap :: (a -> b) -> (M a -> M b)      

It maps a function from

a

to

b

to a function from

M a

to

M b

. In the case of

Maybe

we already know the type constructor part, and

fmap

will follow soon.

它把一個從a到b的函數對應到一個從M a到M b的函數。在Maybe的例子中,已經知道類型構造器部分,fmap部分下面給出。

Let’s go back to our computation turned function returning an enriched type. In general, its signature is:

我們先回到傳回增強類型的函數。它的簽名一般是:

f :: a -> M b      

Since we care about composability of computations, we need a way to glue the enriched functions together. In the process of gluing we usually have to inspect the enriched type and make decisions based on its value. One of such decisions might be whether to continue with the rest of the computation or not. So the general

bind

operation must take two arguments: the result of the previous enriched function and the rest of the computation– a continuation:

因為我們隻關心計算的可組合性,我們需要一種機制把增強的函數黏在一起。在黏結的過程中,我們一般要檢查增強的類型,并且依據其值作出判斷。其中的一種判斷是要不要繼續執行餘下的計算。是以泛化的bind操作符必須帶兩個參數:前一個增強函數的傳回值以及後面的計算部分——延續體。

bind :: M a -> (a -> M b) -> M b      

If you squint enough, you can see a similarity between

bind

and

fmap

.

多看幾眼,就會發現bind和fmap的相似之處。

Here’s how you do it: Imagine that you supply

bind

with its second argument, the continuation. There is still one free parameter of the type

M a

. So what’s left is a function that takes

M a

and returns

M b

. (This is not exactly currying, since we are binding the second argument rather than the first, but the idea is the same.) That’s exactly the type of function that occurs on the right hand side of the signature of

fmap

. Seen from that point of view,

bind

maps functions into functions:

你這一這麼做。想象你僅僅提供第二個參數,就是延續體那部分。現在還有一個空餘的類型為M a的參數。是以我們得到的是一個帶一個M a類型的參數,傳回M b類型的函數。(這不是真正意義上的Currying,我們綁定了第二個而非第一個參數,不過思想是一緻的)這恰恰是fmap簽名右邊的函數的類型。從這個觀點看,bind把函數對應到函數。

(a -> M b) -> (M a -> M b)      

The function of the left hand side is our continuation, the one on the right hand side is

bind

with the second argument plugged by the continuation.

左邊的函數就是延續體,右邊的一個是bind,其第二個參數已經被延續體占據。

This signature is really close to that of

fmap

, but not quite. Luckily, we still have a second family of functions at our disposal, the polymorphic

return

. The signature of

return

is:

這個簽名和fmap非常相近,但是還有點不一樣。幸運的是,我們還有另一簇沒有提及的函數,那就是多态的return。Return的簽名是

return :: a -> M a      

Using both

bind

and

return

we can lift any function

f

:

把bind和return聯合起來,我們可以提升任何函數f:

f :: a -> b      

to a function

g

:

為函數g:

g :: M a -> M b      

Here’s the magic formula that defines

g

in terms of

f

,

bind

, and

return

(the dot denotes regular function composition):

這就是使用f,bind和return來定義g的神奇公式(點表示一般的函數組合)

g ma = bind ma (return . f)      

Or, using infix notation:

使用中綴形式:

g ma = ma >>= (return . f)      

This mapping of

f

into

g

becomes our definition of

fmap

. Of course,

fmap

must obey some axioms, in particular it must preserve function composition and interact correctly with unit (of which shortly). These axioms translate back into corresponding conditions on

bind

and

return

, which I’m not going to discuss here.

從f到g的對應變成了fmap的定義。當然,fmap必須遵循一些基本的原則,特别它必須保證函數的可組合性以及與(其)單元的正确互動。這些公理翻譯成相應的關于bind和return的條件TODO。 這些這裡不會讨論。

Now that we have recovered the functor, we need the two other parts of a monad: unit and join. Guess what,

return

is just another name for unit. For any type, it lifts an element of that type to an element of the enriched type in the most natural manner (incidentally, the word natural has a very precise meaning in category theory; both unit and join are natural transformation of the functor (M, fmap)).

現在我們找回了函子,我們還需要單子的另外兩部分:單元和聯合。猜猜怎麼樣,return就是單元的另一個名字。對于任何類型,它以一種最自然的方式把該類型元素提升為增強類型的元素(很是巧合,自然這個詞在範疇論中有非常準确的含義,單元和聯合是函子(M, fmap)的自然轉換)。

To recover join, let’s look at its type signature in Haskell:

要找回聯合,先看看Haskell中它的類型簽名:

join :: M (M a) -> M a      

It collapses one layer of the type constructor. In the case of

Maybe

it should map a double

Maybe

to a single

Maybe

. It’s easy to figure out that it should map

Just (Just x)

into

Just x

and anything else to

Nothing

. But there is a more general way of defining

join

using

bind

combined with the identity function:

這是重疊的類型構造器。在Maybe的例子中,它把雙Maybe對應到單Maybe。很容易發現它會把just (Just x)對應到Just x,把其他所有都對應到Nothing。但是還有一種更通用的适應bind和機關元函數定義聯合的方式

join :: M (M a) -> M a      
join mmx = mmx >>= id      

We are binding a value of type

M (M a)

to a function

id

. This function, when acting on

M a

returns

M a

. You can convince yourself that these signatures match the general signature of

bind

; and that this definition, when applied to

Maybe

, produces the correct result.

我們把類型為M (M a)的類型綁定到函數id。該函數對M a傳回M a。現在可以确信這些簽名可以等價于bind的一般簽名。該定義作用于Maybe的時候,也得到了正确的結果。

To summarize: What we have done here is to show that

Maybe

together with

bind

and

return

is a monad in the category-theory sense. Actually, we have shown something far more general: Any triple that consists of a type constructor and two functions

bind

and

return

that obey certain identities (axioms) define a monad. In category theory this triple is called a Kleisli triple and can be used as an alternative definition of a monad.

總結:我們這裡展示了Maybe,再加上bind和return就是一個範疇論意義上的單子。實際上,我們展示更為一般的東西,任何由類型構造器以及兩個函數bind和return組成的滿足機關元公理的三元組都是一個單子。範疇論中這樣的三元組叫做Kleisli三元組,當作單子的另一種定義。

Syntactic Sugar

文法糖衣

So far you’ve seen only the

Maybe

example, so it might seem like we are pulling out a monadic cannon to kill a mosquito. But, as you’ll see later, this same pattern pops up in a lot of places in various guises. In fact it’s so common in functional languages that it acquired its own syntactic sugar. In Haskell, this sugar is called the do notation. Let’s go back to the implementation of our function

compose

:

目前我們隻看了Maybe的例子,這有點像用單子大炮隻為打死一隻蚊子。不過,稍後依舊會看到同樣的模式會在許多地方以不同的方式出現。事實上這種情況在函數式語言中如此普遍,需要專門的簡化文法。Haskell中,這樣的文法糖衣叫做do記号。回過頭去看compose的實作:

compose n =      
    f n >>= /n1 ->      
        g n1 >>= /n2 ->      
            h n2 >>= /n3 ->      
                return n3      

Here’s exactly the same code in do notation:

這是使用do記号的相同的代碼:

compose n = do      
    n1 <- f n      
    n2 <- g n1      
    n3 <- h n2      
    return n3      

This looks deceptively similar to an imperative program. In fact here’s a C++ version of it (for the sake of the argument let’s assume that

f

takes a pointer to

Foo

and

h

returns an integer) :

非常有欺騙性,這個看起來很像指令式語言。實際上C++的版本就是(假設f帶一個指向Foo的指針而h傳回一個整數):

int compose(Foo * n)      
{      
    auto n1 = f(n);      
    auto n2 = g(n1);      
    auto n3 = h(n2);      
    return n3;      
}      

Uh, one more thing. This is how you would call it:

哦,還有一點。你應該這樣調用:

try {      
    compose(pFoo);      
}      

In C++ you get virtually the same functionality not by modifying the return type, but by throwing an exception. (Now you may utter your first cry, “Hey, that looks like a monad!”, when looking at C++ code.)

C++中你實際上并不是通過修改傳回值類型而是抛出異常而得到相同的結果(現在你看到C++代碼的時候,就可以驚歎說“嗨,這不就是單子嘛!”)。

Just like in our Haskell example, once any of the functions reports an error (throws an exception), the rest of the body of 

compose

 is skipped. You might say that C++ exceptions offer more power than the 

Maybe

 monad and you’ll be right. But that’s because I haven’t shown you the 

Error

 monad and the 

Exception

 monad.

和Haskell的例子一樣。一旦其中一個函數報告一個錯誤(抛出異常),compose的餘下部分就會被跳過。你可能要說C++異常要比Maybe單子更強大,這一點是你的對的。僅僅因為我還沒有給你展示錯誤單子和異常單子。

Where Haskell’s 

Exception

 monad beats C++ exceptions is in type checking. Remember the unfortunate attempt at adding exceptions specification to the C++ type system? (Java went the same way.) Here’s what the C++ standard says:

Haskell異常單子勝過C++異常在于其類型檢查。還記得曾經往C++中增加異正常範的不幸的嘗試嗎(Java重蹈了C++的覆轍)?看看C++标準是怎麼說的:

An implementation shall not reject an expression merely because when executed it throws or might throw an exception that the containing function does not allow.

實作不應該僅僅因為一個表達式在執行的時抛出或者可能抛出一個其容器函數不允許的異常而拒絕它。

In other words, exceptions specifications are bona fide comments. Not so in Haskell! If a function returns a 

Maybe

 or 

Exception

, that becomes part of its type signature which is checked both at the call site and at the function definition site. No cheating is allowed, period.

換句話說,異正常範不過是一個誠實的注釋而已。Haskell卻不是這樣!當一個函數傳回Maybe或異常的時候,這就是其類型簽名的一部分,在調用和定義的時候都會被檢查。沒有協商的餘地,句号。

But the major difference between Haskell’s and C++’s approach is that the 

do

 notation generalizes to all monads, whereas C++ neat 

try/catch

 syntax applies only to exceptions.

不過Haskell和C++的方法的最大的差別是do記号一般化了所有的單子,而C++齊整的try/catch文法僅僅适用于異常。

A word of caution when reading Haskell monadic code. Despite similarities in usage, Haskell’s left arrow is not an assignment. The left hand side identifier corresponds to the argument of the continuation (which is the rest of the 

do

 block). It is the result of unpacking the outcome of the right hand side expression. This unpacking always happens inside 

bind

.

閱讀Haskell單子型的代碼的要注意一點。除去用法上的相似,Haskell的左箭頭并不是指派。左邊的辨別符對應于延續體的參數(也就是do代碼塊餘下的部分);它是右邊的表達式執行後的解包結果。解包過程總是在隐藏的bind中實作。

Non-deterministic Computations

非确定性計算

In the previous post I introduced the list monad by defining a functor and two families of functions. The type constructor part of the functor mapped any type 

a

 into the list of 

a

[a]

. The part that acted on functions (now we know that, in general, it’s called 

fmap

; but for lists it’s just 

map

) worked by applying the function to each element of the list. The 

unit

 and the 

join

 were polymorphic functions. 

unit

 was defined as:

上一篇文章中,通過定義一個函子以及兩簇函數,我引出了清單單子。函子的類型構造器部分把任意類型a對應到a的清單[a]。作用在函數上的那部分(我們現在知道,一般來說叫做fmap,但是對于清單,就叫map)通過在清單中的每一個元素上調用起作用。機關和聯合是多态函數。機關定義如下:

 unit x = [x]      

and 

join

 as 

concat

.

而聯合就是concat。

Now I can tell you that the list monad provides the solution to the problem of implementing non-deterministic computations. Imagine parsing a non-deterministic grammar. A production might offer multiple alternative parse trees for the same input.

現在我和你說清單單子提供了處理非确定性計算問題的辦法。想想非确定性文法解析。對于同樣的輸入,一個過程可能産生多個可互換的解析樹。

These types of computations may be simulated with functions returning lists of possible results. Here’s the same trick again: whatever type is returned by a deterministic parser (e.g., a parse-tree type), it’s now turned into an enriched type (a parse tree list).

這種類型的計算可以通過傳回一個可能結果的清單來模拟。還是那個技巧,确定性的解析器傳回的任何結果類型(如解析樹類型),都被轉化為強化的類型(解析樹清單)。

We’ve already seen the category-theory construction of the list monad but here we are facing a slightly different problem: how to string together functions that return lists. We know that we have to define 

bind

. It’s signature, in this case, would be:

我們已經看到清單的單子的範疇論構造,不過稍有不同:如何把傳回清單的函數鍊在一起。我們知道必須定義bind。這種情況下,其簽名應該是:

bind :: [a] -> (a -> [b]) -> [b]      

It takes a list (which is the result of a previous function) and a continuation, 

(a -> [b])

, and must produce another list. The obvious thing is to apply the continuation to each element of the input list. But since the continuation also produces a list, we’ll end up with a list of lists. To get a single list as the result, we’ll simply concatenate the sublists. Here’s the final implementation:

它帶一個清單(這是上一個函數的結果)和一個延續體參數(a -> [b]),産生另一個清單。很明顯,要在輸入清單的每一個函數上應用延續體。不過由于延續體的輸出是清單,我們最後得到的是清單的清單。要得到一個簡單清單,我們需要把子清單連接配接在一起。這是最後的實作:

xs >>= cont = concat (map cont xs)      

The application of the continuation to each element of the input list is done using 

map

.

在輸入清單上應用延續體的操作是通過map來完成的。

Is this a coincidence that we were able to define bind in therms of join and fmap? Not at all. The general formula for converting the functor definition of a monad to a Kleisli triple is:

使用聯合和fmap來定義并bind是一個巧合嗎?完全不是。把單子的函子定義轉換為Kleisli三元組的一般公式是:

  • Take the object-mapping part of the functor (the type constructor)
  • 取函子的映射對象的部分(類型構造器)
  • Define bind as
  • 把bind定義為
bind x f = join ((fmap f) x))      

where 

fmap

 is the part of the functor that maps morphisms.

這兒fmap是函子中把映射對應起來的那部分

  • Define return as unit
  • 把return定義為單元

Now we know how to move between those two definitions back and forth.

現在我們可以在這兩種定義之間來回移動了。

Just as in the case of 

Maybe

, we can apply the do notation to functions returning lists:

就和Maybe的情況一樣,對于傳回清單的函數也可以使用do記号:

toss2Dice = do      
    n <- tossDie      
    m <- tossDie      
    return n + m      

Here, 

tossDie

 returns the list of all possible outcomes of a die toss, and 

toss2Dice

 returns the list of all possible sums of the outcomes of a two-die toss.

這裡tossDie傳回仍一個色子的所有可能輸出的清單,而toss2Dice傳回仍兩個色子的結果的所有可能和的清單。

An interesting observation is that the list monad is closely related to list comprehensions, down to the use of left arrows (in Haskell). For instance, the above example is equivalent to:

一個有趣的特點是清單單子和清單推斷很相關。例如,上面的例子等價于:

toss2Dice = [n + m | n <- tossDie, m <- tossDie]      

Conclusions

結論

There is a large class of computations that convert input to output in a non-functional way. Many of those can be expressed as functions that return “enriched” output. This enrichment of the output can be expressed in terms of a type constructor. This type constructor defines the first component of a monad.

存在大量的以非函數的方式對輸入進行處理的計算。其中很多都可以表達為傳回一個“增強”的傳回類型的函數。這種輸出類型的增強可以使用類型構造器來比表達,它定義了單子的第一個元件。

Computations have to be composable, so we need a way of composing the enriched functions. This introduces the second component of the monad, the 

bind

 family of higher-order functions.

計算必須可組合,是以我們需要要中這增強的函數方法。這引出了單子的第二個元件,高階函數的bind函數簇。

Finally, we should be able to construct functions that return enriched types, and for that we need the third component of the monad, the 

return

 family of functions. They convert regular values into their enriched counterparts.

最後,我們應當構造傳回增強類型的函數,為此,我們需要單子的第三個元件,函數的return函數簇。它們把普通的類型轉換為增強的對應類型。

I’ve shown you two examples of Haskell monads in action, but the real treat will come in the third installment of my mini-series. I’ll show you how to define and compose functions that, instead of returning regular values, return functions that calculate those values.

我想你展示了兩個Haskell單子的實際例子,不過真正的重點是在我這裡迷你系列的第三部分。我将向你展示如何定群組合函數,不是傳回正常類型,而是傳回計算這些類型值的函數。

Bibliography

文獻

  1. Eugenio Moggi, Notions of Computation and Monads. This is a hard core research paper that started the whole monad movement in functional languages. 這是函數式語言中開啟單子運動的核心論文。
  2. Philip Wadler, Monads for Functional Programming. The classic paper introducing monads into Haskell.該經典論文在Haskell中引入了單子
  3. Tom Adams, What Does Monad Mean?
  4. Brian Beckman, Don’t fear the Monad
  5. Graham Hutton, Erik Meijer, Monadic Parsing in Haskell.
  6. Brian McNamara, Yannis Smaragdakis, Syntax sugar for FC++: lambda, infix, monads, and more. A C++ template library for functional programming that imitates Haskell pretty closely.極好地模拟了Haskell的用于函數式程式設計的C++模闆庫

繼續閱讀