Go字元串拼接方式深入比較

前言

Go中字元串的拼接主要有

"+"

、

fmt.Sprintf

%s

、

strings.Join

等方式，已經有很多人從耗時的角度比較這些方式的性能，本文則從源碼的角度去分析下這些方式的實作方式，再去比較性能。

拼接字元串方式

`"+"`

"+"

是Go中支援的最直接的字元串拼接符。

str := "a"+"b"+"c"
func contact(list []string) string{
    r := ""
    for _,v :=range list{
        r += v
    }
    return r
}

關于

"+"

，我們可以在runtime.go中找到相關的func。其調用的具體細節在cmd/compile/internal/gc/walk.go檔案中，對應操作符

OADDSTR

，其處理func是

addstr

。在拼接的字元串個數小于等于5個時，會直接調用對應的個數的處理

concatstring%n

func，這些func均在/runtime/string.go中，然後會調用

concatstring

；大于5個時則會直接調用

concatstring

。有興趣的朋友可以去看下詳細的調用處理。此處主要關注

concatstring

，它負責字元串的具體拼接過程。

// The constant is known to the compiler.
// There is no fundamental theory behind this number.
const tmpStringBufSize = 32

type tmpBuf [tmpStringBufSize]byte
// concatstrings implements a Go string concatenation x+y+z+...
// The operands are passed in the slice a.
// If buf != nil, the compiler has determined that the result does not
// escape the calling function, so the string data can be stored in buf
// if small enough.
func concatstrings(buf *tmpBuf, a []string) string {
    idx := 0
    l := 0
    count := 0
    for i, x := range a {
        n := len(x)
        if n == 0 {
            continue
        }
        if l+n < l {
            throw("string concatenation too long")
        }
        l += n
        count++
        idx = i
    }
    if count == 0 {
        return ""
    }

    // If there is just one string and either it is not on the stack
    // or our result does not escape the calling frame (buf != nil),
    // then we can return that string directly.
    if count == 1 && (buf != nil || !stringDataOnStack(a[idx])) {
        return a[idx]
    }
    s, b := rawstringtmp(buf, l)
    for _, x := range a {
        copy(b, x)
        b = b[len(x):]
    }
    return s
}
func rawstringtmp(buf *tmpBuf, l int) (s string, b []byte) {
    if buf != nil && l <= len(buf) {
        b = buf[:l]
        s = slicebytetostringtmp(b)
    } else {
        s, b = rawstring(l)
    }
    return
}
func slicebytetostringtmp(b []byte) string {
    ...
    return *(*string)(unsafe.Pointer(&b))
}
func rawstring(size int) (s string, b []byte) {
    p := mallocgc(uintptr(size), nil, false)

    stringStructOf(&s).str = p
    stringStructOf(&s).len = size

    *(*slice)(unsafe.Pointer(&b)) = slice{p, size, size}

    return
}

根據func的注釋，也可以看出concatstrings就是實作

"+"

的func。參數

a []string

是将多個

連接配接的字元串組裝成slice傳入。

看下處理過程：

計算所有字元串的總長度l，記錄非空字元串的個數，記錄字元串的位置，當總長溢出時報錯。
若非空字元串個數為0，傳回空字元 "" 。
若隻有一個非空字元串，且沒有存儲在buf中或數組還存儲在目前goroutine的棧中，則根據字元的位置直接傳回對應位置的字元串。
建立字元串s及字元串指向的位元組數組b，修改b則改變s的值。

如果buf！=nil且總長度小于32位，則取b=buf[:l]即可存儲所有資料，s指向位元組數組b;
否則，直接根據總長度配置設定記憶體建立字元串，并将位址指向位元組數組b.

逐個将資料拷貝至b中，傳回s即可。

需要注意的是：

當一個表達式中存在多個
'+' 時，會封裝參數至slice中，再調用concatstrings處理，而不是每個 '+'
都調用一遍。

對于靜态的字元串，如str := x+ “a”+“b”+“c”，在編譯後直接合并，會處理成str:=x+“abc”

buf在結果不會逃逸出調用func時才不會為nil，且其長度為32個位元組，僅能存儲長度較小的字元串

concatstrings最多重新配置設定記憶體一次

`fmt.Sprintf`

fmt.Sprintf

是fmt包中根據格式符将資料轉換為string，拼接字元串時使用的格式符為

%s

，用以連接配接字元串。

具體源碼如下，本文僅關注

%s

的部分，無關的源碼部分已忽略。

// Sprintf formats according to a format specifier and returns the resulting string.
func Sprintf(format string, a ...interface{}) string {
    p := newPrinter()
    p.doPrintf(format, a)
    s := string(p.buf)
    p.free()
    return s
}

func (p *pp) doPrintf(format string, a []interface{}) {
    end := len(format)
    argNum := 0         // we process one argument per non-trivial format
    afterIndex := false // previous item in format was an index like [3].
    p.reordered = false
formatLoop:
    for i := 0; i < end; {
        p.goodArgNum = true
        lasti := i
        for i < end && format[i] != '%' {
            i++
        }
        if i > lasti {
            p.buf.writeString(format[lasti:i])//寫入'%'前的字元串
        }
        if i >= end {//結束
            // done processing format string
            break
        }

        // Process one verb
        i++

        // Do we have flags?
        p.fmt.clearflags()
    simpleFormat:
        for ; i < end; i++ {
            c := format[i]
            switch c {
            ...
            default:
                // Fast path for common case of ascii lower case simple verbs
                // without precision or width or argument indices.
                if 'a' <= c && c <= 'z' && argNum < len(a) {
                    if c == 'v' {
                        // Go syntax
                        p.fmt.sharpV = p.fmt.sharp
                        p.fmt.sharp = false
                        // Struct-field syntax
                        p.fmt.plusV = p.fmt.plus
                        p.fmt.plus = false
                    }
                    p.printArg(a[argNum], rune(c))
                    argNum++
                    i++
                    continue formatLoop
                }
                // Format is more complex than simple flags and a verb or is malformed.
                break simpleFormat
            }
        }
    ...
}

func (p *pp) printArg(arg interface{}, verb rune) {
    ...
        case string:
        p.fmtString(f, verb)
    ...
}

func (p *pp) fmtString(v string, verb rune) {
    switch verb {
    ...
    case 's':
        p.fmt.fmtS(v)
    ...
    }
}

func (f *fmt) fmtS(s string) {
    s = f.truncateString(s)//轉換精度，僅用于number，字元串可忽略
    f.padString(s)
}

// padString appends s to f.buf, padded on left (!f.minus) or right (f.minus).
func (f *fmt) padString(s string) {
    if !f.widPresent || f.wid == 0 {//僅在format number時使用
        f.buf.writeString(s)
        return
    }
    width := f.wid - utf8.RuneCountInString(s)//僅用%s，f.width=0，是以width<0
    if !f.minus {//f.minus僅在存在負數時為true
        // left padding
        f.writePadding(width)
        f.buf.writeString(s)
    } else {
        // right padding
        f.buf.writeString(s)//寫入
        f.writePadding(width)//此處無padding
    }
}

func (b *buffer) writeString(s string) {
    *b = append(*b, s...)
}

// writePadding generates n bytes of padding.
func (f *fmt) writePadding(n int) {
    if n <= 0 { // No padding bytes needed.
        return
    }
    ...
}

對于僅拼接字元串的處理過程為：

依次查找 '%' 的位置， '%' 前的資料append至buf中
根據其後的format，确認處理過程，拼接字元串使用的是 %s ，處理過程一個 %s 對應一個string
append追加字元串至buf中（會面臨頻繁擴容的問題）
将buf轉為string

注意：fmt.Sprintf并沒有計算字元串的總長度，而是針對每個 %s 進行處理，每個 %s 的處理最終都會調用append，而使用append可能會出現擴容的問題，尤其是多個字元串時，可能會出現多次擴容的情況。

`strings.Join`

strings.Join

是strings包中針對字元串數組拼接的func，Join支援指定字元串slice間的分隔符。

// Join concatenates the elements of a to create a single string. The separator string
// sep is placed between elements in the resulting string.
func Join(a []string, sep string) string {
    switch len(a) {
    case 0:
        return ""
    case 1:
        return a[0]
    }
    n := len(sep) * (len(a) - 1)
    for i := 0; i < len(a); i++ {
        n += len(a[i])
    }

    var b Builder
    b.Grow(n)
    b.WriteString(a[0])
    for _, s := range a[1:] {
        b.WriteString(sep)
        b.WriteString(s)
    }
    return b.String()
}
// A Builder is used to efficiently build a string using Write methods.
// It minimizes memory copying. The zero value is ready to use.
// Do not copy a non-zero Builder.
type Builder struct {
    addr *Builder // of receiver, to detect copies by value
    buf  []byte
}
// Grow grows b's capacity, if necessary, to guarantee space for
// another n bytes. After Grow(n), at least n bytes can be written to b
// without another allocation. If n is negative, Grow panics.
func (b *Builder) Grow(n int) {
    b.copyCheck()
    if n < 0 {
        panic("strings.Builder.Grow: negative count")
    }
    if cap(b.buf)-len(b.buf) < n {
        b.grow(n)
    }
}
// grow copies the buffer to a new, larger buffer so that there are at least n
// bytes of capacity beyond len(b.buf).
func (b *Builder) grow(n int) {
    buf := make([]byte, len(b.buf), 2*cap(b.buf)+n)
    copy(buf, b.buf)
    b.buf = buf
}
// WriteString appends the contents of s to b's buffer.
// It returns the length of s and a nil error.
func (b *Builder) WriteString(s string) (int, error) {
    b.copyCheck()
    b.buf = append(b.buf, s...)
    return len(s), nil
}

// String returns the accumulated string.
func (b *Builder) String() string {
    return *(*string)(unsafe.Pointer(&b.buf))
}

Join的處理過程：

判斷字元串個數，為0傳回空字元串；為1傳回第一個字元串。
計算分隔符的總長度，再計算拼接後字元串的總長度
如果buf的cap不足以容納所有字元串，進行擴容（建立容量為2*cap(b.buf)+n的新slice，拷貝舊資料至其中)，此時buf足以容納所有資料，後期append無需擴容
依次将資料、分隔符append到buf中
通過指針将buf轉換為string

append僅擴容一次

比較

下面比較三種拼接字元串的優缺點：

`"+"` 拼接字元串

優點：

使用簡單
對短字元串的拼接有性能優勢（結果或參數不escape，總長度不大于32位時會提前配置設定32的buf，這時資料可以存儲在buf中）
一個表達式中有多個 "+" 仍隻處理一次（會将多個拼接的字元串組成成slice再調用 concatstrings ）

缺點：

當資料很多時，多個 "+" 可能會導緻代碼的不簡潔
對于需要多個表達式才能拼接所有字元串的資料，意味着每次都需要調用concatstrings，需要重新計算并配置設定記憶體，一旦資料很多，性能就會變差

`fmt.Sprintf` 拼接字元串

優點：

适用範圍廣，可以将其他類型轉換為字元串
在表示帶有具體意義的資料時更直覺，尤其是帶有描述性字首

缺點：

處理過程相對複雜，多類型的判斷甚至調用反射，影響效率
拼接字元串中并沒有提前計算總長，每次拼接字元串都是使用的append完成，調用append意味着擴容時的記憶體再配置設定及資料拷貝等處理，一旦資料較多時，明顯影響性能

`strings.Join` 拼接字元串

優點：

一次計算總長度，隻需配置設定一次總記憶體，後續無需重新配置設定記憶體
對于同一分隔符時的拼接有很大的便利性

缺點：

對于零散的資料需要主動組裝成slice才能處理
對于不同的分隔符不能直接處理

整體比較

從源碼實作的角度，我們可以得出以下結論:

對于拼接字元串，如果一個表達式可以全部使用

'+'

的方式，則使用

'+'

與

strings.Join

的性能接近，否則其性能不如

strings.Join

，而

fmt.Sprintf

需要經過反射及append的處理，其性能相對來說可能最差。

原因是：三者在拼接字元串過程中，尤其是多個字元串、長度較長的字元串時，

strings.Join

僅需配置設定一次記憶體，

'+'

因使用方式會配置設定一次或多次，

fmt.Sprintf

則針對每個

%s

會調用一次append，可能會配置設定多次。每次重新配置設定都需要進行資料的重新拷貝，都會影響其性能。

當然，對于拼接資料量很少或很短的資料，尤其是零散的資料（

strings.Join

需要組裝資料至slice），三者的效率差異不大，可以按照需求自行決定使用。

整體來說三者的性能：

strings.Join

~=單次

'+'

>>多次

'+'

fmt.Sprintf

總結

本文主要對常見的3種字元串拼接方式，從其實作的角度分析其在使用時的優缺點，進而協助我們在不同情形使用時，選擇合适的字元串拼接方式。

作為建議：

對于零散的少量資料，可以使用 '+' 來拼接資料；
對于少量資料且資料間有解釋性的字首或字尾，可以使用 fmt.Sprintf ；
對于多資料或者slice資料，可以使用 strings.Join

公衆号

鄙人剛剛開通了公衆号，專注于分享Go開發相關内容，望大家感興趣的支援一下，在此特别感謝。

Go字元串拼接方式深入比較

Go字元串拼接方式深入比較

前言

拼接字元串方式

`"+"`

`fmt.Sprintf`

`strings.Join`

比較

`"+"` 拼接字元串

`fmt.Sprintf` 拼接字元串

`strings.Join` 拼接字元串

整體比較

總結

公衆号

繼續閱讀

Golang結構體中的tag

GO語言筆記--結構體1.說明2.工廠函數

Go語言之map與結構體初入map結構體初入（場景map的value通常為struct）

Go初探 (5) – 結構體與接口

go(7)結構體，方法，接口

Go學習筆記: 結構體、方法、接口

Go 結構體方法指針和值的差別

06結構體、接口結構體接口

go語言筆記（結構體、方法、接口）go語言筆記（結構體、方法、接口）

Go中結構體和接口的定義

Rich Domain Model

【Electron】Error: A dynamic link library (DLL) initialization routine failed.

基于jdk1.8的Vector源碼分析

Golang如何寫能進入文檔的測試先看看效果吧可測試的Example結語

Linux之父警告全球程式員：我剛釋出的5.12核心有bug，你們千萬别用

最小化DevOps自動化流程(Golang)

Go字元串拼接方式深入比較

前言

拼接字元串方式

"+"

fmt.Sprintf

strings.Join

比較

"+" 拼接字元串

fmt.Sprintf 拼接字元串

strings.Join 拼接字元串

整體比較

總結

公衆号

繼續閱讀

`"+"`

`fmt.Sprintf`

`strings.Join`

`"+"` 拼接字元串

`fmt.Sprintf` 拼接字元串

`strings.Join` 拼接字元串