background

With the change of business, a large amount of obsolete code has accumulated in the immediate back-end service, and the maintenance cost is high, and code refactoring or even rewriting has been put on the agenda. Compared to Node.js, Golang has certain advantages. Since the immediate backend has been well served, and other businesses have some practice on Go, it is a feasible option to directly use Go to rewrite part of the immediate service. In this process, we can verify the differences between the two languages in the same business, and we can improve the Go-related supporting facilities.

Transformation results

As of now, some non-core services have been rewritten and launched in Go. Compared with the original service, the new version of the service has significantly lower overhead:

50% reduction in interface response time

Golang's practice on the immediate backend

Legacy service response time

New service response time

95% reduction in memory footprint

Trend of memory consumption before and after service replacement

90% reduction in CPU usage

Trend of CPU consumption before and after service replacement

Note: The above performance data is based on the user filtering service, which is a service with a single task and a read is much greater than the write. Since the original implementation has also been optimized in the process of rewriting, the above data is for reference only and does not fully represent the comparison of the real performance of Go and Node.

Retrofit program

Step 1: Rewrite the service

To ensure that the external interface remains unchanged, you need to rewrite the core logic of the entire business. However, in the process of rewriting, there are still some problems:

In the past, most Node services did not explicitly declare the input and output types of the interface, and all relevant fields needed to be found when rewriting.
In the past, most of the code did not contain unit tests, and after rewriting, it was necessary to understand the business requirements and design unit tests.
The old code used a lot of any types, and it took a lot of work to figure out all the possible types. Many types don't need to be very strict in Node, but they can't be deviated in Go.

In short, rewriting is not translation, it requires a deep understanding of the business and a new set of code.

Step 2: Verify the correctness

Since many services don't have full regression testing, relying solely on unit tests is not enough to ensure correctness.

Generally speaking, a read-only interface can verify the correctness of the interface through data matching, that is, compare the output of the old and new services with the same input. For small-scale datasets, you can test by launching two services locally. However, once the data scale is large enough, there is no way to test it completely locally, and one way to do this is to test the traffic replication test.

Because cross-environment calls between services are cumbersome and affect performance, Message Queuing is used to replicate requests for asynchronous matching.

At each response, the original service packages the input and output into messages and sends them to the message queue.
The consuming service in the test environment receives the message and resends the input to the new version of the service.
After the new version of the service responds, the consumer service compares the response body between the two responses and outputs logs if the results are different.
Finally, you only need to download the logs to the local computer and correct the code one by one according to the test data.

Step 3: Grayscale and gradually replace the old service

When you are confident in the correctness of your business, you can gradually launch a new version of the service. Thanks to service splitting, we can replace services without upstream and downstream awareness, and only need to gradually replace the corresponding services with new containers.

Warehouse structure

项目结构是基于 Standard Go Project Layout 的 monorepo：

.
├── build: 构建相关文件，可 symbolic link 至外部
├── tools: 项目自定义工具
├── pkg: 共享代码
│   ├── util
│   └── ...
├── app: 微服务目录
│   ├── hello: 示例服务
│   │   ├── cmd
│   │   │   ├── api
│   │   │   │   └── main.go
│   │   │   ├── cronjob
│   │   │   │   └── main.go
│   │   │   └── consumer
│   │   │       └── main.go
│   │   ├── internal: 具体业务代码一律放在 internal 内，防止被其他服务引用
│   │   │   ├── config
│   │   │   ├── controller
│   │   │   ├── service
│   │   │   └── dao
│   │   └── Dockerfile
│   ├── user: 大业务拆分多个子服务示例
│   │   ├── internal: 子业务间共享代码
│   │   ├── account：账户服务
│   │   │   ├── main.go
│   │   │   └── Dockerfile
│   │   └── profile: 用户主页服务
│   │       ├── main.go
│   │       └── Dockerfile
│   └── ...
├── .drone.yml
├── .golangci.yaml
├── go.mod
└── go.sum

The app catalog contains all the service codes and can be freely hierarchical.
The code shared by all services is placed in the pkg of the root directory.
All external dependencies are declared inside go.mod in the root directory.
Each service or group of services, through the internal directory, exclusively all the following code, avoid being referenced by other services.

Benefits of this model:

When developing, you only need to care about a single code repository to improve development efficiency.
The code of all services can be put together, from an entire set of services with a large function to a service with an operational activity, and can be clearly maintained in the app catalog through a reasonable hierarchical organization.
When modifying the common code, all services that depend on it are guaranteed to be compatible. Even if it's incompatible, the refactoring capabilities provided by the IDE can be easily replaced.

Continuous integration and build

Static checks

项目使用 golangci-lint 静态检查。每一次代码 push,Github Action 会自动运行 golangci-lint,非常快且方便,如果发生了错误会将警告直接 comment 的 PR 上。

Golangci-Lint does not include a lint policy on its own, but it can integrate various linters to achieve very detailed static checks to nip potential errors in the bud.

Test + build the image

For faster builds, we've tried building images on GitHub Actions, which support monorepos well with the matrix feature. However, building an image is relatively time-consuming, and building on GitHub Actions will consume a lot of GitHub Action credits, which will affect normal development work once the quota is used up.

In the end, I chose to build my own Drone, and I can customize complex build strategies through the Drone Configuration Extension.

In general, we want our CI system build strategy to be smart enough to automatically distinguish which code needs to be built and which code needs to be tested. In the early stages of development, I also thought of it by writing scripts to analyze the dependency topology of the entire project, combined with file changes, to find all the affected packages, and then perform tests and builds. It looks great, but the reality is that once you change the public code, almost all services are rebuilt, which is a nightmare. This approach may be better suited for unit testing than packaging.

So, I've now chosen a more drastic strategy of using a Dockerfile as a build flag: if a directory contains a Dockerfile, it means that the directory is "buildable", and if a directory subfile changes (adds or modifies), it means that the Dockerfile is "to be built". Drone will start a pipeline for each Dockerfile to be built.

A few points are worth noting:

Since you need to copy not only the code of the current service but also the shared code when building, you need to set the context directory to the root directory and pass the service directory as a parameter to facilitate the build:

docker build --file app/hello/Dockerfile --build-arg TARGET="./app/hello" .
镜像名会被默认命名为从内向外的文件夹名的拼接,如./app/live/chat/Dockerfile 在构建之后会生成 {registry}/chat-live-app:{branch}-{commitId} 形式的镜像。
All builds (including download dependencies and compilations) are defined by Dockerfiles, which avoids introducing too much logic and reducing flexibility on top of the main CI process. Docker's native caching mechanism also makes builds blazing fast.
One issue is that once the shared code outside of the service catalog changes, Drone can't sense and build the affected services. The solution is to add a specific field to the git commit message to tell Drone to execute the appropriate build.

Configuration management

In the Node project, we usually use node-config to configure different configurations for different environments. There are no ready-made tools in the Go ecosystem that can do the same thing directly, but try to ditch it.

As advocated by the Twelve-Factor principle, we want to configure services through environment variables as much as possible, rather than multiple different configuration files. In fact, in Node projects, in addition to the local development environment, we often use environment variables to configure dynamically, and most test.json/beta.json directly reference production.json.

We divide the configuration into two parts:

Single profile

We define a complete configuration in the service as a file as the basic configuration and can be used in local development.
Dynamic environment variables
When the service is deployed online, we inject environment variables into the configuration based on the basic configuration.

We can write a config.toml in the service catalog (choose any configuration format you like) and write the basic configuration for local development.

# config.toml
port=3000
sentryDsn="https://[email protected]"


[mongodb]
url="mongodb://localhost:27017"
database="db"

When running online, we also need to inject environment variables into the configuration. You can use Netflix/go-env to inject environment variables into the configuration data structure:

type MongoDBConfig struct {
    URL      string `toml:"url" env:"MONGO_URL,MONGO_URL_ACCOUNT"`
    Database string `toml:"database"`
}


type Config struct {
    Port      int            `toml:"port" env:"PORT,default=3000"`
    SentryDSN string         `toml:"sentryDsn"`
    MongoDB   *MongoDBConfig `toml:"mongodb"`
}


//go:embed config.toml
var configToml string


func ParseConfig() (*Config, error) {
  var cfg Config
    if _, err := toml.Decode(configToml, &cfg); err != nil {
        return nil, err
    }
    if _, err := env.UnmarshalFromEnviron(&cfg); err != nil {
        return nil, err
    }
    return &cfg, nil
}

The above code also uses the latest Go1.16 embed function, which only needs one line of Compiler Directive to package any file into the final build binary, and the build image only needs to copy a single executable file, reducing the complexity of building and publishing.

Service calls

Code management

There are multiple language services (Node/Java/Go) in the backend, and the repeated definition of types for each service will cause waste of manpower and inconsistency, so the type is defined through ProtoBuf, and then the corresponding code is generated with protoc, and the client of each language is maintained in a repository.

.
├── go
│   ├── internal: 内部实现，如 http client 封装
│   ├── service
│   │   ├── user
│   │   │   ├── api.go: 接口定义与实现
│   │   │   ├── api_mock.go: 通过 gomock 生成的接口 mock
│   │   │   └── user.pb.go: 通过 protoc 生成的类型文件
│   │   ├── hello
│   │   └── ...
│   ├── go.mod
│   ├── go.sum
│   └── Makefile
├── java
├── proto
│   ├── user
│   │   └── user.proto
│   ├── hello
│   │   └──  hello.proto
│   └── ...
└── Makefile

Each service exposes its interface through an independent package, and each service consists of four parts:

Interface definition
Define the specific call code based on the interface implementation
Based on the interface definition, the mock implementation is generated by gomock
Type codes are generated based on proto

ProtoBuf

As mentioned above, in order to reduce the cost of internal interface docking and maintenance, we chose to use ProtoBuf to define the type and generate the Go type. Although defined using ProtoBuf, data is still passed between services via JSON, and data serialization and deserialization are problematic.

To simplify the conversion between ProtoBuf and JSON, Google provides a package called jsonpb, which converts Enum Name(string) and Value(int32) to and from native json to be compatible with traditional string enum, and also supports oneof types. All of these capabilities are not possible with Go's native json. If you use native JSON to serialize the Proto type, it will cause Enum to fail to output strings and Oneof to fail to output at all.

So, wouldn't it be nice to replace the native JSON with JSONPB in all code? No, JSONPB only supports serialization of proto types:

func Marshal(w io.Writer, m proto.Message) error

Unless all types of external read/write interfaces are defined with ProtoBuf, you can't use jsonpb all the way.

But there's no end to it, and Go's native json defines two interfaces:

// Marshaler is the interface implemented by types that
// can marshal themselves into valid JSON.
type Marshaler interface {
    MarshalJSON() ([]byte, error)
}

// Unmarshaler is the interface implemented by types
// that can unmarshal a JSON description of themselves.
// The input can be assumed to be a valid encoding of
// a JSON value. UnmarshalJSON must copy the JSON data
// if it wishes to retain the data after returning.
//
// By convention, to approximate the behavior of Unmarshal itself,
// Unmarshalers implement UnmarshalJSON([]byte("null")) as a no-op.
type Unmarshaler interface {
    UnmarshalJSON([]byte) error
}

Any type that implements these two interfaces can call its own logic when it is (de)serialized, similar to a hook function. That way, you only need to implement these two interfaces for all proto types: when JSON tries to (de)serialize itself, it does so instead.

func (msg *Person) MarshalJSON() ([]byte, error) {
    var buf bytes.Buffer
    err := (&jsonpb.Marshaler{
        EnumsAsInts:  false,
        EmitDefaults: false,
        OrigName:     false,
    }).Marshal(&buf, msg)
    return buf.Bytes(), err
}

func (msg *Person) UnmarshalJSON(b []byte) error {
    return (&jsonpb.Unmarshaler{
        AllowUnknownFields: true,
    }).Unmarshal(bytes.NewReader(b), msg)
}

After some searching, I finally found a protoc plugin, protoc-gen-go-json, which can generate proto types and implement json for all types. Marshaler and json. Unmarshaler。 This way there is no need to compromise for serialization compatibility and there is nothing intrusive about the code.

publish

Since it is an independently maintained repository, it needs to be introduced into the project in the form of a Go module. Thanks to the design of the Go module, releases can be seamlessly integrated with GitHub, which is very efficient.

Beta version

GO MOD supports directly pulling the code of the corresponding branch as a dependency, no need to manually release the alpha version, you only need to execute go get -u github.com/iftechio/rpc/go@{branch} in the code execution directory of the caller to directly download the latest version of the corresponding development branch.
Official version

When the changes are merged into the main branch, you can release a stable version (you can also use the git tag locally) through Github Release, and you can pull to the corresponding repository snapshot by the specific version number: go get github.com/iftechio/rpc/go@{version}

Since go get is essentially downloading code, and our code is hosted on GitHub, when building code on Alibaba Cloud in China, pull dependencies may fail due to network reasons (private mods cannot be pulled through goproxy). So we retrofitted goproxy and deployed a goproxy in the cluster:

For public repositories, they are pulled via goproxy.cn.
For private repositories, you can pull them directly from GitHub through a proxy, and goproxy will also handle the authentication of GitHub's private repositories.

We just need to execute the following code to download the dependencies via the internal goproxy:

GOPROXY="http://goproxy.infra:8081" \
GONOSUMDB="github.com/iftechio" \
go mod download

Context

Context provides a means of transmitting deadlines, caller cancellations, and other request-scoped values across API boundaries and between processes.

Context is a very special existence in Go, which can connect the entire service like a bridge, so that data and signals can be passed between the upstream and downstream of the service link. In our project, context also has a number of applications:

Cancel Signal

Every HTTP request will carry a context, once the request times out or the client actively closes the connection, the outermost layer will pass a cancel signal to the entire link through the context, and all downstream calls will immediately end the run. If the entire link follows this specification, once the upstream closes the request, all services cancel the current operation, which can reduce a lot of unnecessary consumption.

When developing, you need to pay attention to:

When most tasks are canceled, a context. ErrCancelled error to enable the caller to sense the exception and exit. But the RPC circuit breaker also catches this error and logs it as a failure. In extreme cases, if the client repeatedly initiates requests and cancels them immediately, the circuit breaker of the service can be opened one after another, causing the service to be unstable. The solution is to modify the circuit breaker to still throw a specific error, but not record it as a failure.
In the distributed scenario, the vast majority of data writes cannot be used for transactions, so you need to consider whether the eventual consistency can be guaranteed if an operation is canceled in the middle of the operation. For operations with high consistency requirements, you need to actively block the cancel signal before execution:

// 返回一个仅仅实现了 Value 接口的 context
// 只保留 context 内的数据，但忽略 cancel 信号

func DetachedContext(ctx context.Context) context.Context {
	return &detachedContext{Context: context.Background(), orig: ctx}
}

type detachedContext struct {
	context.Context
	orig context.Context
}

func (c *detachedContext) Value(key interface{}) interface{} {
	return c.orig.Value(key)
}

func storeUserInfo(ctx context.Context, info interface{}) {
  ctx = DetachedContext(ctx)
  saveToDB(ctx, info)
  updateCahce(ctx, info)
}

Contextual transparent transmission

When each request enters, the http request context is carried with various current request information, such as traceId and user information, which can be transparently transmitted to the entire business link along with the context, and the monitoring data collected during the period will be associated with these data to facilitate the aggregation of monitoring data.

Context.Value should inform, not control.

The most important thing to note about using context to pass data is that context data is only used for monitoring, not for business logic. Explicit is better than implicit, because the context doesn't directly expose any internal data, using the context to pass business data makes the program very inelegant and difficult to test. In other words, any function that passes emptyCtx shouldn't affect the correctness.

Error collection

Errors are just values.

Go's error is a normal value (from the outside, it's a string), which makes it a bit of a problem to collect errors: we need to know not only the content of that line of error, but also the context of the error.

Go1.13 introduces the concept of error wrap, through the design of Wrap/Unwrap, you can turn an error into a one-way linked list structure, each node can store custom context information, and you can use an error as a linked list header to read all the error nodes in the back.

The stacktrace of the error is one of the most important pieces of information for a single error. Go via runtime. Callers implement stacktrace collection:

Callers fills the slice pc with the return program counters of function invocations on the calling goroutine's stack.

As you can see, callers can only collect the call stack within a single goroutine, and if you want to collect a complete error trace, you need to include the stacktrace inside the error when passing errors across goroutines. In this case, you can use the third-party library pkg/errors errors. WithStack or errors. Wrap, they create a new error node and store it in the call stack at that time:

// WithStack annotates err with a stack trace at the point WithStack was called.
// If err is nil, WithStack returns nil.
func WithStack(err error) error {
    if err == nil {
        return nil
    }
    return &withStack{
        err,
        callers(),
    }
}

func main() {
  ch := make(chan error)
  go func() {
    err := doSomething()
      ch <- errors.withStack(err)    
  }()
  err := <-ch
  fmt.Printf("%w", err)
}

For the final error collection (often on the root web middleware), you can use Sentry directly:

sentry.CaptureException(errors.WithStack(err)) // 最终上传的时候也不忘收集 stacktrace

Sentry will be based on errors. Unwrap interface to remove the error for each layer. Sentry automatically exports the error stack for each layer of error. Since stacktrace is not an official standard, Sentry has actively adapted to several mainstream stacktrace solutions, including pkg/errors.

This allows you to view the complete error message through the Sentry backend. As shown in the figure below, each large section is a layer of error, and each section contains the context information in the error.

Reference Links

TJ talks about the productivity benefits of Go over Node
https://qr.ae/pNdNhU
Standard Go Project Layout
https://github.com/golang-standards/project-layout
The Tweleve-Factor App
https://12factor.net/
Go Wiki - Module: Releaseing Modules (V2 or Higher)
https://github.com/golang/go/wiki/Modules#releasing-modules-v2-or-higher
How to correctly use context. Context in Go 1.7
https://medium.com/@cep21/how-to-correctly-use-context-context-in-go-1-7-8f2c0fafdf39
Don’t just check errors, handle them gracefully
https://dave.cheney.net/2016/04/27/dont-just-check-errors-handle-them-gracefully

作者:sorcererxw

Source-WeChat public account: Instant technical team

Source: https://mp.weixin.qq.com/s/cepoYJR5Xeloan31-D1iQg

Golang's practice on the immediate backend