Database Proxy Developer's Guide: When to use and how to create one

译自 The Developer’s Guide to Database Proxies: When to Use Them and How to Create One，作者 Alex Pliutau。

Imagine a complex distributed system that is highly dependent on data, where each microservice or team is individually connected to a database (which can be a shared database or a specific/isolated database). Such a complex platform requires centralized monitoring, query validation, alerting, custom sharding, better security, and more. While you can get a lot of these features from a database server, implementing a database proxy may be a better approach if you're ready to invest.

The main advantage of using a database proxy is that it isolates the database topology from the application layer, so developers don't need to know the clusters, nodes, and internals of the data layer (to some extent, of course).

Database proxy use cases

Let's dive into the various ways database agents can empower your development team, enhance security, and optimize database performance.

Intercept SQL queries from your application and dynamically route them to the correct database/table (e.g. custom sharding). Figma is doing exactly that using their in-house Postgres proxy.
Parse/analyze/validate SQL queries from developers and enrich responses with additional information. This may help tell the application which tables will be deprecated.
Scalability and architectural changes do not affect the application. Platform/database teams can independently change the schema without having to rewrite hundreds of microservices. Ability to transparently add or remove nodes from a DB cluster without reconfiguring or restarting applications.
Enforce security policies and perform authentication and authorization checks to ensure that only authorized clients can access the database. It is also possible to prohibit direct access to the database.
Improve the performance of database communication by centralizing the management of connection pools, leveraging caching technology, and more.
Centralized observability. Get notified when an application uses a deprecated table, and so on.

When to use a database proxy

Not all systems require database agents, especially in the early stages. Here are general guidelines on when you might need it:

You have multiple development teams divided by different disciplines: e.g. multiple backend teams, data engineering teams.
You have a platform/database team to own it. Although other teams can have it too.
Your system is distributed, and you maintain many microservices and many databases.
Your system has a lot of data.
You need better security and observability.

The cost of using a database proxy

There is a real cost associated with using a database proxy:

Database proxies are a new element in infrastructure that has complexity in its own right.
It can be a single point of failure, so it must be very stable and battle-tested.
Additional network latency.

Database proxy type

You can deploy a database agent in several ways:

Custom proxy service (below I'll provide a simple Go example)
Managed cloud solutions, such as Amazon RDS Proxy
Sidecars，例如 Cyral
Commercial and open-source products, such as ProxySQL, or dbpack

Use Go to write a custom database proxy service

Now, we'll implement our own MySQL agent using Go. Keep in mind that this is just an experiment to explain ideas.

Our agent will solve a very simple use case: intercepting SQL queries and rewriting table names when a pattern is matched.

-- Application-generated query
SELECT * FROM orders_v1;

-- Rewritten query
SELECT * FROM orders_v2;

The implementation is divided into two parts:

A basic agent that routes queries from the client to the MySQL server.
A SQL parser, which has some logic to manipulate a query before sending it.

You can view the full source code in this Github repository.

TCP proxy from the client to the MySQL server

Our TCP proxy is implemented in a very simple way that is definitely not suitable for a production environment, but is good enough to demonstrate how TCP transport works:

Create a proxy TCP server
Accept the connection
Create a TCP connection to MySQL
Use a pipeline to proxy the byte stream from the client to the MySQL server and vice versa

main.go

package main

import (
  "fmt"
  "io"
  "log"
  "net"
  "os"
)

func main() {
  // proxy listens on port 3307
  proxy, err := net.Listen("tcp", ":3307")
  if err != nil {
    log.Fatalf("failed to start proxy: %s", err.Error())
  }

  for {
    conn, err := proxy.Accept()

    log.Printf("new connection: %s", conn.RemoteAddr())
    if err != nil {
      log.Fatalf("failed to accept connection: %s", err.Error())
    }

    go transport(conn)
  }
}

func transport(conn net.Conn) {
  defer conn.Close()

  mysqlAddr := fmt.Sprintf("%s:%s", os.Getenv("MYSQL_HOST"), os.Getenv("MYSQL_PORT"))
  mysqlConn, err := net.Dial("tcp", mysqlAddr)
  if err != nil {
    log.Printf("failed to connect to mysql: %s", err.Error())
    return
  }

  readChan := make(chan int64)
  writeChan := make(chan int64)
  var readBytes, writeBytes int64

  // from proxy to mysql
  go pipe(mysqlConn, conn, true)
  // from mysql to proxy
  go pipe(conn, mysqlConn, false)

  readBytes = <-readChan
  writeBytes = <-writeChan

  log.Printf("connection closed. read bytes: %d, write bytes: %d", readBytes, writeBytes)
}

func pipe(dst, src net.Conn, send bool) {
  if send {
    intercept(src, dst)
  }

  _, err := io.Copy(dst, src)
  if err != nil {
    log.Printf("connection error: %s", err.Error())
  }
}

func intercept(src, dst net.Conn) {
  buffer := make([]byte, 4096)

  for {
    n, _ := src.Read(buffer)
    dst.Write(buffer[0:n])
  }
}

Function description:

transport - Handles TCP connections and transmits bytes in both directions.
pipe - passes the bytes, and if it's a proxy → mysql, it also calls intercept to process the query.
intercept - we'll implement it later to parse the query.

Dockerfile

FROM golang:1.22 as builder
WORKDIR /
COPY . .
RUN CGO_ENABLED=0 GOOS=linux GOARCH=amd64 go build -o proxy main.go
FROM alpine:latest
COPY --from=builder /proxy .
EXPOSE 3307
CMD ["./proxy"]

docker-compose.yaml

services:
  proxy:
    restart: always
    build:
      context: .
    ports:
      - 3307:3307
    environment:
      - MYSQL_HOST=mysql
      - MYSQL_PORT=3306
    links:
      - mysql

  mysql:
    restart: always
    image: mysql:5.7
    platform: linux/amd64
    ports:
      - 3306:3306
    environment:
      - MYSQL_ROOT_PASSWORD=root
    command: --init-file /data/application/init.sql
    volumes:
      - ./init.sql:/data/application/init.sql

init.sql

CREATE DATABASE IF NOT EXISTS packagemain;

CREATE TABLE IF NOT EXISTS packagemain.orders_v2 (
    id INT AUTO_INCREMENT PRIMARY KEY,
    name VARCHAR(255) NOT NULL
) ENGINE=InnoDB;

INSERT INTO packagemain.orders_v2 (name) VALUES ('order1');

SQL parsing

Our intercept function already fetches byte packets. It's helpful to understand the structure of a MySQL package. I won't go into the details, but you can read about it here.

In our intercept function, we do the following:

Look for the COM_QUERY client command, which has a numeric code of 3.
Get the original query.
Do a very basic table renaming.

There's a great package sqlparser from YouTube's Vitess project that we can use to parse SQL queries. However, to simplify the presentation, we'll use string matching and substitution.

main.go

const COM_QUERY = byte(0x03)

func intercept(src, dst net.Conn) {
  buffer := make([]byte, 4096)

  for {
    n, _ := src.Read(buffer)
    if n > 5 {
      switch buffer[4] {
      case COM_QUERY:
        clientQuery := string(buffer[5:n])
        newQuery := rewriteQuery(clientQuery)
        fmt.Printf("client query: %s\n", clientQuery)
        fmt.Printf("server query: %s\n", newQuery)

        writeModifiedPacket(dst, buffer[:5], newQuery)
        continue
      }

    }
    dst.Write(buffer[0:n])
  }
}

func rewriteQuery(query string) string {
  return strings.NewReplacer("from orders_v1", "from orders_v2").Replace(strings.ToLower(query))
}

func writeModifiedPacket(dst net.Conn, header []byte, query string) {
  newBuffer := make([]byte, 5+len(query))
  copy(newBuffer, header)
  copy(newBuffer[5:], []byte(query))
  dst.Write(newBuffer)
}

Run the agent and connect to it

Here we connect to the agent running on port 3307 instead of the MySQL server itself (port 3306). As you can see, we can use a regular MySQL client, which simplifies the use of the proxy.

This means that the orders_v1 table is redirected to the orders_v2. Agent Logs:

client query: select * from orders_v1;
server query: select * from orders_v2;

conclusion

In summary, database proxies provide a powerful layer of abstraction between the application and the underlying database. It simplifies development by isolating database complexity, enables database teams to make schema changes independently, and enhances security with centralized access control. While there are additional costs such as infrastructure overhead and potential latency, database proxies can be a worthwhile investment for complex distributed systems with multiple teams and data-intensive needs.

Database Proxy Developer's Guide: When to use and how to create one