DATABASE MANAGEMENT IS IMPLEMENTED THROUGH GITOPS ON KUBERNETES

Use the Operator mode to migrate databases.

Translated from GitOps for Databases on Kubernetes by Rotem Tamir, co-founder and CTO of Ariga. Co-creator and maintainer of Atlas, an open-source tool that manages database schemas as code. Co-maintainer of Ent, a Go entity framework supported by the Linux Foundation. Ex-infra team lead at ironSource， ex-data...

As an application evolves, so does its database schema. The practice of automating the deployment of database schema changes evolved into what is known as database migration as modern DevOps principles evolved.

As part of this evolution, hundreds of "migration tools" have been created to help developers manage database migrations. These tools range from object-oriented relational mapping and language-specific tools like Python's Alembic, to language-agnostic tools like Flyway and Liquibase.

Migration on Kubernetes: Current state

When Kubernetes came along and the team started containerizing their applications, their first instinct was to encapsulate traditional migration tools in containers and run them as part of the application deployment process.

As often happens when we try to project an old tool onto a new platform, the result is a collection of defects that need to be addressed. Now let's review and discuss some of these common practices.

Run the migration within the app

The easiest way to run migrations is to call them directly during application startup. This doesn't require the use of any special Kubernetes features. We just need to make sure that the migration tool, migration files, and database credentials are available inside the application container. Then we just need to change the startup logic, first try to run the migration, and if it succeeds, start the application.

This is thought to be anti-pattern for several reasons. First, from a security perspective, it's best to reduce the attack surface of the runtime environment and not include anything that is strictly needed at runtime. With this pattern, the migration tool and the higher database credentials required to run DDL statements are left in the runtime environment for attackers to exploit.

Second, if an application runs multiple replicas for redundancy and availability reasons, having the migration as part of the application startup forces replicas to be loaded sequentially, rather than in parallel. Applying the same database changes from multiple places at the same time is very dangerous, which is why almost all tools acquire (or hold the user accountable) some sort of locking or syncing technique. This means that in practice, a new pod can't start until it's mutually excluded all other pods from starting.

If you only have a few replicas, you may not feel the difference, but consider what would happen if there were hundreds of replicas that needed to compete with each other for startup (with required retries, backoffs, etc.).

Run the migration as an init container

A slight improvement to this technique is the use of init containers. Kubernetes makes it possible to define an "init container", which is a container that runs before the primary container in a PodSpec. With this approach, teams can bring in standalone tools like Liquibase or FlyWay and run them before the app launches.

In addition, the migration itself (SQL files) for schema revisions must also make containers available in some way, either by building custom images or mounting them from some external source.

This approach is better than running a migration in-app because it removes the migration tools and credentials from the runtime environment, but suffers from the same synchronization issues we demonstrated in the in-app migration.

Also, consider what happens if the migration fails. Migrations can fail for a variety of reasons, ranging from invalid SQL to constraint conflicts to unstable network connections. When a migration is coupled to the application runtime, any failure in the migration step results in a large number of pods in a crash loop, which can mean reduced application availability or even downtime.

Run the migration as a Kubernetes job

Kubernetes allows you to execute programs using the Jobs API. Similar to using an init container, teams can use the encapsulated migration tool and somehow mount the migration file to execute the job before the application starts.

The advantage of this approach is that by using jobs, you can ensure that the migration runs as a separate step before the new application pods start rolling updates. Teams often use Helm pre-upgrade hooks or ArgoCD pre-sync hooks to implement this technique.

Combined, the result is that migrations run only once, avoiding the chaotic "race to migration" that init containers showcase, and isolating from the runtime environment, reducing the attack surface of the application as described above.

GitOps principles and migration

"We can encapsulate existing schema management solutions into containers and run them as jobs in Kubernetes. But that's stupid. That's not how we work in Kubernetes. - Viktor Farcic, DevOps Toolkit

Overall, running the migration as a job using ArgoCD or Helm hooks is a possible solution. But if you look through the lens of modern GitOps principles, you can see more problems.

GitOps is a software development and deployment methodology that uses Git as a central repository for code and infrastructure configurations, enabling automated and audited deployments.

In this context, let's consider how the migration techniques we describe map to two commonly accepted GitOps principles:

principle	description
Declarative	Systems managed by GitOps must declaratively represent the desired state.
Ongoing coordination	The software agent continuously observes the actual system state and tries to apply the desired state.

Source: https://opengitops.dev/

Declarative - Almost all migration tools used in the industry today have an imperative versioning approach. The desired state of the database is never described, but is inferred by applying all migration scripts in order. This means that these tools can't handle any unforeseen or manual changes to the target environment in a way that GitOps should be able to handle.

Continuous Orchestration - The way Kubernetes jobs handle failures is very simple: brute force retries. If the migration fails, the job pod will crash and Kubernetes will try to run it again (with a backoff policy). This may work, but in most cases, the migration tool is not designed to handle partial failures, and retries become a futile endeavor.

Operator 模式

If running migrations as jobs is an under-equipped strategy that satisfies GitOps principles, what's the missing piece?

Kubernetes is a great solution for managing stateless resources. However, for many stateful resources, such as databases, reconciling the desired state of a database with its actual state can be a complex task that requires specific domain knowledge. The Kubernetes Operator was introduced to the Kubernetes ecosystem to help users manage complex stateful resources by coding this domain knowledge as a Kubernetes controller.

At a high level, Operators work by introducing new CRDs (Custom Resource Definitions), extending the Kubernetes API to describe new types of resources, and providing controllers—specialized software that runs in clusters that are declaratively responsible for managing those resources through the use of coordination loops.

What if we could use the right Kubernetes Operator to manage the database schema of the application?

Atlas Operator

The Atlas Kubernetes Operator is a Kubernetes controller that uses Atlas to manage your database schemas. The Atlas Kubernetes Operator allows you to define the desired schema and apply it to your database using the Kubernetes API.

The Atlas Operator supports a fully declarative process in which the user defines the desired state of the database, and the Operator coordinates the desired state with the actual state of the database (planning and executing the CREATE, ALTER, and DROP statements).

There is also support for a more classic versioning workflow in which the desired database version is provided to the Operator, who is responsible for coordinating the current and actual state of the database to meet that version.

There are a number of advantages to using the Kubernetes Operator to manage our databases:

It makes schema management a declarative process. - Not only does this satisfy GitOps principles, but it's also simpler for end users - they just need to define what they want and don't have to think much about how to implement it.
It is constantly coordinated. - As we shown, the robustness of the job is limited to a very basic retry mechanism, but an operator with a long-term coordination loop has more means and opportunities to advance the desired state of the application.
It is semantically richer. - Jobs are a very opaque way to manage resources. Their specifications mostly deal with how they run rather than the resource they represent, and the state they expose doesn't contain any meaningful information about this resource. CRDs, on the other hand, can be managed and operated using standard Kubernetes tools, and their state can be used programmatically to build more advanced workflows.

conclusion

In this article, we show some of the existing practices for managing database schemas in Kubernetes applications and discuss their drawbacks. Finally, we demonstrate how to use the Operator pattern to meet GitOps principles and advance database management.

DATABASE MANAGEMENT IS IMPLEMENTED THROUGH GITOPS ON KUBERNETES