laitimes

Let's talk about high-availability storage architectures: clusters and partitions

author:Imagine 008

Master/Standby, Master-Slave, and Master-Master architectures are all based on a common premise: the host needs to be able to store all data. However, the storage and processing capacity of the host is limited. Taking historical development as an example, servers in the Intel 386 era can only store a few hundred MB of data, and in the Intel Pentium era, they can store tens of gigabytes, and after entering the Intel Core multi-core era, the storage capacity of the server has increased to several terabytes. While storage capacity is increasing rapidly from a hardware perspective, it's not enough compared to the growth rate of business needs. For example, as of 2013, Facebook had stored 250 billion photos with a total capacity of 250 petabytes (250×1024 terabytes) and an average of 350 million images uploaded per day. Obviously, this huge amount of data cannot be stored and processed by a single server, so it must rely on a cluster architecture of multiple servers to achieve this.

In short, a cluster is a unified system of multiple machines, and "many" here usually refers to at least 3 machines. Compared to two machines in a master/standby or master-slave architecture, a cluster provides greater scalability. Clusters can be divided into two types according to the different roles of machines: centralized clusters and decentralized clusters.

1. Cluster in the data center

Clusters in a dataset are similar to the master/standby and master-slave architectures, and we can also call a cluster in a dataset as 1 master, multiple standbys, or 1 master, multiple slaves. Whether it is 1 master and 1 slave, 1 master and 1 standby, or 1 master and multiple backups, or 1 master and multiple slaves, data can only be written to the host, and read operations can be flexibly changed with reference to the master/standby and master-slave architectures. The following diagram shows an architecture that reads and writes all to the host:

In master/standby and master-slave architectures, data is typically replicated from the host to the standby through a single replication channel. However, in a clustered schema in a dataset, there are multiple replication channels, which can increase the replication burden on the host. In some cases, it is necessary to reduce the replication burden on the host or to reduce the impact of replication operations on normal read and write activities.

In addition, multiple replication channels may cause data inconsistencies between different standbys. In this case, it is necessary to verify and adjust the data consistency between the standby machines.

For how to determine the status of a standby host, the master/standby and master-slave architectures only determine the status of a single standby server. However, in the cluster architecture of the dataset, multiple standby servers need to make judgments on the host status, and the judgment results of different standby servers may be inconsistent, so it is a complex problem to deal with these inconsistent judgments.

When a host fails, how to decide on a new host is also a key issue. In the master-slave architecture, the standby server is usually upgraded to the host. However, in a dataset cluster architecture, since there are multiple standby machines that can be upgraded, it is important to decide which standby is best suited to be the new host and how to coordinate between standbys.

ZooKeeper is a typical open-source centralized clustering solution that solves these problems with the ZAB algorithm, although it is quite complex.

For data-dispersed clusters, this structure involves multiple servers, each of which stores some of the data and backs up the others. The complexity of a data-dispersed cluster lies in how to properly distribute data across different servers. This involves the following design elements:

Balance: The allocation algorithm must ensure that the distribution of data across servers is roughly balanced, so that the amount of data on one server is significantly higher than that on other servers.

Fault tolerance: When a portion of a server fails, the algorithm needs to be able to redistribute the affected data area to other servers.

Scalability: When the cluster capacity needs to be expanded, the algorithm should be able to automatically migrate data to the new server and ensure that the data is still evenly distributed after expansion.

Unlike a dataset cluster, each server in a dataset cluster can handle read and write requests, so there is no dedicated host role for writes, as is the case in dataset clusters. However, in a data-dispersed cluster, there needs to be a specific role responsible for executing the data allocation algorithm, either as a standalone server or as an elected server within the cluster. In the latter case, the server is also often referred to as the host, but its responsibilities are different from those of the host in the cluster in the dataset.

The implementation of Hadoop is that a separate server is responsible for the distribution of data partitions, and this server is called a Namenode. The data partitioning management architecture of Hadoop is as follows:

Unlike Hadoop, an Elasticsearch cluster selects a server to allocate data partitions, called a master node, and its data partition management architecture is:

In a cluster architecture, a data-centralized cluster allows clients to write data only to the master node, while a data-dispersed cluster allows clients to read and write on any server. This key difference determines that the two architectures are suitable for different use cases. For example, in ZooKeeper clusters, it is recommended to use about 5 servers, and the data volume of each server is manageable. Conversely, data-dispersed clusters are more suitable for handling large amounts of business data and large-scale server farms, such as Hadoop and HBase clusters, which can contain hundreds or even thousands of servers due to their superior scalability.

Data partitioning

When considering a storage HA architecture, we typically focus on how to keep the system running in the event of a hardware failure. However, for major disasters or accidents that can cause all hardware to fail at the same time, such as the flooding in New Orleans, a widespread power outage in the United States, and the major earthquake in Los Angeles, a high-availability architecture based on hardware failure alone may not be sufficient to deal with it. In this case, you need to design a highly available architecture that can resist geo-level failures, which is where the data partitioning architecture comes in.

A data partitioning architecture avoids the significant impact of geographically level failures by distributing data across geographies according to specific rules. This architecture ensures that even if a major disaster occurs in a region, only some of the data will be affected, not all of it. Once a region fails, backup data in other regions can quickly restore business operations in the affected region.

Designing an effective data partitioning architecture requires a combination of several aspects:

1. The amount of dataThe size of the data determines the partition complexity.

For example, if each MySQL server has a storage capacity of 500 GB, then 2 TB of data requires at least 4 servers. But for 200 TB of data, simply increasing to 800 MySQL servers would greatly increase the complexity of management. For example, there may be server failures every week, and finding the one or two of the 800 servers that fail is not straightforward, and the operational complexity can increase significantly. Geographically, if the data is concentrated in one city, the risk of a major disaster is extremely high.

2. Zoning rules

Zoning can be at the intercontinental, national, or city level, depending on business needs and cost considerations. Intercontinental partitions are suitable for users serving different continents, and are often used as data backups rather than real-time services due to high network latency. Country partitions are suitable for countries with different language and legal needs, and are often used primarily for data backup. Urban zoning is suitable for providing low-latency services in the same country or region, and is suitable for requirements such as remote multiplexing.

3. Replication rules

Even with a data partition architecture, each partition still needs to process a large amount of data. Data corruption or loss of a single partition is still unacceptable. Therefore, even in a partitioned schema, data replication policies must be implemented to ensure the security and high availability of data.

There are three common partition replication rules: centralized, interstandby, and independent.

Centralized backup

A centralized backup system has a main backup center to which all partitions transfer their data for backup. The advantages of this architecture include the simplicity of the design, as the partitions do not have a direct connection to each other and operate independently without interfering with each other. In addition, if you need to add a new partition, such as the Wuhan partition, you only need to back up its data to the existing Xi'an backup center, and it will not affect other partitions. However, the disadvantage of this approach is that it is relatively costly, as a separate backup center needs to be established and maintained.

Mutual backup

Backup-by-backup requires each partition to back up data from another partition. This design is complex because each partition not only has to handle its own business data, but also is responsible for backup, and the partitions have mutual influences and dependencies. Scaling this system is difficult, for example, the introduction of the Wuhan partition may require reconfiguring the backup destination of the Guangzhou partition to Wuhan, and at the same time, it is necessary to process the original backup data in Beijing and Guangzhou, which will bring challenges in both data migration and historical data retention. But this method is less costly because it directly leverages existing facilities.

Stand-alone backup

In standalone backup, each partition has its own backup center, and the backup center is not co-located with the original data center. For example, the backup of the Beijing partition is located in Tianjin, the backup of Shanghai is located in Hangzhou, and the backup of Guangzhou is located in Shantou, the main purpose of which is to prevent disasters in the same city or geographical location from affecting both the primary data center and the backup center. The advantage of this architecture is that the design is simple, the partitions do not interfere with each other, and the expansion is relatively simple, and the new partition only needs to establish its own backup center. However, the disadvantage is that the cost is very high, each partition needs to build and maintain a backup center separately, and the location rental and facility costs are the main financial burden, making the cost of standalone backup much higher than centralized backup.

For more information, please click Full-Scenario Live Streaming Solution - Aerospace Cloud Network Solution

Read on