Mycat Advanced

First, Mycat configuration detailed explanation

1. Two configuration modes supported by Mycat

Mycat supports two configurations starting from version 1.5: ZooKeeper and native XML. Mycat starts by default by loading XML locally, and if it needs to be configured to start as ZooKeeper, the loadfromzk parameter in the zk.conf file in the conf directory should be set to true.

2. The server .xml configuration file

The server .xml configuration file contains Mycat's system configuration information

2.1, user tag

<user name="user">
    <property name="password">user</property>
    <property name="schemas">TESTDB</property>
    <property name="readOnly">true</property>
    <property name="defaultSchema">TESTDB</property>
</user>

The user tag is primarily used to define the users and permissions to log in to Mycat. In the above configuration, the user name and password are defined as test, and the user can access the schema only TESTDB.

To define TESTDB in a schema .xml, TESTDB must first be defined in the server .xml, otherwise the user will not be able to access the TESTDB. If the use command is used, Mycat will have the following error message:

Error 1044 (HY000):Access denied for user 'test' to database 'xxx'

You can modify the name attribute of the user tag to specify the user name, modify the value of the password to change the password, and modify the value of readOnly to true or false to restrict the user's read and write permissions. If multiple schemas need to be accessed at the same time, they are separated by commas, for example:

<property name="schemas">TESTDB,db1,db2</property>

Benchmark property

Limit the overall number of connections to the front end by setting the value of the benchmark property, which is not limited if its value is 0 or not set. For example:

<property name="benchmark">1000</property>

2.2, system tag

2.2.1, charset property

<system>
  <property name="charset">utf8</property>
</system>

2.2.2, processors attribute

The processors property specifies the number of threads available to the system, the default value is the number of threads running per core × the machine CPU core, and the processors value affects the processorBufferPool, processorBufferLocalPercent, processorExecutor properties. The number of NIOProcessors is also determined by the processor property, so the processor value can be modified appropriately when tuning.

2.2.3, processorBufferChunk attribute

The processorBufferChunk property specifies that the default value of each allocation of Socket Direct Buffer is 4096 bytes, which will also affect the length of BufferPool, if too many bytes are obtained at one time and the Buffer is not enough, there will often be a warning, and the processorBufferChunk value can be adjusted appropriately.

2.2.4, sequnceHandlerType property

The sequnceHandlerType property specifies the type of Mycat global sequence: 0 for local file mode, 1 for database mode, and 2 for timestamp sequence mode. By default, the local file mode is used, which is mainly used for testing.

2.2.5 Properties related to MySQL connections

Some of the properties involved in initializing a MySQL front-end connection are as follows.

· packetHeaderSize: Specifies the header length in the MySQL protocol, the default value is 4 bytes.

· maxPacketSize: Specifies the maximum size of data that the MySQL protocol can carry, with a default value of 16MB.

· idleTimeout: Specifies the length of timeout for the connection's idle time. If a connection is idle for more than the value of idleTimeout, the connection closes the resource and reclaims it in milliseconds, with a default of 30 minutes.

· charset: Initializes the connection character set, defaulting to utf8.

· txIsolation: Initializes the isolation level of the front-end connection transaction, and the subsequent txIsolation value is the configuration value of the client. The default value is REPEATED_READ, corresponding to the number 3.

· sqlExecuteTimeout: The timeout time of executing an SQL statement, if the execution time of the SQL statement exceeds this value, the connection will be closed directly, in seconds, the default value is 300 seconds.

2.2.6. Heartbeat properties

· processorCheckPeriod: The interval between cleaning up nioProcessor front and back end idle, timeout, and closing connections, in milliseconds, the default is 1 second.

· dataNodeIdleCheckPeriod: The interval between idle and timeout checks for backend connections in milliseconds, with a default of 300 seconds.

· dataNodeHeartbeatPeriod: The interval between heartbeats initiated for all read and write libraries on the backend, in milliseconds, is 10 seconds by default.

2.2.7 Distributed Transaction Switch Properties

HandleDistributedTransactions are distributed transaction switches: 0 is not filtered distributed transactions; 1 is filtered distributed transactions (if only global tables are involved in a distributed transaction, it is not filtered); 2 is not filtered distributed transactions, but the distributed transaction log is logged. Primarily used to control whether cross-library transactions are allowed, the configuration is as follows:

<property name="handleDistributedTransactions">0</property>

2.2.8, useOffHeapForMerge attribute

This property is used to configure whether non-heap memory processing is enabled for cross-shard result sets, with 1 on and 0 off, which is supported by Mycat starting with version 1.6. The configuration is as follows:

<property name="useOffHeapForMerge">0</property>

2.2.9Global table consistency detection

The principle is to perform consistency detection by adding _MYCAT_OP_TIME field to the global table, which is of type BIGINT. This field is automatically added when the create statement is executed through Mycat, and in other cases it needs to be added manually. 1 is on and 0 is off, and Mycat supports this property from version 1.6. The configuration is as follows:

<property name="useGlobleTableCheck">0</property>  <!-- 1为开启全加班一致性检测、0为关闭 -->

The global table consistency detection function is described and follows the steps below.

(1) Add an internal column of type BIGINT to all global tables named_mycat_op_time alter table t add column_mycat_op_time bigint [not null default 0], and it is recommended to index the column (alter table t add index_op_idx (_mycat_op_time)).

(2) When cruding a global table, the internal column can be treated as non-existent, and it is recommended not to perform operations such as update and insert on the internal column, otherwise a warning statement "Do not operate the internal column" will appear in the Log log.

(3) Because there is an extra inner column in the global table, the column name must be included when inserting the global table, which means that the STATEMENT inserted by SQL must be insert into t(id,name)values(xx, xx, xx), and cannot use insert into t values(xx, xx), otherwise an exception with the wrong number of columns will be reported. Such an operation may be inconvenient for development engineers and will improve this problem in the future.

2.2.10, useSqlStat property

Enable SQL real-time statistics, 1 is on and 0 is off. The configuration is as follows:

<property name="useSqlStat">0</property>  <!-- 1为开启实时统计、0为关闭 -->

3. Schema .xml configuration file

As one of the important configuration files in Mycat, schema .xml covers Mycat's logical libraries, tables, sharding rules, shard nodes, and data sources.

3.1. Schema tags

<schema name="TESTDB" checkSQLschema="true" sqlMaxLimit="100" randomDataNode="dn1">


  </schema>

Schema tags are used to define logical libraries in Mycat instances. Mycat can have multiple libraries, each with its own associated configuration. You can use schema tags to divide different libraries, and if you don't configure schema tags, all table configurations will belong to the same default library.

<schema name="TESTDB" checkSQLschema="true" sqlMaxLimit="100" randomDataNode="dn1">
    <!-- auto sharding by id (long) -->
    <!--splitTableNames 启用<table name 属性使用逗号分割配置多个表,即多个表使用这个配置-->
<!--fetchStoreNodeByJdbc 启用ER表使用JDBC方式获取DataNode-->
    <table name="customer" primaryKey="id" dataNode="dn1,dn2" rule="sharding-by-intfile" autoIncrement="true" fetchStoreNodeByJdbc="true">
      <childTable name="customer_addr" primaryKey="id" joinKey="customer_id" parentKey="id"> </childTable>
    </table>
    <!-- <table name="oc_call" primaryKey="ID" dataNode="dn1$0-743" rule="latest-month-calldate"
      /> -->
  </schema>

As shown above, two different logical libraries are configured, the concept of logical libraries is equivalent to the Database concept in the MySQL database, and when we query the tables in the logical library, we need to switch to the logical library to query the tables in them.

3.1.1, dataNode attribute

This property is used to bind the library to a specific Database.

<schema name="TESTDB" checkSQLschema="true" sqlMaxLimit="100" randomDataNode="dn1">
    <!-- auto sharding by id (long) -->
    <!--splitTableNames 启用<table name 属性使用逗号分割配置多个表,即多个表使用这个配置-->
<!--fetchStoreNodeByJdbc 启用ER表使用JDBC方式获取DataNode-->
    <table name="customer" primaryKey="id" dataNode="dn1,dn2" rule="sharding-by-intfile" autoIncrement="true" fetchStoreNodeByJdbc="true">
      <childTable name="customer_addr" primaryKey="id" joinKey="customer_id" parentKey="id"> </childTable>
    </table>
    <!-- <table name="oc_call" primaryKey="ID" dataNode="dn1$0-743" rule="latest-month-calldate"
      /> -->
  </schema>
  <!-- <dataNode name="dn1$0-743" dataHost="localhost1" database="db$0-743"
    /> -->
  <dataNode name="dn1" dataHost="localhost1" database="db1" />
  <dataNode name="dn2" dataHost="localhost1" database="db2" />
  <dataNode name="dn3" dataHost="localhost1" database="db3" />

3.1.2 CheckSQLschema property

When the value is set to true, if we execute the statement ** select*from TESTDB.travelrecord; Mycat will remove the schema character and modify the SQL statement to **select*from travelrecord;** to avoid sending to the back-end database to execute the times" (ERROR 1146(42S02): Table'testdb.travelrecord'doesn't exist)" error.

However, even if the value is set to true, if the schema name in the statement is not the name specified by schema, for example, select*from db1.travelrecord;**, then Mycat does not delete the db1 string. If the library is not defined, an error will be reported, preferably without this field in the SQL statement.

3.1.3, sqlMaxLimit property

When this property is set to a numeric value, myCat will automatically add the corresponding value after the limit statement if the SQL statement is not added to each SQL statement executed. For example, if you set a value of 100, the effect of select*from TESTDB.travelrecord is the same as that of select*from TESTDB.travelrecord limit 100;**

If you do not set this value, Mycat will return all the queried information by default, so it is recommended to set this value in the normal use process to avoid too much data return.

Of course, if the size of the limit is also explicitly specified in the SQL statement, it is not bound by this property. It should be noted that if the schema running the SQL statement is non-split library, then the property will not take effect, and you need to manually add limit after the SQL statement.

3.2. Table tag

table标签定义了Mycat中的逻辑表，所有需要拆分的表都需要在table标签中定义。




<table name="customer" primaryKey="id" dataNode="dn1,dn2" rule="sharding-by-intfile" autoIncrement="true" fetchStoreNodeByJdbc="true">
  <childTable name="customer_addr" primaryKey="id" joinKey="customer_id" parentKey="id"> </childTable>
</table>

3.2.1, name attribute

Define the name of the logical table, just as we would the table name of a create table statement in the database, and the name of the table defined in the same schema tag must be unique.

3.2.2, dataNode attribute

Defines the dataNode to which the logical table belongs, and the value of the property needs to correspond to the value of the name attribute in the dataNode tag. If you need to define too many DNs, you can reduce the configuration using the following method

<table name="oc_call" primaryKey="ID" dataNode="dn1$0-743" rule="latest-month-calldate"> </table>

3.2.3, rule attribute

This property is used to specify the name of the rule to be used by the logical table, the name of the rule is defined in the rule .xml and must correspond to the value of the name attribute in the tableRule tag.

3.2.4 RuleRequired attribute

This property is used to specify whether the table is bound to a sharding rule, and if it is configured as true, but no specific rule is configured, the program will report an error.

3.2.5, primaryKey property

Logical table corresponds to the primary key of the real table, for example: the rule of sharding is to use a non-primary key for sharding, then when using the primary key query, the query statement will be sent to all configured dn; if the primary key of the real table is configured with this property, then Mycat will cache the primary key and the specific dn information, and when the primary key is used again for query, there will be no broadcast query, but the SQL statement will be directly sent to the specific dn. However, despite configuring this property, if the cache is not hit, the SQL statement is still sent to all dn executions to obtain data.

3.2.6, type attribute

This property defines the type of logical table, which currently has only two types: Global Table and Normal Table.

· Global table: The value of type is global, which represents the global table.

· Normal Tables: All tables that do not specify a value of global.

3.2.7, autoIncrement property

MySQL does not return a result using last_insert_id() for non-self-growing primary keys, only 0. Therefore, the value of the primary key can only be returned if the last_insert_id () is used for tables that have a self-growing primary key defined. Mycat currently provides a self-growing primary key feature, but if the table on the corresponding MySQL node does not have a auto_increment defined, the last_insert_id() called in the Mycat layer will not return a result.

Since the insert operation does not bring in the shard key, Mycat will first take down the global sequence corresponding to the table and then assign the shard key.

If you want to use this feature, it is best to cooperate with the global sequence of the database schema. Use autoIncrement="true" to specify that the table uses a self-growing primary key so that Mycat does not throw a "shard key not found" exception. To disable this feature using autoIncrement="false", the value of autoIncrement defaults to false.

3.2.8 NeedAddLimit attribute

Specifies whether the table needs to automatically append a limit after each statement. Due to the use of sharding tables, the amount of data can sometimes be particularly large. If you happen to forget to add a limit to the number, it will take a certain amount of time to query all the data.

So, after adding this property Mycat will automatically add LIMIT 100 to the end of the query statement. If there is a limit limit in the statement, it will not be added repeatedly. This property defaults to true, and you can also set the value to false to disable the default behavior.

3.3, childTable tag

The childTable tag is used to define the child table of the E-R shard, which is associated with the parent table by the attributes on the tag.

<schema name="TESTDB" checkSQLschema="true" sqlMaxLimit="100" randomDataNode="dn1">
  <!-- auto sharding by id (long) -->
  <!--splitTableNames 启用<table name 属性使用逗号分割配置多个表,即多个表使用这个配置-->
<!--fetchStoreNodeByJdbc 启用ER表使用JDBC方式获取DataNode-->
  <table name="customer" primaryKey="id" dataNode="dn1,dn2" rule="sharding-by-intfile" autoIncrement="true" fetchStoreNodeByJdbc="true">
    <childTable name="customer_addr" primaryKey="id" joinKey="customer_id" parentKey="id"> </childTable>
  </table>
  <!-- <table name="oc_call" primaryKey="ID" dataNode="dn1$0-743" rule="latest-month-calldate"
      /> -->
</schema>

3.3.1, name attribute

Defines the name of the child table.

3.3.2, joinKey property

This value is used when inserting a child table to find the data node stored in the parent table.

3.3.3, parentKey property

The parentKey is the column name that establishes an association with the parent table. The program first obtains the value of the joinKey, and then generates a query statement by using the column name specified by the parentKey property, and executes the statement to know which shard the parent table is stored on, thereby determining the location of the child table storage.

3.3.4, primaryKey property

As described by the table tag.

3.3.5 NeedAddLimit attribute

3.4. DataNode tag

The dataNode tag defines the data nodes in Mycat, which is what we usually call data sharding. A dataNode tag is an independent piece of data.

As shown below, a data shard is formed using the db1 physical database on a DB instance named localhost1, which we identify by the name dn1.

<dataNode name="dn1" dataHost="localhost1" database="db1" />

The relevant attributes of the dataNode tag are shown in the table:

3.4.1, name attribute

Define the unique name of the data node, we need to apply this name on the table tag to establish the correspondence between the table and the shard.

3.4.2, dataHost attribute

This property defines the DB instance to which the shard belongs, and the property value is referenced from the name attribute defined on the dataHost tag.

3.4.3, database attribute

This property is used to define the specific library on the DB instance to which the shard belongs, where two dimensions are used to define the shard: instance + concrete library. Because the table structure is the same on each library, it is easy to split the table horizontally.

3.5. DataHost tag

As the last tag in the schema .xml, the tag exists as the underlying tag in the Mycat library, directly defining the specific DB instance, read/write splitting, and heartbeat statements.

<dataHost name="localhost1" maxCon="1000" minCon="10" balance="0"
      writeType="0" dbType="mysql" dbDriver="jdbc" switchType="1"  slaveThreshold="100">
  <heartbeat>select user()</heartbeat>
  <!-- can have multi write hosts -->
  <writeHost host="hostM1" url="jdbc:mysql://localhost:3306" user="root"
         password="root">
  </writeHost>
  <!-- <writeHost host="hostM2" url="localhost:3316" user="root" password="123456"/> -->
</dataHost>

3.5.1, name attribute

Uniquely identifies the dataHost tag for use by the upper tag.

3.5.2, maxCon property

Specifies the maximum number of connections per read/write instance connection pool. Both the inline tags writeHost and readHost use the value of this property to instantiate the maximum number of connections in the connection pool.

3.5.3, minCon property

Specifies the minimum number of connections per read/write instance connection pool, initializing the size of the connection pool.

3.5.4, balance property

There are four types of load balancing types:

· balance="0": Without turning on the read/write splitting mechanism, all read operations are sent to the currently available writeHost.

· balance="1": All readHost and stand by writeHost participate in load balancing of select statements, in short, when it is a dual-master-slave mode (M1→ S1, M2→ S2, and M1 and M2 are mutually standby), under normal circumstances, M2, S1, and S2 are involved in load balancing of select statements.

· balance="2": All read operations are randomly distributed on writeHost, readHost.

· balance="3": All read requests are randomly distributed to the readHost corresponding to writeHost, writeHost does not bear the read pressure, note that balance=3 is only available in Mycat 1.4 and later versions, not in Mycat 1.3.

3.5.5, writeType property

There are currently two types of load balancing types.

· writeType="0": All write operations are sent to the first writeHost configured, writeHost1 hangs up to writeHost2, after re-resuming the writeHost1 node, it will not be cut back, or writeHost2 prevails, the switch is recorded in the configuration file dnindex.properties.

· writeType="1": All writes are randomly sent to the configured writeHost, and this value is no longer recommended after Mycat 1.5.

3.5.6, switchType property

·-1 indicates that it does not switch automatically.

· 1 is the default value, which means automatic switching.

· 2 indicates that the status of the MySQL master-slave synchronization determines whether to switch or not, and the heartbeat statement is as follows:

· 3 indicates a switching mechanism based on MySQL Galary Cluster (suitable for clusters, supported by Mycat 1.4.1 and above), and the heartbeat statement is as follows:

3.5.7, tempReadHostAvailable attribute

If the writeHost property is configured, the following readHost is still available, the default value is 0.

3.6 Heartbeat tags

This tab indicates the statement used for heartbeat checking in the back-end database.

3.7, writeHost tag, readHost tag

Both tags specify the configuration of the Mycat back-end database for instantiating the back-end connection pool. The only difference is that writeHost specifies a write instance, and readHost specifies a read instance, which is composed of these read and write instances to meet the requirements of the system.

Multiple writeHosts and readHosts can be defined within a dataHost. However, if the back-end database specified by writeHost goes down, then all readHost bound by this writeHost will also be unavailable; on the other hand, Mycat will automatically detect the writeHost downtime and switch to the standby writeHost.

3.7.1, host attribute

For identifying different instances, we generally use *M1 for writeHost; for readHost, we generally use *S1.

3.7.2, url attribute

The connection address of the backend instance, which is generally address:port if you use native dbDriver, or special specified if you use JDBC or other dbDriver. When using JDBC, it can be written as jdbc:MySQL://localhost:3306/.

3.7.3, user attribute

The user name of the back-end storage instance.

3.7.4, password attribute

The password for the backend storage instance.

3.7.5, weight attribute

Weighted as a read node in readHost (Mycat only existed after version 1.4).

3.7.6, usingDecrypt attribute

Same as the configuration of usingDecrypt in server .xml.

4. Sequence configuration file

In the case of database sharding and table splitting, the database self-increment primary key can no longer guarantee that it is the only global primary key in the cluster, so Mycat provides global sequence, and provides local configuration, database configuration and other implementations.

4.1. Local file mode

In this way, Mycat configures the sequence to the sequence_conf.properties file in the classpath directory.

Configure the sequence_conf.properties file as follows:

HOTNEWS.HISIDS=
HOTNEWS.MINID=1001
HOTNEWS.MAXID=2000
HOTNEWS.CURID=1000

Where HISIDS represents the used historical segment (generally no special needs can be configured), MINID represents the smallest ID value, MAXID represents the largest ID value, and CURID represents the current ID value.

To enable this approach, you first need to configure the following parameters in the server .xml:

<property name="sequnceHandlerType">0</property>

Note: The sequnceHandlerType is configured to 0, which indicates the use of local file mode.

insert into table1(id,name) values(next value for MYCATSEQ_GLOBAL,'test');

The disadvantage of taking this approach is that after The Mycat is republished, the sequence in the configuration file reverts to its original value; the advantage is that it loads locally and reads faster.

4.2. Database mode

Create a table in the database called sequence, with information such as the current value of the sequence (current_value), the increment int type, and the number of sequences read at a time, assuming K.

The steps to obtain the sequence are as follows.

(1) When using sequence for the first time, read the current_value and increment from the database table into Mycat according to the incoming sequence name, and modify the current_value in the database to the value of current_value + increment.

(2) Mycat will read the current_value + increment as the sequence value used this time, the next time it is used, the sequence automatically adds 1, and when the insertion is used, perform the same operation as step 1.

(3) Mycat is responsible for maintaining this table, and when those sequences are used, only one record needs to be inserted in this table. If the system goes down without using up the sequence of a read, the unused value that has been read will be discarded.

To enable this method, you need to configure the following parameters in the server .xml:

<property name="sequnceHandlerType">1</property>

Note: The sequnceHandlerType needs to be configured to 1, which means that the sequence is generated using the database mode.

The database configuration is as follows:

(1) Create a table to hold the MYCAT_SEQUENCE:

create table MYCAT_SEQUENCE (name varchar(50) not null, current_value int not null , increment int not  null default 100,primary key(name) )engine = InnoDB;

Name, current_value, and increment are the name of sequence, the current value, and the growth step, respectively. Increment can be understood as Mycat reading 100 (default) sequences in batches from the database to use, and then reading from the database after using these values.

Insert a sequence statement:

insert into MYCAT_SEQUENCE (name ,current_value ,increment ) values ('GLOBAL',100000,100)

(2) Create a related function

4.3. Local timestamp mode

4.4. Other means

4.4.1. Use the catlet annotation method

4.4.2, you can also use ZooKeeper to implement

4.5, self-growth primary key

Mycat self-growing primary key and returning the implementation of generating primary key ID are as follows:

(1) MySQL itself uses last_insert_id () for non-autogrowing primary keys will only return 0.

(2) MySQL pairs can define self-growing primary keys to return the value of the primary key with last_insert_id().

Mycat currently provides a self-growing primary key function, but if the data table on the corresponding MySQL node does not have a defined auto_increment, then calling last_insert_id() at the Mycat layer will not return a result.

The correct configuration is as follows:

(1) MySQL defines a self-growing primary key.

(2) Mycat defines the primary key auto-increment.

Add autoIncrement="true" to the table tag:

(3) Mycat corresponds to sequence_db_conf.properties to increase the corresponding settings.

(4) Add the sequence record of table1 table to the mycat_sequence table of the database.

5. zk-create.yaml configuration file

Before introducing the configuration, let's introduce a few concepts. Mycat Zone refers to the Mycat Cluster distributed in different regions (Zone), and the naming of zone is recommended to be identified by geographical location, such as Beijing Unicom Computer Room 1. A Cluster is a Mycat cluster, a Cluster containing one or more Mycat Servers. Generally speaking, a Zone has a set of active and standby Mycat load balancer LB, LB and the Mycat Cluster in the same center to form a one-to-many relationship, that is, an LB can serve all the Load Balancing requests of all Cluster in a center, or it can be multiple LB, each bearing different Mycat Cluster traffic. In addition, it is recommended that each LB have a Backup, Backup usually does not connect to the Mycat Cluster, but after detecting that the LB Master is offline, it immediately starts to connect to the Mycat Cluster and starts working. Their relationship can probably be represented by a set of arrows: Zone→ Mycat Cluster→ Mycat Server→ MySQL, as shown in the figure: