laitimes

After Java development changed the MySQL table, all the indexes disappeared!

author:DBAplus Community

I. Preface

Who knows, as a java developer, I originally thought that MySQL was a thing that could write CRUD, and I just had my hands on it.

Freeze-dry the keyboard, and my cat can come to work. - Porridge Master

As a result, there was a problem online, one after another, not only the cat couldn't go to work, but I had to wait for me to work overtime at night to go home, and I couldn't eat freeze-dried.

After Java development changed the MySQL table, all the indexes disappeared!

(You see if I don't hammer you, you'll be done)

1. Table structure

A brief introduction to the table where the problem occurred.

A metadata table, extracting the key parts, the abstracted structure is as follows,

(id, group, code, name,property1, property2, ...)

After Java development changed the MySQL table, all the indexes disappeared!

主键primary key:id

唯一键unique key:group + code,

That is, the code is unique within the group.

In addition, we have a DataWorks offline task, which writes records to the table every day, using the insert ignore into method, and if we encounter duplicate group+code, we will not write it.

The overall logic is relatively clear and straightforward. The amount of data is also relatively small, with hundreds or thousands of pieces of data per group, and the total data volume is less than 10W.

2. Problem troubleshooting and repair process

1. Initial question

One day, a user reported an error in an online product, and quickly found that a new service was connected to DataWorks (assuming the name is bad_group), and the synchronization task unexpectedly imported tens of millions of data to the MySQL table on the same day (only a few thousand of them are actually valid, and the rest are dirty data), resulting in slow online product queries and errors. After locating the problem, the first reaction is to clear all the data of the wrong bad_group first, keep the data of other groups, resume online query, and then slowly find a way to re-import the correct data.

By the way, the following SQL executions are performed using the DMS platform.

2. Initial ideas

Clean up the error data v1

DELETE FROM MY_TABLE 
WHERE group = 'bad_group';           

Is it feasible to directly execute the above SQL statement to change ordinary data? If it doesn't work, experienced students know that at tens of millions of levels, cleaning up a large amount of data will exceed the binlog limit, resulting in SQL cannot be executed.

Therefore, we directly use another solution, lock-free data change, SQL is still the same as above, and the description of lock-free change can be found in the introduction of the platform:

After Java development changed the MySQL table, all the indexes disappeared!

I thought that the problem could be solved with lock-free changes, but during the execution process, it was found that due to the large amount of data, the efficiency of executing SQL in batches for lock-free changes was very low, and it was estimated that it would take more than 2 hours to empty these tens of millions of dirty data.

3. Take a different path

So there was only one way to go. Reconsider this problem, we only need to keep less than 100,000 non-bad_group data out of 10 million, so in addition to deleting bad_group data, it is easier to copy the valid data into a temporary table, then drop the original table, and then recreate the table to copy the data in the temporary table back. Why drop tables are faster than delete data is also an important point to know.

After Java development changed the MySQL table, all the indexes disappeared!

To take a less appropriate example, if the landlord rents the house to someone else, and when it expires, he finds that the house is full of garbage, and the DELETE statement is to clean up the garbage one by one, leaving only the original clean furniture. TRUNCATE is equivalent to a fire that burns everything in the house, and the DROP statement means that the house is not wanted directly.

Here you can choose between the drop and truncate schemes, we use the scheme that the house is not wanted, and directly drop the table:

Clean up erroneous data v2

-- 将正常数据复制到临时表
CREATE TABLE TEMP_TABLE AS SELECT * FROM MY_TABLE WHERE group <> 'bad_group';


-- 删除原表
DROP TABLE MY_TABLE;


-- 将临时表重命名为原表
RENAME TABLE TEMP_TABLE TO MY_TABLE;           

After the execution was successful, count(*) counted a handful of data levels, and found that it had indeed returned to the normal level, so the problem was initially solved. However, if the problem had been solved so easily, it would not have been recorded in the ATA. The SQL above leaves a huge hole, and experienced students may see it at a glance, if you don't see it, continue to the following.

4. The watch is broken

Everything was fine for the day. However, the good times did not last long, and the next day, some students found a problem when they entered the derivative, and poured all the rows id = 0 without specifying the id. I have a black question mark on my face?

Isn't id the default primary key, how can this be, reopen the table structure and look at it, all the indexes are gone!

At this time, my heart was half cold, and I immediately remembered that there must be something wrong with this statement:

-- 将正常数据复制到临时表
CREATE TABLE TEMP_TABLE AS SELECT * FROM MY_TABLE WHERE group <> 'bad_group';           

Quickly asked GPT:

After Java development changed the MySQL table, all the indexes disappeared!
After Java development changed the MySQL table, all the indexes disappeared!

Sure enough, create table as only copies the column information structure and data of the table, but does not copy the table index, primary key, and other information.

In other words, the table has been broken! Looking back at this problem, there were at least two ways to avoid it:

1) Don't use the drop statement. Use the truncate statement to retain the original table structure.

Clean up the error data v3

-- 将正常数据复制到临时表
CREATE TABLE TEMP_TABLE AS SELECT * FROM MY_TABLE WHERE group <> 'bad_group';


-- 清空原表数据,但不删除表
TRUNCATE TABLE MY_TABLE;


-- 将临时表数据插入到原表
INSERT INTO MY_TABLE SELECT * FROM TEMP_TABLE;           

2)使用CREATE TABLE LIKE 语句创建临时表,复制原表结构。

Clean up the error data v4

-- 创建和原表结构一样的临时表
CREATE TABLE TEMP_TABLE LIKE MY_TABLE;


-- 将正常数据复制到临时表
INSERT INTO TEMP_TABLE SELECT * FROM MY_TABLE WHERE group <> 'bad_group';


-- 删除原表
DROP TABLE MY_TABLE;


-- 将临时表重命名为原表
RENAME TABLE TEMP_TABLE TO MY_TABLE;           

5. I think there is still a chance to salvage

The situation is such a situation, we can only see how to rescue!

After Java development changed the MySQL table, all the indexes disappeared!

The missing primary key leads to the insertion of many pieces of data with 0 IDs, but the application does not rely on the self-increment ID of MySQL and does not affect the query results of the online application temporarily. That is to say, the luck in the misfortune, the product side is temporarily unfeeling, and quickly find a way to redeem it.

The table synchronizes data in such a way that if the unique key is conflicting, ignore it, otherwise the import is successful. Due to the missing primary key and unique key, the IDs of the newly imported batch of data are all 0 and duplicated, but in fact, only part of it needs to be retained, and the other part needs to be deduplicated according to the unique key.

There are two things I need to accomplish at this point:

  • While retaining the original data, the primary key, unique key, and query index of the table are rebuilt.
  • Re-import the newly imported data with id=0 according to the rules of the original unique key.

However, we know that when executing a statement that adds a unique key, it will check whether there is any data in the table that does not satisfy the unique key, and if so, the statement will be rejected for execution. Therefore, this batch of interference with duplicate new data cannot directly alter table add unique key.

In a flash of inspiration, we adopt the same temporary table scheme as yesterday, that is, first copy the id=0 data to the temporary table, delete all the id=0 data in the original table, then rebuild the index, and then use the insert ignore into the data to import the id=0 data back. Corresponding SQL:

Rebuild the table

-- 1.复制id=0的数据到临时表,
CREATE TABLE TEMP_TABLE AS SELECT * FROM MY_TABLE WHERE id = 0;


-- 2.删除源表中id=0的记录
DELETE FROM MY_TABLE WHERE id = 0;


-- 3.重建索引
ALTER TABLE MY_TABLE ADD INDEX ...;


-- 4.导回id=0的新数据
INSERT IGNORE INTO MY_TABLE SELECT * FROM TEMP_TABLE;           

If you think about it, it's okay to use CREATE TABLE AS this time, because this temporary table is not important. DELETE has no performance issues due to the small amount of data. Out of prudence, the above four SQL statements are also committed and executed through four tickets, which is convenient for intermediate process observation. Clear thinking, this time should be OK!

When the above 2nd statement is executed, after deleting the id=0 data, the select count(*) is executed to confirm it simply, but I didn't expect this confirmation to really go wrong, and the number of data entries has not changed after delete. IN THIS REGARD, THE MYSQL WE USE ONLINE IS THE PRIMARY DATABASE, AND THE SQL EXECUTED BY THE TICKET IS ALSO EXECUTED IN THE PRIMARY DATABASE, BUT IN ORDER NOT TO AFFECT THE NORMAL ONLINE USE, THE DMS CONSOLE IS QUERIED IN THE STANDBY DATABASE, AND THE PRIMARY/STANDBY DATABASES ARE SYNCHRONIZED IN REAL TIME UNDER NORMAL CIRCUMSTANCES. However, when some time-consuming SQL statements are executed, synchronization delays occur. In order to verify this, you can select count(*) in the primary database, and DMS also provides a toggle option, but the backup database will be selected by default.

After Java development changed the MySQL table, all the indexes disappeared!

This screenshot is the result I got after consulting the DBA, and it is indeed delayed.

After Java development changed the MySQL table, all the indexes disappeared!

Continue to rebuild the index, including the primary key, unique key, and common index key. No problem.

In the last step, import the id=0 data from the temporary table back to the original table, and you can go home to feed the cat, but the ticket keeps executing an error.

[ERROR] Duplicate entry '0' for key 'PRIMARY'【解决方法】:https://help.aliyun.com/document_detail/198139.html
TraceId : 0b8464d617047224212725080d867f           

I don't know why, ideally, after re-importing the data, the id should be self-incrementing from the maximum id at the moment (assuming there are 10000 data in the table, then the newly inserted data should be id=10001), why is it still 0, and it is also repeated? Is it the previous CREATE TABLE AS statement that caused the auto increment to be cleared to 0?

According to this idea, when I remember that when I wrote fake data in the daily environment, if a relatively large ID was specified, then all new data in the future will be generated on the basis of this ID (for example, there are only 10 records in the current table, id=10, insert a data id=100, and the subsequent data will continue to be generated with id=101). After trying it, I still get an error.

I'm getting a little sweaty.

Why didn't it work? I used GPT to query the method of setting the auto increment value of the table:

ALTER TABLE MY_TABLE AUTO_INCREMENT = 10001;           

However, this error is still reported.

Despair.

At this time, it was almost ten o'clock at night, there was no one around, and the sound of the air conditioner blowing hot air unconsciously tended to be quiet, I looked at the opposite building, and the lights were visible on and off. The night in January was a little cold, and I suddenly remembered Li Qingzhao's sentence "Lonely and desolate, miserable and miserable", isn't it depicting this scene?

Finally, I compared the normal table structure of the daily library again, and found that the auto increment that was the id had also disappeared. It turned out to be a pit left by create table as, no wonder the previous reset of auto increment did not take effect. Why didn't I find this at the first time, because according to the answer of the above gpt, the statement can copy the "column structure" normally, only the index, primary key and other information will be lost, and I thought that "AUTO_INCREMENT" is the column information that belongs to the id column, but it doesn't look like it.

After Java development changed the MySQL table, all the indexes disappeared!

To reset the ID to use auto-increment:

MODIFY COLUMN `id` bigint unsigned NOT NULL AUTO_INCREMENT COMMENT '自增ID';           

At this point, the problem is solved.

3. Summary

All problems stem from the unfamiliarity with the create table as statement, which causes the loss of the table primary key, index, and auto_increment caused by the creation of the table.

Unfamiliar SQL statements should not run around.

Later, I am also reflecting on the use of drop and truncate online, which is a bit radical. However, at the time, it was considered that it was an internal application and the query was no longer available. Readers and students are also welcome to think and give feedback, and whether there are suggestions for better handling of such scenarios.

By the way, we have made restrictions on the import of MySQL from ODPS to prevent this kind of thing from happening again.

Author丨Warm porridge

Source丨Official Account: Ali Technology (ID: gh_438062fa21b1)

The DBAPLUS community welcomes contributions from technical personnel at [email protected]

Read on