天天看點

大量删除mysql優化,MySQL删除語句優化

大量删除mysql優化,MySQL删除語句優化

i have some delete queries to run against some pretty huge table (~100 GB), and i want to optimize them as much as possible:

delete from table1 where column1 < date_sub(now(), interval 100 hour);

column1 is a datetime column, i assume making an index for this column will speed up the deletions. besides that, anything i can do here? will using the date_sub() function slow down the query? should i calculate that value before running the query?

delete from table2 where column2 = x;

column2 is the primary key for table2, so it's already an index according to the mysql documentation. my question is: the index kind is PRIMARY, is that same as the INDEX? do i have to make another index of the kind INDEX for speeding up?

delete from table3 where column3 = y;

table3 has a compound primary key, which is column3 and column4. so i have an primary key index, but since the delete query doesn't use column4, should i make a separate index just for column3? or the combined primary key would do it?

i guess these are pretty basic questions, but i couldn't find a definite answer specific to my situation, so any help would be appreciated!

解決方案

If your DELETE is intended to eliminate a great majority of the rows in that table, one thing that people often do is copy just the rows you want to keep to a duplicate table, and then use DROP TABLE or TRUNCATE to wipe out the original table much more quickly.

An index may help to find the rows you need to delete, but deleting requires updating the index. After deleting a lot of rows, the index may be imbalanced and requires some maintenance with OPTIMIZE TABLE.

The DATE_SUB() function is a constant expression (it does not vary row by row) so the query optimizer should be smart enough to factor it out and perform the calculation once.

You don't need to create an extra index for a primary key. The primary key constraint implicitly creates an index that gives the same benefit as a non-primary key index.

A compound index is probably just as useful as a single-column index, provided your search criteria reference the leftmost column(s) of the index. The "probably" caveat is due to the individual index nodes being larger and so it takes more memory to cache the index, but this is a small enough factor that I wouldn't create a whole other single-column index.