An application down due to not being able to write into a table anymore due to a maximum allowed auto-increment value may be one of the worst nightmares of a DBA. Typical errors related to this problem in MySQL will look like this:

or

While the solution could be easy and fairly quick for small tables, it may be a really daunting one for big ones. A common table reaching two billion rows may have sizes ranging from few to hundreds of gigabytes, depending on the number and size of columns.

ALTER TABLE problem

Given your big table reached maximum auto-increment for int (of default signed type), you’d think the immediate fix would be to change the type to unsigned and thus extend the available value range twice without even changing the column space requirements. So, for a simple example table:

We try to quickly change it using the online method:

That’s right, a seemingly simple operation, which we could expect to be a small metadata change only, is not allowed in a non-blocking manner in MySQL, and this limitation still applies in version 8.4.2. It is handled as a general column data type change hence the operation will rebuild the whole table and will not allow any writes before it’s done:

https://dev.mysql.com/doc/refman/8.4/en/innodb-online-ddl-operations.html#online-ddl-column-operations

When you decide to run the ALTER anyway, all writes will wait with a status similar to this:

An improvement request has been made by the community a long time ago, but it’s yet to be addressed: https://bugs.mysql.com/bug.php?id=86695
And similar work is planned for MariaDB: https://jira.mariadb.org/browse/MDEV-16291

This certainly can be a huge problem in a situation where the ALTER command is going to take many hours.

Production down problem

So, if your application depends on the ability to insert data, exhausting auto-increment values virtually means the system is down. Given the lack of a quick and online ALTER feature to extend the data type, there is no quick way out!

Will pt-online-schema-change or gh-ost help to get the table back into production quickly? These tools could be helpful only if the application can function without INSERTs but would still benefit from the ability to run UPDATE and DELETE queries or when the affected table can be taken offline for a while. This is because using one of these tools would give better control over the overall performance impact as both can pause in case of high load or replication lag.

The two most popular approaches during this hopeless situation are to declare downtime and simply run ALTER TABLE to extend the datatype or switch to a new table.

The second solution typically involves creating a new table with an extended auto-increment column and swapping the original one via the RENAME command. This assumes the production can work for some time without the historical data. Example commands to do the swap can look like the ones below. Given the example original table:

We need to give the new one a starting auto increment point higher to avoid data conflict:

Sync the historical data

Now, the next step is to sync the data from the old table. I highly discourage doing this:

It would create a huge transaction, causing all possible performance problems and likely even failing, depending on the table size and environment. Convenient tools let you do it in a more controlled way.

MySQL shell dump/import utils

This tool will allow you to import the table rows very fast by utilizing bulk export and import via a non-SQL format (TSV) and automatically splitting the data into configurable chunks for fast multi-threaded processing. This is one of the fastest logical dump and restore tools out there, so it’s a great candidate for syncing the data from an old table as fast as possible.

An example export and import sessions may look like this:

And import to the new table:

This method is very fast, but there is no option to avoid causing too much load or replication lag in a production database. Therefore, I submitted this feature request: https://bugs.mysql.com/bug.php?id=116886

Pt-archiver

With this tool from the Percona Toolkit suite, you can not only archive old data but also migrate it. In this case, we can copy old table rows to the newly recreated table:

This tool does the job in a single thread, so it is slower, but it can monitor the replication lag and pause importing if needed. To allow this, use –check-slave-lag pointing to the replica coordinates.

Better safe than sorry!

It is always better to avoid the problem rather than fight with a system-down situation under time pressure!

You can quickly check how close your tables are to reaching the maximum auto integer value with one query (credits to the openark.org blog). I verified that it works on MySQL 5.7, 8.0, and 8.4, as well as MariaDB 10.x and 11.x. Unfortunately, such a query can be quite expensive if there are many tables in the database instance, like hundreds of thousands or more.

The query and an example result are as follows:

The ratio value allows us to see how close it is to reaching the maximum value. In the above example, two tables have already reached it.

It is a good idea to set up continuous monitoring of the same and, ideally, alerting. Percona Monitoring and Management (PMM) has already a dashboard showing the auto-increment usage:

PMM Dashboard

Virtually synchronous clustering accelerates the problem!

The default increment step for traditional MySQL replication is 1. But when MySQL Group Replication is used in multi-master mode, it is 7! Check the group_replication_auto_increment_increment variable for details. This is another of the many reasons for using the single-primary mode. Auto-increment values will get depleted insanely quickly in multiple writers mode:

Similarly, in PXC / Galera, the auto-increment value gets automatically adjusted to the number of cluster nodes. As there is no built-in single-primary mode, you have to turn off this automation and set the increment to 1 explicitly whenever using a single writer scenario, like via external proxy, for instance.

Foreign constraints nightmare

When the auto-increment column is referenced in other tables via foreign key constraints, the situation gets way more complicated. MySQL will not allow you to have even slightly different data types between referrals, and an attempt to modify it on one table triggers an error:

The ALTER command does not support changing multiple tables simultaneously either.

To address this problem, you will need to:

  • stop writes
  • drop the related referential constraints
  • modify the auto-increment column as well as all the referencing columns in the related tables to match the data type
  • re-create the FK constraints
  • enable writes

Summary

  • Start monitoring your tables’ auto-inc use if you are not doing it already.
  • Check if your deployments should use unsigned datatype (to double the available range at no cost).
  • Verify if your applications can temporarily operate without the historical data when considering a quick table re-creation+rename+sync solution.
  • If any table is close to reaching the maximum, consider using the pt-online-schema-change tool to extend the datatype as soon as possible.
  • Use single-primary settings for clustering.

 

Subscribe
Notify of
guest

0 Comments
Oldest
Newest Most Voted
Inline Feedbacks
View all comments