Recently, I read an article about relational databases and why they are bad. I have my own thoughts about it and I want to share my experience with you.
A simple alter table, right?
Some time ago, in one of the projects I was working on, we had a simple table with 5 columns. The columns were 5 ints. But, the time goes on and in this small table, we had millions of records. Only this table took gigabytes on a hard drive. One of the columns were not used anymore and most of the time it contained a zero (default value). To save space on the hard drive, we decided to remove the column. It sounds like a simple task, isn’t it? But it wasn’t…
As the database engine, we used Percona, a MySQL opensource port with few extra tools (I wrote a post about indexes in MySQL and Percona). The situation becomes more complicated when we read how altering tables work in MySQL. Direct MySQL ALTER table locks the table for the duration of the change. It means, you can still read data from the table but any INSERT, UPDATE or DELETE command will have to wait until the process ends. In our case, it could take even 2-3 months! To clarify – our production would be down up to 3 months only because this one ALTER command.
Fortunately, Percona has his toolkit and we used it as the workaround for the problem. The tool does not lock any table but creates a copy of the table with the new schema in the background and makes every query twice – in both new and old table. This solution has one disadvantage. If your table takes 100 gigabytes on the hard drive, you have to have at least an extra 100 gigabytes of free space on the hard drive.
I know it’s an antipattern to have so huge tables but we had one 🙂
When the integer ends
After some time we had another problem. One of the columns (with the record ID) was the primary key and its type was int(4) (autoincrement). We had so many records that the integer almost riched his maximum value which was: 4294967295. The solution we found was changing its type to bigint int(8) (max value: 9223372036854775807). Do you remember how long it took to alter the table? 🙂
A very similar problem had youtube when Gangnam Style overflowed their counter.
What makes it even more difficult, we had less than 3 months until it reaches the maximum value. We were lucky that someone noticed that we do not use every value in the primary key. There were some gaps. Why? MySQL database during executing INSERT command with some JOINs creates a temporary table to make his life easier. We used this fact to get more time and at the end of the day – we saved the world.
Why stuff like that happens? In my opinion, because the world does not work this way. Not every record should have an identifier. In the MySQL database when you create a table without any index, MySQL will create for you but you won’t see it. The schema evolves over time and its change should not affect existing records. NoSQL databases like MongoDB or Cassandra do not have such problems. They have their own 🙂
Have you had similar adventures with relational or NoSQL databases? Share with us your story in the comments below.