This is awesome! We’ve been having multiple issues with data loads recently. Especially one file we get monthly, which is not very wide (few columns, with small data types), but has over 27 millions rows. Even though our server is now huge, and I split the file into 7M row files, one of the “chunks” still almost brought down our database this month. About an hour into the load (the first two chunks, which loaded with no issues, took about 2.5 hours), all inserts started timing out. Looking at the monitoring graphs on RightScale, the cpu’s were pegged all waiting for disk response. Selects were still being returned, but all inserts had stopped. Weirdly, while new inserts were timing out, two inserts and one update were just hanging. Unfortunately, killing the load didn’t have any immediate affect as it needed to rollback the hour of changes it had made (don’t ask me why MySQL does a “load data infile” within a transaction; seems much smarter if it weren’t, but anyway …). Luckily, killing the three “hanging” processes did allow other inserts to start up again. And, after about 30 minutes, the data load process finally went away.
So, I went a-googling and found this article. We tried it on one file made up of the two remaining “chunks”, so around 13M rows (1.3GB). We set the fifo to 500K, and a delay of 30 seconds between each. Not only was the load on the db amazingly lower, but (most likely since memory wasn’t filling forcing the system to start swapping) the load was amazingly fast, loading everything in just over an hour. Yes, that’s right, from 2.5 hours for 7M rows to (I’ll round up and say) 1.5 hours for 13M rows.
Needless to say, I’m geeked-out about this.
Now, we just need to figure how to turn this into a repeatable monthly process.
Comments are closed.