Percona Toolkit — An all around solution to all of your MySQL troubles.

Percona Toolkit — An all around solution to all of your MySQL troubles.

Not Clapped
Clapped
unlike icon
Liked
Shobhit Sahoo SDE-II, Backend Shadowfax

“Behind every successful application is a clean and well-maintained database.”

Why pt-archiver?

“In the realm of database management, Percona Toolkit is the Swiss Army knife for administrators, unlocking unparalleled efficiency.”

Generally when we do clean-up activities like data purging or moving, writing SQL queries is going to be costly and very time consuming. That is where Percona Toolkit comes in handy. It can perform complex tasks for MySQL, MariaDB, MongoDB, and PostgreSQL servers and systems. The toolkit was originally designed to perform advanced database and system administration tasks which one would not do manually. Percona Toolkit is free and open source. You can download packages from the Percona website. It is derived from Maatkit and Aspersa, two of the best-known toolkits for MySQL server administration.

In the world of databases, maintaining an existing database on an organizational level takes a lot of effort. In the effort to minimize our troubles in moving and purging our database we came across the Percona Toolkit. We specifically used one tool i.e. the pt-archiver tool for all database related work that was to be done on the MySQL databases.

The one that has been extensively used for the data purging and the data migration is the pt-archiver tool. This tool is responsible for removing data from the table entirely or migrating data from one table to another.

The setup and installation for the toolkit is pretty easy but the issue comes in running the script for the data migration/purging. We faced troubles there due to lesser knowledge on the variety of options that come in with the tool to help mitigate any and all risks that could cause system wide stress on the database and cause the application to flaw out.

The installation can be done by the following command: -

apt-get install percona-toolkit

Why did we need the pt-archiver?

We needed the pt-archiver tool for the following purposes: -

  1. We needed to purge certain tables currently not being used in our application set up. Simple DELETE MySQL commands won’t suffice. They are very slow and will put the database machine under stress causing the application to break on the end user’s side.
  2. We needed to take backup of major tables who have rows entries in the millions or even billions. Those backups are currently stored in s3 of Amazon Web Services (AWS).
  3. We also needed to create new partitioned versions of old un-partitioned tables and migrate the data from old to new tables.
  4. Apart from the above 3 once the tables are partitioned, we won’t be needing the data in the DBs all the time. We would automate a process to take partitioned tables’ chunked backups and store in s3 purging that data from the table.

These prior 3 use cases we needed were met very satisfyingly using pt-archiver. It made the data manipulation highly effective, easy and efficient without causing system-wide downtime.

How do we use the pt-archiver?

Purging table data using pt-archiver: -

The code that was used for purging data in the table of the database was as follows: -

pt-archiver - source h=<host_name>,D=<db_name>,t=<table_name>,u=<username>,p=<password> - purge - where '<condition for which section of the table to choose from based on the primary key index>' - bulk-delete - limit 100 - commit-each - sleep 2 - dry-run - no-check-charset

The above command deletes rows in the specified table name from the database in bulk in batches of 100 at a time. All the commands like — limit and — sleep and — dry-run are options in the pt-archiver tool for the command to not break the normal functioning of the system.

limit is used for limiting how many rows in the table that satisfy the “where” condition can be selected and the task is executed.

sleep is used for the execution to sleep for 2 seconds after every batch purge.

dry-run is used to see the commands that the script is executing for the data purge.

Make sure the column or field on which the condition is to be put should be a primary key or a part of the primary key. Also make sure the user has all necessary permissions for the script to be run.

Moving of data between a source and destination: -

Here we will move the data from one source table in a database to a destination table in a database. This requires all details of the source database and the destination ones as well. In such events, there will be a good amount of replica lags and since our system has replicas on which multiple business scripts are being run, we cannot have a huge replica lag between the master and slave replicas. The scripts are to be always run on master since all of these are write queries or scripts and not of the replica. The script used for the moving of data is as follows: -

pt-archiver --source h=<source_hostname>,D=<source_database_name>,t=<source_table_name>,u=<source_username>,p=<source_password> --dest h=<destination_hostname>,D=<destination_database_name>,t=<destination_table_name>,u=<destination_username>,p=<destination_password> --where '<where condition>'  --bulk-insert --check-slave-lag h=<slave_hostname>,u=<slave_username>,p=<slave_password> --max-lag 2  --limit 10000 --commit-each --sleep-coef 1.0 --no-check-charset

Here we specify the source and destination details where the table data was to be moved. The original table was unpartitioned and was thus renamed to <table_name_old> and a new table with updated schema and <table_name> was created that was partitioned.

The rider_location table that stored the riders’ locations had around 2–3 billions of rows of data and running a partitioning query on MySQL would have made the systems go down. What we did instead was create a new partitioned table keeping past one month’s partitions and creating new partitions everyday on a daily basis and then moved data from the old table to new. That saved us time and money and also resources.

In the above query you can see the option check-slave-lag that takes care of the replication lag. Due to the master db being hit with a resource extensive query using the pt-archiver results in a huge significant lag in the slave replicas which affected the app used by the rider fleet. Any such instance causes stress on the system performance and the end user is affected. Here in the above script, the replication lag has been controlled to a maximum of 2 seconds. In the whole monitoring process, the max it went was 3s but never crossed that threshold. For the user and password for the slave replica to work, the user should have replication client privileges.

For more content on pt-archiver and other tools in the Percona Toolkit, please refer to: -

https://docs.percona.com/percona-toolkit/pt-archiver.html

https://docs.percona.com/percona-toolkit/index.html

Results: -

“A clean database is a silent guardian, ensuring the integrity of your data.”

  1. From the above use cases, now our database is perfectly memory managed.
  2. We don’t have stale unused and useless data lying around in our tables.
  3. All tables are perfectly backed up and the purging of partitions and creations of new ones are all automated.
  4. There are perfect alerts in place and any shortcoming would be alerted to the technology team as and when they occur.

Relevant articles

Shadowfax stories, delivered fresh

plane arrow
Thank you! You have subscribed successfully!
Oops! Something went wrong while submitting the form.
newslatte1newslatter-bg-2.png

Become our
Client

right arrow black
Take on our
Franchise
right arrow black
Cookie Consent

We use cookies on our website for analytics and marketing purposes which helps us to improve the site functionality and user experience. By continuing to use this site, you agree to our use of cookie policy.

Unable to find what you were looking for?

Get in Touch

Company Information

Shadowfax is India's Fastest Growing End-to-End Logistics provider that provides e-commerce, hyper-local, on-demand delivery solutions for businesses for seamless business operations and dynamic growth.

Office Address

1st floor, Appek Building, 93/A,4th B Cross Rd,
5th Block, Koramangala, Bengaluru, Karnataka - 560095