These days I am migrating one of my sites to a new platform and I needed to download some data from MySQL database. Host provider gave me cPanel access to my server, so I can go there and open phpMyAdmin page and from there to manually pick the tables and data I want to download, but, …

I want to automate this task so that I can perform it again later (in case some more data is collected on my old server).

Installing necessary libraries

To start I needed to install puppeteer and coffeescript. Installing puppeteer involves downloading Chromium browser so I prefer to install it globally once, rather than installing it every time I need to automate some browser based task.

Now, to import puppeteer installed globally, we need to find path to its location. This can be achieved with the following snippet:

Accessing cPanel

Using puppeteer we instantiate browser, first and then login to cPanel using following function:

It takes three string arguments:

  • url of cPanel,
  • user name
  • password

It opens url, waits until page is loaded, then fills input fields on page with given user name and password and finally clicks the login button. These actions are performed with the loginfunc function, evaluated inside web page context using page.evaluate method. And finally we wait for the page to finish loading.

At the end, page contains main cPanel page, with given user being logged in.

Getting phpMyAdmin url and token

On the main cPanel page, we can open phpMyAdmin page using a relative url sql/PhpMyAdmin.html. But after loading this page, our browser is redirected to some dynamically generated url which depends on the current session and randomly generated token. Here is the function to extract those two values:

This function uses some helper functions partition and rpartition which are written following their Python implementation. Both of them take two strings as input: a string to be partitioned and a separator. They return exactly three strings which when concatenated produce the original input string. The first returned string is part of the input string before first occurrence of the separator in the input string. The second returned string is the separator if it was found and the third returned string is part of the input string after the first separator occurrence.

The rpartition has the same meaning except that it uses the last separator occurrence in the input string.

Downloading a table rows

Using found phpMyAdmin url and token values, we can now open the export page. The relative url address is tbl_export.php and it needs some query parameters:

  • db - database name
  • table - table name
  • token - the security token

Using all these values, we construct the url, then direct our page there and finally wait for page to load.

After page has been loaded, we fill some input fields on it, to select export options (gzip compression, all rows from the table and JSON format). If you need some other options, change this function according to your needs.

At the end Go button is clicked to start download.

Tables one by one

If we have a list of tables we want to download, we have to process them one by one. But if we just use some loop to iter all table names and call our export_table function each time, we would change table name parameter before download has been started, so we would end up with just the last table downloaded. We must make sure not to touch page object again before download finishes. We need to process table names one by one, waiting for each download to finish before processing the next one. One way to do it is to take a single table name, remove it from the list and once we finish download start new iteration and keep doing it until the list of table names is empty.

In order to use await command, this code needs to be inside a function. So, we are creating one and immediately execute it. Here is the code to achieve this:

It is also necessary to wait a little longer at the end for the final download to be completed. Here we wait a second (1000ms), but you can adjust this waiting time according to your needs.

I hope this can help someone looking for a way to automate some sequence of actions in browser. For some other tasks that don’t include downloads browser can be launched using headless=true mode, but for some reason, downloading files doesn’t work properly in headless mode.