How to compress files much faster with pigz

botond published March 2019, 02, Thu - 14:08 time

Content

 

Introductory

If you compress files and directories frequently, the speed of compression can also be important. Unfortunately, the more common compression programs only work on a single thread, so they can't take advantage of today's modern, multi-core processors. In this description, we will learn about the pigz program, which can be used to run the compression process on multiple threads, thereby taking advantage of all the processor cores.

 

 

The pigz compression program

The pigz (parallel implementation of gzip) a parallel development of gzip, which can completely replace it, with so many extras that it launches a thread on each processor core, so it can compress files and directories at a much higher speed.

Installing

The source code of the program can be downloaded at maker, but fortunately also in the Debian repository can be found, so you can install it as usual:

apt-get install pigz

A pigz The command switches also include the gzip switches, so they are used as usual in gzip.

compression

Compress / package a file with the simplest example:

pigz <fájlnév>

This command works just like gzip, by default it removes the source file.

Another example, with some switches:

pigz -kv11f <fájl>

The switches here are:

  • k: keep - Keeps the source file (as in gzip)
  • v: verbose mode. It works the same way.
  • 11: Maximum compression. With gzip, you can only adjust the compression rate to 1-9, and there is also a separate 11 level.
  • f: force mode: Overwrites the output file if it already exists. You will be asked without the switch.

Compression already runs on multiple threads by default. Unless otherwise specified, compression is started on the number of threads corresponding to the number of detected processor cores. But if you want to run the package on fewer threads, you can also give it this:

pigz -p <x> <fájlnév>

Where the value of x indicates the amount of processes you want to start.

And another example of recursive wrapping of a subdirectory:

pigz -r <alkönyvtár>

Unpacking

Unpacking is the same as unpacking gzip:

pigz -d <archív fájl>

 

Reviews

In this section, we take a couple of timings compared to gzip.

Create a trial file

We create a test file and then perform the speed tests on them with the gzip and pigz programs.

Upload a file with 500 MB of random data:

head -c 500M </dev/urandom >tesztfajl

This creates a 524 byte file lined up with random bytes. This is a size large enough to measure the operating speeds of compression programs on it.

Here, we only measure the speeds of compression, since a randomly uploaded file cannot really be compressed, and both programs use the gzip compression algorithm, so the compression ratio is irrelevant here.

timing

Now we compress the file first with the gzip command, keeping the original file and measuring the time, and then compressing it with the pigz command in the same way. So the commands are:

time gzip -kf tesztfajl
time pigz -kf tesztfajl

And the results speak for themselves:

Comparison of Gzip and Pigz compression rates

gzip: 6,748 seconds and pigz 4,070 seconds.

I have an I7-3770 CPU in my machine that has a physical 4 core. This HTwith means 8 cores, but here the virtual cores don't seem to throw much in, so the pigz finished in about a quarter of the time. This is roughly as many times shorter than the number of physical cores in my processor.

Let's take another look, now using the 9 compression level, to work better on our programs. So the commands are:

time gzip -kf9 tesztfajl
time pigz -kf9 tesztfajl

The difference is a bit bigger here:

Comparison of Gzip and Pigz compression rates with 9 compression levels

gzip: 18,226 sec, pigz: 3,887 sec

So here in proportion, the pigz performed even better.

CPU usage

The point is not how all of this is working in the background on the processor. This is best viewed with the htop program.

So first come the gzip with the 9 compression level:

time gzip -kf9 tesztfajl

And during compression, on the other tab, htop:

Run htop - gzip command on 1 thread

There is nothing strange here, at the top of the list of processes is the gzip thread as you push the compression by 99,2%. The top section also shows that only 3. seed is actually driven.

Now let's see the same with pigz:

time pigz -kf9 tesztfajl

Htop - pigz command running on multiple threads

And here you can see that every seed works. Of course, it is difficult to catch the right moment because the percentages and process states alternate, but here you can see that there is a main thread, the command itself, and another 8 sub-process.

 

 

Using Pigz with tar

If there are many subdirectories or files that need to be recursively wrapped into a single file, this is usually done by tar command.

A completely common example of wrapping multiple files or directories:

tar -czf kimenet.tgz <bemeneti fájlok, könyvtárak>

In this example, we set the switch to use the tar gzip for compression. This is the usual way to use tar, but it also only works on one thread. However, it is possible to specify another compression program for tar a -I switch. The compression programs you specify here are only required to be able to use -d switch (based on manual page).

So we can use pigz instead of gzip for tar packages:

tar -I pigz -cf kimenet.tgz <bemeneti fájlok, könyvtárak>

As a result, the tar wrapper runs on multiple threads, which can save you a lot of time with many files and subdirectories.

 

Conclusion

So we have seen the efficiency of the pigz program, which, if we get used to it everyday, can significantly speed up our data backup work on our server, for example.