Content
Introductory
If you compress files and directories frequently, the speed of compression can also be important. Unfortunately, the more common compression programs only work on a single thread, so they can't take advantage of today's modern, multi-core processors. In this description, we will learn about the pigz program, which can be used to run the compression process on multiple threads, thereby taking advantage of all the processor cores.
The pigz compression program
The pigz (parallel implementation of gzip) a parallel development of gzip, which can completely replace it, with so many extras that it launches a thread on each processor core, so it can compress files and directories at a much higher speed.
Installing
The source code of the program can be downloaded at maker, but fortunately also in the Debian repository can be found, so you can install it as usual:
apt-get install pigz
A pigz The command switches also include the gzip switches, so they are used as usual in gzip.
compression
Compress / package a file with the simplest example:
pigz <fájlnév>
This command works just like gzip, by default it removes the source file.
Another example, with some switches:
pigz -kv11f <fájl>
The switches here are:
- k: keep - Keeps the source file (as in gzip)
- v: verbose mode. It works the same way.
- 11: Maximum compression. With gzip, you can only adjust the compression rate to 1-9, and there is also a separate 11 level.
- f: force mode: Overwrites the output file if it already exists. You will be asked without the switch.
Compression already runs on multiple threads by default. Unless otherwise specified, compression is started on the number of threads corresponding to the number of detected processor cores. But if you want to run the package on fewer threads, you can also give it this:
pigz -p <x> <fájlnév>
Where the value of x indicates the amount of processes you want to start.
And another example of recursive wrapping of a subdirectory:
pigz -r <alkönyvtár>
Unpacking
Unpacking is the same as unpacking gzip:
pigz -d <archív fájl>
Reviews
In this section, we take a couple of timings compared to gzip.
Create a trial file
We create a test file and then perform the speed tests on them with the gzip and pigz programs.
Upload a file with 500 MB of random data:
head -c 500M </dev/urandom >tesztfajl
This creates a 524 byte file lined up with random bytes. This is a size large enough to measure the operating speeds of compression programs on it.
Here, we only measure the speeds of compression, since a randomly uploaded file cannot really be compressed, and both programs use the gzip compression algorithm, so the compression ratio is irrelevant here.
timing
Now we compress the file first with the gzip command, keeping the original file and measuring the time, and then compressing it with the pigz command in the same way. So the commands are:
time gzip -kf tesztfajl
time pigz -kf tesztfajl
And the results speak for themselves:
gzip: 6,748 seconds and pigz 4,070 seconds.
I have an I7-3770 CPU in my machine that has a physical 4 core. This HTwith means 8 cores, but here the virtual cores don't seem to throw much in, so the pigz finished in about a quarter of the time. This is roughly as many times shorter than the number of physical cores in my processor.
Let's take another look, now using the 9 compression level, to work better on our programs. So the commands are:
time gzip -kf9 tesztfajl
time pigz -kf9 tesztfajl
The difference is a bit bigger here:
gzip: 18,226 sec, pigz: 3,887 sec
So here in proportion, the pigz performed even better.
CPU usage
The point is not how all of this is working in the background on the processor. This is best viewed with the htop program.
So first come the gzip with the 9 compression level:
time gzip -kf9 tesztfajl
And during compression, on the other tab, htop:
There is nothing strange here, at the top of the list of processes is the gzip thread as you push the compression by 99,2%. The top section also shows that only 3. seed is actually driven.
Now let's see the same with pigz:
time pigz -kf9 tesztfajl
And here you can see that every seed works. Of course, it is difficult to catch the right moment because the percentages and process states alternate, but here you can see that there is a main thread, the command itself, and another 8 sub-process.
Using Pigz with tar
If there are many subdirectories or files that need to be recursively wrapped into a single file, this is usually done by tar command.
A completely common example of wrapping multiple files or directories:
tar -czf kimenet.tgz <bemeneti fájlok, könyvtárak>
In this example, we set the switch to use the tar gzip for compression. This is the usual way to use tar, but it also only works on one thread. However, it is possible to specify another compression program for tar a -I switch. The compression programs you specify here are only required to be able to use -d switch (based on manual page).
So we can use pigz instead of gzip for tar packages:
tar -I pigz -cf kimenet.tgz <bemeneti fájlok, könyvtárak>
As a result, the tar wrapper runs on multiple threads, which can save you a lot of time with many files and subdirectories.
Conclusion
So we have seen the efficiency of the pigz program, which, if we get used to it everyday, can significantly speed up our data backup work on our server, for example.
- To post registration and login required
- 392 views