File compression is a way of making a file (or multiple files) into a single, smaller file, thus using less bandwidth when downloading.
Some files are always compressed, eg JPEG is a method of compressing image data into a .jpg file. General purpose compression programs can be used to combine multiple .jpg files into a single file for convenience, however the overall size is often not reduced greatly, because each .jpg file is already compressed.
Lossless compression allows the exact original data to be reconstructed from the compressed data. Lossless compression is used in many applications; for example it is used in the popular ZIP file format and in the Unix tool gzip. Lossless compression is used when it is important that the original and the decompressed data be identical. Typical examples are executable programs and source code. Some image file formats (eg PNG and GIF) use only lossless compression.
Lossless compression methods vary according to the type of data they are designed to compress (eg text, images or sound). While in principle, any general-purpose lossless compression method can be used on any type of data, many are unable to achieve significant compression on data that is not of the form for which they were designed to compress. Sound data, for instance, cannot be compressed well with conventional text compression methods.
Lossy compression is where compressing data and then decompressing it retrieves data that might be different from the original, but is close enough to be useful in some way. The advantage of lossy methods is that they can produce a much smaller compressed file than any known lossless method, while still meeting the requirements of the application.
Lossy compression is most commonly used to compress multimedia data (sound, images and video), especially in applications such as streaming media and internet telephony. Many lossy compression methods focus on the idiosyncrasies of the human physiology, taking into account, for example, that the human eye can see only certain frequencies of light. Flaws caused by lossy compression that are noticeable to the human eye or ear are known as compression artefacts.
Most lossy compression formats suffer from generation loss: repeatedly compressing and decompressing the file will cause it to progressively lose quality.
General purpose compression
There are a number of general purpose lossless compression methods that are used for compressing and/or combining (archiving) any type of files. The most common of these are:
- The ZIP file format is a popular data compression and archival format. A .zip file contains one or more files that have been compressed or stored.
- gzip, or GNU zip is a free file compression method. A gzip archive can only contain a single file; to contain multiple files a tar archive would be required. The HTTP/1.1 protocol allows for clients to optionally request pages from the server compressed using gzip. gzip is also the compression algorithm used in PNG files.
- bzip2, is a free and open source file compression method. bzip2 compresses most files more effectively than more traditional gzip or ZIP, but is slower. A bzip2 archive will carry any of the file extensions: .gz, .tgz, .tar.gz. Just like gzip, a tar archive is required in order to hold multiple files.
- The RAR file format is a slightly better compression than ZIP, and can also contain multiple files.
Other compression methods (file formats) include ACE, AR, EAR, JAR, TAR, WAR and ZOO.
Programs & utilities
Windows XP comes with some basic ZIP software of its own, but earlier versions require additional third party software to create or extract ZIP files.