FACTOID # 42: English speaking kids are the world's biggest novel readers - but the least enthusiastic comic readers.
 
 Home   Encyclopedia   Statistics   Countries A-Z   Flags   Maps   Education   Forum   FAQ   About 
 
WHAT'S NEW
RELATED ARTICLES
People who viewed "DEFLATE" also viewed:
RECENT ARTICLES
More Recent Articles »
 

SEARCH ALL

FACTS & STATISTICS    Advanced view

Search encyclopedia, statistics and forums:

 

 

(* = Graphable)

 

 


Encyclopedia > DEFLATE

DEFLATE is a lossless data compression algorithm that uses a combination of the LZ77 algorithm and Huffman coding. It was originally defined by Phil Katz for version 2 of his PKZIP archiving tool, and was later specified in RFC 1951. Lossless data compression is a class of data compression algorithms that allows the exact original data to be reconstructed from the compressed data. ... In mathematics, computing, linguistics, and related disciplines, an algorithm is a finite list of well-defined instructions for accomplishing some task that, given an initial state, will terminate in a defined end-state. ... LZ77 and LZ78 are the names for the two lossless data compression algorithms published in papers by Abraham Lempel and Jacob Ziv in 1977 and 1978. ... In computer science and information theory, Huffman coding is an entropy encoding algorithm used for lossless data compression. ... Phil Katz, shown in 1994, holds a computer disk containing compression software made by his company, PKWare Inc. ... PKZIP is an archiving tool originally written by the late Phil Katz, and marketed by his company PKWARE, Inc. ...


DEFLATE is widely thought to be free of any subsisting patents, and at a time before the patent on LZW (which is used in the GIF file format) expired, this has led to its use in gzip compressed files and PNG image files, in addition to the ZIP file format for which Katz originally designed it. A patent is a set of exclusive rights granted by a state to a patentee for a fixed period of time in exchange for a disclosure of an invention. ... LZW (Lempel-Ziv-Welch) is an implementation of a lossless data compression algorithm created by Abraham Lempel and Jacob Ziv. ... An example of a GIF image. ... The correct title of this article is . ... PNG (Portable Network Graphics) is a bitmapped image format that employs lossless data compression. ... The ZIP file format is the most widely-used compressed file format in the IBM PC world. ...

Contents

Using DEFLATE in new software

Implementations of deflate are freely available in many languages. C programs typically use the zlib library (under a BSD-like licence without advertising clause). Programs written using the Borland dialects of Pascal can use paszlib, a C++ library is included as part of 7-Zip/AdvanceCOMP. Java includes support as part of the standard library (in java.util.zip). Microsoft .NET Framework 2.0 base class library supports it in the System.IO.Compression namespace. zlib is an open-source, cross-platform data compression library. ... The zlib license is a permissive free software license which defines the terms under which the zlib and libpng software libraries can be distributed. ... Borland Software Corporation is a software company headquartered in Austin, Texas. ... C++ (pronounced see plus plus, IPA: ) is a general-purpose, programming language with high-level and low-level capabilities. ... 7-Zip is an open source file archiver designed originally for the Microsoft Windows operating system, and later made available to other systems. ... AdvanceCOMP is a set of command line data (re-)compression tools. ... The . ...


Stream format

A DEFLATE stream consists of a series of blocks. Each block is preceded by a 3-bit header: This article is about the unit of information. ...

  • 1-bit: Last block in stream marker:
 * 1: if this is the last-block in the stream * 0: if there are more blocks to process after this one. 
  • 2-bits: Encoding method used for this block type:
 * 00: a stored/raw/literal section follows, between 0 and 65535 bytes in length. * 01: a static Huffman compressed block, using a pre-agreed Huffman tree. * 10: a compressed block complete with the Huffman table supplied. 

Most blocks will end up being encoded using method 10, the dynamic Huffman encoding, which produces an optimised Huffman tree customised for each block of data individually. Instructions to generate the necessary Huffman tree immediately follow the block header.


Compression is achieved through two steps

  • The matching and replacement of duplicate strings with pointers.
  • Replacing symbols with new, weighted symbols based on frequency of use.

Duplicate string elimination

Main article: LZ77

Within compressed blocks, if a duplicate series of bytes is spotted (a repeated string), then a back-reference is inserted, linking to the previous location of that identical string instead. An encoded match to an earlier string consists of a length (3-258 bytes) and a distance (1-32768 bytes). Relative back-references can be made across any number of blocks, as long as the distance appears within the last 32kB of uncompressed data decoded (termed the sliding window). LZ77 and LZ78 are the names for the two lossless data compression algorithms published in papers by Abraham Lempel and Jacob Ziv in 1977 and 1978. ... This article discusses a general notion of reference in computing. ...


Bit reduction

Main article: Huffman coding

The second compression stage consists of replacing commonly-used symbols with shorter representations and less commonly used symbols with longer representations. The method used is Huffman coding which creates an unprefixed tree of non-overlapping bit-sequences, where the length of each sequence is inversely proportional to the likelihood of that symbol needing to be encoded. The more likely a symbol has to be encoded, the shorter its bit-sequence will be. In computer science and information theory, Huffman coding is an entropy encoding algorithm used for lossless data compression. ... In computer science and information theory, Huffman coding is an entropy encoding algorithm used for lossless data compression. ...


A tree is created which contains space for 288 symbols:

 * 0-255: represent the literal bytes/symbols 0-255. * 256: end of block - stop processing if last block, otherwise start processing next block. * 257-285: combined with extra-bits, a match length of 3-258 bytes. * 286,287: not used, reserved and illegal but still part of the tree. 

A match length code will always be followed by a distance code. Based on the distance code read, further "extra" bits may be read in order to produce the final distance. The distance tree contains space for 32 symbols:

 * 0-3: distances 1-4 * 4-5: distances 5-8, 1 extra bit * 6-7: distances 9-16, 2 extra bits * 8-9: distances 17-32, 3 extra bits * ... * 26-27: distances 8193-16384, 12 extra bits * 28-29: distances 16385-32768, 13 extra bits * 30-31: not used, reserved and illegal but still part of the tree. 

Note that for the match distance symbols 2-29, the number of extra bits can be calculated as n / 2 − 1.


Encoder / Compressor

During the compression stage, it is the encoder that chooses the amount of time spent looking for matching strings. The zlib/gzip reference implementation allows the user to select from a sliding scale of likely resulting compression-level vs. speed of encoding. Options range from -0 (do not attempt compression, just store uncompressed) to -9 representing the maximum capability of the reference implementation in zlib/gzip. A sliding scale is a scale (usually a payment scale) the values of which depend on another related factor. ...


Other DEFLATE encoders have been produced, all of which will also produce a compatible bit-stream capable of being decompressed by any existing DEFLATE decoder. Differing implementations will likely produce variations on the final encoded bit-stream produced. The focus with non-zlib versions of an encoder has normally been to produce a more efficiently compressed and small encoded stream.


Encoder Implementations

  • PKZIP: the first implementation, originally done by Phil Katz as part of PKZip.
  • zlib/gzip: standard reference implementation used in a huge amount of software, owing to public availability of the source code and a license allowing inclusion into other software.
    • jzlib: Rewrite/re-implementation/port of the zlib encoder into pure Java and distributed under a BSD license. (Fully-featured replacement for java.util.zip).
    • PasZLIB: Translation/port of the zlib code into Pascal source code by Jacques Nomssi-Nzali.
  • KZIP/PNGOUT: an encoder by the game-programmer Ken Silverman using "an exhaustive search of all patterns" and "[an] advanced block splitter".
  • PuZip: designed for Commodore 64/C128 computers. PuZip is limited to an 8kB LZ77 window size, with only the store (type 00) and fixed Huffman (type 01) methods.
  • BigSpeed Deflate: "Tiny in-memory compression library" available as a MS Windows DLL limited to 32kB blocks at a time and three compression settings.
  • BJWFlate/DeflOpt: Ben Jos Walbeehm's utilities "designed to attempt to squeeze every possible byte out of the files it compresses". Note that the author has stopped development on BJWFlate (but not DeflOpt) in March 2004.
  • Crypto++: contains a public domain implementation in C++ aimed mainly at reducing potential security vulnerabilities. The author, Wei Dai states "This code is less clever, but hopefully more understandable and maintainable [than zlib]".
  • 7-Zip/AdvanceCOMP: written by Igor Pavlov in C++, this version is freely licensed and tends to achieve higher compression than zlib at the expense of CPU usage.
  • PuTTY contains its own DEFLATE implementation.

AdvanceCOMP uses the higher compression ratio version of DEFLATE as implemented by 7-Zip to enable recompression of gzip, PNG, MNG and ZIP files with the possibility of achieving smaller file sizes than zlib is able to at maximum settings. An even more effective (but also more user-input-demanding and CPU intensive) DEFLATE encoder is employed inside Ken Silverman's KZIP and PNGOUT utilities. PKZIP is an archiving tool originally written by the late Phil Katz, and marketed by his company PKWARE, Inc. ... Phil Katz, shown in 1994, holds a computer disk containing compression software made by his company, PKWare Inc. ... PKZIP is an archiving tool originally written by the late Phil Katz, and marketed by his company PKWARE, Inc. ... zlib is an open-source, cross-platform data compression library. ... The correct title of this article is . ... Java is a programming language originally developed by Sun Microsystems and released in 1995. ... The BSD license is a permissive license and is one of the most widely used free software licenses. ... Wikibooks has a book on the topic of Pascal Pascal is an imperative computer programming language, developed in 1970 by Niklaus Wirth as a language particularly suitable for structured programming. ... PNGOUT is a free optimizer for PNG images and is written by Ken Silverman. ... He invented the Build Engine Ken Silverman (born November 1, 1975) is a jewish game programmer best known for writing the Build engine used in Duke Nukem 3D, Redneck Rampage, and more than a dozen other games in the mid- to late-1990s. ... The Commodore 64 is the best-selling single personal computer model of all time. ... The Commodore 128 (C128, CBM 128, C=128) home/personal computer was Commodore Business Machiness (CBM) last commercially released 8-bit machine. ... DeflOpt is a tool created by Ben Jos Walbeehm, designed to optimize the storage of Deflate streams contained in several types of files. ... C++ (pronounced see plus plus, IPA: ) is a general-purpose, programming language with high-level and low-level capabilities. ... In computer security, the word vulnerability refers to a weakness in a system allowing an attacker to violate the confidentiality, integrity, availability [i. ... 7-Zip is an open source file archiver designed originally for the Microsoft Windows operating system, and later made available to other systems. ... AdvanceCOMP is a set of command line data (re-)compression tools. ... C++ (pronounced see plus plus, IPA: ) is a general-purpose, programming language with high-level and low-level capabilities. ... PuTTY is a free software SSH, Telnet, rlogin, and raw TCP client. ... AdvanceCOMP is a set of command line data (re-)compression tools. ... 7-Zip is an open source file archiver designed originally for the Microsoft Windows operating system, and later made available to other systems. ... The correct title of this article is . ... PNG (Portable Network Graphics) is a bitmapped image format that employs lossless data compression. ... Multiple-image Network Graphics (MNG) (IPA pronunciation: ) is a public graphics file format for animated images. ... The ZIP file format is the most widely-used compressed file format in the IBM PC world. ... zlib is an open-source, cross-platform data compression library. ... He invented the Build Engine Ken Silverman (born November 1, 1975) is a jewish game programmer best known for writing the Build engine used in Duke Nukem 3D, Redneck Rampage, and more than a dozen other games in the mid- to late-1990s. ...


Other possible focuses for a DEFLATE implementation could be super-fast compression speed, or being designed to have a very small executable code such for use in embedded systems. Further requirements could be to have a separate license on the software, or for the code to be written in a language such Java or Python. In theory it might be possible for an implementation to produce an encoding that attempted to avoid certain symbols appearing in the output stream, avoiding blacklisted sequences or NUL bytes in a particular context. Java is a programming language originally developed by Sun Microsystems and released in 1995. ... Python is a high-level programming language first released by Guido van Rossum in 1991. ... A blacklist is a list or register of entities who, for one reason or another, are being denied a particular privilege, service, or mobility. ...


Hardware Encoders

  • AHA361-PCIX/AHA362-PCIX from Comtech AHA. Comtech produce a PCI-X only card (PCI-ID: 193f:0001) capable of compressing streams using DEFLATE at a claimed rate of up to 3.0 Gbit/s (375 MB/s) for uncompressed incoming data. Accompanying the Linux kernel driver for the AHA362-PCIX are an 'ahagzip' utility and customised 'mod_deflate_aha' capable of using the hardware compression from Apache. Despite containing a Xilinx Virtex FPGA and four custom AHA3601 ASICs, the hardware appears to be somewhat limited in only handling static Huffman blocks.
  • StorCompress 300/MX3 from Indra Networks. This is a range of PCI (PCI-ID: 17b4:0011) or PCI-X cards featuring between one and six compression engines with claimed processing speeds of up to 3.6 Gbit/s (450 MB/s). A version of the cards are available with the separate brand WebEnhance specifically designed for web-serving use rather than SAN or backup use.

For other meanings of PCI, see PCI (disambiguation). ... To meet Wikipedias quality standards, this article or section can be improved by converting lengthy lists to text. ... Windows XP loading drivers during a Safe Mode bootup A device driver, or a software driver is a specific type of computer software, typically developed to allow interaction with hardware devices. ... The Apache HTTP Server, commonly referred to simply as Apache, is a web server notable for playing a key role in the initial growth of the World Wide Web. ... Xilinx, Inc. ... Virtex is a series of FPGAs produced by Xilinx. ... A field-programmable gate array or FPGA is a gate array that can be reprogrammed after it is manufactured, rather than having its programming fixed during the manufacturing — a programmable logic device. ... This article does not cite any references or sources. ... For other meanings of PCI, see PCI (disambiguation). ... For other meanings of PCI, see PCI (disambiguation). ... It has been suggested that this article or section be merged with SAN. (Discuss) Look up san, -san in Wiktionary, the free dictionary. ...

Decoder / Decompressor

INFLATE is the decoding process that takes a DEFLATE bit stream for decompression and correctly produces the original full-size data or file.


INFLATE-only implementations

The normal intent with an alternative INFLATE implementation is highly optimised decoding speed, or extremely predictable RAM usage for micro-controller embedded systems.

  • inflate.cl: by John Foderaro. Self-standing Common Lisp decoder distributed with a GNU LGPL license.
  • kunzip: by Michael Kohn and unrelated to "KZIP". Comes with C source-code under the GNU LGPL license. Used in the GIMP installer.
  • lodepng: by Lode Vandevenne. A BSD-licensed single file PNG file reader with built-in C++ inflate implementation and no external dependencies.
  • pyflate: a pure-Python stand-alone DEFLATE (gzip) and bzip2 decoder by Paul Sladen. Written for research/prototyping and made available under the BSD/GPL/LGPL/DFSG licenses.
  • PCDEZIP, Bob Flanders and Michael Holmes, published in PC Magazine 1994–01–11.

Common Lisp, commonly abbreviated CL, is a dialect of the Lisp programming language, standardised by ANSI X3. ... GNU logo The GNU Lesser General Public License (formerly the GNU Library General Public License) is an FSF approved Free Software license designed as a compromise between the GNU General Public License and simple permissive licenses such as the BSD license and the MIT License. ... C is a general-purpose, block structured, procedural, imperative computer programming language developed in 1972 by Dennis Ritchie at the Bell Telephone Laboratories for use with the Unix operating system. ... GNU logo The GNU Lesser General Public License (formerly the GNU Library General Public License) is an FSF approved Free Software license designed as a compromise between the GNU General Public License and simple permissive licenses such as the BSD license and the MIT License. ... The GNU Image Manipulation Program, or GIMP, is a raster graphics editor application with some support for vector graphics. ... The BSD license is a permissive license and is one of the most widely used free software licenses. ... Python is a high-level programming language first released by Guido van Rossum in 1991. ... The correct title of this article is . ... The correct title of this article is . ... The BSD license is a permissive license and is one of the most widely used free software licenses. ... The GNU logo The GNU General Public License (GNU GPL or simply GPL) is a widely-used free software license, originally written by Richard Stallman for the GNU project. ... GNU logo The GNU Lesser General Public License (formerly the GNU Library General Public License) is a free software license published by the Free Software Foundation. ... The Debian Free Software Guidelines (DFSG) are a set of guidelines that the Debian Project uses to determine whether a software license is free software license, which in turn is used to determine whether a piece of software can be included in the main, free software distribution of Debian. ...

See also

This is a list of file formats used by archivers and compressors. ... It has been suggested that this article or section be merged into Comparison of file archivers. ... The following tables compare general and technical information for a number of file archivers. ...

External links


  Results from FactBites:
 
EH.Net Encyclopedia: Deflation (2814 words)
Deflation is a persistent fall in some generally followed aggregate indicator of price movements, such as the consumer price index or the GDP deflator.
Hence, a deflation is a harbinger of a financial crisis with repercussions for the economy as a whole.
Unfortunately, the prevailing ideology was that deflation was a purgative of sorts, that is, the price to be paid for economic excesses during the boom years, and necessary to establish to conditions for economic recovery.
  More results at FactBites »


 

COMMENTARY     


Share your thoughts, questions and commentary here
Your name
Your comments
Please enter the 5-letter protection code

Want to know more?
Search encyclopedia, statistics and forums:

 


Lesson Plans | Student Area | Student FAQ | Reviews | Press Releases |  Feeds | Contact
The Wikipedia article included on this page is licensed under the GFDL.
Images may be subject to relevant owners' copyright.
All other elements are (c) copyright NationMaster.com 2003-5. All Rights Reserved.
Usage implies agreement with terms.