|
Audio compression is a form of data compression designed to reduce the size of audio files. Audio compression algorithms are typically referred to as audio codecs. As with other specific forms of data compression, there exist many "lossless" and "lossy" algorithms to achieve the compression effect. In computer science and information theory, data compression or source coding is the process of encoding information using fewer bits (or other information-bearing units) than an unencoded representation would use through use of specific encoding schemes. ...
An audio file format is a file format for storing audio data on a computer system. ...
A Codec is a device or program capable of performing encoding and decoding on a digital data stream or signal. ...
Lossless data compression is a class of data compression algorithms that allows the exact original data to be reconstructed from the compressed data. ...
A lossy data compression method is one where compressing data and then decompressing it retrieves data that may well be different from the original, but is close enough to be useful in some way. ...
Lossless compression As with image compression, both lossy and lossless compression algorithms are used in audio compression. As file storage and communications bandwidth have become less expensive and more available, the popularity of lossless formats such as FLAC has increased sharply, as people are choosing to maintain a permanent archive of their audio files. The primary users of lossless compression are audio engineers, audiophiles and those consumers who want to preserve the full quality of their audio files, in contrast to the quality loss from lossy compression techniques such as Vorbis and MP3. Of course, virtually every user will use both schemes for some files, or maintain both lossy and lossless versions, as their needs require. Image compression is the application of data compression on digital images. ...
FLAC, an acronym for Free Lossless Audio Codec, is a popular file format for audio data compression. ...
An Audio Engineer is a person recording, editing, manipulating, mixing and mastering sound by technical means. ...
Vorbis is an open and free lossy audio compression codec project headed by the Xiph. ...
MPEG-1 Audio Layer 3, more commonly referred to as MP3, is a popular digital audio encoding and lossy compression format, designed to greatly reduce the amount of data required to represent audio, yet still sound like a faithful reproduction of the original uncompressed audio to most listeners. ...
It is difficult to maintain all the data in an audio stream and achieve substantial compression. First, the vast majority of sound recordings are highly complex and random, recorded from the real world. As one of the key methods of compression is to find patterns and repetition, more random data such as audio doesn't compress well. In a similar manner, photographs compress less efficiently with lossless methods than simpler computer-generated images do. But interestingly, even computer generated sounds can contain very complicated waveforms that present a challenge to many compression algorithms. This is due to the nature of audio waveforms, which are generally difficult to simplify without a (necessarily lossy) conversion to frequency information, as performed by the human ear. A photograph (often just called a photo) is an image (or a representation of that on e. ...
Waveform quite literally means the shape and form of a signal, such as a wave moving across the surface of water, or the vibration of a plucked string. ...
The second reason is that values of audio samples change very quickly, so generic data compression algorithms don't work well for audio, and strings of consecutive bytes don't generally appear very often. However, convolution with the filter [-1 1] (that is, taking the first difference) tends to slightly whiten (decorrelate, make flat) the spectrum, thereby allowing traditional lossless compression at the encoder to do its job; integration at the decoder restores the original signal. Codecs such as FLAC, Shorten and TTA use linear prediction to estimate the spectrum of the signal. At the encoder, the estimator's inverse is used to whiten the signal by removing spectral peaks while the estimator is used to reconstruct the original signal at the decoder. A sample refers to a value or set of values at a point in time and/or space. ...
In mathematics and computing, an algorithm is a procedure (a finite set of well-defined instructions) for accomplishing some task which, given an initial state, will terminate in a defined end-state. ...
In mathematics and, in particular, functional analysis, convolution is a mathematical operator which takes two functions f and g and produces a third function that in a sense represents the amount of overlap between f and a reversed and translated version of g. ...
White noise spectrum White noise( ) is a random signal (or process) with a flat power spectral density. ...
Decorrelation is a general term for any process that is used to reduce autocorrelation within a signal, or cross-correlation within a set of signals, while preserving other aspects of the signal. ...
FLAC, an acronym for Free Lossless Audio Codec, is a popular file format for audio data compression. ...
SHN (Shorten) is a file format used to losslessly compress CD-quality audio files (44. ...
TTA may refer to Telecommunications Technology Association in Korea Tennessee Telecommunications Association, a telecommunications industry trade organization Tennessee Trails Association Terran Trade Authority, a science-fiction universe Texas Telephone Association, a trade association for telephone companies in Texas The Tough Alliance, a synthpop duo from Sweden Time triggered architecture, software...
Linear prediction is a mathematical operation where future values of a digital signal are estimated as a linear function of previous samples. ...
Linear models Parameter Estimation Deterministic parameters Least-squares (batch and recursive processing) Best linear unbiased estimation (BLUE) Maximum likelihood Random parameters Mean-squared Maximum a posteriori BLUE Weighted least squares State estimation Mean-squared prediction Mean-squared filtering (Kalman filter) Mean-squared smoothing Nonlinear models Parameter estimation Iterated least squares...
Lossless audio codecs have no quality issues, so the usability can be estimated by - Speed of compression and decompression
- Degree of compression
- Software and hardware support
For comparisons of lossless audio codecs, see hydrogenaudio.org wiki comparison; Speek's comparison (note the other links as well); this graph from Hans Heiden's site and Robin Whittle's 2003 comparison of several algorithms and discussion of Rice coding.
Lossy compression Lossy audio compression is used in an extremely wide range of applications. In addition to the direct applications (mp3 players or computers), digitally compressed audio streams are used in most video DVDs; digital television; streaming media on the internet; satellite and cable radio; and increasingly in terrestrial radio broadcasts. Lossy compression typically achieves far greater compression than lossless compression (data of 5-20% of the original stream, rather than 50-60%), by simplifying the complexities of the data. Given that bandwidth and storage are always limited, the trade-off of reduced audio quality is clearly outweighed for some applications where users wish to transmit or store more information. (For example, one can fit a lot more songs on his or her iPod using lossy than using lossless compression; and a DVD might hold several audio tracks using lossy compression in the space needed for one lossless audio track.) In both lossy and lossless compression, information redundancy is reduced, using methods such as coding, pattern recognition and linear prediction to reduce the amount of information used to describe the data. For example, suppose you wanted to record twenty house numbers along one side of a street, each of which goes up by 2. If the first address was 14461, or five digits, the uncompressed stream would require 20 times 5 bytes, or 100 bytes, to store. You could recode that to take advantage of the repetition and simply say begin at 14461, increase by 2, repeat 19 times. Now the data are losslessly captured in just 8 bytes! The innovation of lossy audio compression was to use psychoacoustics to recognize that not all data in an audio stream can be perceived by the human auditory system. Most lossy compression reduces perceptual redundancy by first identifying sounds which are considered perceptually irrelevant, that is, sounds that are very hard to hear. Typical examples include high frequencies, or sounds that occur at the same time as other louder sounds. Those sounds are coded with decreased accuracy or not coded at all. Psychoacoustics is the study of subjective human perception of sounds. ...
While removing or reducing these 'unhearable' sounds may account for a small percentage of bits saved in lossy compression, the real savings comes from a complementary phenomenon - noise shaping. Reducing the amount of bits used to code a signal increases the amount of noise in that signal. In psychoacoustics based lossy compression, the real key is to 'hide' the noise generated by the bit savings in areas of the audio stream that cannot be perceived. This is done by, for instance, using very small amounts of bits to code the high frequencies of most signals - not because the signal has little high frequency information (though this is also often true as well), but rather because the human ear can only perceive very loud signals in this region, so that softer (noise) sounds 'hidden' there simply aren't heard. To illustrate this by continuing with the example, suppose the data were more complex, so the difference between two house numbers was 4 in one instance, between the tenth and eleventh houses. Lossless coding would require something like this: begin at 14461, increase by 2, repeat 9 times, increase by 4, increase by 2, repeat 8 times. So 10, rather than 8 bytes, are needed to store the data. But if your model of lossy compression determines that difference was not relevant for the application, it might simplify the data to ignore the variation and increase the compression. However, some data are lost in the process, because the original data cannot be reconstructed from the lossy compression scheme; only an approximation of that data, determined to be sufficient for this application, can be recovered. If reducing perceptual redundancy does not achieve sufficient compression for a particular application, it may require further lossy compression with a difference in quality that can be more readily perceived by a user. Most lossy compression schemes allow compression parameters to be adjusted to achieve a target rate of data, usually expressed as a bit rate. Again, the data reduction will be guided by some model of how important the sound is as perceived by the human ear, with the goal of efficiency and optimized quality for the target data rate. (There are many different models used for this perceptual analysis, some better suited to different types of audio than others.) Hence, depending on the bandwidth and storage requirements, the use of lossy compression may result in a perceived reduction of the audio quality that ranges from none to severe. Of course, that trade-off is usually intentional. In telecommunications and computing, bit rate (sometimes written bitrate) is the frequency at which bits are passing a given (physical or metaphorical) point. It is quantified using the bit per second (bit/s) unit. ...
Because data are removed during lossy compression and cannot be recovered by decompression, some people may not prefer lossy compression for archival storage. Hence, as noted, even those who use lossy compression (for portable audio applications, for example) may wish to keep a losslessly compressed archive for other applications. In addition, the technology of compression continues to advance, and achieving a state-of-the-art lossy compression would require one to begin again with the lossless, original audio data and compress with the new lossy codec. The nature of lossy compression (for both audio and images) results in increasing degradation of quality if data are decompressed, then recompressed using lossy compression.
Coding methods Transform domain methods In order to determine what information in an audio signal is perceptual irrelevant, most lossy compression algorithms use transforms such as the modified discrete cosine transform (MDCT) to convert time domain sampled waveforms into a transform domain. Once transformed, typically into the frequency domain, component frequencies can be allocated bits according to how audible they are. Audibility of spectral components is determined by first calculating a masking threshold, below which it is estimated that sounds will be beyond the limits of human perception. The modified discrete cosine transform (MDCT) is a Fourier-related transform based on the type-IV discrete cosine transform (DCT-IV), with the additional property of being lapped: it is designed to be performed on consecutive blocks of a larger dataset, where subsequent blocks are overlapped so that the last...
Time-domain is a term used to describe the analysis of mathematical functions, or real-life signals, with respect to time. ...
Frequency domain is a term used to describe the analysis of mathematical functions with respect to frequency. ...
The masking threshold is the sound pressure level (SPL) of a sound you need to make hearing another in presence of a masker signal. ...
The masking threshold is calculated using the absolute threshold of hearing and the principles of simultaneous masking - the phenomenon wherein a signal is masked by another signal separated by frequency - and, in some cases, temporal masking - where a signal is masked by another signal separated by time. Equal-loudness contours may also be used to weight the perceptual importance of different components. Models of the human ear-brain combination incorporating such effects are often called psychoacoustic models. Fig. ...
- Masking between two concurrent sounds - Sometimes called frequency masking since it is often observed when the sounds share a frequency band - E.g. ...
Temporal masking occurs when a sudden stimulus sound makes inaudible other sounds which are present immediately preceding or following the stimulus. ...
An equal-loudness contour is a measure of sound pressure (dB SPL), over the frequency spectrum, for which a listener perceives a constant loudness. ...
Psychoacoustics is the study of subjective human perception of sounds. ...
Time domain methods Other types of lossy compressors, such as the linear predictive coding (LPC) used with speech, are source-based coders. These coders use a model of the sound's generator (such as the human vocal tract with LPC) to whiten the audio signal (i.e., flatten its spectrum) prior to quantization. LPC may also be thought of as a basic perceptual coding technique; reconstruction of an audio signal using a linear predictor shapes the coder's quantization noise into the spectrum of the target signal, partially masking it. Linear predictive coding (LPC) is a tool used mostly in audio signal processing and speech processing for representing the spectral envelope of a digital signal of speech in compressed form, using the information of a linear predictive model. ...
Applications Due to the nature of lossy algorithms, audio quality suffers when a file is decompressed and recompressed (generational losses). This makes lossy compression unsuitable for storing the intermediate results in professional audio engineering applications, such as sound editing and multitrack recording. However, they are very popular with end users (particularly MP3), as a megabyte can store about a minute's worth of music at adequate quality. audio filtering ...
MPEG-1 Audio Layer 3, more commonly referred to as MP3, is a popular digital audio encoding and lossy compression format, designed to greatly reduce the amount of data required to represent audio, yet still sound like a faithful reproduction of the original uncompressed audio to most listeners. ...
Usability Usability of lossy audio codecs is determined by: - Perceived audio quality
- Compression factor
- Speed of compression and decompression
- Inherent latency of algorithm (critical for real-time streaming applications; see below)
- Software and hardware support
Lossy formats are often used for the distribution of streaming audio, or interactive applications (such as the coding of speech for digital transmission in cell phone networks). In such applications, the data must be decompressed as the data flows, rather than after the entire data stream has been transmitted. Not all audio codecs can be used for streaming applications, and for such applications a codec designed to stream data effectively will usually be chosen. Latency results from the methods used to encode and decode the data. Some codecs will analyze a longer segment of the data to optimize efficiency, and then code it in a manner that requires a larger segment of data at one time in order to decode. (Often codecs create segments called a "frame" to create discrete data segments for encoding and decoding.) The inherent latency of the coding algorithm can be critical; for example, when there is two-way transmission of data, such as with a telephone conversation, significant delays may seriously degrade the perceived quality. Latency is the time a message takes to traverse a system. ...
In contrast to the speed of compression, which is proportional to the number of operations required by the algorithm, here latency refers to the number of samples which must be analysed before a block of audio is processed. In the minimum case, latency is 0 zero samples (e.g., if the coder/decoder simply reduces the number of bits used to quantize the signal). Time domain algorithms such as LPC also often have low latencies, hence their popularity in speech coding for telephony. In algorithms such as MP3, however, a large number of samples have to be analyzed in order to implement a psychoacoustic model in the frequency domain, and latency is on the order of 23 ms (46 ms for two-way communication).
Speech encoding Speech encoding is an important category of audio data compression. The perceptual models used to estimate what a human ear can hear are generally somewhat different from those used for music. The range of frequencies needed to convey the sounds of a human voice are normally far narrower than that needed for music, and the sound is normally less complex. As a result, speech can be encoded at high quality using relatively low bit rates. Speech coding is the compression of speech (into a code) for transmission with speech codecs that use audio signal processing and speech processing techniques. ...
This is accomplished, in general, by some combination of two approaches: - Only encoding sounds that could be made by a single human voice.
- Throwing away more of the data in the signal -- keeping just enough to reconstruct an "intelligible" voice rather than the full frequency range of human hearing.
Perhaps the earliest algorithms used in speech encoding (and audio data compression in general) were the A-law algorithm and the mu-law algorithm. Hearing, or audition, is one of the traditional five senses, and refers to the ability to detect sound. ...
Graph of μ-law & A-law algorithms An a-law algorithm is a standard companding algorithm, used in European digital communications systems to optimize, modify, the dynamic range of an analog signal for digitizing. ...
In telecommunication, a mu-law algorithm (μ-law) is a standard analog signal compression or companding algorithm, used in digital communications systems of the North American and Japanese digital hierarchies, to optimize (in other words, modify) the dynamic range of an audio analog signal prior to digitizing. ...
See also To meet Wikipedias quality standards, this article or section may require cleanup. ...
This article or section does not cite its references or sources. ...
Audio storage refers to techniques and formats used to store audio with the goal to reproduce the audio later using audio signal processing to something that resembles the original. ...
A Codec is a device or program capable of performing encoding and decoding on a digital data stream or signal. ...
A container format is a computer file format that can contain various types of data, compressed by means of standardized codecs. ...
In computer science and information theory, data compression or source coding is the process of encoding information using fewer bits (or other information-bearing units) than an unencoded representation would use through use of specific encoding schemes. ...
Digital Rights Management (generally abbreviated to DRM) is any of several technologies used by publishers (or copyright owners) to control access to and usage of digital data (such as software, music, movies) and hardware, handling usage restrictions associated with a specific instance of a digital work. ...
Digital signal processing (DSP) is the study of signals in a digital representation and the processing methods of these signals. ...
The following is a list of codecs. ...
Psychoacoustics is the study of subjective human perception of sounds. ...
Speech coding is the compression of speech (into a code) for transmission with speech codecs that use audio signal processing and speech processing techniques. ...
Speech coding is the compression of speech (into a code) for transmission with speech codecs that use audio signal processing and speech processing techniques. ...
Subband encoding is an audio compression technique where the sound is split into frequency bands and then parts of the signal which the ear cannot detect are removed (e. ...
External links |