|
Linear predictive coding (LPC) is a tool used mostly in audio signal processing and speech processing for representing the spectral envelope of a digital signal of speech in compressed form, using the information of a linear predictive model. It is one of the most powerful speech analysis techniques, and one of the most useful methods for encoding good quality speech at a low bit rate and provides extremely accurate estimates of speech parameters. Wikipedia does not have an article with this exact name. ...
CELP stands for Code Excited Linear Prediction and is a speech coding algorithm originally proposed by M.R. Schroeder and B.S. Atal in 1984. ...
This article or section does not cite its references or sources. ...
Speech processing is the study of speech signals and the processing methods of these signals. ...
In remote sensing using a spectrometer, the spectral envelope of a feature is the boundary of its spectral properties, as defined by the range of brightness levels in each of the spectral bands of interest. ...
A digital system is one that uses discrete values (often electrical voltages), especially those representable as binary numbers, or non-numeric symbols such as letters or icons, for input, processing, transmission, storage, or display, rather than a continuous spectrum of values (ie, as in an analog system). ...
In information theory, a signal is the sequence of states of a communications channel that encodes a message. ...
This article or section does not cite its references or sources. ...
In computer science and information theory, data compression or source coding is the process of encoding information using fewer bits (or other information-bearing units) than an unencoded representation would use through use of specific encoding schemes. ...
Linear prediction is a mathematical operation where future values of a digital signal are estimated as a linear function of previous samples. ...
Overview
-
Main article: source-filter model of speech production LPC starts with the assumption that a speech signal is produced by a buzzer at the end of a tube (voiced sounds), with occasional added hissing and popping sounds (sibilants and plosive sounds). Although apparently crude, this model is actually a close approximation to the reality of speech production. The glottis (the space between the vocal cords) produces the buzz, which is characterized by its intensity (loudness) and frequency (pitch). The vocal tract (the throat and mouth) forms the tube, which is characterized by its resonances, which are called formants. Hisses and pops are generated by the action of the tongue, lips and throat during sibilants and plosives. The source-filter model of speech production assumes that the vocal cords are the source of spectrally flat sound (the excitation signal), and that the vocal tract acts as a filter to spectrally shape the various sounds of speech. ...
A sibilant is a type of fricative, made by speeding up air through a narrow channel and directing it over the sharp edge of the teeth. ...
A stop or plosive or occlusive is a consonant sound produced by stopping the airflow in the vocal tract. ...
The space between the vocal cords is called the glottis. ...
spectrogram of American English vowels [i, u, É] showing the formants F1 and F2 A formant is a peak in an acoustic frequency spectrum which results from the resonant frequencies of any acoustical system. ...
LPC analyzes the speech signal by estimating the formants, removing their effects from the speech signal, and estimating the intensity and frequency of the remaining buzz. The process of removing the formants is called inverse filtering, and the remaining signal after the subtraction of the filtered modeled signal is called the residue. The numbers which describe the intensity and frequency of the buzz, the formants, and the residue signal, can be stored or transmitted somewhere else. LPC synthesizes the speech signal by reversing the process: use the buzz parameters and the residue to create a source signal, use the formants to create a filter (which represents the tube), and run the source through the filter, resulting in speech. Because speech signals vary with time, this process is done on short chunks of the speech signal, which are called frames; generally 30 to 50 frames per second give intelligible speech with good compression.
Early history of LPC According to Robert M. Gray of Stanford University, the first ideas leading to LPC started in 1966 when S. Saito and F. Itakura of NTT described an approach to automatic phoneme discrimination that involved the first maximum likelihood approach to speech coding. In 1967, John Burg outlined the maximum entropy approach. In 1969 Itakura and Saito introduced partial correlation, May Glen Culler proposed realtime speech encoding, and B. S. Atal presented an LPC speech coder at the Annual Meeting of the Acoustical Society of America. In 1971 realtime LPC using 16-bit LPC hardware was demonstrated by Philco-Ford; four units were sold. The Leland Stanford Junior University, commonly known as Stanford University (or simply Stanford), is a private university located approximately 37 miles (60 kilometers) southeast of San Francisco and approximately 20 miles northwest of San José in an unincorporated area of Santa Clara County. ...
Nippon Telegraph and Telephone (æ¥æ¬é»ä¿¡é»è©± Nippon Denshin Denwa) is a telephone company that dominates the telecommunication market in Japan. ...
Maximum likelihood estimation (MLE) is a popular statistical method used to make inferences about parameters of the underlying probability distribution from a given data set. ...
The principle of maximum entropy is a method for analyzing the available information in order to determine a unique epistemic probability distribution. ...
The Acoustical Society of America is an international scientific society dedicated to increasing and diffusing the knowledge of acoustics and its practical applications. ...
Philco, the Philadelphia Electric Company (formerly known as the Spencer Company), was a pioneer in early radio and television and former employer of Philo Farnsworth, inventor of cathode ray tube television. ...
In 1972 Bob Kahn of ARPA, with Jim Forgie (Lincoln Laboratory, LL) and Dave Walden (BBN Technologies), started the first developments in packetized speech, which would eventually lead to Voice over IP technology. In 1973, according to Lincoln Laboratory informal history, the first realtime 2400 bit/s LPC was implemented by Ed Hofstetter. In 1974 the first realtime two-way LPC packet speech communication was accomplished over the ARPANET at 3500 bit/s between Culler-Harrison and Lincoln Laboratories. In 1976 the first LPC conference took place over the ARPANET using the Network Voice Protocol, between Culler-Harrison, ISI, SRI, and LL at 3500 bit/s. And finally in 1978, Vishwanath et al. of BBN developed the first variable-rate LPC algorithm. Robert E. Kahn, (born December 23, 1938), along with Vinton G. Cerf, invented the TCP/IP protocol, the technology used to transmit information on the modern Internet. ...
The Defense Advanced Research Projects Agency (DARPA) is an agency of the United States Department of Defense responsible for the development of new technology for use by the military. ...
MIT Lincoln Laboratory, also known as Lincoln Lab, is a federally funded research and development center managed by the Massachusetts Institute of Technology and funded by the United States Department of Defense. ...
BBN Technologies (originally Bolt Beranek and Newman) is a high-technology company that provides research and development services. ...
An overview of how VoIP works A typical analog telephone adapter for connecting an ordinary phone to a VoIP network Voice over Internet Protocol, also called VoIP, IP Telephony, Internet telephony, Broadband telephony, Broadband Phone and Voice over Broadband is the routing of voice conversations over the Internet or through...
The Network Voice Protocol (NVP) was a pioneering computer network protocol for transporting human speech over packetized communications networks. ...
LPC coefficient representations LPC is frequently used for transmitting spectral envelope information, and as such it has to be tolerant for transmission errors. Transmission of the filter coefficients directly (see linear prediction for definition of coefficients) is undesirable, since they are very sensitive to errors. In other words, a very small error can distort the whole spectrum, or worse, a small error might make the prediction filter unstable. Linear prediction is a mathematical operation where future values of a digital signal are estimated as a linear function of previous samples. ...
There are more advanced representations such as Log Area Ratios (LAR), line spectral pairs (LSP) decomposition and reflection coefficients. Of these, especially LSP decomposition has gained popularity, since it ensures stability of the predictor, and spectral errors are local for small coefficient deviations. Log Area Ratios (LAR) can be used to represent Reflection Coefficients (another from for Linear Prediction Coefficients) for transmission over a channel. ...
Line Spectral Pairs (LSP) are used to represent Linear Prediction Coefficients (LPC) for transmission over a channel. ...
Levinson recursion is a mathematical procedure which recursively calculates the solution to a Toeplitz matrix. ...
Applications LPC is generally used for speech analysis and resynthesis. It is used as a form of voice compression by phone companies, for example in the GSM standard. It is also used for secure wireless, where voice must be digitized, encrypted and sent over a narrow voice channel. Not to be confused with Get Some Mates The Global System for Mobile Communications (GSM) is the most popular standard for mobile phones in the world. ...
Communications security (COMSEC): Measures and controls taken to deny unauthorized persons information derived from telecommunications and ensure the authenticity of such telecommunications. ...
Digitizing, or digitization, is the process of turning an analog signal into a digital representation of that signal. ...
This article is about algorithms for encryption and decryption. ...
LPC synthesis can be used to construct vocoders where musical instruments are used as excitation signal to the time-varying filter estimated from a singer's speech. This is somewhat popular in electronic music. Paul Lansky made the well-known computer music piece notjustmoreidlechatter using linear predictive coding.[1] A 10th-order LPC was used in the popular 1980's Speak & Spell educational toy. A vocoder (name derived from voice encoder, formerly also called voder) is a speech analyzer and synthesizer. ...
Electronic music is a term for music created using electronic devices. ...
Paul Lansky (born 1944) is widely considered one of the original electronic music or computer music composers, and has been producing works from the 1970s up to the present day (see discography, below). ...
More Than Idle Chatter is a collection of music by Paul Lansky. ...
The Speak & Spell was a popular and revolutionary electronic toy consisting of a speech synthesizer and a keyboard. ...
Waveform ROM in digital sample-based music synthesizers made by Yamaha Corporation is compressed using LPC algorithm. Look up ROM in Wiktionary, the free dictionary. ...
Sample-based synthesis is a form of audio synthesis that can be similar in structure to either subtractive synthesis or additive synthesis. ...
A synthesizer (or synthesiser) is an electronic musical instrument designed to produce electronically generated sound, using techniques such as additive, subtractive, FM, physical modelling synthesis, phase distortion, or Scanned synthesis. ...
The Yamaha Corporation (ã¤ããæ ªå¼ä¼ç¤¾; TYO: 7951 ) is a Japanese company with a large number of product areas. ...
0-to-4th order LPC predictors are used in FLAC audio codec. FLAC, an acronym for Free Lossless Audio Codec, is a popular file format for audio data compression. ...
References See also Warped Linear Predictive Coding (Warped LPC or WLPC) is a variant of Linear predictive coding in which the spectral representation of the system is modified, for example by replacing the unit delays used in an LPC implementation with first-order allpass filters. ...
The Akaike information criterion (AIC) (pronounced, approximately, ah-kah-ee-kay), developed by Professor Hirotugu Akaike (èµ¤æ± å¼æ¬¡) in 1971 and proposed in 1974, is a statistical model fit measure. ...
Audio compression can mean two things: Audio data compression - in which the amount of data in a recorded waveform is reduced for transmission. ...
FS-1015 is a secure telephony speech coding standard developed by the U.S. Department of Defence (DoD) and later by NATO. The standard was finished 1984. ...
FS-1016 is a secure telephony speech coding standard developed by the U.S. Department of Defence (DoD). ...
External links | v • d • e Compression methods | | Lossless compression methods | Theory Entropy · Complexity · Redundancy | Entropy encoding Huffman · Adaptive Huffman · Arithmetic (Shannon-Fano · Range) · Golomb · Exp-Golomb · Universal (Elias · Fibonacci) In computer science and information theory, data compression or source coding is the process of encoding information using fewer bits (or other information-bearing units) than an unencoded representation would use through use of specific encoding schemes. ...
Used mainly in object-oriented programming, the term method refers to a piece of code that is exclusively associated either with a class (called class methods, static methods, or factory methods) or with an object (called instance methods). ...
Lossless data compression is a class of data compression algorithms that allows the exact original data to be reconstructed from the compressed data. ...
A bundle of optical fiber. ...
Entropy of a Bernoulli trial as a function of success probability, often called the binary entropy function Entropy is a concept in thermodynamics (see entropy), statistical mechanics and information theory. ...
In computer science, the Kolmogorov complexity (also known as descriptive complexity, Kolmogorov-Chaitin complexity, stochastic complexity, algorithmic entropy, or program-size complexity) of an object such as a piece of text is a measure of the computational resources needed to specify the object. ...
Redundancy in information theory is the number of bits used to transmit a message minus the number of bits of actual information in the message. ...
An entropy encoding is a coding scheme that assigns codes to symbols so as to match code lengths with the probabilities of the symbols. ...
In computer science and information theory, Huffman coding is an entropy encoding algorithm used for lossless data compression. ...
Adaptive Huffman coding is an adaptive coding technique based on Huffman coding, building the code as the symbols are being transmitted, having no initial knowledge of source distribution, that allows one-pass encoding and adaptation to changing conditions in data. ...
Arithmetic coding is a method for lossless data compression. ...
In the field of data compression, Shannon-Fano coding is a technique for constructing a prefix code based on a set of symbols and their probabilities (estimated or measured). ...
Range encoding is a form of arithmetic coding, a data compression method, that is believed to be free from arithmetic coding related patents. ...
Golomb coding is a form of entropy encoding invented by Solomon W. Golomb that is optimal for alphabets following geometric distributions, that is, when small values are vastly more common than large values. ...
An Exponential-Golomb code (or just Exp-Golomb code) of order is a type of universal code, parameterized by a whole number . ...
In data compression, a universal code for integers is a prefix-free code that maps the positive integers onto self-delimiting binary codewords, with the additional property that whatever the true probability distribution on integers, the lengths of the codewords are within a constant factor of the lengths that the...
Elias gamma code is a universal code encoding the positive integers. ...
The Fibonacci code is a universal code which encodes positive integers into binary code words. ...
| Dictionary LZ77/78 · LZW · LZO · DEFLATE · LZMA · LZX | Others RLE · BWT · PPM | | | Audio compression methods | Theory Convolution · Sampling · Nyquist–Shannon theorem | Audio codecs parts LPC (LAR · LSP) · WLPC · CELP · ACELP · A-law · Mu-law · MDCT · Fourier transform · Psychoacoustic model A dictionary coder, also sometimes known as a substitution coder, is any of a number of data compression algorithms which operate by searching for matches between the text to be compressed and a set of strings contained in a data structure (called the dictionary) maintained by the encoder. ...
LZ77 and LZ78 are the names for the two lossless data compression algorithms published in papers by Abraham Lempel and Jacob Ziv in 1977 and 1978. ...
LZW (Lempel-Ziv-Welch) is an implementation of a lossless data compression algorithm created by Abraham Lempel and Jacob Ziv. ...
LZO is a data compression algorithm that is focused on decompression speed. ...
DEFLATE is a lossless data compression algorithm that uses a combination of the LZ77 algorithm and Huffman coding. ...
LZMA, short for Lempel-Ziv-Markov chain-Algorithm, is a data compression algorithm in development since 2001 and used in the 7z format of the 7-Zip archiver. ...
LZX is the name of an LZ77 family compression algorithm. ...
Run-length encoding (RLE) is a very simple form of data compression in which runs of data (that is, sequences in which the same data value occurs in many consecutive data elements) are stored as a single data value and count, rather than as the original run. ...
The Burrows-Wheeler transform (BWT, also called block-sorting compression), is an algorithm used in data compression techniques such as bzip2. ...
PPM is an adaptive statistical data compression technique based on context modeling and prediction. ...
Audio compression is a form of data compression designed to reduce the size of audio files. ...
Acoustics is a branch of physics and is the study of sound (mechanical waves in gases, liquids, and solids). ...
In mathematics and, in particular, functional analysis, convolution is a mathematical operator which takes two functions f and g and produces a third function that in a sense represents the amount of overlap between f and a reversed and translated version of g. ...
In signal processing, sampling is the reduction of a continuous signal to a discrete signal. ...
The NyquistâShannon sampling theorem is a fundamental result in the field of information theory, in particular telecommunications and signal processing. ...
An audio codec is a computer program that compresses/decompresses digital audio data according to a given audio file format or streaming audio format. ...
Log Area Ratios (LAR) can be used to represent Reflection Coefficients (another from for Linear Prediction Coefficients) for transmission over a channel. ...
Line Spectral Pairs (LSP) are used to represent Linear Prediction Coefficients (LPC) for transmission over a channel. ...
Warped Linear Predictive Coding (Warped LPC or WLPC) is a variant of Linear predictive coding in which the spectral representation of the system is modified, for example by replacing the unit delays used in an LPC implementation with first-order allpass filters. ...
CELP stands for Code Excited Linear Prediction and is a speech coding algorithm described by the US Federal Standard FIPS 1016. ...
Algebraic Code Excited Linear Prediction or ACELP is a speech encoding algorithm where a limited set of pulses is distributed as excitation to linear prediction filter. ...
Graph of μ-law & A-law algorithms An a-law algorithm is a standard companding algorithm, used in European digital communications systems to optimize, modify, the dynamic range of an analog signal for digitizing. ...
Graph of μ-law & A-law algorithms The mu-law algorithm (μ-law) is a companding algorithm, primarily used in the digital telecommunication systems of North America and Japan. ...
The modified discrete cosine transform (MDCT) is a Fourier-related transform based on the type-IV discrete cosine transform (DCT-IV), with the additional property of being lapped: it is designed to be performed on consecutive blocks of a larger dataset, where subsequent blocks are overlapped so that the last...
In mathematics, the Fourier transform is a certain linear operator that maps functions to other functions. ...
Psychoacoustics is the study of subjective human perception of sounds. ...
| Others Dynamic range compression · Speech compression · Sub-band coding | | | Image compression methods | | | Video compression | | |