FACTOID # 13: The United States spends more money on its military than the next 12 nations combined.
 
 Home   Encyclopedia   Statistics   Countries A-Z   Flags   Maps   Education   Forum   FAQ   About 
 
WHAT'S NEW
RECENT ARTICLES
More Recent Articles »
 

FACTS & STATISTICS    Simple view

  1. Select countries to view: (hold down Control key and click to select several)

     

     

    Compare:

     

     

  1. Select fact or statistic: (* = graphable)

     

     

     

  2. (OPTIONAL) Compare to statistic: (both need to be graphable)

     

     

     

  3. View result as:

     

       
(OR) SEARCH ALL encyclopedia, stats & forums:   

Encyclopedia > Linear classifier

In the field of Machine Learning, when a system or computer is being 'trained' to classify some data into classes, the decision of whether or not an instance belongs to a particular class is made on the basis of some function. A linear classifier is a classifier that uses a linear function of its inputs to base its decision on. As a broad subfield of artificial intelligence, Machine learning is concerned with the development of algorithms and techniques that allow computers to learn. At a general level, there are two types of learning: inductive, and deductive. ... Statistical classification is a type of supervised learning problem in which labeled training data is used to create a function that will correctly predict the label of future data. ... A linear function is a mathematical function term of the form: f(x) = m x + c where c is a constant. ...


That is, if the input feature vector to the classifier is a real vector vec x, then the estimated output score (or probability) is In mathematics, the real numbers may be described informally in several different ways. ...

y = f(vec{w}cdotvec{x}) = fleft(sum_j w_j x_jright),

where vec w is a real vector of weights and f is a function that converts the dot product of the two vectors into the desired output. Often f is a simple function that maps all values above a certain threshold to "yes" and all other values to "no". In mathematics, the dot product, also known as the scalar product, is a binary operation which takes two vectors and returns a scalar quantity. ...


For a two-class classification problem, one can visualize the operation of a linear classifier as splitting a high-dimensional input space with a hyperplane: all points on one side of the hyperplane are classified as "yes", while the others are classified as "no". A hyperplane is a concept in geometry. ...


A linear classifier is often used in situations where the speed of classification is an issue, since it is often the fastest classifier, especially when vec x is sparse. However, decision trees can be faster. Also, linear classifiers often work very well when the number of dimensions in vec x is large, as in document classification, where each element in vec x is typically the number of counts of a word in a document (see document-term matrix). In such cases, the classifier should be well-regularized. In decision theory, a decision tree is a graph of decisions and their possible consequences, (including resource costs and risks) used to create a plan to reach a goal. ... Document classification/categorization is a problem in information science. ... When creating a database of terms that appear in a set of documents the document-term matrix contains rows that refer to the documents whose coordinates are the terms. ... The mathematical term regularization has two main meanings, both associated with making a function more `regular or smooth. ...


Generative models vs. discriminative models

There are two main approaches for determining the parameters of a linear classifier vec w [1][2]. The first is by modeling conditional density functions P(vec x|{rm class}). Examples of such algorithms include: This article defines some terms which characterize probability distributions of two or more variables. ...

The second approach is called discriminative training, which attempts to maximize the quality of the output on a training set. Additional terms in the training cost function can easily perform regularization of the final model. Examples of discriminative training of linear classifiers include Linear discriminant analysis (LDA) and the related Fishers linear discriminant are used in machine learning to find the linear combination of features which best separate two or more classes of object or event. ... The normal distribution, also called Gaussian distribution (although Gauss was not the first to work with it), is an extremely important probability distribution in many fields. ... A naive Bayes classifier (also known as Idiots Bayes) is a simple probabilistic classifier based on applying Bayes theorem with strong (naive) independence assumptions. ... In probability theory and statistics, the binomial distribution is the discrete probability distribution of the number of successes in a sequence of n independent yes/no experiments, each of which yields success with probability p. ... A training set is used in artificial intelligence, together with a supervised training method, and it consists of an input vector and an answer vector. ... The mathematical term regularization has two main meanings, both associated with making a function more `regular or smooth. ...

  • Logistic regression --- maximum likelihood estimation of vec w assuming that the observed training set was generated by a binomial model that depends on the output of the classifier.
  • Perceptron --- an algorithm that attempts to fix all errors encountered in the training set
  • Support vector machine --- an algorithm that maximizes the margin between the decision hyperplane and the examples in the training set.

Note: In contrast to its name, LDA does not belong to the class of discriminative models in this taxonomy. However, its name makes sense when we compare LDA to the other main linear dimensionality reduction algorithm: Principal Components Analysis (PCA). LDA is a supervised learning algorithm that utilizes the labels of the data, while PCA is an unsupervised learning algorithm that ignores the labels. To summarize, the name is an historical artifact (see [3], p.117). It has been suggested that Logit be merged into this article or section. ... The perceptron is a type of artificial neural network invented in 1957 at the Cornell Aeronautical Laboratory by Frank Rosenblatt. ... Support vector machines (SVMs) are a set of related supervised learning methods used for classification and regression. ... The term margin has many meanings: In telecommunication, margin has the following meanings: In communications systems, the maximum degree of signal distortion that can be tolerated without affecting the restitution, without its being interpreted incorrectly by the decision circuit. ... Look up taxonomy in Wiktionary, the free dictionary. ... In statistics, dimensionality reduction is mapping a multidimensional space into a space of fewer dimensions. ... In statistics, principal components analysis (PCA) is a technique for simplifying a dataset. ... Supervised learning is a machine learning technique for creating a function from training data. ... Unsupervised learning is a method of machine learning where a model is fit to observations. ...


Discriminative training often yields higher accuracy than modeling the conditional density functions. However, handling missing data is often easier with conditional density models.


All of the linear classifier algorithms listed above can be converted into non-linear algorithms operating on a different input space varphi(vec x), using the kernel trick. The kernel trick was first published in the 1964 paper Theoretical foundations of the potential function method in pattern recognition learning. ...


Notes

  1. ^ T. Mitchell, Generative and Discriminative Classifiers: Naive Bayes and Logistic Regression. Draft Version, 2005 download
  2. ^ A. Y. Ng and M. I. Jordan. On Discriminative vs. Generative Classifiers: A comparison of logistic regression and Naive Bayes. in NIPS 14, 2002. download
  3. ^ R.O. Duda, P.E. Hart, D.G. Stork, "Pattern Classification", Wiley, (2001). ISBN 0-471-05669-3

See also:

  1. Y. Yang, X. Liu, "A re-examination of text categorization", Proc. ACM SIGIR Conference, pp. 42-49, (1999). paper @ citeseer
  2. R. Herbrich, "Learning Kernel Classifiers: Theory and Algorithms," MIT Press, (2001). ISBN 0-262-08306-X

  Results from FactBites:
 
Abstract (1713 words)
One might question why the linear classifier outperformed the Bayes classifier considering the Bayes classifier is supposed to be optimal.
For each of the classifiers, the training and test sets were formed by randomly selecting data points from each class until a collection half the size of the total set was reached.
The linear classifier's high-risk classification errors were similar to the Bayes classifier, while its misclassification of benign cancers was much lower.
Linear classifier - Wikipedia, the free encyclopedia (643 words)
In the field of Machine Learning, when a system or computer is being 'trained' to classify some data into classes, the decision of whether or not an instance belongs to a particular class is made on the basis of some function.
A linear classifier is a classifier that uses a linear function of its inputs to base its decision on.
For a two-class classification problem, one can visualize the operation of a linear classifier as splitting a high-dimensional input space with a hyperplane: all points on one side of the hyperplane are classified as "yes", while the others are classified as "no".
  More results at FactBites »


 

COMMENTARY     


Share your thoughts, questions and commentary here
Your name
Your comments
Please enter the 5-letter protection code

Want to know more?
Search encyclopedia, statistics and forums:

 


Lesson Plans | Student Area | Student FAQ | Reviews | Press Releases |  Feeds | Contact
The Wikipedia article included on this page is licensed under the GFDL.
Images may be subject to relevant owners' copyright.
All other elements are (c) copyright NationMaster.com 2003-5. All Rights Reserved.
Usage implies agreement with terms.