FACTOID # 163: Only 4% of married women in Chad are using contraceptives.
 
 Home   Encyclopedia   Statistics   Countries A-Z   Flags   Maps   Education   Forum   FAQ   About 
 
WHAT'S NEW
RECENT ARTICLES
More Recent Articles »
 

FACTS & STATISTICS    Simple view

  1. Select countries to view: (hold down Control key and click to select several)

     

     

    Compare:

     

     

  1. Select fact or statistic: (* = graphable)

     

     

     

  2. (OPTIONAL) Compare to statistic: (both need to be graphable)

     

     

     

  3. View result as:

     

       
(OR) SEARCH ALL encyclopedia, stats & forums:   

Encyclopedia > Supervised learning

Supervised learning is a machine learning technique for creating a function from training data. The training data consist of pairs of input objects (typically vectors), and desired outputs. The output of the function can be a continuous value (called regression), or can predict a class label of the input object (called classification). The task of the supervised learner is to predict the value of the function for any valid input object after having seen a number of training examples (i.e. pairs of input and target output). To achieve this, the learner has to generalize from the presented data to unseen situations in a "reasonable" way (see inductive bias). (Compare with unsupervised learning.) The parallel task in human and animal psychology is often referred to as concept learning. As a broad subfield of artificial intelligence, machine learning is concerned with the design and development of algorithms and techniques that allow computers to learn. At a general level, there are two types of learning: inductive, and deductive. ... A training set is used in artificial intelligence, together with a supervised training method, and it consists of an input vector and an answer vector. ... In statistics, regression analysis examines the relation of a dependent variable (response variable) to specified independent variables (explanatory variables). ... Statistical classification is a procedure in which individual items are placed into groups based on quantitative information on one or more characteristics inherent in the items (referred to as traits, variables, characters, etc) and based on a training set of previously labeled items. ... Informally speaking, the inductive bias of a machine learning algorithm refers to additional assumptions, that the learner will use to predict correct outputs for situations that have not been encountered so far. ... Unsupervised learning is a method of machine learning where a model is fit to observations. ... Concept learning refers to a learning task in which a human or machine learner is trained to classify objects by being shown a set of example objects along with their class labels. ...

Contents

Overview

Supervised learning can generate models of two types. Most commonly, supervised learning generates a global model that maps input objects to desired outputs. In some cases, however, the map is implemented as a set of local models (such as in case-based reasoning or the nearest neighbor algorithm). Case-based reasoning (CBR), broadly construed, is the process of solving new problems based on the solutions of similar past problems. ... The nearest neighbor algorithm in pattern recognition is a method for classifying phenomena based upon observable features. ...


In order to solve a given problem of supervised learning (e.g. learning to recognize handwriting) one has to consider various steps: Handwriting recognition is the ability of a computer to receive intelligible handwritten input. ...

  1. Determine the type of training examples. Before doing anything else, the engineer should decide what kind of data is to be used as an example. For instance, this might be a single handwritten character, an entire handwritten word, or an entire line of handwriting.
  2. Gathering a training set. The training set needs to be characteristic of the real-world use of the function. Thus, a set of input objects is gathered and corresponding outputs are also gathered, either from human experts or from measurements.
  3. Determine the input feature representation of the learned function. The accuracy of the learned function depends strongly on how the input object is represented. Typically, the input object is transformed into a feature vector, which contains a number of features that are descriptive of the object. The number of features should not be too large, because of the curse of dimensionality; but should be large enough to accurately predict the output.
  4. Determine the structure of the learned function and corresponding learning algorithm. For example, the engineer may choose to use artificial neural networks or decision trees.
  5. Complete the design. The engineer then runs the learning algorithm on the gathered training set. Parameters of the learning algorithm may be adjusted by optimizing performance on a subset (called a validation set) of the training set, or via cross-validation. After parameter adjustment and learning, the performance of the algorithm may be measured on a test set that is separate from the training set.

Another term for supervised learning is classification. A wide range of classifiers are available, each with its strengths and weaknesses. Classifier performance depend greatly on the characteristics of the data to be classified. There is no single classifier that works best on all given problems, this is also referred to as the 'No free lunch theorem'. Various empirical tests have been performed to compare classifier performance and to find the characteristics of data that determine classifier performance. Determining a suitable classifier for a given problem is however still more an art than a science. Curse of dimensionality is a term coined by Richard Bellman applied to the problem caused by the rapid increase in volume associated with adding extra dimensions to a (mathematical) space. ... An artificial neural network (ANN), often just called a neural network (NN), is a mathematical model or computational model based on biological neural networks. ... In operations research, specifically in decision analysis, a decision tree is a decision support tool that uses a graph or model of decisions and their possible consequences, including chance event outcomes, resource costs, and utility. ... In statistics cross-validation is the practice of partitioning a sample of data into subsamples such that analysis is initially performed on a single subsample, while further subsamples are retained blind in order for subsequent use in confirming and validating the initial analysis. ...


The most widely used classifiers are the Neural Network (Multi-layer Perceptron), Support Vector Machines, k-Nearest Neighbors, Gaussian Mixture Model, Gaussian, Naive Bayes, Decision Tree and RBF classifiers.


Empirical risk minimization

The goal of supervised learning of a global model is to find a function g, given a set of points of the form (x, g(x)).


It is assumed that the set of points for which the behavior of g is known is an independent and identically-distributed random variables sample drawn according to an unknown probability distribution p of a larger, possibly infinite, population. Furthermore, one assumes the existence of a task-specific loss function L of type In probability theory, a sequence or other collection of random variables is independent and identically distributed (i. ... A probability distribution describes the values and probabilities that a random event can take place. ... In statistics, decision theory and economics, a loss function is a function that maps an event (technically an element of a sample space) onto a real number representing the economic cost or regret associated with the event. ...

L: Ytimes Y to Bbb{R}^+

where Y is the codomain of g and L maps into the nonnegative real numbers (further restrictions may be placed on L). The quantity L(z, y) is the loss incurred by predicting z as the value of g at a given point when the true value is y. In mathematics, the real numbers may be described informally as numbers that can be given by an infinite decimal representation, such as 2. ...


The risk associated with a function f is then defined as the expectation of the loss function, as follows: In probability theory the expected value (or mathematical expectation) of a random variable is the sum of the probability of each possible outcome of the experiment multiplied by its payoff (value). Thus, it represents the average amount one expects as the outcome of the random trial when identical odds are...

R(f) = sum_i L(f(x_i), g(x_i)) ; p(x_i)

if the probability distribution p is discrete (the analogous continuous case employs a definite integral and a probability density function). This article deals with the concept of an integral in calculus. ... In mathematics, a probability density function (pdf) is a function that represents a probability distribution in terms of integrals. ...


The goal is now to find a function f* among a fixed subclass of functions for which the risk R(f*) is minimal. In mathematics, the term optimization, or mathematical programming, refers to the study of problems in which one seeks to minimize or maximize a real function by systematically choosing the values of real or integer variables from within an allowed set. ...


However, since the behavior of g is generally only known for a finite set of points (x1y1), ..., (xnyn), one can only approximate the true risk, for example with the empirical risk:

tilde{R}_n(f) = frac{1}{n} sum_{i=1}^n L(f(x_i), y_i)

Selecting the function f* that minimizes the empirical risk is known as the principle of empirical risk minimization. Statistical learning theory investigates under what conditions empirical risk minimization is admissible and how good the approximations can be expected to be.


Active Learning

There are situations in which unlabeled data is abundant but labeling data is expensive. In such a scenario the learning algorithm can actively query the user/teacher for labels. This type of supervised learning is called active learning. Since the learner chooses the examples, the number of examples to learn a concept can often be much lower than the number required in normal supervised learning. With this approach there is a risk that the algorithm might focus on unimportant or even invalid examples.


Approaches and algorithms

An artificial neural network (ANN), often just called a neural network (NN), is a mathematical model or computational model based on biological neural networks. ... Backpropagation is a supervised learning technique used for training artificial neural networks. ... Boosting is a machine learning meta-algorithm for performing supervised learning. ... Bayesian inference is statistical inference in which probabilities are interpreted not as frequencies or proportions or the like, but rather as degrees of belief. ... Case-based reasoning (CBR), broadly construed, is the process of solving new problems based on the solutions of similar past problems. ... In operations research, specifically in decision analysis, a decision tree is a decision support tool that uses a graph or model of decisions and their possible consequences, including chance event outcomes, resource costs, and utility. ... Inductive logic programming (ILP) is a machine learning approach, which uses techniques of logic programming. ... Kriging is a regression technique used in geostatistics. ... Minimum message length (MML) is a formal information theory restatement of Occams Razor: even when models are not equal in goodness of fit accuracy to the observed data, the one generating the shortest overall message is more likely to be correct (where the message consists of a statement of... In operations research, specifically in decision analysis, a decision tree is a decision support tool that uses a graph or model of decisions and their possible consequences, including chance event outcomes, resource costs, and utility. ... A naive Bayes classifier is a simple probabilistic classifier based on applying Bayes theorem with strong (naive) independence assumptions. ... The nearest neighbor algorithm in pattern recognition is a method for classifying phenomena based upon observable features. ... Probably approximately correct learning (PAC learning) is a framework of learning that was proposed by Leslie Valiant in his paper A theory of the learnable. ... Support vector machines (SVMs) are a set of related supervised learning methods used for classification and regression. ... A Random Forest classifier is a classifier that is constructed using an algorithm developed by Leo Breiman and Adele Cutler. ...

Applications

Map of the human X chromosome (from the NCBI website). ... Cheminformatics (also known as chemoinformatics and chemical informatics) is the use of computer and informational techniques, applied to a range of problems in the field of chemistry. ... Quantitative structure-activity relationship (QSAR) is the process by which chemical structure is quantitatively correlated with a well defined process, such as biological activity or chemical reactivity. ... Handwriting recognition is the ability of a computer to receive intelligible handwritten input. ... Information retrieval (IR) is the science of searching for information in documents, searching for documents themselves, searching for metadata which describe documents, or searching within databases, whether relational stand-alone databases or hypertextually-networked databases such as the World Wide Web. ... Computer vision is the science and technology of machines that see. ... Optical character recognition, usually abbreviated to OCR, is a type of computer software designed to translate images of handwritten or typewritten text (usually captured by a scanner) into machine-editable text, or to translate pictures of characters into a standard encoding scheme representing them (e. ... A KMail folder full of spam emails collected over a few days. ... Pattern recognition is a field within the area of machine learning. ... Speech recognition (in many contexts also known as automatic speech recognition, computer speech recognition or erroneously as voice recognition) is the process of converting a speech signal to a sequence of words in the form of digital data, by means of an algorithm implemented as a computer program. ...

General issues

In statistics, computational learning theory is a mathematical field related to the analysis of machine learning algorithms. ... Informally speaking, the inductive bias of a machine learning algorithm refers to additional assumptions, that the learner will use to predict correct outputs for situations that have not been encountered so far. ... Noisy (roughly linear) data is fit to both linear and polynomial functions. ... A version space in concept learning or induction is the subset of all hypotheses that are consistent with the observed training examples (Mitchell 1997). ...

References

S. Kotsiantis, Supervised Machine Learning: A Review of Classification Techniques, Informatica Journal 31 (2007) 249-268 (http://www.informatica.si/PDF/31-3/11_Kotsiantis%20-%20Supervised%20Machine%20Learning%20-%20A%20Review%20of...pdf).


External Links

  • Matlab SUrrogate MOdeling Toolbox - SUMO Toolbox - Matlab code for Active Learning + Model Selection + Supervised Learning (Surrogate Modeling)

  Results from FactBites:
 
Reinforcement Learning (2197 words)
Thus while much of the focus of machine learning has been on supervised learning, if we are to understand learning in nature, we need to study unsupervised and reinforcement learning.
That is, before learning, the agent may not know what will happen when it takes a particular action in a particular state, but the only relevant information for deciding what action to take is the current state, which the agent does have access to.
Early in learning, it is better to explore because the knowledge the agent has gained so far is not very reliable and because a number of options may still need to be tried.
Supervised learning - Wikipédia (370 words)
Supervised learning is a machine learning technique for creating a function from training data.
Parameters of the learning algorithm may be adjusted by optimizing performance on a subset (called a validation set) of the training set, or via cross-validation.
After parameter adjustment and learning, the performance of the algorithm may be measured on a test set that is separate from the training set.
  More results at FactBites »


 

COMMENTARY     


Share your thoughts, questions and commentary here
Your name
Your comments
Please enter the 5-letter protection code

Want to know more?
Search encyclopedia, statistics and forums:

 


Lesson Plans | Student Area | Student FAQ | Reviews | Press Releases |  Feeds | Contact
The Wikipedia article included on this page is licensed under the GFDL.
Images may be subject to relevant owners' copyright.
All other elements are (c) copyright NationMaster.com 2003-5. All Rights Reserved.
Usage implies agreement with terms.