|
Statistical classification is a procedure in which individual items are placed into groups based on quantitative information on one or more characteristics inherent in the items (referred to as traits, variables, characters, etc) and based on a training set of previously labeled items. A training set is used in artificial intelligence, together with a supervised training method, and it consists of an input vector and an answer vector. ...
Formally, the problem can be stated as follows: given training data produce a classifier which maps an object to its classification label . For example, if the problem is filtering spam, then is some representation of an email and y is either "Spam" or "Non-Spam". Statistical classification algorithms are typically used in pattern recognition systems. Pattern recognition is a field within the area of machine learning. ...
Note: in community ecology, the term "classification" is synonymous with what is commonly known (in machine learning) as clustering. See that article for more information about purely unsupervised techniques. Community ecology is the study of the distribution, abundance, demography, and interactions between coexisting populations (not precisely synonymous with population ecology). ...
As a broad subfield of artificial intelligence, machine learning is concerned with the design and development of algorithms and techniques that allow computers to learn. At a general level, there are two types of learning: inductive, and deductive. ...
Clustering is the classification of objects into different groups, or more precisely, the partitioning of a data set into subsets (clusters), so that the data in each subset (ideally) share some common trait - often proximity according to some defined distance measure. ...
Unsupervised learning is a method of machine learning where a model is fit to observations. ...
Statistical classification techniques While there are many methods for classification, they all attempt to solve one of the following mathematical problems - The first is to find a map of a feature space (which is typically a multi-dimensional vector space) to a set of labels. This is equivalent to partitioning the feature space into regions, then assigning a label to each region. Such algorithms (e.g., the nearest neighbour algorithm) typically do not yield confidence or class probabilities, unless post-processing is applied. Another set of algorithms to solve this problem first apply unsupervised clustering to the feature space, then attempt to label each of the clusters or regions.
- The second problem is to consider classification as an estimation problem, where the goal is to estimate a function of the form
 where the feature vector input is , and the function f is typically parameterized by some parameters . In the Bayesian approach to this problem, instead of choosing a single parameter vector , the result is integrated over all possible thetas, with the thetas weighted by how likely they are given the training data D: In mathematics, a vector space (or linear space) is a collection of objects (called vectors) that, informally speaking, may be scaled and added. ...
The nearest neighbor algorithm in pattern recognition is a method for classifying phenomena based upon observable features. ...
The posterior probability can be calculated by Bayes theorem from the prior probability and the likelihood function. ...
Unsupervised learning is a method of machine learning where a model is fit to observations. ...
Clustering can refer to Computer clustering - (in Computer science) the connection of many low-cost computers using special hardware and software such that they can be used as one larger computer. ...
Estimation is the calculated approximation of a result which is usable even if input data may be incomplete, uncertain, or noisy. ...
Bayesian inference is statistical inference in which probabilities are interpreted not as frequencies or proportions or the like, but rather as degrees of belief. ...
 Examples of classification algorithms include: This article defines some terms which characterize probability distributions of two or more variables. ...
Bayes theorem is a result in probability theory, which gives the conditional probability distribution of a random variable A given B in terms of the conditional probability distribution of variable B given A and the marginal probability distribution of A alone. ...
An intriguing problem in pattern recognition yet to be solved is the relationship between the problem to be solved (data to be classified) and the performance of various pattern recognition algorithms (classifiers). Van der Walt and Barnard (see reference section) investigated very specific artificial data sets to determine conditions under which certain classifiers perform better and worse than others. The introduction to this article provides insufficient context for those unfamiliar with the subject matter. ...
Linear discriminant analysis (LDA) and the related Fishers linear discriminant are methods used in statistics and machine learning to find the linear combination of features which best separate two or more classes of objects or events. ...
Logistic regression is a statistical regression model for Bernoulli-distributed dependent variables. ...
A naive Bayes classifier is a simple probabilistic classifier based on applying Bayes theorem with strong (naive) independence assumptions. ...
The perceptron is a type of artificial neural network invented in 1957 at the Cornell Aeronautical Laboratory by Frank Rosenblatt. ...
A quadratic classifier is used in machine learning to separate measurements of two or more classes of objects or events by a quadric surface. ...
The nearest neighbor algorithm in pattern recognition is a method for classifying phenomena based upon observable features. ...
Boosting is a machine learning meta-algorithm for performing supervised learning. ...
In operations research, specifically in decision analysis, a decision tree is a decision support tool that uses a graph or model of decisions and their possible consequences, including chance event outcomes, resource costs, and utility. ...
A neural network is an interconnected group of neurons. ...
A Bayesian network (or a belief network) is a probabilistic graphical model that represents a set of variables and their probabilistic independencies. ...
Support vector machines (SVMs) are a set of related supervised learning methods used for classification and regression. ...
State transitions in a hidden Markov model (example) x â hidden states y â observable outputs a â transition probabilities b â output probabilities A hidden Markov model (HMM) is a statistical model in which the system being modeled is assumed to be a Markov process with unknown parameters, and the challenge is to...
Classifier performance depends greatly on the characteristics of the data to be classified. There is no single classifier that works best on all given problems (a phenomenon that may be explained by the No-free-lunch theorem). Various empirical tests have been performed to compare classifier performance and to find the characteristics of data that determine classifier performance. Determining a suitable classifier for a given problem is however still more an art than a science. Many computational problems are solved by searching for good solutions in a space of candidates. ...
The most widely used classifiers are the Neural Network (Multi-layer Perception), Support Vector Machines, k-Nearest Neighbours, Gaussian Mixture Model, Gaussian, Naive Bayes, Decision Tree and RBF classifiers.
Evaluation The measures Precision and Recall are popular metrics used to evaluate the quality of a classification system. Precision and Recall are two widely used measures for evaluating the quality of results in domains such as Information Retrieval and statistical classification. ...
Application domains Computer vision is the science and technology of machines that see. ...
Medical imaging designates the ensemble of techniques and processes used to create images of the human body (or parts thereof) for clinical purposes (medical procedures seeking to reveal, diagnose or examine disease) or medical science (including the study of normal anatomy and function). ...
Optical character recognition, usually abbreviated to OCR, is a type of computer software designed to translate images of handwritten or typewritten text (usually captured by a scanner) into machine-editable text, or to translate pictures of characters into a standard encoding scheme representing them (e. ...
This article or section is in need of attention from an expert on the subject. ...
Speech recognition (in many contexts also known as automatic speech recognition, computer speech recognition or erroneously as Voice Recognition) is the process of converting a speech signal to a sequence of words, by means of an algorithm implemented as a computer program. ...
Handwriting recognition is the ability of a computer to receive intelligible handwritten input. ...
At Disney World, biometric measurements are taken of the fingers of multi-day pass users to ensure that the pass is used by the same person from day to day. ...
Natural language processing (NLP) is a subfield of artificial intelligence and linguistics. ...
Document classification/categorization is a problem in information science. ...
The success of the Google search engine was mainly due to its powerful PageRank algorithm and its simple, easy-to-use interface. ...
A credit score is a number that represents an estimate of an individuals financial creditworthiness as calculated by a statistical model. ...
References - C.M. van der Walt and E. Barnard,“Data characteristics that determine classifier performance”, in Proceedings of the Sixteenth Annual Symposium of the Pattern Recognition Association of South Africa, pp.160-165, 2006.
External links See also |