FACTOID # 7: Israel enjoys a GDP per capita 21 times that of the Palestinian West Bank and 33 times that of the Gaza Strip. Its military spending per capita tops the world.
 
 Home   Encyclopedia   Statistics   Countries A-Z   Flags   Maps   Education   Forum   FAQ   About 
 
WHAT'S NEW
RECENT ARTICLES
More Recent Articles »
 

FACTS & STATISTICS    Simple view

  1. Select countries to view: (hold down Control key and click to select several)

     

     

    Compare:

     

     

  1. Select fact or statistic: (* = graphable)

     

     

     

  2. (OPTIONAL) Compare to statistic: (both need to be graphable)

     

     

     

  3. View result as:

     

       
(OR) SEARCH ALL encyclopedia, stats & forums:   

Encyclopedia > Binary classification

Binary classification is the task of classifying the members of a given set of objects into two groups on the basis of whether they have some property or not. Some typical binary classification tasks are In mathematics, a set can be thought of as any collection of distinct things considered as a whole. ...

  • medical testing to determine if a patient has certain disease or not (the classification property is the disease)
  • quality control in factories; ie. deciding if a new product is good enough to be sold, or if it should be discarded (the classification property is being good enough)
  • deciding whether a page or an article should be in the result set of a search or not (the classification property is the relevance of the article - typically the presence of a certain word in it)

Classification in general is one of the problems studied in computer science, in order to automatically learn classification systems; some methods suitable for learning binary classifiers include the decision trees, Bayesian networks, support vector machines, and neural networks.   Computer science is the study of the theoretical foundations of information and computation and their implementation and application in computer systems. ... To meet Wikipedias quality standards, this article or section may require cleanup. ... A Bayesian network or Bayesian belief network or just belief network is a form of probabilistic graphical model. ... Support vector machines (SVMs) are a set of related supervised learning methods used for classification and regression. ... Simplified view of an artificial neural network A neural network is an interconnected group of biological neurons. ...


Sometimes, classification tasks are trivial. Given 100 balls, some of then red and some blue, a human with normal color vision can easily separate them into red ones and blue ones. However, some tasks, like those in practical medicine, and those interesting from the computer science point-of-view, are far from trivial, and produce also faulty results.


Hypothesis testing

In traditional statistical hypothesis testing, the tester starts with a null hypothesis and an alternative hypothesis, performs an experiment, and then decides whether to reject the null hypothesis in favour of the alternative. One may be faced with the problem of making a definite decision with respect to an uncertain hypothesis which is known only through its observable consequences. ... In statistics, a null hypothesis is a hypothesis set up to be nullified or refuted in order to support an alternative hypothesis. ... In statistics, the Alternative Hypothesis is the hypothesis proposed to explain a statistically significant difference between results, that is if the Null Hypothesis has been rejected. ...


A positive or statistically significant result is one which rejects the null hypothesis. Doing this when the null hypothesis is in fact true - a false negative - is a Type I error; doing this when the null hypothesis is false is a true positive. In statistics, a result is significant if it is unlikely to have occurred by chance, given that a presumed null hypothesis is true, but is not improbable if the null hypothesis is false. ... In statistical hypothesis testing, a Type I error consists of rejecting a null hypothesis that is true, in other words finding a result to have statistical significance when this has in fact happened by chance. ...


A negative or not statistically significant result is one which does not reject the null hypothesis. Doing this when the null hypothesis is in fact false - a false negative - is a Type II error; doing this when the null hypothesis is true is a true negative. In statistical hypothesis testing, a Type II error consists of failing to reject an invalid null hypothesis (i. ...


Evaluation of binary classifiers

To measure the performance of a medical test, the concepts sensitivity and specificity are often used; these concepts are readily usable for the evaluation of any binary classifier. Say we test some people for the presence of a disease. Some of these people have the disease, and our test says they are positive. They are called true positives. Some have the disease, but the test claims they don't. They are called false negatives. Some don't have the disease, and the test says they don't - true negatives. Finally, we might have healthy people who have a positive test result false positives. Thus, the number of true positives, false negatives, true negatives, and false positives add up to 100% of the set. The sensitivity of a binary classification test or algorithm, such as a blood test to determine if a person has a certain disease, or an automated system to detect faulty products in a factory, is a parameter that expresses something about the tests performance. ... In binary testing, e. ...


Sensitivity is the proportion of people that tested positive of all the positive people tested; that is (true positives) / (true positives + false negatives). It can be seen as the probability that the test is positive given that the patient is sick. The higher the sensitivity, the less real cases of diseases go undetected (or, in the case of the factory quality control, the less faulty products go to the market).


Specificity is the proportion of people that tested negative of all the negative people tested; that is (true negatives) / (true negatives + false positives). As with sensitivity, it can be looked at as the probability that the test is negative given that the patient is not sick. The higher the specificity, the less healthy people are labeled as sick (or, in the factory case, the less money the factory loses by discarding good products instead of selling them).


In theory, sensitivity and specificity are independent in the sense that it is possible to achieve 100 % in both (for instance, the human classifying the red and blue balls most likely does). In practice, there often is a tradeoff, and you can't achieve both. [Explanation why should go here.]


In addition to sensitivity and specificity, the performance of a binary classification test can be measured with positive and negative predictive values. These are possibly more intuitively clear: the positive prediction value answers the question "how likely it is that I really have the disease, given that my test result was positive?". It is calculated as (true positives) / (true positives + false positives); that is, it is the proportion of true positives out of all positive results. (The negative prediction value is the same, but for negatives, naturally.) The positive predictive value is the proportion of patients with positive test results who are correctly diagnosed. ... Binary classification is the task of classifying the members of a given set of objects into two groups on the basis of whether they have some property or not. ...


One should note, though, one important difference between the two concepts. That is, sensitivity and specificity are independent from the population in the sense that they don't change depending on what the proportion of positives and negatives tested are. Indeed, you can determine the sensitivity of the test by testing only positive cases. However, the prediction values are dependent on the population.


As an example, say that you have a test for a disease with 99 % sensitivity and 99 % specificity. Say you test 2000 people, and 1000 of them are sick and 1000 of them are healthy. You are likely to get about 990 true positives, 990 true negatives, and 10 of false positives and negatives each. The positive and negative prediction values would be 99 %, so the people can be quite confident about the result.


Say, however, that of the 2000 people only 100 are really sick. Now you are likely to get 99 true positives, 1 false negative, 1881 true negatives and 19 false positives. Of the 19+99 people tested positive, only 99 really have the disease - that means, intuitively, that given that your test result is positive, there's only 84 % chance that you really have the disease. On the other hand, given that your test result is negative, you can really be reassured: there's only 1 chance in 1881, or 0.05% probability, that you have the disease despite of your test result.


The receiver operating characteristic is a graphical way of visualizing the performance of binary classifiers. In signal detection theory, a receiver operating characteristic (ROC), also receiver operating curve, is a graphical plot of the sensitivity vs. ...


See also


  Results from FactBites:
 
Binary Star - MSN Encarta (1296 words)
Binary Star, two stars that are bound to each other by gravity and orbit about a common center of mass.
Some close binary pairs—for example the W Ursa Majoris system—are composed of otherwise ordinary stars that are so close that their mutual gravitational pull distorts the shape of one or both stars from a spherical shape into an egg shape.
In an interacting binary system consisting of a normal star and a collapsed star—for example a white dwarf, a neutron star, or a fl hole—the gravitational field at the surface of the collapsed star is so strong that the flow of material onto its surface causes bursts of highly energetic X rays.
Binary classification - Wikipedia, the free encyclopedia (1006 words)
Binary classification is the task of classifying the members of a given set of objects into two groups on the basis of whether they have some property or not.
Classification in general is one of the problems studied in computer science, in order to automatically learn classification systems; some methods suitable for learning binary classifiers include the decision trees, Bayesian networks, support vector machines, and neural networks.
In addition to sensitivity and specificity, the performance of a binary classification test can be measured with positive and negative predictive values.
  More results at FactBites »


 

COMMENTARY     


Share your thoughts, questions and commentary here
Your name
Your comments
Please enter the 5-letter protection code

Want to know more?
Search encyclopedia, statistics and forums:

 


Lesson Plans | Student Area | Student FAQ | Reviews | Press Releases |  Feeds | Contact
The Wikipedia article included on this page is licensed under the GFDL.
Images may be subject to relevant owners' copyright.
All other elements are (c) copyright NationMaster.com 2003-5. All Rights Reserved.
Usage implies agreement with terms.