|
In probability theory, Bayes' theorem (often called Bayes' Law) relates the conditional and marginal probabilities of two random events. It is often used to compute posterior probabilities given observations. For example, a patient may be observed to have certain symptoms. Bayes' theorem can be used to compute the probability that a proposed diagnosis is correct, given that observation. (see example 2) Probability theory is the branch of mathematics concerned with analysis of random phenomena. ...
This article defines some terms which characterize probability distributions of two or more variables. ...
In probability theory, an event is a set of outcomes (a subset of the sample space) to which a probability is assigned. ...
The posterior probability of a random event or an uncertain proposition is the conditional probability it is assigned when the relevant evidence is taken into account. ...
In probability theory, Bayes theorem (often called Bayes Law) relates the conditional and marginal probabilities of two random events. ...
As a formal theorem, Bayes' theorem is valid in all interpretations of probability. However, it plays a central role in the debate around the foundations of statistics: frequentist and Bayesian interpretations disagree about the ways in which probabilities should be assigned in applications. Frequentists assign probabilities to random events according to their frequencies of occurrence or to subsets of populations as proportions of the whole, while Bayesians describe probabilities in terms of beliefs and degrees of uncertainty. The articles on Bayesian probability and frequentist probability discuss these debates at greater length. Look up theorem in Wiktionary, the free dictionary. ...
The word probability has been used in a variety of ways since it was first coined in relation to games of chance. ...
Statistical regularity has motivated the development of the relative frequency concept of probability. ...
Bayesian probability is an interpretation of probability suggested by Bayesian theory, which holds that the concept of probability can be defined as the degree to which a person believes a proposition. ...
Bayesian probability is an interpretation of probability suggested by Bayesian theory, which holds that the concept of probability can be defined as the degree to which a person believes a proposition. ...
Statistical regularity has motivated the development of the relative frequency concept of probability. ...
Statement of Bayes' theorem
Bayes' theorem relates the conditional and marginal probabilities of events A and B, where B has a non-vanishing probability:  Each term in Bayes' theorem has a conventional name: Intuitively, Bayes' theorem in this form describes the way in which one's beliefs about observing 'A' are updated by having observed 'B'. A prior probability is a marginal probability, interpreted as a description of what is known about a variable in the absence of some evidence. ...
This article defines some terms which characterize probability distributions of two or more variables. ...
This article defines some terms which characterize probability distributions of two or more variables. ...
The posterior probability can be calculated by Bayes theorem from the prior probability and the likelihood function. ...
The concept of a normalizing constant arises in probability theory and a variety of other areas of mathematics. ...
Bayes' theorem in terms of likelihood Bayes' theorem can also be interpreted in terms of likelihood: Look up likelihood in Wiktionary, the free dictionary. ...
 Here L(A|b) is the likelihood of A given fixed b. The rule is then an immediate consequence of the relationship . With this terminology, the theorem may be paraphrased as  (where α is a normalising constant). In words: the posterior probability is proportional to the product of the prior probability and the likelihood.
Derivation from conditional probabilities To derive the theorem, we start from the definition of conditional probability. The probability of event A given event B is This article defines some terms which characterize probability distributions of two or more variables. ...
 Equivalently, the probability of event B given event A is  Rearranging and combining these two equations, we find  This lemma is sometimes called the product rule for probabilities. Dividing both sides by P(B), providing that it is non-zero, we obtain Bayes' theorem: In mathematics, a lemma is a proven proposition which is used as a stepping stone to a larger result rather than an independent statement, in and of itself. ...
 Alternative forms of Bayes' theorem Bayes' theorem is often embellished by noting that  where AC is the complementary event of A (often called "not A"). So the theorem can be restated as In set theory and other branches of mathematics, two kinds of complements are defined, the relative complement and the absolute complement. ...
 More generally, where {Ai} forms a partition of the event space, A partition of U into 6 blocks: an Euler diagram representation. ...
 for any Ai in the partition. See also the law of total probability. Nomenclature in probability theory is not wholly standard. ...
Bayes' theorem in terms of odds and likelihood ratio Bayes' theorem can also be written neatly in terms of a likelihood ratio Λ and odds O as Look up likelihood in Wiktionary, the free dictionary. ...
In probability theory and statistics the odds in favor of an event or a proposition are the quantity p / (1 â p), where p is the probability of the event or proposition. ...
 where are the odds of A given B, and are the odds of A by itself, while is the likelihood ratio.
Bayes' theorem for probability densities There is also a version of Bayes' theorem for continuous distributions. It is somewhat harder to derive, since probability densities, strictly speaking, are not probabilities, so Bayes' theorem has to be established by a limit process; see Papoulis (citation below), Section 7.3 for an elementary derivation. Bayes's theorem for probability densities is formally similar to the theorem for probabilities: In mathematics, a probability distribution assigns to every interval of the real numbers a probability, so that the probability axioms are satisfied. ...
In mathematics, a probability density function (pdf) is a function that represents a probability distribution in terms of integrals. ...
 and there is an analogous statement of the law of total probability:  As in the discrete case, the terms have standard names. f(x, y) is the joint distribution of X and Y, f(x|y) is the posterior distribution of X given Y=y, f(y|x) = L(x|y) is (as a function of x) the likelihood function of X given Y=y, and f(x) and f(y) are the marginal distributions of X and Y respectively, with f(x) being the prior distribution of X. Here we have indulged in a conventional abuse of notation, using f for each one of these terms, although each one is really a different function; the functions are distinguished by the names of their arguments. In mathematics, abuse of notation occurs when an author uses a mathematical notation in a way that is not formally correct but that seems likely to simplify the exposition (while being unlikely to introduce errors or cause confusion). ...
Abstract Bayes' theorem Given two absolutely continuous probability measures P˜Q on the probability space and a sigma-algebra , the abstract Bayes theorem for a -measurable random variable X becomes Absolute continuity of real functions In mathematics, a real_valued function f of a real variable is absolutely continuous if for every positive number ε, no matter how small, there is a positive number δ small enough so that whenever a sequence of pairwise disjoint intervals [xk, yk], k = 1, ..., n...
In mathematics, the definition of the probability space is the foundation of probability theory. ...
. This formulation is used in Kalman filtering to find Zakai equations. It is also used in financial mathematics for change of numeraire techniques. The Kalman filter (named after its inventor, Rudolf Kalman) is an efficient recursive computational solution for tracking a time-dependent state vector with noisy equations of motion in real time by the least-squares method. ...
The Zakai equation is a linear recursive filtering equation for the un-normalized density of a hidden state. ...
Mathematical finance is the branch of applied mathematics concerned with the financial markets. ...
Numéraire is one of the functions of money: to measure the worth of different goods and services relative to one another. ...
Extensions of Bayes' theorem Theorems analogous to Bayes' theorem hold in problems with more than two variables. For example:  This can be derived in a few steps from Bayes' theorem and the definition of conditional probability:  Similarly, we have  which can be regarded as a conditional Bayes' Theorem, and can be derived by as follows:  A general strategy is to work with a decomposition of the joint probability, and to marginalize (integrate) over the variables that are not of interest. Depending on the form of the decomposition, it may be possible to prove that some integrals must be 1, and thus they fall out of the decomposition; exploiting this property can reduce the computations very substantially. A Bayesian network, for example, specifies a factorization of a joint distribution of several variables in which the conditional probability of any one variable given the remaining ones takes a particularly simple form (see Markov blanket). This article defines some terms which characterize probability distributions of two or more variables. ...
In probability theory, given two jointly distributed random variables X and Y, the marginal distribution of X is simply the probability distribution of X ignoring information about Y, typically calculated by summing or integrating the joint probability distribution over Y. For discrete random variables, the marginal probability mass function can...
A Bayesian network (or a belief network) is a probabilistic graphical model that represents a set of variables and their probabilistic independencies. ...
Given two random variables X and Y, the joint probability distribution of X and Y is the probability distribution of X and Y together. ...
In machine learning, the Markov blanket for a node in a Bayes net is the set of nodes composed by the s parents, its children, and its childrens parents. ...
Examples Example #1: Conditional probabilities Suppose there are two bowls full of cookies. Bowl #1 has 10 chocolate chip cookies and 30 plain cookies, while bowl #2 has 20 of each. Fred picks a bowl at random, and then picks a cookie at random. We may assume there is no reason to believe Fred treats one bowl differently from another, likewise for the cookies. The cookie turns out to be a plain one. How probable is it that Fred picked it out of bowl #1? Intuitively, this should be greater than half since bowl #1 contains the same number of cookies as bowl #2, yet it has more plain. We can clarify the situation by rephrasing the question to "what’s the probability that Fred picked bowl #1, given that he has a plain cookie?” The event A is that Fred picked bowl #1, and the event B is that Fred picked a plain cookie. To compute P(A|B), we first need to know: - P(A), or the probability that Fred picked bowl #1 regardless of any other information. Since Fred is treating both bowls equally, it is 0.5.
- P(B), or the probability of getting a plain cookie regardless of any other information. Since there are 80 total cookies, and 50 of them are plain, the probability of selecting a plain cookie is 50/80 = 0.625.
- P(B|A), or the probability of getting a plain cookie given Fred picked bowl #1. Since there are 40 cookies in bowl #1 and 30 of them are plain, the probability is 30/40 = 0.75.
Given all this information, we can compute the probability of Fred having selected bowl #1 given that he got a plain cookie by substitution:  As we expected, it is more than half.
Tables of occurrences and relative frequencies It is often helpful when calculating conditional probabilities to create a simple table containing the number of occurrences of each outcome, or the relative frequencies of each outcome, for each of the independent variables. The tables below illustrate the use of this method for the cookies. In a series of observations, or trials, the relative frequency of occurrence of an event is calculated as: The of an event over a long series of trials is the conceptual foundation of the frequency interpretation of probability. ...
Number of cookies in each bowl by type of cookie | | Relative frequency of cookies in each bowl by type of cookie | | Bowl #1 | Bowl #2 | Totals | | Chocolate Chip | 10 | 20 | 30 | | Plain | 30 | 20 | 50 | | Total | 40 | 40 | 80 | | | | Bowl #1 | Bowl #2 | Totals | | Chocolate Chip | 0.125 | 0.250 | 0.375 | | Plain | 0.375 | 0.250 | 0.625 | | Total | 0.500 | 0.500 | 1.000 | | The table on the right is derived from the table on the left by dividing each entry by the total number of cookies under consideration, i.e. dividing each number by 80.
Example #2: Drug testing Bayes' theorem is useful in evaluating the result of drug tests. Suppose a certain drug test is 99% sensitive and 99% specific, that is, the test will correctly identify a drug user as testing positive 99% of the time, and will correctly identify a non-user as testing negative 99% of the time. This would seem to be a relatively accurate test, but Bayes' theorem will reveal a potential flaw. Let's assume a corporation decides to test its employees for opium use, and 0.5% of the employees use the drug. We want to know the probability that, given a positive drug test, an employee is actually a drug user. Let "D" be the event of being a drug user and "N" indicate being a non-user. Let "+" be the event of a positive drug test. We need to know the following: For the episode of the American television series The Office, see Drug Testing. A drug test is commonly a technical examination of urine, semen, blood, sweat, or oral fluid samples to determine the presence or absence of specified drugs or their metabolized traces. ...
The sensitivity of a binary classification test or algorithm, such as a blood test to determine if a person has a certain disease, or an automated system to detect faulty products in a factory, is a parameter that expresses something about the tests performance. ...
The specificity is a statistical measure of how well a binary classification test correctly identifies the negative cases, or those cases that do not meet the condition under study. ...
This article is about the drug. ...
Probability is the likelihood that something is the case or will happen. ...
- P(D), or the probability that the employee is a drug user, regardless of any other information. This is 0.005, since 0.5% of the employees are drug users. This is the prior probability of D.
- P(N), or the probability that the employee is not a drug user. This is 1 − P(D), or 0.995.
- P(+|D), or the probability that the test is positive, given that the employee is a drug user. This is 0.99, since the test is 99% accurate.
- P(+|N), or the probability that the test is positive, given that the employee is not a drug user. This is 0.01, since the test will produce a false positive for 1% of non-users.
- P(+), or the probability of a positive test event, regardless of other information. This is 0.0149 or 1.49%, which is found by adding the probability that the test will produce a true positive result in the event of drug use (= 99% x 0.5% = 0.495%) plus the probability that the test will produce a false positive in the event of non-drug use (= 1% x 99.5% = 0.995%). This is the prior probability of +.
Given this information, we can compute the posterior probability P(D|+) of an employee who tested positive actually being a drug user: A false positive, also called false alarm, exists when a test reports, incorrectly, that it has found a signal where none exists in reality. ...
 Despite the high accuracy of the test, the probability that an employee who tested positive actually did use drugs is only about 33%, so it is actually more likely that the employee is not a drug user. The rarer the condition for which we are testing, the greater the percentage of positive tests that will be false positives.
Example #3: Bayesian inference Applications of Bayes' theorem often assume the philosophy underlying Bayesian probability that uncertainty and degrees of belief can be measured as probabilities. One such example follows. For additional worked out examples, including simpler examples, please see the article on the examples of Bayesian inference. Bayesian probability is an interpretation of probability suggested by Bayesian theory, which holds that the concept of probability can be defined as the degree to which a person believes a proposition. ...
Bayesian inference is statistical inference in which evidence or observations are used to update or to newly infer the probability that a hypothesis may be true. ...
We describe the marginal probability distribution of a variable A as the prior probability distribution or simply the prior. The conditional distribution of A given the "data" B is the posterior probability distribution or just the posterior. A prior probability is a marginal probability, interpreted as a description of what is known about a variable in the absence of some evidence. ...
In Bayesian probability theory, the posterior probability is the conditional probability of some event or proposition, taking empirical data into account. ...
Suppose we wish to know about the proportion r of voters in a large population who will vote "yes" in a referendum. Let n be the number of voters in a random sample (chosen with replacement, so that we have statistical independence) and let m be the number of voters in that random sample who will vote "yes". Suppose that we observe n = 10 voters and m = 7 say they will vote yes. From Bayes' theorem we can calculate the probability distribution function for r using In probability theory, to say that two events are independent intuitively means that the occurrence of one event makes it neither more nor less probable that the other occurs. ...
 From this we see that from the prior probability density function f(r) and the likelihood function L(r) = f(m = 7|r, n = 10), we can compute the posterior probability density function f(r|n = 10, m = 7). The prior probability density function f(r) summarizes what we know about the distribution of r in the absence of any observation. We provisionally assume in this case that the prior distribution of r is uniform over the interval [0, 1]. That is, f(r) = 1. If some additional background information is found, we should modify the prior accordingly. However before we have any observations, all outcomes are equally likely. Under the assumption of random sampling, choosing voters is just like choosing balls from an urn. The likelihood function L(r) = P(m = 7|r, n = 10,) for such a problem is just the probability of 7 successes in 10 trials for a binomial distribution. In probability theory and statistics, the binomial distribution is the discrete probability distribution of the number of successes in a sequence of n independent yes/no experiments, each of which yields success with probability p. ...
 As with the prior, the likelihood is open to revision -- more complex assumptions will yield more complex likelihood functions. Maintaining the current assumptions, we compute the normalizing factor,  and the posterior distribution for r is then  for r between 0 and 1, inclusive. One may be interested in the probability that more than half the voters will vote "yes". The prior probability that more than half the voters will vote "yes" is 1/2, by the symmetry of the uniform distribution. In comparison, the posterior probability that more than half the voters will vote "yes", i.e., the conditional probability given the outcome of the opinion poll – that seven of the 10 voters questioned will vote "yes" – is In mathematics, the uniform distributions are simple probability distributions. ...
 which is about an "89% chance".
We are presented with three doors - red, green, and blue - one of which has a prize. We choose the red door, which is not opened until the presenter performs an action. The presenter who knows what door the prize is behind, and who must open a door, but is not permitted to open the door we have picked or the door with the prize, opens the blue door and reveals that there is no prize behind it and subsequently asks if we wish to change our mind about our initial selection of red. What is the probability that the prize is behind each of the green and red doors? In search of a new car, the player picks door 1. ...
Let us call the situation that the prize is behind a given door Ar, Ag, and Ab. To start with, , and to make things simpler we shall assume that we have already picked the red door. Let us call B "the presenter opens the blue door". Without any prior knowledge, we would assign this a probability of 50%. - In the situation where the prize is behind the red door, the host is free to pick between the green or the blue door at random. Thus, P(B | Ar) = 1 / 2
- In the situation where the prize is behind the green door, the host must pick the blue door. Thus, P(B | Ag) = 1
- In the situation where the prize is behind the blue door, the host must pick the green door. Thus, P(B | Ab) = 0
Thus,
 Note how this depends on the value of P(B).
Historical remarks An investigation by a statistics professor (Stigler 1983) suggests that Bayes' theorem was discovered by Nicholas Saunderson some time before Bayes. Nicholas Saunderson (1682âApril 19, 1739) was an English scientist and mathematician. ...
Bayes' theorem is named after the Reverend Thomas Bayes (1702–1761), who studied how to compute a distribution for the parameter of a binomial distribution (to use modern terminology). His friend, Richard Price, edited and presented the work in 1763, after Bayes' death, as An Essay towards solving a Problem in the Doctrine of Chances. Pierre-Simon Laplace replicated and extended these results in an essay of 1774, apparently unaware of Bayes' work. Thomas Bayes (c. ...
Events March 8 - William III died; Princess Anne Stuart becomes Queen Anne of England, Scotland and Ireland. ...
1761 was a common year starting on Thursday (see link for calendar). ...
In probability theory and statistics, the binomial distribution is the discrete probability distribution of the number of successes in a sequence of n independent yes/no experiments, each of which yields success with probability p. ...
Richard Price (February 23, 1723 â April 19, 1791), was a Welsh moral and political philosopher. ...
1763 was a common year starting on Saturday (see link for calendar). ...
Pierre-Simon, marquis de Laplace (March 23, 1749 - March 5, 1827) was a French mathematician and astronomer whose work was pivotal to the development of mathematical astronomy. ...
Chesma Column in Tsarskoe Selo, commemorating the end of the Russo-Turkish War. ...
One of Bayes' results (Proposition 5) gives a simple description of conditional probability, and shows that it can be expressed independently of the order in which things occur: This article defines some terms which characterize probability distributions of two or more variables. ...
- If there be two subsequent events, the probability of the second b/N and the probability of both together P/N, and it being first discovered that the second event has also happened, from hence I guess that the first event has also happened, the probability I am right [i.e., the conditional probability of the first event being true given that the second has also happened] is P/b.
Note that the expression says nothing about the order in which the events occurred; it measures correlation, not causation. His preliminary results, in particular Propositions 3, 4, and 5, imply the result now called Bayes' Theorem (as described above), but it does not appear that Bayes himself emphasized or focused on that result. Bayes' main result (Proposition 9 in the essay) is the following: assuming a uniform distribution for the prior distribution of the binomial parameter p, the probability that p is between two values a and b is In mathematics, the uniform distributions are simple probability distributions. ...
A prior probability is a marginal probability, interpreted as a description of what is known about a variable in the absence of some evidence. ...
In elementary algebra, a binomial is a polynomial with two terms: the sum of two monomials. ...
 where m is the number of observed successes and n the number of observed failures. What "Bayesian" about Proposition 9 is that Bayes presented it as a probability for the parameter p. So, one can compute probability for an experimental outcome, but also for the parameter which governs it, and the same algebra is used to make inferences of either kind. Bayes states his question in a way that might make the idea of assigning a probability distribution to a parameter palatable to a frequentist. He supposes that a billiard ball is thrown at random onto a billiard table, and that the probabilities p and q are the probabilities that subsequent billiard balls will fall above or below the first ball. Stephen Fienberg [[1]] describes the evolution of the field from "inverse probability" at the time of Bayes and Laplace, and even of Harold Jeffreys (1939) to "Bayesian" in the 1950's. The irony is that this label was introduced by R.A. Fisher in a derogatory sense. So, historically, Bayes was not a "Bayesian". It is actually unclear whether or not he was a Bayesian in the modern sense of the term, i.e. whether or not he was interested in inference or merely in probability: the 1763 essay is more of a probability paper. Stephen Fienberg is Maurice Falk Professor of Statistics at Carnegie Mellon University. ...
Sir Harold Jeffreys (22 April 1891 â 18 March 1989) was a mathematician, statistician, geophysicist, and astronomer. ...
Sir Ronald Fisher Sir Ronald Aylmer Fisher, FRS (February 17, 1890–July 29, 1962) was an extraordinarily talented evolutionary biologist, geneticist and statistician. ...
See also Bayesian inference is statistical inference in which evidence or observations are used to update or to newly infer the probability that a hypothesis may be true. ...
A Bayesian network (or a belief network) is a probabilistic graphical model that represents a set of variables and their probabilistic independencies. ...
Bayesian spam filtering (pronounced Bays-ee-en, IPA pronunciation: , after Rev. ...
Thomas Bayes (c. ...
Bogofilter is a mail filter that classifies e-mail as spam or ham (non-spam) by a statistical analysis of the messages header and content (body). ...
In Bayesian probability theory, a class of prior probability distributions p(θ) is said to be conjugate to a class of likelihood functions p(x|θ) if the resulting posterior distributions p(θ|x) are in the same family as p(θ). For example, the Gaussian family is conjugate to itself (or self-conjugate...
In statistics, empirical Bayes methods involve: An underlying probability distribution of some unobservable quantity assigned to each member of a statistical population. ...
In search of a new car, the player picks door 1. ...
For the House television show episode called Occams Razor, see Occams Razor (House episode) Occams razor (sometimes spelled Ockhams razor) is a principle attributed to the 14th-century English logician and Franciscan friar William of Ockham. ...
To meet Wikipedias quality standards, this article or section may require cleanup. ...
A black raven Non-black non-ravens The Raven paradox, also known as Hempels paradox or Hempels ravens is a paradox proposed by the German logician Carl Gustav Hempel in the 1940s to illustrate a problem where inductive logic violates intuition. ...
It has been suggested that this article or section be merged with Sequential_bayesian_filtering. ...
Researchers who use personal probability can proceed as follows: A statistical model for the data generating process is assumed. ...
It has been suggested that this article or section be merged with Recursive_Bayesian_estimation. ...
Borels paradox (sometimes known as the Borel-Kolmogorov paradox) is a paradox of probability theory relating to conditional probability density functions. ...
A naive Bayes classifier is a simple probabilistic classifier based on applying Bayes theorem with strong (naive) independence assumptions. ...
References Versions of the essay - Thomas Bayes (1763), "An Essay towards solving a Problem in the Doctrine of Chances. By the late Rev. Mr. Bayes, F. R. S. communicated by Mr. Price, in a letter to John Canton, A. M. F. R. S.", Philosophical Transactions, Giving Some Account of the Present Undertakings, Studies and Labours of the Ingenious in Many Considerable Parts of the World 53:370–418.
- Thomas Bayes (1763/1958) "Studies in the History of Probability and Statistics: IX. Thomas Bayes' Essay Towards Solving a Problem in the Doctrine of Chances", Biometrika 45:296–315. (Bayes' essay in modernized notation)
- Thomas Bayes "An essay towards solving a Problem in the Doctrine of Chances". (Bayes' essay in the original notation)
The Philosophical Transactions of the Royal Society, or , is the oldest scientific journal printed in the English-speaking world, and was only three months shy of being the oldest in the world. ...
Biometrika is a scientific journal established in 1901 by Francis Galton, Karl Pearson and W. F. R. Weldon to promote the study of biometrics, the statistical analysis of hereditary phenomena. ...
Commentaries - G. A. Barnard (1958) "Studies in the History of Probability and Statistics: IX. Thomas Bayes' Essay Towards Solving a Problem in the Doctrine of Chances", Biometrika 45:293–295. (biographical remarks)
- Daniel Covarrubias. "An Essay Towards Solving a Problem in the Doctrine of Chances". (an outline and exposition of Bayes' essay)
- Stephen M. Stigler (1982). "Thomas Bayes' Bayesian Inference," Journal of the Royal Statistical Society, Series A, 145:250–258. (Stigler argues for a revised interpretation of the essay; recommended)
- Isaac Todhunter (1865). A History of the Mathematical Theory of Probability from the time of Pascal to that of Laplace, Macmillan. Reprinted 1949, 1956 by Chelsea and 2001 by Thoemmes.
George Alfred Barnard (September 23, 1915 - August 9, 2002) British statistician known particularly for his work on the foundations of statistics and on quality control. ...
Isaac Todhunter (November 23, 1820 â March 1, 1884), was an English mathematician. ...
Additional material - Pierre-Simon Laplace (1774/1986), "Memoir on the Probability of the Causes of Events", Statistical Science 1(3):364–378.
- Stephen M. Stigler (1986), "Laplace's 1774 memoir on inverse probability", Statistical Science 1(3):359–378.
- Stephen M. Stigler (1983), "Who Discovered Bayes' Theorem?" The American Statistician 37(4):290–296.
- Jeff Miller, et al., Earliest Known Uses of Some of the Words of Mathematics (B). (very informative; recommended)
- Athanasios Papoulis (1984), Probability, Random Variables, and Stochastic Processes, second edition. New York: McGraw-Hill.
- James Joyce (2003), "Bayes' Theorem", Stanford Encyclopedia of Philosophy.
- The on-line textbook: Information Theory, Inference, and Learning Algorithms, by David J. C. MacKay provides an up to date overview of the use of Bayes' theorem in information theory and machine learning.
- Stanford Encyclopedia of Philosophy: Bayes' Theorem provides a comprehensive introduction to Bayes' theorem.
- Stanford Encyclopedia of Philosophy: Inductive Logic provides a comprehensive Bayesian treatment of Inductive Logic and Confirmation Theory.
- Eric W. Weisstein, Bayes' Theorem at MathWorld.
- Bayes' theorem at PlanetMath.
- Eliezer S. Yudkowsky (2003), "An Intuitive Explanation of Bayesian Reasoning"
- A tutorial on probability and Bayes’ theorem devised for Oxford University psychology students
- Confirmation Theory An extensive presentation of Bayesian Confirmation Theory
Athanasios Papoulis (1921 - April 25, 2002) was a Greek American engineer and applied mathematician. ...
The Stanford Encyclopedia of Philosophy (hereafter SEP) is a free online encyclopedia of philosophy run and maintained by Stanford University. ...
David J. C. MacKay (born April 22, 1967) is the professor of natural philosophy in the department of Physics at the University of Cambridge. ...
Dr. Eric W. Weisstein Encyclopedist Dr. Eric W. Weisstein (born March 18, 1969, in Bloomington, Indiana) is a noted encyclopedist in several technical areas of science and mathematics. ...
MathWorld is an online mathematics reference work, sponsored by Wolfram Research Inc. ...
PlanetMath is a free, collaborative, online mathematics encyclopedia. ...
The Singularity Institute for Artificial Intelligence is a 501(c)(3) tax-exempt nonprofit organization whose mission statement is as follows: SIAI was founded for the pursuit of ethically enhanced cognition by creating Friendly AI. We expect the ethical and significant enhancement of cognition will help solve contemporary challenges – disease...
|