FACTOID # 108: Japan leads the world in car production, producing almost 50% more cars than either of its next closest competitors, Germany and the United StatesInteresting industry facts »
 
 Home   Encyclopedia   Statistics   Countries A-Z   Flags   Maps   Education   Forum   FAQ   About 
 
WHAT'S NEW
RECENT ARTICLES
More Recent Articles »
 

SEARCH ALL

FACTS & STATISTICS    Advanced view

Search encyclopedia, statistics and forums:

 

 

(* = Graphable)

 

 


Encyclopedia > Bayesian inference

Bayesian inference is statistical inference in which evidence or observations are used to update or to newly infer the probability that a hypothesis may be true. The name "Bayesian" comes from the frequent use of Bayes' theorem in the inference process. Bayes' theorem was derived from the work of the Reverend Thomas Bayes. It has been suggested that this article or section be merged with inferential statistics. ... Probability is the likelihood that something is the case or will happen. ... Bayes theorem (also known as Bayes rule or Bayes law) is a result in probability theory, which relates the conditional and marginal probability distributions of random variables. ... Thomas Bayes (c. ...

Contents

Evidence and changing beliefs

Bayesian inference uses aspects of the scientific method, which involves collecting evidence that is meant to be consistent or inconsistent with a given hypothesis. As evidence accumulates, the degree of belief in a hypothesis changes. With enough evidence, it will often become very high or very low. Thus, proponents of Bayesian inference say that it can be used to discriminate between conflicting hypotheses: hypotheses with a very high degree of belief should be accepted as true and those with a very low degree of belief should be rejected as false. However, detractors say that this inference method may be biased due to initial beliefs that one needs to hold before any evidence is ever collected. Scientific method is a body of techniques for investigating phenomena, acquiring new knowledge, or correcting and integrating previous knowledge. ... This article does not cite any references or sources. ... Look up Hypothesis in Wiktionary, the free dictionary. ...

An example of Bayesian inference is
For billions of years, the sun has risen after it has set. The sun has set tonight. With very high probability (or 'I strongly believe that' or 'it is true that') the sun will rise tomorrow. With very low probability (or 'I do not at all believe that' or 'it is false that') the sun will not rise tomorrow.

Bayesian inference uses a numerical estimate of the degree of belief in a hypothesis before evidence has been observed and calculates a numerical estimate of the degree of belief in the hypothesis after evidence has been observed. Bayesian inference usually relies on degrees of belief, or subjective probabilities, in the induction process and does not necessarily claim to provide an objective method of induction. Nonetheless, some Bayesian statisticians believe probabilities can have an objective value and therefore Bayesian inference can provide an objective method of induction. See scientific method. Scientific method is a body of techniques for investigating phenomena, acquiring new knowledge, or correcting and integrating previous knowledge. ...


Bayes' theorem adjusts probabilities given new evidence in the following way:

P(H_0|E) = frac{P(E|H_0);P(H_0)}{P(E)}

where

  • H0 represents a hypothesis, called a null hypothesis, that was inferred before new evidence, E, became available.
  • P(H0) is called the prior probability of H0.
  • P(E | H0) is called the conditional probability of seeing the evidence E given that the hypothesis H0 is true. It is also called the likelihood function when it is expressed as a function of E given H0.
  • P(E) is called the marginal probability of E: the probability of witnessing the new evidence E under all mutually exclusive hypotheses. It can be calculated as the sum of the product of all probabilities of mutually exclusive hypothesis and corresponding conditional probabilities: sum P(E|H_i)P(H_i).
  • P(H0 | E) is called the posterior probability of H0 given E.

The factor P(E | H0) / P(E) represents the impact that the evidence has on the belief in the hypothesis. If it is likely that the evidence will be observed when the hypothesis under consideration is true, then this factor will be large. Multiplying the prior probability of the hypothesis by this factor would result in a large posterior probability of the hypothesis given the evidence. Under Bayesian inference, Bayes' theorem therefore measures how much new evidence should alter a belief in a hypothesis. In statistics, a null hypothesis is a hypothesis set up to be nullified or refuted in order to support an alternative hypothesis. ... A prior probability is a marginal probability, interpreted as a description of what is known about a variable in the absence of some evidence. ... This article defines some terms which characterize probability distributions of two or more variables. ... Look up likelihood in Wiktionary, the free dictionary. ... This article defines some terms which characterize probability distributions of two or more variables. ... The posterior probability can be calculated by Bayes theorem from the prior probability and the likelihood function. ...


Bayesian statisticians argue that even when people have very different prior subjective probabilities, new evidence from repeated observations will tend to bring their posterior subjective probabilities closer together. However, others argue that when people hold widely different prior subjective probabilities their posterior subjective probabilities may never converge even with repeated collection of evidence. These critics argue that worldviews which are completely different initially can remain completely different over time despite a large accumulation of evidence.


Multiplying the prior probability P(H0) by the factor P(E | H0) / P(E) will never yield a probability that is greater than 1. Since P(E) is at least as great as P(E cap H_0), which equals P(E|H_0) cdot P(H_0) (see joint probability), replacing P(E) with P(E cap H_0) in the factor P(E | H0) / P(E) will yield a posterior probability of 1. Therefore, the posterior probability could yield a probability greater than 1 only if P(E) were less than P(E cap H_0), which is never true. This article defines some terms which characterize probability distributions of two or more variables. ...


The probability of E given H0, P(E | H0), can be represented as a function of its second argument with its first argument held at a given value. Such a function is called a likelihood function; it is a function of H0 given E. A ratio of two likelihood functions is called a likelihood ratio, Λ. For example, Look up likelihood in Wiktionary, the free dictionary. ...

Lambda = frac{L(H_0|E)}{L(mathrm{not},H_0|E)} = frac{P(E|H_0)}{P(E|mathrm{not},H_0)}

The marginal probability, P(E), can also be represented as the sum of the product of all probabilities of mutually exclusive hypotheses and corresponding conditional probabilities: P(E|H_0)P(H_0)+ P(E|mathrm{not},H_0)P(mathrm{not},H_0) .


As a result, we can rewrite Bayes' theorem as

P(H_0|E) = frac{P(E|H_0)P(H_0)}{P(E|H_0)P(H_0)+ P(E|mathrm{not},H_0)P(mathrm{not},H_0)} = frac{Lambda P(H_0)}{Lambda P(H_0) +P(mathrm{not},H_0)}

With two independent pieces of evidence E1 and E2, Bayesian inference can be applied iteratively. We could use the first piece of evidence to calculate an initial posterior probability, and then use that posterior probability as a new prior probability to calculate a second posterior probability given the second piece of evidence.


Independence of evidence implies that

P(E_1, E_2 | H_0) = P(E_1 | H_0) times P(E_2 | H_0)
P(E_1, E_2) = P(E_1) times P(E_2)
P(E_1,E_2|mathrm{not},H_0) = P(E_1|mathrm{not},H_0) times P(E_2|mathrm{not},H_0)

Bayes' theorem applied iteratively implies

P(H_0|E_1, E_2) = frac{P(E_1|H_0)times P(E_2|H_0);P(H_0)}{P(E_1)times P(E_2)}

Using likelihood ratios, we find that

P(H_0|E_1, E_2) = frac{Lambda_1 Lambda_2 P(H_0)}{[Lambda_1 P(H_0) + P(mathrm{not},H_0)] [Lambda_2 P(H_0) + P(mathrm{not},H_0)]} ,

This iteration of Bayesian inference could be extended with more independent pieces of evidence.


Bayesian inference is used to calculate probabilities for decision making under uncertainty. In addition to probabilities, a loss function should be calculated in order to reflect the consequences of making an error. Probabilities represent the chance or belief of being wrong. A loss function represents the consequences of being wrong. In statistics, decision theory and economics, a loss function is a function that maps an event (technically an element of a sample space) onto a real number representing the economic cost or regret associated with the event. ...


Simple examples of Bayesian inference

From which bowl is the cookie?

To illustrate, suppose there are two full bowls of cookies. Bowl #1 has 10 chocolate chip and 30 plain cookies, while bowl #2 has 20 of each. Our friend Fred picks a bowl at random, and then picks a cookie at random. We may assume there is no reason to believe Fred treats one bowl differently from another, likewise for the cookies. The cookie turns out to be a plain one. How probable is it that Fred picked it out of bowl #1?


Intuitively, it seems clear that the answer should be more than a half, since there are more plain cookies in bowl #1. The precise answer is given by Bayes' theorem. Let H1 correspond to bowl #1, and H2 to bowl #2. It is given that the bowls are identical from Fred's point of view, thus P(H1) = P(H2), and the two must add up to 1, so both are equal to 0.5. The datum D is the observation of a plain cookie. From the contents of the bowls, we know that P(D | H1) = 30/40 = 0.75 and P(D | H2) = 20/40 = 0.5. Bayes' formula then yields

 begin{matrix} P(H_1 | D) &=& frac{P(H_1) cdot P(D | H_1)}{P(H_1) cdot P(D | H_1) + P(H_2) cdot P(D | H_2)}    & =& frac{0.5 times 0.75}{0.5 times 0.75 + 0.5 times 0.5}    & =& 0.6 end{matrix}

Before observing the cookie, the probability that Fred chose bowl #1 is the prior probability, P(H1), which is 0.5. After observing the cookie, we revise the probability to P(H1|D), which is 0.6.


It's worth noting that our belief that observing the plain cookie should somehow affect the prior probability P(H1) has formed the posterior probability P(H1 | D), increased from 0.5 to 0.6. This reflects our intuition that the cookie is more likely from the bowl 1, since it has a higher ratio of plain to chocolate cookies than the other. The decision is given as a probability, which is different from classical statistics.


False positives in a medical test

False positives result when a test falsely or incorrectly reports a positive result. For example, a medical test for a disease may return a positive result indicating that patient has a disease even if the patient does not have the disease. We can use Bayes' theorem to determine the probability that a positive result is in fact a false positive. We find that if a disease is rare, then the majority of positive results may be false positives, even if the test is accurate. Type I errors (or α error, or false positive) and type II errors (β error, or a false negative) are two terms used to describe statistical errors. ... This article is about the medical term. ...


Suppose that a test for a disease generates the following results:

  • If a tested patient has the disease, the test returns a positive result 99% of the time, or with probability 0.99
  • If a tested patient does not have the disease, the test returns a negative result 95% of the time, or with probability 0.95.

Suppose also that only 0.1% of the population has that disease, so that a randomly selected patient has a 0.001 prior probability of having the disease.


We can use Bayes' theorem to calculate the probability that a positive test result is a false positive.


Let A represent the condition in which the patient has the disease, and B represent the evidence of a positive test result. Then, probability that the patient actually has the disease given the positive test result is

begin{matrix} P(A | B) &=& frac{P(B | A) P(A)}{P(B | A)P(A) + P(B |mathrm{not},A)P(mathrm{not},A)}   P(A|B) &= &frac{0.99times 0.001}{0.99 times 0.001 + 0.05times 0.999}  ~ &approx &0.019 .end{matrix}

and hence the probability that a positive result is a false positive is about  (1 – 0.019) = 0.981.


Despite the apparent high accuracy of the test, the incidence of the disease is so low that the vast majority of patients who test positive do not have the disease. Nonetheless, the fraction of patients who test positive who have the disease (.019) is 19 times the fraction of people who have not yet taken the test who have the disease (.001). Thus the test is not useless, and re-testing may improve the reliability of the result.


In order to reduce the problem of false positives, a test should be very accurate in reporting a negative result when the patient does not have the disease. If the test reported a negative result in patients without the disease with probability 0.999, then

P(A|B) = frac{0.99times 0.001}{0.99 times 0.001 + 0.001times 0.999} approx 0.5 ,

so that 1- 0.5 = 0.5 now is the probability of a false positive.


On the other hand, false negatives result when a test falsely or incorrectly reports a negative result. For example, a medical test for a disease may return a negative result indicating that patient does not have a disease even though the patient actually has the disease. We can also use Bayes' theorem to calculate the probability of a false negative. In the first example above, Type I errors (or α error, or false positive) and type II errors (β error, or a false negative) are two terms used to describe statistical errors. ... This article is about the medical term. ...

begin{matrix} P(A |mathrm{not},B) &=& frac{P(mathrm{not},B | A) P(A)}{P(mathrm{not},B | A)P(A) + P(mathrm{not},B |mathrm{not},A)P(mathrm{not},A)}   P(A|mathrm{not},B) &= &frac{0.01times 0.001}{0.01 times 0.001 + 0.95times 0.999}, , ~ &approx &0.0000105, .end{matrix}

The probability that a negative result is a false negative is about 0.0000105 or 0.00105%. When a disease is rare, false negatives will not be a major problem with the test.


But if 60% of the population had the disease, then the probability of a false negative would be greater. With the above test, the probability of a false negative would be

begin{matrix} P(A |mathrm{not},B) &=& frac{P(mathrm{not},B | A) P(A)}{P(mathrm{not},B | A)P(A) + P(mathrm{not},B |mathrm{not},A)P(mathrm{not},A)}   P(A|mathrm{not},B) &= &frac{0.01times 0.6}{0.01 times 0.6 + 0.95times 0.4}, , ~ &approx &0.0155, .end{matrix}

The probability that a negative result is a false negative rises to 0.0155 or 1.55%.


In the courtroom

Bayesian inference can be used in a court setting by an individual juror to coherently accumulate the evidence for and against the guilt of the defendant, and to see whether, in totality, it meets their personal threshold for 'beyond a reasonable doubt'.

  • Let G be the event that the defendant is guilty.
  • Let E be the event that the defendant's DNA matches DNA found at the crime scene.
  • Let P(E | G) be the probability of seeing event E assuming that the defendant is guilty. (Usually this would be taken to be unity.)
  • Let P(G | E) be the probability that the defendant is guilty assuming the DNA match event E
  • Let P(G) be the juror's personal estimate of the probability that the defendant is guilty, based on the evidence other than the DNA match. This could be based on his responses under questioning, or previously presented evidence.

Bayesian inference tells us that if we can assign a probability p(G) to the defendant's guilt before we take the DNA evidence into account, then we can revise this probability to the conditional probability P(G | E), since

P(G | E) = frac{P(G) P(E | G)}{P(E)}

Suppose, on the basis of other evidence, a juror decides that there is a 30% chance that the defendant is guilty. Suppose also that the forensic evidence is that the probability that a person chosen at random would have DNA that matched that at the crime scene was 1 in a million, or 10-6.


The event E can occur in two ways. Either the defendant is guilty (with prior probability 0.3) and thus his DNA is present with probability 1, or he is innocent (with prior probability 0.7) and he is unlucky enough to be one of the 1 in a million matching people.


Thus the juror could coherently revise his opinion to take into account the DNA evidence as follows: Genetic fingerprinting, DNA testing, DNA typing, and DNA profiling are techniques used to distinguish between individuals of the same species using only samples of their DNA. Its invention by Sir Alec Jeffreys at the University of Leicester was announced in 1985. ...

P(G | E) = (0.3 times 1.0) /(0.3 times 1.0 + 0.7 times 10^{-6}) = 0.99999766667.

The benefit of adopting a Bayesian approach is that it gives the juror a formal mechanism for combining the evidence presented. The approach can be applied successively to all the pieces of evidence presented in court, with the posterior from one stage becoming the prior for the next.


The juror would still have to have a prior for the guilt probability before the first piece of evidence is considered. It has been suggested that this could be the guilt probability of a random person of the appropriate sex taken from the town where the crime occurred. Thus, for a crime committed by an adult male in a town containing 50,000 adult males the appropriate initial prior probability might be 1/50,000.


For the purpose of explaining Bayes' theorem to jurors, it will usually be appropriate to give it in the form of betting odds rather than probabilities, as these are more widely understood. In this form Bayes' theorem states that In probability theory and statistics the odds in favor of an event or a proposition are the quantity p / (1 − p), where p is the probability of the event or proposition. ...

Posterior odds = prior odds x Bayes factor

In the example above, the juror who has a prior probability of 0.3 for the defendant being guilty would now express that in the form of odds of 3:7 in favour of the defendant being guilty, the Bayes factor is one million, and the resulting posterior odds are 3 million to 7 or about 429,000 to one in favour of guilt. In statistics, the use of Bayes factors is a Bayesian alternative to classical hypothesis testing. ...


In the United Kingdom, Bayes' theorem was explained to the jury in the odds form by a statistician expert witness in the rape case of Regina versus Denis John Adams. A conviction was secured but the case went to Appeal, as no means of accumulating evidence had been provided for those jurors who did not want to use Bayes' theorem. The Court of Appeal upheld the conviction, but also gave their opinion that "To introduce Bayes' Theorem, or any similar method, into a criminal trial plunges the Jury into inappropriate and unnecessary realms of theory and complexity, deflecting them from their proper task." No further appeal was allowed and the issue of Bayesian assessment of forensic DNA data remains controversial. An expert witness is a witness, who by virtue of education, profession, publication or experience, is believed to have special knowledge of his or her subject beyond that of the average person, sufficient that others may officially (and legally) rely upon his opinion. ... A rape trial in the United Kingdom in 1996. ...


Gardner-Medwin argues that the criterion on which a verdict in a criminal trial should be based is not the probability of guilt, but rather the probability of the evidence, given that the defendant is innocent. He argues that if the posterior probability of guilt is to be computed by Bayes' theorem, the prior probability of guilt must be known. This will depend on the incidence of the crime and this is an odd piece of evidence to consider in a criminal trial. Consider the following three propositions:


A: The known facts and testimony could have arisen if the defendant is guilty,


B: The known facts and testimony could have arisen if the defendant is innocent,


C: The defendant is guilty.


Gardner-Medwin argues that the jury should believe both A and not-B in order to convict. A and not-B implies the truth of C, but the reverse is not true. It is possible that B and C are both true, but in this case he argues that a jury should acquit, even though they know that they will be letting some guilty people go free.


Other court cases in which probabilistic arguments played some role were the Howland will forgery trial, the Sally Clark case, and the Lucia de Berk case. The Howland will forgery trial was a US court case in 1868 to decide Henrietta Howland Robinsons contest of the will of Sylvia Ann Howland. ... For other persons named Sally Clark, see Sally Clark (disambiguation). ... Lucia de Berk (born The Hague, Netherlands 22 September 1961), in the Dutch media generally called Lucy de B. or Lucia de B., is a Dutch nurse who was sentenced to life imprisonment in 2003 for four murders and three attempted murders on patients in her care. ...


Search theory

In May 1968 the US nuclear submarine Scorpion (SSN-589) failed to arrive as expected at her home port of Norfolk, Virginia. The US Navy was convinced that the vessel had been lost off the Eastern seaboard but an extensive search failed to discover the wreck. The US Navy's deep water expert, John Craven USN, believed that it was elsewhere and he organised a search south west of the Azores based on a controversial approximate triangulation by hydrophones. He was allocated only a single ship, the Mizar, and he took advice from a firm of consultant mathematicians in order to maximise his resources. A Bayesian search methodology was adopted. Experienced submarine commanders were interviewed to construct hypotheses about what could have caused the loss of the Scorpion. Bayesian search theory is the application of Bayesian statistics to the search for lost objects. ... USS Scorpion (SSN-589) was the sixth ship of the United States Navy to be named for the scorpion, (hence the Scorpius constellation on her insignia). ... Motto: Crescas (Latin for, Thou shalt grow. ... John Craven is the former Chief Scientist of the US Navys Special Projects Office. ... Motto (Portuguese for Rather die free than in peace subjugated) Anthem  (national)  (local) Capital Ponta Delgada1 Angra do Heroísmo2 Horta3 Largest city Ponta Delgada Official languages Portuguese Government Autonomous region  -  President Carlos César Establishment  -  Settled 1439   -  Autonomy 1976  Area  -  Total 2,333 km² (n/a) 911 sq mi... USNS Mizar (MA-48/AGOR-11/AK-272) was a vessel of the United States Navy. ...


The sea area was divided up into grid squares and a probability assigned to each square, under each of the hypotheses, to give a number of probability grids, one for each hypothesis. These were then added together to produce an overall probability grid. The probability attached to each square was then the probability that the wreck was in that square. A second grid was constructed with probabilities that represented the probability of successfully finding the wreck if that square were to be searched and the wreck were to be actually there. This was a known function of water depth. The result of combining this grid with the previous grid is a grid which gives the probability of finding the wreck in each grid square of the sea if it were to be searched.


This sea grid was systematically searched in a manner which started with the high probability regions first and worked down to the low probability regions last. Each time a grid square was searched and found to be empty its probability was reassessed using Bayes' theorem. This then forced the probabilities of all the other grid squares to be reassessed (upwards), also by Bayes' theorem. The use of this approach was a major computational challenge for the time but it was eventually successful and the Scorpion was found in October of that year. Suppose a grid square has a probability p of containing the wreck and that the probability of successfully detecting the wreck if it is there is q. If the square is searched and no wreck is found, then, by Bayes' theorem, the revised probability of the wreck being in the square is given by Bayes theorem (also known as Bayes rule or Bayes law) is a result in probability theory, which relates the conditional and marginal probability distributions of random variables. ...

 p' = frac{p(1-q)}{(1-p)+p(1-q)}.

More mathematical examples

Naive Bayes classifier

See naive Bayes classifier. A naive Bayes classifier is a simple probabilistic classifier based on applying Bayes theorem with strong (naive) independence assumptions. ...


Posterior distribution of the binomial parameter

In this example we consider the computation of the posterior distribution for the binomial parameter. This is the same problem considered by Bayes in Proposition 9 of his essay.


We are given m observed successes and n observed failures in a binomial experiment. The experiment may be tossing a coin, drawing a ball from an urn, or asking someone their opinion, among many other possibilities. What we know about the parameter (let's call it a) is stated as the prior distribution, p(a).


For a given value of a, the probability of m successes in m+n trials is

 p(m,n|a) = begin{pmatrix} n+m  m end{pmatrix} a^m (1-a)^n.

Since m and n are fixed, and a is unknown, this is a likelihood function for a. From the continuous form of the law of total probability we have

 p(a|m,n) = frac{p(m,n|a),p(a)}{int_0^1 p(m,n|a),p(a),da} = frac{begin{pmatrix} n+m  m end{pmatrix} a^m (1-a)^n,p(a)} {int_0^1 begin{pmatrix} n+m  m end{pmatrix} a^m (1-a)^n,p(a),da}.

For some special choices of the prior distribution p(a), the integral can be solved and the posterior takes a convenient form. In particular, if p(a) is a beta distribution with parameters m0 and n0, then the posterior is also a beta distribution with parameters m+m0 and n+n0. In probability theory and statistics, the beta distribution is a continuous probability distribution with the probability density function (pdf) defined on the interval [0, 1]: where α and β are parameters that must be greater than zero and B is the beta function. ...


A conjugate prior is a prior distribution, such as the beta distribution in the above example, which has the property that the posterior is the same type of distribution. In Bayesian probability theory, a class of prior probability distributions p(θ) is said to be conjugate to a class of likelihood functions p(x|θ) if the resulting posterior distributions p(θ|x) are in the same family as p(θ). For example, the Gaussian family is conjugate to itself (or self-conjugate...


What is "Bayesian" about Proposition 9 is that Bayes presented it as a probability for the parameter a. That is, not only can one compute probabilities for experimental outcomes, but also for the parameter which governs them, and the same algebra is used to make inferences of either kind. Interestingly, Bayes actually states his question in a way that might make the idea of assigning a probability distribution to a parameter palatable to a frequentist. He supposes that a billiard ball is thrown at random onto a billiard table, and that the probabilities p and q are the probabilities that subsequent billiard balls will fall above or below the first ball. By making the binomial parameter a depend on a random event, he cleverly escapes a philosophical quagmire that was an issue he most likely was not even aware of. Statistical regularity has motivated the development of the relative frequency concept of probability. ...


Computer applications

Bayesian inference has applications in artificial intelligence and expert systems. Bayesian inference techniques have been a fundamental part of computerized pattern recognition techniques since the late 1950s. There is also an ever growing connection between Bayesian methods and simulation-based Monte Carlo techniques since complex models cannot be processed in closed form by a Bayesian analysis, while the graphical model structure inherent to statistical models, may allow for efficient simulation algorithms like the Gibbs sampling and other Metropolis-Hastings algorithm schemes. Recently Bayesian inference has gained popularity amongst the phylogenetics community for these reasons; applications such as BEAST and MrBayes allow many demographic and evolutionary parameters to be estimated simultaneously. AI redirects here. ... An expert system, also known as a knowledge based system, is a computer program that contains the knowledge and analytical skills of one or more human experts, related to a specific subject. ... Pattern recognition is a field within the area of machine learning. ... Monte Carlo methods are a widely used class of computational algorithms for simulating the behavior of various physical and mathematical systems, and for other computations. ... In probability theory and statistics, a graphical model (GM) represents dependencies among random variables by a graph in which each random variable is a node. ... In mathematics and physics, Gibbs sampling is an algorithm to generate a sequence of samples from the joint probability distribution of two or more random variables. ... The Proposal distribution Q proposes the next point that the random walk might move to. ... Phylogenetic groups, or taxa, can be monophyletic, paraphyletic, or polyphyletic. ...


As applied to statistical classification, Bayesian inference has been used in recent years to develop algorithms for identifying unsolicited bulk e-mail spam. Applications which make use of Bayesian inference for spam filtering include DSPAM, Bogofilter, SpamAssassin, InBoxer, and Mozilla. Spam classification is treated in more detail in the article on the naive Bayes classifier. Statistical classification is a procedure in which individual items are placed into groups based on quantitative information on one or more characteristics inherent in the items (referred to as traits, variables, characters, etc) and based on a training set of previously labeled items. ... E-mail spam, also known as bulk e-mail or junk e-mail is a subset of spam that involves sending nearly identical messages to numerous recipients by e-mail. ... DSPAM is a free software statistical spam filter written by Jonathan A. Zdziarski, author of the book Ending Spam (ISBN 1593270526). ... Bogofilter is a mail filter that classifies e-mail as spam or ham (non-spam) by a statistical analysis of the messages header and content (body). ... SpamAssassin is a computer program released under the Apache License 2. ... Mozilla was the official, public, original name of Mozilla Application Suite by the Mozilla Foundation, nowadays called SeaMonkey suite. ... A naive Bayes classifier is a simple probabilistic classifier based on applying Bayes theorem with strong (naive) independence assumptions. ...


In some applications fuzzy logic is an alternative to Bayesian inference. Fuzzy logic and Bayesian inference, however, are mathematically and semantically not compatible: You cannot, in general, understand the degree of truth in fuzzy logic as probability and vice versa. Fuzzy logic is derived from fuzzy set theory dealing with reasoning that is approximate rather than precisely deduced from classical predicate logic. ...


References

  • On-line textbook: Information Theory, Inference, and Learning Algorithms, by David MacKay, has chapters on Bayesian methods, including examples; arguments in favour of Bayesian methods (in the style of Edwin Jaynes); modern Monte Carlo methods, message-passing methods, and variational methods; and examples illustrating the connections between Bayesian inference and data compression.
  • Berger, J.O. (1999) Statistical Decision Theory and Bayesian Analysis. Second Edition. Springer Verlag, New York. ISBN 0-387-96098-8 and also ISBN 3-540-96098-8.
  • Bolstad, William M. (2004) Introduction to Bayesian Statistics, John Wiley ISBN 0-471-27020-2
  • Bretthorst, G. Larry, 1988, Bayesian Spectrum Analysis and Parameter Estimation in Lecture Notes in Statistics, 48, Springer-Verlag, New York, New York
  • Dawid, A.P. and Mortera, J. (1996) Coherent analysis of forensic identification evidence. Journal of the Royal Statistical Society, Series B, 58,425-443.
  • Foreman, L.A; Smith, A.F.M. and Evett, I.W. (1997). Bayesian analysis of deoxyribonucleic acid profiling data in forensic identification applications (with discussion). Journal of the Royal Statistical Society, Series A, 160, 429-469.
  • Gardner-Medwin, A. What probability should the jury address?. Significance. Volume 2, Issue 1, March 2005
  • Gelman, A., Carlin, B., Stern, H., and Rubin, D.B. (2003). Bayesian Data Analysis. Second Edition. Chapman & Hall/CRD, Boca Raton, Florida. ISBN 1-58488-388-X.
  • Gelman, A. and Meng, X.L. (2004). Applied Bayesian Modeling and Causal Inference from Incomplete-Data Perspectives: an essential journey with Donald Rubin's statistical family. John Wiley & Sons, Chichester, UK. ISBN 0-470-09043-X
  • Giffin, A. and Caticha, A. (2007) Updating Probabilities with Data and Moments
  • Jaynes, E.T. (1998) Probability Theory: The Logic of Science.
  • Lee, Peter M. Bayesian Statistics: An Introduction. Second Edition. (1997). ISBN 0-340-67785-6.
  • O'Hagan, A. and Forster, J. (2003) Kendall's Advanced Theory of Statistics, Volume 2B: Bayesian Inference. Arnold, New York. ISBN 0-340-52922-9.
  • Pearl, J. (1988) Probabilistic Reasoning in Intelligent Systems, San Mateo, CA: Morgan Kaufmann.
  • Robert, C.P. (2001) The Bayesian Choice. Springer Verlag, New York.
  • Robertson, B. and Vignaux, G.A. (1995) Interpreting Evidence: Evaluating Forensic Science in the Courtroom. John Wiley and Sons. Chichester.
  • Winkler, Robert L, Introduction to Bayesian Inference and Decision, 2nd Edition (2003) Probabilistic. ISBN 0-9647938-4-9

Edwin Thompson Jaynes (July 5, 1922 – April 30, 1998) was Wayman Crow Distinguished Professor of Physics at Washington University in St. ... Monte Carlo methods are a widely used class of computational algorithms for simulating the behavior of various physical and mathematical systems, and for other computations. ... Message-passing methods are a set of algorithms in statistics/machine learning for doing inference through local computation. ... Calculus of variations is a field of mathematics that deals with functionals, as opposed to ordinary calculus which deals with functions. ... Source coding redirects here. ... The Journal of the Royal Statistical Society is a series of three peer-reviewed statistics journals published by Blackwell Publishing for the London-based Royal Statistical Society. ...

See also

The posterior probability of a model given data, P(H|D), is given by Bayes theorem: P(H|D) = P(D|H)P(H)/P(D) The key data_dependent term P(D|H) is a likelihood, and is sometimes called the evidence for model H; evaluating it correctly is the... Bayesian probability is an interpretation of probability suggested by Bayesian theory, which holds that the concept of probability can be defined as the degree to which a person believes a proposition. ... Bayesian filtering is the process of using Bayesian statistical methods to classify documents into categories. ... A Bayesian network (or a belief network) is a probabilistic graphical model that represents a set of variables and their probabilistic independencies. ... In statistics, the use of Bayes factors is a Bayesian alternative to classical hypothesis testing. ... In statistics, empirical Bayes methods involve: An underlying probability distribution of some unobservable quantity assigned to each member of a statistical population. ... It has been suggested that this article or section be merged with statistical inference. ... An influence diagram (ID) (also called a decision network) is a compact graphical and mathematical representation of a decision situation. ... For the House television show episode called Occams Razor, see Occams Razor (House episode) Occams razor (sometimes spelled Ockhams razor) is a principle attributed to the 14th-century English logician and Franciscan friar William of Ockham. ... Cromwells rule, named by statistician D. Lindley, states that one should avoid using prior probabilities of 0 or 1. ... To meet Wikipedias quality standards, this article or section may require cleanup. ... Minimum message length (MML) is a formal information theory restatement of Occams Razor: even when models are not equal in goodness of fit accuracy to the observed data, the one generating the shortest overall message is more likely to be correct (where the message consists of a statement of... The minimum description length principle is a formalization of Occams Razor in which the best hypothesis for a given set of data is the one that leads to the largest compression of the data. ... Kriging is a regression technique used in geostatistics. ... In physics the Maximum entropy school of thermodynamics (or more colloquially, the MaxEnt school of thermodynamics), initiated with two papers published in the Physical Review by Edwin T. Jaynes in 1957, views statistical mechanics as an inference process: a specific application of inference techniques rooted in information theory, which relate... // Probability The Doctrine of Chances Author: Abraham de Moivre Publication data: 1738 (2nd ed. ... The Wisdom of Crowds: Why the Many Are Smarter Than the Few and How Collective Wisdom Shapes Business, Economies, Societies and Nations, first published in 2004, is a book written by James Surowiecki about the aggregation of information in groups, resulting in decisions that, he argues, are often better than... A black raven Non-black non-ravens The Raven paradox, also known as Hempels paradox or Hempels ravens is a paradox proposed by the German logician Carl Gustav Hempel in the 1940s to illustrate a problem where inductive logic violates intuition. ...

External links


  Results from FactBites:
 
Basics of Bayesian Inference (1245 words)
Bayesian probability theory is a branch of mathematical probability theory that allows one to model uncertainty about the world and outcomes of interest by combining common-sense knowledge and observational evidence.
Inference, or model evaluation, is the process of updating probabilities of outcomes based upon the relationships in the model and the evidence known about the situation at hand.
When inference is performed on a model, there are various mathematical schemes for discovering which pieces of evidence would be the most important to discover.
  More results at FactBites »


 

COMMENTARY     


Share your thoughts, questions and commentary here
Your name
Your comments
Please enter the 5-letter protection code

Want to know more?
Search encyclopedia, statistics and forums:

 


Lesson Plans | Student Area | Student FAQ | Reviews | Press Releases |  Feeds | Contact
The Wikipedia article included on this page is licensed under the GFDL.
Images may be subject to relevant owners' copyright.
All other elements are (c) copyright NationMaster.com 2003-5. All Rights Reserved.
Usage implies agreement with terms.