FACTOID # 153: In all the countries surveyed, women do more housework than men.
 
 Home   Encyclopedia   Statistics   Countries A-Z   Flags   Maps   Education   Forum   FAQ   About 
 
WHAT'S NEW
RECENT ARTICLES
More Recent Articles »
 

SEARCH ALL

FACTS & STATISTICS    Advanced view

Search encyclopedia, statistics and forums:

 

 

(* = Graphable)

 

 


Encyclopedia > Birthday problem

The birthday paradox states that if there are 23 people in a room then there is a slightly more than 50:50 chance that at least two of them will have the same birthday. For 60 or more people, the probability is greater than 99%. This is not a paradox in the sense of leading to a logical contradiction; it is a paradox in the sense that it is a mathematical truth that contradicts common intuition. Most people estimate that the chance is much lower than 50:50. Calculating this probability (and related ones) is the birthday problem. The mathematics behind it has been used to devise a well-known cryptographical attack named the birthday attack. A birthday is the date on which a person was born. ... For other meanings of Paradox, see Paradox (disambiguation). ... Logic (from ancient Greek λόγος (logos), meaning reason) is the study of arguments. ... Intuition has many meanings across many cultures, including: quick and ready insight seemingly independent of previous experiences and empirical knowledge immediate apprehension or cognition knowledge or conviction gained by intuition the power or faculty of attaining to direct knowledge or cognition without evident rational thought and inference. ... The word probability derives from the Latin probare (to prove, or to test). ... A birthday attack is a type of cryptographic attack which exploits the mathematics behind the birthday paradox, making use of a space_time tradeoff. ...

Contents

Understanding the paradox

The key to understanding the birthday paradox is to realize that there are many possible pairs of people whose birthdays could match. Specifically, among 23 people, there are 23 × 22/2 = 253 pairs, each of which being a potential candidate for a match. Looked at it this way, it doesn't seem that unlikely that one of these 253 pairs yields a match.


To emphasize the point, consider a different scenario: if you enter a room with 22 people, the chance that somebody there has the same birthday as you is not 50:50, but much lower. This is because now there are only 22 possible pairs that could yield a match. The actual birthday problem is asking if any of the 23 people have a matching birthday with any of the others.


Estimating the probability

To compute the approximate probability that in a room of n people, at least two have the same birthday, we disregard distribution variations, such as leap years, twins, seasonal or weekday variations, and assume that 365 possible birthdays are equally likely. Real-life birthday distributions are not uniform since not all dates are equally likely. A leap year (or intercalary year) is a year containing an extra day or month in order to keep the calendar year in sync with an astronomical or seasonal year. ... Fraternal twin boys in the tub The term twin most notably refers to two individuals (or one of two individuals) who have shared the same uterus (womb) and usually, but not necessarily, born on the same day. ...


The trick is to first calculate the probability p that the n birthdays are different. Assuming n ≤ 365, this probability is given by

A graph showing the probability of at least two people sharing a birthday amongst a certain number of people.

because the second person cannot have the same birthday as the first (364/365), the third cannot have the same birthday as the first two (363/365), etc. Using factorial notation, this expression can also be written as Download high resolution version (871x534, 5 KB)The Birthday paradox. ... Download high resolution version (871x534, 5 KB)The Birthday paradox. ... In mathematics, the factorial of a natural number n is the product of the positive integers less than or equal to n. ...

The probability p(n) that at least two persons of the n have the same birthday is then given by

for n ≤ 365, and 1 for n > 365.


For n = 23, one obtains a probability of about 0.507. Some of these probabilities, numerically computed using the formula above, are shown here:

n p(n)
10 12%
20 41%
30 70%
50 97%
100 99.99996%
200 99.9999999999999999999999999998%
300 1 − (7 × 10−73)
350 1 − (3 × 10−131)
≥366 100%
Enlarge
Comparing p(n) = probability of a birthday match with q(n) = probability of matching your birthday

Note that neither of the two people is chosen in advance: by way of contrast, the probability q(n) that someone in a room of n other people has the same birthday as a particular person (for example, you) is given by its pd generated with home. ... its pd generated with home. ...

which for n = 22 gives only about 0.059, or slightly better than 1 chance in 17. For a greater than 50:50 chance that one person in a roomful of n people has the same birthday as you, n would need to be at least 253 . Note that this number is significantly higher than 365/2 = 182.5: the reason is that there are likely some birthday matches among the people in the room.


A mathematical (as opposed to numerical) view

In his autobiography, Paul Halmos deplored the fact that the birthday paradox is often presented in terms of mere numerical computation rather than conceptual mathematics. The argument given below is adapted from Halmos' argument, which contained a small error. The product Paul Halmos Paul Richard Halmos (born March 3, 1916) is a Hungarian-born American mathematician who has done research in the fields of probability theory, statistics, operator theory, ergodic theory, and functional analysis (in particular Hilbert spaces). ...

equals 1 − p(n), and we are therefore interested in the first n such that the product is smaller than 1/2. We have

because of the inequality of arithmetic and geometric means. Therefore: In mathematics, the inequality of arithmetic and geometric means, or more briefly the AM-GM inequality, states that the arithmetic mean of a list of non-negative real numbers is greater than or equal to the geometric mean of the same list; and further, that the two means are equal...

(Here we first used the fact that the sum over all integers from 1 to n − 1 equals n(n − 1)/2, and then the inequality 1 − x < e−x.)


The value of the last expression is less than 1/2 if and only if

where "loge" means the natural logarithm. This falls just barely short of 506, and as luck would have it, 506 is one of the values of n2 − n, attained when n = 23. The natural logarithm is the logarithm to the base e, where e is approximately equal to 2. ...


Of this argument, Halmos wrote:

"The reasoning is based on important tools that all students of mathematics should have ready access to. The birthday problem used to be a splendid illustration of the advantages of pure thought over mechanical manipulation; the inequalities can be obtained in a minute or two, whereas the multiplications would take much longer, and be much more subject to error, whether the instrument is a pencil or an old-fashioned desk computer. What calculators do not yield is understanding, or mathematical facility, or a solid basis for more advanced, generalized theories."

However, Halmos' derivation only shows that at most 23 people are needed to ensure a birthday match with even chance; since we don't know how sharp the given inequalities are, the argument leaves open the possibility that n = 22 could also work. Calculator - Wikipedia /**/ @import /skins/monobook/IE50Fixes. ...


Generalization and approximation

The birthday paradox may be generalized: there are n persons; each one of them randomly chooses a number between 1 and fixed N (where N may be, e.g., 365 or any other integer > 0).


What is the probability p(n) that at least two persons choose the same number?


An approximation to the answer is given by

Results for N = 365

image:050329-birthday2.png its pd File links The following pages link to this file: Birthday paradox ...


Reverse problem

An alternate question may be:

For fixed probability p ...
... find the greatest n(p) for which the probability p(n) is smaller than the given p, or
... find the smallest n(p) for which the probability p(n) is greater than the given p.

An approximation for this is given by:

Example

approximation computation for N := 365
p n generalized n for N := 365 n p(n↓) n p(n↑)
0.01
0.14178 √N
2.70864
2 0.00274 3
0.00820
0.05 0.32029 √N 6.11916 6 0.04046 7 0.05624
0.1
0.45904 √N
8.77002
8 0.07434 9
0.09462
0.2
0.66805 √N
12.76302
12 0.16702 13
0.19441
0.3 0.84460 √N 16.13607 16 0.28360 17 0.31501
0.5 1.17741 √N 22.49439 22 0.47570 23 0.50730
0.7 1.55176 √N 29.64625 29 0.68097 30 0.70632
0.8 1.79412 √N 34.27666 34 0.79532 35 0.81438
0.9 2.14597 √N 40.99862 40 0.89123 41 0.90315
0.95 2.44775 √N 46.76414 46 0.94825 47 0.95477
0.99
3.03485 √N
57.98081
57
0.99012
58 0.99166

Note: some values are coloured showing that the approximation is not always exact.


Empirical test

The birthday paradox can be simulated empirically using computer code.


The following command can be used with the Ruby programming language. Ruby is an object-oriented programming language. ...

 ruby -e 'puts (1..30).collect {rand(365)+1}.uniq.length' 

where "30" is the number of people in the group and "365" is the number of days in a year. If the output is the same as the number of people in the group, there were no repeating birthdays. If it is not the same, then there were repeating birthdays.


This Bash script does the same thing: This article is about the UNIX shell named Bash. ...

 for i in `seq 1 30`; do echo $(($RANDOM%365+1)); done | sort -u | wc -l 

The following code is in Perl. It returns numbers found to be repeated on series generated. It is not unusual to get repetition for the 23 sample random values generated. Programming Republic of Perl logo Perl, also Practical Extraction and Report Language (a backronym, see below), is an interpreted procedural programming language designed by Larry Wall. ...

 perl -e 'for (1..23) {$h{int(rand(365)+1)}++}; for (keys %h) {print $_, ": ", $h{$_}, " timesn" if $h{$_} > 1}' 

This Objective CAML script prints the point at which the probability of finding shared birthdays exceeds 50%. For the original problem as framed, it prints the answer 23: Objective Caml, OCaml for short, is an advanced programming language that is part of the ML family. ... This article or section should be merged with script programming language In computer applications, a script, roughly speaking, is a computer program that automates the sort of task that a user might otherwise do interactively at the keyboard. ... The word probability derives from the Latin probare (to prove, or to test). ...

 #!/usr/bin/ocamlrun /usr/bin/ocaml let size = 365.;; let rec loop p i = let p' = (size -. (float (i-1))) *. p /. size in if p' < 0.5 then Printf.printf "answer = %dn" i else loop p' (i+1);; loop 1.0 2 

This script in the Python programming language prints the number of people and the probability that at least two of them sharing the same birthday, stopping when the probability exceeds 0.5: This article or section should be merged with script programming language In computer applications, a script, roughly speaking, is a computer program that automates the sort of task that a user might otherwise do interactively at the keyboard. ... Python is an interpreted, interactive programming language created by Guido van Rossum in 1990, originally as a scripting language for Amoeba OS capable of making system calls. ...

 days = 365 num_people = 1 prob = 0.0 print "Number of people: %d; Prob. of same birthday = %f" % (num_people,prob) while prob < 0.5: num_people = num_people + 1 prob = 1 - (1-prob)*(days-(num_people-1))/days print "Number of people: %d; Prob. of same birthday = %f" % (num_people,prob) 

Applications

The birthday paradox in its more generic sense applies to hash functions: the number of N-bit hashes that can be generated before probably getting a collision is not 2N, but rather only 2N/2. This is exploited by birthday attacks on cryptographic hash functions. A hash function is a function that converts an input from a (typically) large domain into an output in a (typically) smaller range (the hash value, often a subset of the integers). ... A bit (abbreviated b) is the most basic information unit used in computing and information theory. ... A birthday attack is a type of cryptographic attack which exploits the mathematics behind the birthday paradox, making use of a space_time tradeoff. ... In cryptography, a cryptographic hash function is a hash function with certain additional security properties to make it suitable for use as a primitive in various information security applications, such as authentication and message integrity. ...


The theory behind the birthday problem was used in [Schnabel 1938] under the name of capture-recapture statistics to estimate the size of fish population in lakes.


Unequal probabilities

As mentioned above, real-world birthday data are not equally distributed. The birthday problem for such non-constant birthday probabilities was tackled in [Klamkin 1967].


References

  • Zoe Emily Schnabel: "The estimation of the total fish population of a lake", American Mathematical Monthly 45 (1938), pages 348-352
  • M. Klamkin and D. Newman: "Extensions of the birthday surprise", Journal of Combinatorial Theory 3 (1967), pages 279-282.

The American Mathematical Monthly is a mathematical journal published 10 times each year by the Mathematical Association of America since 1894. ...

External links



 

COMMENTARY     


Share your thoughts, questions and commentary here
Your name
Your comments
Please enter the 5-letter protection code

Want to know more?
Search encyclopedia, statistics and forums:

 


Lesson Plans | Student Area | Student FAQ | Reviews | Press Releases |  Feeds | Contact
The Wikipedia article included on this page is licensed under the GFDL.
Images may be subject to relevant owners' copyright.
All other elements are (c) copyright NationMaster.com 2003-5. All Rights Reserved.
Usage implies agreement with terms.