FACTOID # 129: ‘Dollar’ is the most common currency name, followed by ‘franc,’ ‘pound,’ ‘dinar,’ ‘peso,’ and ‘rupee.’
 
 Home   Encyclopedia   Statistics   Countries A-Z   Flags   Maps   Education   Forum   FAQ   About 
 
WHAT'S NEW
RECENT ARTICLES
More Recent Articles »
 

SEARCH ALL

FACTS & STATISTICS    Advanced view

Search encyclopedia, statistics and forums:

 

 

(* = Graphable)

 

 


Encyclopedia > Box plot
Figure 1. Box plot of data from the Michelson-Morley Experiment
Figure 1. Box plot of data from the Michelson-Morley Experiment

In descriptive statistics, a boxplot (also known as a box-and-whisker diagram or plot or candlestick chart) is a convenient way of graphically depicting the five-number summary, which consists of the smallest observation, lower quartile (Q1), median, upper quartile (Q3), and largest observation; in addition, the boxplot indicates which observations, if any, are considered unusual, or outliers. The boxplot was invented in 1977 by American statistician John Tukey. Image File history File links This is a lossless scalable vector image. ... Image File history File links This is a lossless scalable vector image. ... The Michelson-Morley experiment, one of the most important and famous experiments in the history of physics, was performed in 1887 by Albert Michelson and Edward Morley at what is now Case Western Reserve University, and is considered by some to be the first strong evidence against the theory of... Descriptive statistics are used to describe the basic features of the data in a study. ... FUCK U ... In descriptive statistics, a quartile is any of the three values which divide the sorted data set into four equal parts, so that each part represents 1/4th of the sample or population. ... In probability theory and statistics, a median is a number dividing the higher half of a sample, a population, or a probability distribution from the lower half. ... In descriptive statistics, a quartile is any of the three values which divide the sorted data set into four equal parts, so that each part represents 1/4th of the sample or population. ... Figure 1. ... John Wilder Tukey (June 16, 1915 - July 26, 2000) was a statistician. ...


Boxplots are able to visually show different types of populations, without any assumptions of the statistical distribution. The spacings between the different parts of the box help indicate variance, skew and identify outliers. Boxplots can be drawn either horizontally or vertically. In mathematics and statistics, a probability distribution is a function of the probabilities of a mutually exclusive and exhaustive set of events. ... In probability theory and statistics, the variance of a random variable (or somewhat more precisely, of a probability distribution) is a measure of its statistical dispersion, indicating how its possible values are spread around the expected value. ... Example of the experimental data with non-zero skewness (gravitropic response of wheat coleoptiles, 1,790) In probability theory and statistics, skewness is a measure of the asymmetry of the probability distribution of a real-valued random variable. ... Figure 1. ...

Contents

Construction

For a data set, one constructs a horizontal box plot in the following manner: A data set (or dataset) is a collection of data, usually presented in tabular form. ...

  • Calculate the first quartile (x.25), the median (x.50) and third quartile (x.75)
  • Calculate the interquartile range (IQR) by subtracting the first quartile from the third quartile. (x.75-x.25)
  • Construct a box above the number line bounded on the left by the first quartile (x.25) and on the right by the third quartile (x.75). The box may be as tall as one likes, although reasonably proportioned boxplots are customary.
  • Indicate where the median lies inside of the box with the presence of a symbol or a line dividing the box at the median value.
  • The mean value of the data can also be labeled with a point.
  • Any data observation which lies more than 1.5*IQR lower than the first quartile or 1.5*IQR higher than the third quartile is considered an outlier. Indicate where the smallest value that is not an outlier is by a vertical tic mark or "whisker", and connect the whisker to the box via a horizontal line. Likewise, indicate where the largest value that is not an outlier is by a "whisker", and connect that whisker to the box via another horizontal line.
  • Indicate outliers by open and closed dots. "Extreme" outliers, or those which lie more than three times the IQR to the left and right from the first and third quartiles, respectively, are indicated by the presence of an open dot. "Mild" outliers - that is, those observation which lie more than 1.5 times the IQR from the first and third quartile but are not also extreme outliers are indicated by the presence of a closed dot.
  • Add an appropriate label to the number line and title the boxplot.
  • A boxplot may be constructed in a similar manner vertically as opposed to horizontally by merely interchanging "bottom" for "left" and "top" for "right" in the above description.

In descriptive statistics, a quartile is any of the three values which divide the sorted data set into four equal parts, so that each part represents 1/4th of the sample or population. ... In probability theory and statistics, a median is a number dividing the higher half of a sample, a population, or a probability distribution from the lower half. ... In descriptive statistics, a quartile is any of the three values which divide the sorted data set into four equal parts, so that each part represents 1/4th of the sample or population. ... In descriptive statistics, the interquartile range (IQR) is the difference between the third and first quartiles and is a measure of statistical dispersion. ... Figure 1. ... Figure 1. ...

Example

A plain-text version might look like this:

 +-----+++ o * |-------| + | |---| +-----+-+ +---+---+---+---+---+---+---+---+---+---+ number line 0 1 2 3 4 5 6 7 8 9 10 

For this data set: A data set (or dataset) is a collection of data, usually presented in tabular form. ...

  • smallest non-outlier observation = 5 (left "whisker")
  • lower (first) quartile (Q1, x.25) = 7
  • median (second quartile) (Med, x.5) = 8.5
  • upper (third) quartile (Q3, x.75) = 9
  • largest non-outlier observation = 10
  • interquartile range, IQR = Q3 − Q1 = 2
  • the value 3.5 is a "mild" outlier, between 1.5*(IQR) and 3*(IQR) below Q1
  • the value 0.5 is an "extreme" outlier, more than 3*(IQR) below Q1
  • the data is skewed to the left (negatively skewed)

The horizontal lines (the "whiskers") extend to at most 1.5 times the box width (the interquartile range) from either or both ends of the box. They must end at an observed value, thus connecting all the values outside the box that are not more than 1.5 times the box width away from the box. Three times the box width marks the boundary between "mild" and "extreme" outliers. In this boxplot, "mild" and "extreme" outliers are differentiated by closed and open dots, respectively. Figure 1. ... In descriptive statistics, the interquartile range (IQR) is the difference between the third and first quartiles and is a measure of statistical dispersion. ... Figure 1. ... Figure 1. ... Example of the experimental data with non-zero skewness (gravitropic response of wheat coleoptiles, 1,790) In probability theory and statistics, skewness is a measure of the asymmetry of the probability distribution of a real-valued random variable. ... In descriptive statistics, the interquartile range (IQR) is the difference between the third and first quartiles and is a measure of statistical dispersion. ...


There are alternative implementations of this detail of the box plot in various software packages, such as the whiskers extending to at most the 5th and 95th (or some more extreme) percentiles. Such approaches do not conform to Tukey's definition, with its emphasis on the median in particular and counting methods in general, and they tend to produce "outliers" for all data sets larger than ten, no matter what the shape of the distribution. John Wilder Tukey (June 16, 1915 - July 26, 2000) was a statistician. ...


Visualization

Figure 2. Boxplot and Probability Density Function (pdf) of a Normal N(0,1σ2) Population

The boxplot is a quick graphic approach for examining one or more sets of data. Boxplots may seem more primitive than a histogram or probability density function (pdf) but they do have some advantages. Besides saving space on paper, boxplots are quicker to generate by hand. Histograms and probability density functions require assumptions of the statistical distribution. This assumption can be a major barrier because binning techniques can heavily influence the histogram and incorrect variance calculations will heavily affect the probability density function. Image File history File links Size of this preview: 577 × 600 pixel Image in higher resolution (703 × 731 pixel, file size: 9 KB, MIME type: image/png) Nishiguchi File history Legend: (cur) = this is the current file, (del) = delete this old version, (rev) = revert to this old version. ... Image File history File links Size of this preview: 577 × 600 pixel Image in higher resolution (703 × 731 pixel, file size: 9 KB, MIME type: image/png) Nishiguchi File history Legend: (cur) = this is the current file, (del) = delete this old version, (rev) = revert to this old version. ... Example of a histogram of 100 normally distributed random values. ... In mathematics, a probability density function (pdf) is a function that represents a probability distribution in terms of integrals. ... In mathematics and statistics, a probability distribution is a function of the probabilities of a mutually exclusive and exhaustive set of events. ...


Because looking at a statistical distribution is more intuitive than looking at a boxplot, comparing the boxplot against the probability density function (theoretical histogram) for a normal N(0,1σ2) distribution may be a useful tool for understanding the boxplot (Figure 2).


See also

Exploratory data analysis (EDA) is that part of statistical practice concerned with reviewing, communicating and using data where there is a low level of knowledge about its cause system. ...

References

  • John W. Tukey. "Exploratory Data Analysis". Addison-Wesley, Reading, MA. 1977.
  • Michael Frigge and David C. Hoaglin and Boris Iglewicz. "Some Implementations of the Boxplot". The American Statistician. Vol. 43 (1), February 1989. 50–54.
  • Yoav Benjamini. "Opening the Box of a Boxplot". The American Statistician. Vol 42 (4), November 1988. 257–262.
  • Peter J. Rousseeuw, Ida Ruts and John W. Tukey. "The Bagplot: A Bivariate Boxplot". The American Statistician. Vol 53 (4), November 1999. 382–387.

Pearson can mean Pearson PLC the media conglomerate. ...

External links


  Results from FactBites:
 
1.3.3.7. Box Plot (542 words)
Calculate the median and the quartiles (the lower quartile is the 25th percentile and the upper quartile is the 75th percentile).
Plot a symbol at the median (or draw a line) and draw a box (hence the name--box plot) between the lower and upper quartiles; this box represents the middle 50% of the data--the "body" of the data.
Plot a symbol at the median and draw a box between the lower and upper quartiles.
Box Plot (184 words)
This attribute can be named in a bucket that is dropped on the box plot or can be one of the states of a component that is dropped on the box plot.
If a bucket is dropped on the box plot then the bin controls for the box plot are those of the bucket.
If a component is dropped on the box plot then there is a hidden bucket associated with the box plot and the bin controls are associated with the box plot and are accessible from the box plot control panel.
  More results at FactBites »


 

COMMENTARY     


Share your thoughts, questions and commentary here
Your name
Your comments
Please enter the 5-letter protection code

Want to know more?
Search encyclopedia, statistics and forums:

 


Lesson Plans | Student Area | Student FAQ | Reviews | Press Releases |  Feeds | Contact
The Wikipedia article included on this page is licensed under the GFDL.
Images may be subject to relevant owners' copyright.
All other elements are (c) copyright NationMaster.com 2003-5. All Rights Reserved.
Usage implies agreement with terms.