FACTOID # 137: Taiwan and Luxembourg are the only countries in the world where the mobile phones outnumber the people!
 
 Home   Encyclopedia   Statistics   Countries A-Z   Flags   Maps   Education   Forum   FAQ   About 
 
WHAT'S NEW
RELATED ARTICLES
People who viewed "Histogram" also viewed:
RECENT ARTICLES
More Recent Articles »
 

Encyclopedia > Histogram
An example histogram of the heights of 31 Black Cherry trees.
An example histogram of the heights of 31 Black Cherry trees.

In statistics, a histogram is a graphical display of tabulated frequencies. It shows what proportion of cases fall into each of several categories. A histogram differs from a bar chart in that it is the area of the bar that denotes the value, not the height, a crucial distinction when the categories are not of uniform width (Lancaster, 1974). The categories are usually specified as non-overlapping intervals of some variable. The categories (bars) must be adjacent. An image histogram is a histogram of the values of the pixels in a digital image. ... Not to be confused with Image histogram. ... Binomial name Prunus serotina Ehrh. ... This article is about the field of statistics. ... ... In statistics the frequency of an event i is the number ni of times the event occurred in the experiment or the study. ... For Wikipedias categorization projects, see Wikipedia:Categorization. ... A bar chart is a chart with rectangular bars of lengths usually proportional to the magnitudes or frequencies of what they represent. ... The term interval is used in the following contexts: cricket mathematics music time This is a disambiguation page — a navigational aid which lists other pages that might otherwise share the same title. ...


The word histogram is derived from Greek: histos 'anything set upright' (as the masts of a ship, the bar of a loom, or the vertical bars of a histogram); gramma 'drawing, record, writing'. The histogram is one of the seven basic tools of quality control, which also include the Pareto chart, check sheet, control chart, cause-and-effect diagram, flowchart, and scatter diagram. A generalization of the histogram is kernel smoothing techniques. This will construct a very smooth probability density function from the supplied data. Pareto Chart A Pareto Chart is a special type of Histogram where the values being plotted are arranged in descending order. ... The check sheet is a simple document that is used for collecting data in real-time and at the location where the data is generated. ... In statistical process control, the control chart, also known as the Shewhart chart or process-behaviour chart is a tool to determine whether a manufacturing or business process is in a state of statistical control or not. ... The Ishikawa diagram is a graphical method for finding the most likely causes for an undesired effect. ... A flowchart that a member of the Wikipedia community could use for guidance when dealing with a difficult editor. ... A scatterplot or scatter graph is a graph used in statistics to visually display and compare two or more sets of related quantitative, or numerical, data by displaying only finitely many points, each having a coordinate on a horizontal and a vertical axis. ... A Kernel is a weighting function used in non-parametric estimation techniques. ... In mathematics, a probability density function (pdf) is a function that represents a probability distribution in terms of integrals. ...

Contents

Examples

As an example we consider data collected by the U.S. Census Bureau on time to travel to work (2000 census, [1], Table 5). The census found that there were 124 million people who work outside of their homes.This rounding is a common phenomenon when collecting data from people. The United States Census Bureau (officially Bureau of the Census as defined in Title ) is a part of the United States Department of Commerce. ...

Histogram of travel time, US 2000 census. Area under the curve equals the total number of cases. This diagram uses Q/width from the table.
Histogram of travel time, US 2000 census. Area under the curve equals the total number of cases. This diagram uses Q/width from the table.
Data by absolute numbers
Interval Width Quantity Quantity/width
0 5 4180 836
5 5 13687 2737
10 5 18618 3723
15 5 19634 3926
20 5 17981 3596
25 5 7190 1438
30 5 16369 3273
35 5 3212 642
40 5 4122 824
45 15 9200 613
60 30 6461 215
90 60 3435 57

This histogram shows the number of cases per unit interval so that the height of each bar is equal to the proportion of total people in the survey who fall into that category. The area under the curve represents the total number of cases (124 million). This type of histogram shows absolute numbers.
In mathematics, the unit interval is the interval [0,1], that is the set of all real numbers x such that zero is less than or equal to x and x is less than or equal to one. ...

Histogram of travel time, US 2000 census. Area under the curve equals 1. This diagram uses Q/total/width from the table.
Histogram of travel time, US 2000 census. Area under the curve equals 1. This diagram uses Q/total/width from the table.
Data by proportion
Interval Width Quantity (Q) Q/total/width
0 5 4180 0.0067
5 5 13687 0.0220
10 5 18618 0.0300
15 5 19634 0.0316
20 5 17981 0.0289
25 5 7190 0.0115
30 5 16369 0.0263
35 5 3212 0.0051
40 5 4122 0.0066
45 15 9200 0.0049
60 30 6461 0.0017
90 60 3435 0.0004

This histogram differs from the first only in the vertical scale. The height of each bar is the decimal percentage of the total that each category represents, and the total area of all the bars is equal to 1, the decimal equivalent of 100%. The curve displayed is a simple density estimate. This version shows proportions, and is also known as a unit area histogram.
In astronomy, geography, geometry and related sciences and contexts, a direction passing by a given point is said to be vertical if it is locally aligned with the gradient of the gravity field, i. ... In probability and statistics, density estimation is the construction of an estimate, based on observed data, of an unobservable underlying probability density function. ...


In other words a histogram represents a frequency distribution by means of rectangles whose widths represent class intervals and whose areas are proportional to the corresponding frequencies. They only place the bars together to make it easier to compare data.


Activities and demonstrations

The SOCR resource pages contain a number of hands-on interactive activities demonstrating the concept of a histogram, histogram construction and manipulation using Java applets and charts. The Statistics Online Computational Resource (SOCR) is a suite of online tools and interactive aids for hands-on learning and teaching concepts in statistical analyses and probability theory. ...


Mathematical definition

In a more general mathematical sense, a histogram is a mapping mi that counts the number of observations that fall into various disjoint categories (known as bins), whereas the graph of a histogram is merely one way to represent a histogram. Thus, if we let n be the total number of observations and k be the total number of bins, the histogram mi meets the following conditions:


n = sum_{i=1}^k{m_i}.


Cumulative histogram

A cumulative histogram is a mapping that counts the cumulative number of observations in all of the bins up to the specified bin. That is, the cumulative histogram Mi of a histogram mi is defined as:


M_i = sum_{j=1}^i{m_j}


Number of bins and width

There is no "best" number of bins, and different bin sizes can reveal different features of the data. Some theoreticians have attempted to determine an optimal number of bins, but these methods generally make strong assumptions about the shape of the distribution. You should always experiment with bin widths before choosing one (or more) that illustrate the salient features in your data.


The number of bins k can be calculated directly, or from a suggested bin width h:

k = left lceil frac{max x - min x}{h} right rceil

The braces indicate the ceiling function. The floor and fractional part functions In mathematics, the floor function of a real number x, denoted or floor(x), is the largest integer less than or equal to x (formally, ). For example, floor(2. ...

Sturges' formula[1]
k = lceil log_2 n + 1 rceil

which implicitly bases the bin sizes on the range of the data, and can perform poorly if n < 30.

Scott's choice[2]
h = frac{3.5 s}{n^{1/3}}

where h is the common bin width, and s is the sample standard deviation. In probability and statistics, the standard deviation of a probability distribution, random variable, or population or multiset of values is a measure of statistical dispersion of its values. ...

Freedman-Diaconis' choice[3]
h = 2 frac{operatorname{IQR}(x)}{n^{1/3}}

which is based on the interquartile range In descriptive statistics, the interquartile range (IQR), also called the midspread and middle fifty is the range between the third and first quartiles and is a measure of statistical dispersion. ...


Continuous data

The idea of a histogram can be generalized to continuous data. Let f in L^1(R) (see Lebesgue space), then the cumulative histogram operator H can be defined by: The title given to this article is incorrect due to technical limitations. ...

H(f)(y) = with only finitely many intervals of monotony this can be rewritten as
h(f)(y) = sum_{&# 0;in{x : f(x)=y}} frac{1}{|f'(&# 0;|}.

h(f)(y) is undefined if y is the value of a stationary point. The term interval is used in the following contexts: cricket mathematics music time This is a disambiguation page — a navigational aid which lists other pages that might otherwise share the same title. ... Monotonicity redirects here. ... Stationary points (red pluses) and inflection points (green circles). ...


See also

In probability and statistics, density estimation is the construction of an estimate, based on observed data, of an unobservable underlying probability density function. ... Freedman-Diaconis rule is used to specify the number of bins to be used in a histogram. ... An image histogram is a histogram of the values of the pixels in a digital image. ... In statistics, the Parzen window method (or kernel density estimation), named after Emanuel Parzen, is a way of estimating the probability density function of a random variable. ...

References

  1. ^ Sturges, H. A. (1926). "The choice of a class interval". J. American Statistical Association: 65–66. 
  2. ^ Scott, David W. (1979). "On optimal and data-based histograms". Biometrika 66 (3): 605–610. doi:10.1093/biomet/66.3.605. 
  3. ^ Freedman, David; Diaconis, P. (1981). "On the histogram as a density estimator: L2 theory". Zeitschrift für Wahrscheinlichkeitstheorie und verwandte Gebiete 57 (4): 453–476. doi:10.1007/BF01025868. 
  • Lancaster, H.O. An Introduction to Medical Statistics. John Wiley and Sons. 1974. ISBN 0 471 51250-8

A digital object identifier (or DOI) is a standard for persistently identifying a piece of intellectual property on a digital network and associating it with related data, the metadata, in a structured extensible way. ... A digital object identifier (or DOI) is a standard for persistently identifying a piece of intellectual property on a digital network and associating it with related data, the metadata, in a structured extensible way. ... 1888 advertisement for Websters Dictionary Websters Dictionary is a common title given to English language dictionaries in the United States, deriving its name from American lexicographer Noah Webster. ...

External links


  Results from FactBites:
 
Histograms: A Tutorial (1183 words)
The histogram clearly portrays information on location, spread, and shape that enables the user to perceive subtleties regarding the functioning of the physical process that is generating the data.
For the histogram to be representative of the true process behavior, as a general rule, at least fifty (50) samples should be measured.
Histograms are limited in their use due to the random order in which samples are taken and lack of information about the state of control of the process.
1.3.3.14. Histogram (482 words)
The purpose of a histogram (Chambers) is to graphically summarize the distribution of a univariate data set.
The cumulative histogram is a variation of the histogram in which the vertical axis gives not just the counts for a single bin, but rather gives the counts for that bin plus all bins for smaller values of the response variable.
The histogram is demonstrated in the heat flow meter data case study.
  More results at FactBites »

 

COMMENTARY     


Share your thoughts, questions and commentary here
Your name
Your location
Your comments
Please enter the 5-letter protection code


Lesson Plans | Student Area | Student FAQ | Reviews | Press Releases |  Feeds | Contact
The Wikipedia article included on this page is licensed under the GFDL.
Images may be subject to relevant owners' copyright.
All other elements are (c) copyright NationMaster.com 2003-5. All Rights Reserved.
Usage implies agreement with terms.