FACTOID # 19: Single guys should check out The Virgin Islands, where the women outnumber the men.
 
 Home   Encyclopedia   Statistics   Countries A-Z   Flags   Maps   Education   Forum   FAQ   About 
 
WHAT'S NEW
RECENT ARTICLES
More Recent Articles »
 

SEARCH ALL

FACTS & STATISTICS    Advanced view

Search encyclopedia, statistics and forums:

 

 

(* = Graphable)

 

 


Encyclopedia > Survival analysis

Survival analysis is a branch of statistics which deals with death in biological organisms and failure in mechanical systems. This topic is called reliability theory or reliability analysis in engineering, and duration analysis or duration modeling in economics. Death or failure is called an "event" in the survival analysis literature, and so models of death or failure are generically termed time-to-event models. A graph of a bell curve in a normal distribution showing statistics used in educational assessment, comparing various grading methods. ... Reliability theory developed apart from the mainstream of probability and statistics, and was used originally as a tool to help nineteenth century maritime insurance and life insurance companies compute profitable rates to charge their customers. ... Buyers bargain for good prices while sellers put forth their best front in Chichicastenango Market, Guatemala. ...


Survival analysis attempts to answer questions such as: what is the fraction of a population which will survive past a certain time? Of those that survive, at what rate will they die or fail? Can multiple causes of death or failure be taken into account? How do particular circumstances or characteristics increase or decrease the odds of survival?


To answer such questions, it is necessary to define "lifetime". In the case of biological survival, death is unambiguous, but for mechanical reliability, failure may not be well-defined, for there may well be mechanical systems in which failure is partial, a matter of degree, or not otherwise not localized in time. Even in biological problems, some events (for example, heart attack or other organ failure) may have the same ambiguity. The theory outlined below assumes well-defined events at specific times; other cases may be better treated by models which explicitly account for ambiguous events. To meet Wikipedias quality standards, this article or section may require cleanup. ... Look up failure in Wiktionary, the free dictionary. ... A pocket watch. ... A myocardial infarction occurs when an atherosclerotic plaque slowly builds up in the inner lining of a coronary artery and then suddenly ruptures, totally occluding the artery and preventing blood flow downstream. ... Theory has a number of distinct meanings in different fields of knowledge, depending on the context and their methodologies. ...


The theory of survival present here also assumes that death or failure happens just once for each subject. Recurring event or repeated event models relax that assumption. The study of recurring events is relevant in systems reliability.


This article is phrased primarily in terms of biological survival, but this is just a convenience. An equivalent formulation in terms of mechanical failure can be made by replacing every occurrence of death with failure.

Contents


General formulation

Survival function

The object of primary interest is the survival function, conventionally denoted S, which is defined as

S(t) = Pr(T > t)

where t is some time, T is the time of death, and "Pr" stands for probability. That is: the survival function is the probability that the time of death is later than some specified time. The survival function is also called the survivor function or survivorship function in problems of biological survival, and the reliability function in mechanical survival problems. In the latter case, the reliability function is denoted R(t).


Usually one assumes S(0) = 1, although it could be less than 1 if there is the possibility of immediate death or failure. Some survival distributions (for example the Gaussian distribution) have the property that S(t) < 1 for all finite t, but this point can be finessed or ignored; see the discussion under "Some survival distributions" below.


The survival function must be non-increasing: S(u) <= S(t) if u > t. This expresses the notion that survival is only less probable as one ages. Given this property, the lifetime distribution function and event density (F and f below) are well-defined.


Survival probability is usually assumed to approach zero as age increases without bound, i.e., S(t) → 0 as t → ∞, although the limit could be greater than zero if eternal life is possible. Immortality is the concept of existing for a potentially infinite or indeterminate length of time. ...


Lifetime distribution function and event density

Related quantities are defined in terms of the survival function. The lifetime distribution function, conventionally denoted F, is defined as the complement of the survival function,

F(t) = Pr(T le t) = 1 - S(t)

and the derivative of F (i.e., the density function of the lifetime distribution) is conventionally denoted f,

f(t) = frac{d}{dt} F(t)

f is sometimes called the event density; it is the rate of death or failure events per unit time.


Hazard function and cumulative hazard function

The hazard function, conventionally denoted λ, is defined as the event rate at time t conditional on survival until time t or later, Exponential failure density functions A failure rate is the average frequency with which something fails. ...

lambda(t),dt = Pr(t < T < t+dt,|,T > t) = frac{f(t),dt}{S(t)} = -frac{S'(t),dt}{S(t)}

Force of mortality is a synonym of hazard function which is used particularly in demographics and actuarial science. The term hazard rate is another synonym. In a Life table, we consider the probability of a person dying from age (x) to (x+1), called qx. ... Demographics is a shorthand term for population characteristics. Demographics include race, age, income, mobility (in terms of travel time to work or number of vehicles available), educational attainment, home ownership, employment status, and even location. ... Actuarial science applies mathematical and statistical methods to finance and insurance, particularly to the assessment of risk. ...


The hazard function must be positive, λ(t) > 0, and its integral over [0, infty) must be infinite, but is not otherwise constrained; the hazard function may be increasing or decreasing, nonmonotonic, or discontinuous. An example is the "bathtub" hazard function, which is large for small values of t, decreasing to some minimum, and thereafter increasing again; this can model the property of some mechanical systems to either failure soon after operation, or much later, as the system ages. Exponential failure density functions A failure rate is the average frequency with which something fails. ...


The hazard function can alternatively be represented in terms of the cumulative hazard function, conventionally denoted Λ:

Lambda(t) = -log S(t),

so

frac{d}{dt} Lambda(t) = -frac{S'(t)}{S(t)} = lambda(t)

Λ is called the cumulative hazard function because the preceding definitions together imply

Lambda(t) = int_0^{t} lambda(u) , du,

which is the "accumulation" of the hazard over time.


From Λ(t) = − logS(t) we see that Λ(t) increases without bound as t tends to infinity (assuming S(t) tends to zero). This implies that λ(t) must not decrease too quickly, since the cumulative hazard diverges. For example, exp( − t) is not the hazard function of any survival distribution, because its integral converges (to 1).


Quantities derived from the survival distribution

Future lifetime at a given time t0 is denoted by the time remaining until death, thus future lifetime is Tt0 in the present notation. The expected future lifetime is the expected value of future lifetime. Now the event density given survival until t + t0 or later, given survival until t0, is just In probability theory (and especially gambling), the expected value (or mathematical expectation) of a random variable is the sum of the probability of each possible outcome of the experiment multiplied by its payoff (value). Thus, it represents the average amount one expects to win per bet if bets with identical...

frac{f(t+t_0)}{S(t_0)},

so the expected future lifetime is given by

frac{1}{S(t_0)} int_0^{infty} t,f(t+t_0),dt

For t0 = 0, i.e., at birth, this reduces to the expected lifetime.


In reliability problems, the expected lifetime is called the mean time to failure, and the expected future lifetime is called the mean residual lifetime.


The probability of individual survival until t or later is S(t), by definition. The expected number of survivors, in a population of n individuals, is n × S(t), assuming the same survival function for all. Thus the expected proportion of survivors is S(t), and the variance of the proportion of survivors is S(t) × (1-S(t)). In probability theory and statistics, the variance of a random variable is a measure of its statistical dispersion, indicating how far from the expected value its values typically are. ...


The age at which a specified proportion of survivors remain can be found by solving the equation S(t) = q for t, where q is the quantile in question. Typically one is interested in the median lifetime, for which q = 1/2, or other quantiles such as q = 0.90 or q = 0.99. This article or section should include material from percentile, quartile This article needs cleanup. ... In probability theory and statistics, a median is a number dividing the higher half of a sample, a population, or a probability distribution from the lower half. ...


One can also make more complex inferences from the survival distribution. In mechanical reliability problems, one can bring cost (or utility, more generally) into consideration and solve problems concerning repair or replacement. See age-replacement problem and durability for further discussion of this topic. In economics, utility is a measure of the happiness or satisfaction gained consuming good and services. ...


Some survival distributions

Survival models are constructed by choosing a basic survival distribution. It is straightforward to phrase model fitting and analysis in general terms, using the concepts outlined in under "General formulation", above. Thus it is relatively easy to substitute one distribution for another, in order to study the consequences of different choices.


The choice of survival distribution expresses some particular information about the relation of time and any exogenous variables to survival. It is natural to choose a statistical distribution which has non-negative support since survival times are non-negative. There are several distributions commonly used in survival analysis, which are listed in the table below. Additional types of distributions can be found in the references.

begin{matrix} & S(t)   mbox{Exponential (special case of Weibull)} & e^{t}   mbox{Weibull} & e^{-(t/lambda)^k}   mbox{Lognormal} & frac{1}{2}left(1-mathrm{Erf}left[frac{ln(t)-mu}{sigmasqrt{2}} right]right) end{matrix}

Fitting parameters to data

Survival models can be usefully viewed as ordinary regression models in which the response variable is time. However, computing the likelihood function (needed for fitting parameters or making other kinds of inferences) is complicated by missing data problems which are peculiar to time. The birth and death of a subject may be known, in which case the lifetime is known. More generally, it may be known only that the date of birth was prior to some date: this is called left censoring. Also, it may be known only that the date of death is after some date: this is called right censoring. The lifetime may be both right and left censored, which is sometimes called interval censoring. It may also happen that subjects with a lifetime less than some threshold may not be observed at all: this is called truncation. Note that truncation is different from left censoring, since for a left censored datum, we know the subject exists, but for a truncated datum, we may be completely unaware of the subject.


There are standard examples of censoring and truncation. Perhaps the most common is right censoring. If we examine a group of living subjects, we know that each one is alive today, but we do not know their future date of death. Left censoring is also common. For each subject, we know they are alive today but we may not know their date of birth. Truncation is also common. In a so-called delayed entry study, subjects are not observed at all until they have reached a certain age. For example, people may not be observed until they have reached the age to enter school. Any deceased subjects in the pre-school age group would be unknown.


The likelihood function for a survival model, in the presence of censored data, is formulated as follows. By definition the likelihood function is the joint probability of the data given the parameters of the model. It is customary to assume that the data are independent given the parameters. Then the likelihood function is the product of the likelihood of each datum. It is convenient to partition the data into four categories: uncensored, left censored, right censored, and interval censored. These are denoted "unc.", "l.c.", "r.c.", and "i.c." in the equation below. In statistics, a likelihood function is a conditional probability function considered a function of its second argument with its first argument held fixed, thus: and also any other function proportional to such a function. ...

L(theta) = prod_{T_iin unc.} Pr(T = T_i|theta) prod_{iin l.c.} Pr(T < T_i|theta) prod_{iin r.c.} Pr(T > T_i|theta) prod_{iin i.c.} Pr(T_{i,l} < T < T_{i,r}|theta)

For an uncensored datum, with Ti equal to the age at death, we have

Pr(T = T_i|theta) = f(T_i|theta)

For a left censored datum, such that the age at death is known to be less than Ti, we have

Pr(T < T_i|theta) = F(T_i|theta) = 1 - S(T_i|theta)

For a right censored datum, such that the age at death is known to be greater than Ti, we have

Pr(T > T_i|theta) = S(T_i|theta)

For an interval censored datum, such that the age at death is known to be greater than Ti,r and less than Ti,l, we have

Pr(T_{i,l} < T < T_{i,r}|theta) = S(T_{i,l}|theta) - S(T_{i,r}|theta)

See also

The Kaplan-Meier estimator provides an estimate of the survivor function from life-time data. ... Reliability theory developed apart from the mainstream of probability and statistics, and was used originally as a tool to help nineteenth century maritime insurance and life insurance companies compute profitable rates to charge their customers. ... Exponential failure density functions A failure rate is the average frequency with which something fails. ... The survival function, also known as a survivor function or reliability function, is a property of any random variable that maps a set of events, usually associated with mortality or failure of some system, onto time. ... In engineering and telecommunication, the mean time between failures (MTBF) is the average time a system will operate without a failure. ...

References


  Results from FactBites:
 
Survival analysis - Wikipedia, the free encyclopedia (1589 words)
Survival analysis is a branch of statistics which deals with death in biological organisms and failure in mechanical systems.
In the case of biological survival, death is unambiguous, but for mechanical reliability, failure may not be well-defined, for there may well be mechanical systems in which failure is partial, a matter of degree, or not otherwise not localized in time.
The likelihood function for a survival model, in the presence of censored data, is formulated as follows.
Survival (1310 words)
Survival analysis is used to study the pattern of survival or failure over time.
Survival analysis usually should be used if the variable of interest is a time to an event.
Survival function: S(t) (also called the survivorship function) the survival function shows the fraction of the original group who survive at various points in time.
  More results at FactBites »


 

COMMENTARY     


Share your thoughts, questions and commentary here
Your name
Your comments
Please enter the 5-letter protection code

Want to know more?
Search encyclopedia, statistics and forums:

 


Lesson Plans | Student Area | Student FAQ | Reviews | Press Releases |  Feeds | Contact
The Wikipedia article included on this page is licensed under the GFDL.
Images may be subject to relevant owners' copyright.
All other elements are (c) copyright NationMaster.com 2003-5. All Rights Reserved.
Usage implies agreement with terms.