# Discovering Statistics

A den for Learning

## Censoring Schemes

Many researchers consider survival data analysis to be merely the application of two conventional statistical methods to a special type of problem: parametric if the distribution of survival times is known to be normal and nonparametric if the distribution is unknown. This assumption would be true if the survival times of all the subjects were exact and known; however, some survival times are not. Further, the survival distribution is often skewed, or far from being normal. Thus there is a need for new statistical techniques. One of the most important developments is due to a special feature of survival data in the life sciences that occurs when some subjects in the study have not experienced the event of interest at the end of the study or time of analysis. For example, some patients may still be alive or disease-free at the end of the study period. The exact survival times of these subjects are unknown. These are called censored observations or censored times and can also occur when people are lost to follow-up after a period of study. When these are not censored observations, the set of survival times is complete.

### TYPES OF CENSORING:

There are three types of censoring namely –

• Type-I Censoring: Animal studies usually start with a fixed number of animals, to which the treatment or treatments is given. Because of time and/or cost limitations, the researcher often cannot wait for the death of all the animals. One option is to observe for a fixed period of time, say six months, after which the surviving animals are sacrificed. Survival times recorded for the animals that died during the study period are the times from the start of the experiment to their death. These are called exact or uncensored observations. The survival times of the sacrificed animals are not known exactly but are recorded as at least the length of the study period. These are called censored observations. Some animals could be lost or die accidentally. Their survival times, from the start of experiment to loss or death, are also censored observations. In type I censoring, if there are no accidental losses, all censored observations equal the length of the study period.For example, suppose that six rats have been exposed to carcinogens by injecting tumor cells into their foot pads. The times to develop a tumor of a given size are observed. The investigator decides to terminate the experiment after 30 weeks. Figure 1.1 is a plot of the development times of the tumors. Rats A, B, and D developed tumors after 10, 15, and 25 weeks, respectively. Rats C and E did not develop tumors by the end of the study; their tumor-free times are thus 30-plus weeks. Rat F died accidentally without tumors after 19 weeks of observation. The survival data (tumor-free times) are 10, 15, 30+, 25, 30+, and 19+ weeks. (The plus indicates a censored observation.)
• Type II Censoring: Another option in animal studies is to wait until a fixed portion of the animals have died, say 80 of 100, after which the surviving animals are sacrificed. In this case, type II censoring, if there are no accidental losses, the censored observations equal the largest uncensored observation. For example, in an experiment of six rats (Figure 1.2), the investigator may decide to terminate the study after four of the six rats have developed tumors. The survival or tumor-free times are then 10, 15, 35;, 25, 35, and 19; weeks.
• Type III Censoring: In most clinical and epidemiologic studies the period of study is fixed and patients enter the study at different times during that period. Some may die before the end of the study; their exact survival times are known. Others may withdraw before the end of the study and are lost to follow-up. Still others may be alive at the end of the study. For ‘‘lost’’ patients, survival times are at least from their entrance to the last contact. For patients still alive, survival times are at least from entry to the end of the study. The latter two kinds of observations are censored observations. Since the entry times are not simulta- neous, the censored times are also different. This is type III censoring. For example, suppose that six patients with acute leukemia enter a clinical study during a total study period of one year. Suppose also that all six respond to treatment and achieve remission. The remission times are plotted in Figure 1.3. Patients A, C, and E achieve remission at the beginning of the second, fourth, and ninth months, and relapse after four, six, and three months, respectively. Patient B achieves remission at the beginning of the third month but is lost to follow-up four months later; the remission duration is thus at least four months. Patients D and F achieve remission at the beginning of the fifth and tenth months, respectively, and are still in remission at the end of the study; their remission times are thus at least eight and three months. The respective remission times of the six patients are 4, 4+, 6, 8+, 3, and 3+ months.

NOTE: Type I and type II censored observations are also called singly censored data, and type III, progressively censored data, by Cohen (1965). Another commonly used name for type III censoring is random censoring. All of these types of censoring are right censoring or censoring to the right. There are also left censoring and interval censoring cases. Left censoring occurs when it is known that the event of interest occurred prior to a certain time t, but the exact time of occurrence is unknown. For example, an epidemiologist wishes to know the age at diagnosis in a follow-up study of diabetic retinopathy. At the time of the examination, a 50-year-old participant was found to have already develop- ed retinopathy, but there is no record of the exact time at which initial evidence was found. Thus the age at examination (i.e., 50) is a left-censored observation. It means that the age of diagnosis for this patient is at most 50 years.

## Estimation of Mean Survival Time and Variance of the Estimator for Type-I Censored Data [Right Censored]

Suppose the lifetimes for individuals in some population follow a distribution with pdf f(t;\theta) and distribution function F(t;\theta) and that the lifetimes t_1,t_2,\dots, t_n for a random sample of n individuals are observed. Then the likelihood function is given by:

L \left( \theta \right) = \prod_{i=1}^{n} f(t_i ; \theta)

However, in case of censored data this likelihood becomes quite absurd, as we never know the exact value for the censored observations. As a result of which finding an ml estimate of parameter(s) becomes quite impossible. An easy way out of it is to assume that T_1,T_2,\dots,T_n are random variables representing the lifetimes of the n individuals and writing each observation as a duplet of variables (t_i,\delta_i) where,

\delta_i = \left\{
\begin{matrix}
1 & \text{;if i-th observations is exact}\\
0 & \text{;if i-th observations is censored}
\right.

and

T_i = \left\{
\begin{matrix}
t_i & if & \delta_i = 1  \\
C_i & if & \delta_i =0
\end{matrix}
\text{Here, $C_i$ is taken to be value of the  i-th censored observation}

Under Right Censored Type-I Censoring Scheme, we know that the observations after a certain fixed time, say R , is not exactly known. Hence we define, the duplet (T_i,\delta_i) such that

\delta_i = \left\{
\begin{matrix}
1 & \text{;if } T_i \leq R \\
0 & \text{;if } T_i >R
\right. 

Under these circumstances, the likelihood function for an individual may be written as

l_i = (f(t_i))^{\delta_i} * (Pr[T_i>R])^{1-\delta_i}

and ultimately the likelihood for the data may be written as:

L(\theta) = \prod_{i=1}^n (f(t_i))^{\delta_i}*(S(R))^{1-\delta_i}

Now assuming that t_1 \leq t_2 \leq t_3 \leq \dots \leq t_n , say that there are ‘r’ exact observations we may simply write the likelihood as:

L(\theta) = \prod_{i=1}^r f(t_i) * \left( S(R) \right)^{n-r}

The log-likelihood is then given by:

l(\theta) = \sum_{i=1}^r log(f(t_i) + (n-r) log(S(R))

Then the MLE of \hat{\theta} of \theta is the set of \hat{\theta}_1,\hat{\theta}_2,\dots,\hat{\theta}_p such that,

l(\hat{\theta}) = max_{\theta} l({\theta})

Thus by default we are to solve the following system of equations to find the mle:

\frac{\delta l({\theta})}{\delta \theta_i} = 0 \forall \quad i=1(1)p

Using this the estimate of the mean survival time is given as:

h(\hat{\theta})=[E(T)]_{\theta=\hat{\theta}}

As, the estimated covariance matrix of the mle is given by:

\widehat{V(\hat{\theta})} = \left[ - \frac{\delta^2}{\delta\theta\delta\theta'} l(\hat{\theta}) \right]^{-1}

The estimated variance of the estimator of the mean survival time, by using the delta method, would be:

\widehat{V(h(\hat{\theta}))} = \frac{ \left(\widehat{V(\hat{\theta})}\right)^2}{n}*\left[ h'(\hat{\theta})\right]^2

## NOTE:

For left-type-I censored data with left censoring time L, the likelihood would be:

 L(\theta) = \prod_{i=1}^n (f(t_i))^{\delta_i}*(F(L))^{1-\delta_i}

For right-type II censored data with ‘r’ exact observations, the likelihood would be:

L(\theta) = \prod_{i=1}^n (f(t_{(i}))^{\delta_i}*(S(t_{(r)}))^{1-\delta_i} \\
\text{;where $t_{(r)}$ is the max of the exact observations}

For left-type II censored data with ‘r’ exact observations, the likelihood would be:

L(\theta) = \prod_{i=1}^n (f(t_{(i)}))^{\delta_i}*(F(t_{(1)}))^{1-\delta_i} \\
\text{;where $t_{(1)}$ is the min of the exact observations}

#### Reference:

1. Lee, E.T. and Wang, J.W. (2003): Statistical Methods for Survival data Analysis, 3rd Edition, John Wiley and Sons.
2. Statistical Models and Methods for Lifetime Data, Second Edition (Wiley Series in Probability and Statistics) by Jerald F. Lawless