Estimation of Allele and Extinction Probability from Generations of an Ancestor

Babulal Seal

American Journal of Mathematics and Statistics

p-ISSN: 2162-948X e-ISSN: 2162-8475

2013; 3(5): 288-295

doi:10.5923/j.ajms.20130305.06

Estimation of Allele and Extinction Probability from Generations of an Ancestor

Abstract
Reference
Full-Text PDF
Full-text HTML

Babulal Seal

Department of Statistics, Burdwan University, Burdwan, W.B., India

Correspondence to: Babulal Seal, Department of Statistics, Burdwan University, Burdwan, W.B., India.

Email:

Abstract

In population genetics estimation of alleles for a gene is important to understand the character of the population. Several works are available in the literature and these are based on gene-type data of a population. They collect data from several units. But they do not collect data of an unit from the generations for estimating this. Considering the variation of genes from the generations of an unit, an estimation procedure of di-allele is attempted. Though it started from an unit, this method can be extended for many units also. However, variations over generations are considered and with this kind of data-format the extinction probability is estimated which is logical as variations of gene along generations are important. These estimates of di-alleles have been computed. Also the performance of estimators based on three generations and four generations are compared. As expected from common sense, it is seen in findings also that if one takes data from more generations then estimator becomes improved. For comparing these risks using three generations and four generations were computed and plotted. Along this line the work can be generalized by considering many units and more generations.

Keywords: Branching Process, Probability Generating Function, m. l. e.

Cite this paper: Babulal Seal, Estimation of Allele and Extinction Probability from Generations of an Ancestor, American Journal of Mathematics and Statistics, Vol. 3 No. 5, 2013, pp. 288-295. doi: 10.5923/j.ajms.20130305.06.

Article Outline

1. Introduction

2. Method for Estimation of Probability of Extinction

3. General Case

4. Comparison of Two Estimators Using 3 Generations and 4 Generations

5. Concluding Remarks

1. Introduction

Extinction probability[1] of a gene and its estimation[2] is important to understand whether a character will be vanished in the long run or not. Being motivated while working with estimation of extinction probability, it is found that the technique may be used for estimating allele also. For this we may use observations from a population or may use data from the generation chain of a single unit.

In population genetics estimation of alleles for a gene is important to understand the character of the population. Several works are available in the literature eg[3] these are based on gene-type data of the population. They collect data from several units. But they do not use data of an unit from the generations for estimating this. Considering the variation of genes from the generations of an unit, an estimation procedure of di-allele is attempted. In fact this is a kind of branching process[4]&[5]. Estimation of some branching processes are discussed in[6] &[7]. In section 2, estimation from three generation data is given by using m.l.e. Here estimation of

, proportion of allele ‘a’ is given and

is proportion of allele ‘A’.

For simplicity we may pretend that a unit gives birth at most two offspring having

= probability of being male. In that section using one unit, the estimates are given and estimation of extinction probability is also given. The risk function[8] is also calculated. In section 3 same things are carried out using four generation data. This is for simple type of branching process as described. There are many types of branching processes also e.g.[9].

In section 4 two risk functions are obtained from results of section 2 and section 3 graphically by using R- program and using squared error loss function. It is found that the second one is better for most of

values. This is logical because more generations give the true reflection of the allele as was partially thought. It is believed that if we use more generations then estimate of

value will improve. However, that is a point of more investigations. But up to these generations it seems to be consistent.

The reason that first one is slightly better for

may be the following. Small

implies more ‘males’ but using more generations implies variation among data set i.e., mixing of ‘male’, ‘female’ information contradict that truly there are only ‘males’.

2. Method for Estimation of Probability of Extinction

Suppose we want to make inference about branching process using data over generations. Let the generations be

We consider the simple case where a particle gives birth at most 2 offspring as

Now at

if one stars with one unit, then in the following all possible paths through the three generations may be observed. In the following, possible sample values together with transition probabilities inside the 1st bracket are given for each generation (column-wise).

It is to be noted that the probability generating function (p.g.f.) from a single unit is

The p.g.f. obtain from two units X, Y will be

So the possible paths together with probabilities will be:

Table 1. Paths and probabilities using 3 generations

Now let us suppose that we have

i.i.d observations from the above distribution. Then the joint density becomes

where

and t_i=number of s_i sample for i=1,2,…,9

Therefore

In order to obtain maximum likelihood estimate

where

Now

Hence above estimate is a maximum likelihood estimate.

Now based on one sample path the estimates are as follows:

Table 2. Paths, summary of statistic and estimates of parameter

From this estimate one can get idea of being extinct or fate of the population.

Probability of extinction and its estimate in this case:

Now to get extinction probability we solve the following equation:

where

Therefore,

So the extinction probability is

& it’s estimate will be

where

are given in (2.1).

Remark 2.1

It is to be noted that

Now let us choose

be such that

then

So if

then

we have

is an

function of

If we collect huge data and obviously this is not a difficult task, then from each subset of data one can obtain estimates. From several such estimate one can find the prior distribution of θ and if it is seen that it has negligible weights beyond

then there is nothing to be worried about of being extinct.

Remark 2.2

Actually θ is controlled by the sex gene factor and from one generation we can not get good information about that. For this we should read pattern of transition from one generation to another. But for practical purposes we may not get data for many generations. But at least using 3 or 4 generations we should study the transitions.

Remark 2.3

Above can be used in testing the hypothesis regarding the sex gene i.e.

or one can get idea whether two communities have equal θ‘s through usual Neyman Pearson test. Before doing that we are to compute its statistical curvature as the above family is a curved exponential family. We can say that tests of curved family holds good if its statistical curvature is small i.e., it does not deviate from regular 2-parameter exponential family much.

3. General Case

Now depending on the lengths of chain let us see the behavior of the estimator and compare this case with previous one via MSE

If we take four generation then the sample paths, probabilities will be as in the following.

Table 3. Paths and probabilities of such paths

Estimator of the parameter using above paths

Table 4. Estimator of the parameter

4. Comparison of Two Estimators Using 3 Generations and 4 Generations

R-codes for computations of the risk functions and plotting of the risk functions.

Figure 1. Comparison of two risk function

5. Concluding Remarks

There are enough scopes for elaborating the inference part like observing samples with more generations. The testing part in this regard should be carefully handled, as the family distribution here is a curved exponential family.

References

[1]	Galton, F., and Watson, H. W. (1874): On the probability of extinction of families. J. of Anthr. Inst., 6, 138-144.
[2]	Guttorp, P. (1991): Statistical Inference for Branching Processes. Wiley, New York.
[3]	Ben Hui Liu (1997): Statistical Genomics. CRC Press.
[4]	Arthreya, K. B., and Ney, P. E. (1972): Branching Processes. Springer, New York.
[5]	Mode, C. J. (1971): Multi type Branching Processes. Elsevier, New York.
[6]	Keiding, N., and Lauritzen, S. (1978): Marginal Maximum Likelihood Estimates and Estimation of the Offspring’s Mean in Branching Process. Scand. J. Stat., 5, 106-110.
[7]	Nagaev, A. E. (1967): On Estimating the Expected Number of Direct Descendants of a Particle in a Branching Process. Theory Prob. Appl.,12, 314-320.
[8]	Berger, J. O. (1985): Statistical Decision Theory and Related Topics. Springer.
[9]	Seal, B. (2013): A Generalized Branching Process with Dependent Branching Times. International Journal of Statistics and Analysis. 3. 159-166

Paper Information

Journal Information

Estimation of Allele and Extinction Probability from Generations of an Ancestor

Article Outline

1. Introduction

2. Method for Estimation of Probability of Extinction

3. General Case

4. Comparison of Two Estimators Using 3 Generations and 4 Generations

5. Concluding Remarks

References