International Journal of Probability and Statistics
p-ISSN: 2168-4871 e-ISSN: 2168-4863
2013; 2(2): 13-20
doi:10.5923/j.ijps.20130202.01
Rajarathinam A., Vinoth B.
Department of Statistics, Manonmaniam Sundaranar University, Tirunelveli, India
Correspondence to: Rajarathinam A., Department of Statistics, Manonmaniam Sundaranar University, Tirunelveli, India.
| Email: | ![]() |
Copyright © 2012 Scientific & Academic Publishing. All Rights Reserved.
The present study was undertaken to study the trends, growth rate and jump points in area, production and productivity of tobacco (Nicotiana tabacum) crop grown in Anand region of Gujarat state, India for the period 1949-50 to 2008-09 based on parametric and nonparametric regression models. In first step, parametric model approach was adopted to model the data; however it could not explain the sudden jumps. So the nonparametric regression approach, which requires fewer assumptions, was employed. It was shown that nonparametric regression with jump points provides a good description of data under consideration and gives statistical evidence of jump in area, production and productivity of crop under study.
Keywords: Nonlinear Models, Auto-correlation, Band Width, Kernel, Jump Points, Nonparametric Regression, Local Polynomial Regression
Cite this paper: Rajarathinam A., Vinoth B., Computations of Jump Points in Tobacco (Nicotiana Tabacum) Crop Production, International Journal of Probability and Statistics , Vol. 2 No. 2, 2013, pp. 13-20. doi: 10.5923/j.ijps.20130202.01.
is a non-linear regression model as the derivatives of Yx with respect a and b are both functions of a and /or b. Details about the family of non-linear models are mentioned in Bard[2], Seber and Wild[26], Ratkowsky[24], Draper and Smith[5] and Montgomery et al.,[17]. Like in linear regression, parameters in a non-linear model can also be estimated by the method of least squares. However, due to the difficulty in the procedure of computation, the common practice is to work with the log transformed model.The log transformation is valid only when error term ‘e’ in the above equation is multiplicative in nature. Thereafter, method of least square is used to estimate the unknown parameters. Furthermore, R2 value is calculated to measure the goodness of fit of the model.The log transformed procedure suffers from some important drawbacks.a). Original structure of the error term got disturbed due to transformation.b). R2 values computed, assess the goodness of fit of the transformed model and not of the original non-linear model.c). Proceeding further to carryout residual analysis for the residuals generated by the transformed model, will result in erroneous conclusion.As a remedy to these pitfalls, non-linear regression procedures are already developed in literature which necessitates computer intensive tools to find solution for the parameters (Venugopalan and Shamasundaran[30]). The following non-linear models are considered in the present investigation.
where Y is the area/production/productivity during the time X; A, B ,C and D are the parameters, and e is the error term. The parameter ‘C’ is the intrinsic growth rate and the parameter ‘A’ represents the carrying capacity for each model. Symbol B represents different functions of the initial value Y(0) and d is the added parameter. In addition to the above non-linear models some other non-linear models also are employed as per the data need.Four main methods are available in literature (Seber and Wild[26]) to obtain estimates of the unknown parameters of a non-linear regression model. These are: (i) Gauss-Newton method (ii) Steepest-Descent method (iii)Levenberg-Marquardt technique and (iv) Do not use derivative (DUD) method. However, in all these methods the following steps are carried out.Step (i). Starting with a good initial guess of the unknown parameters, a sequence of θ’s which hopefully converge to θ is computed.Step (ii). Error sum of squares or objective function expressed as
is minimized with respect to the current value of θ. The new estimates are obtained.Step (iii). By feeding the recently obtained estimates as the initial guess for the next iteration, objective function S(θ) is minimized again to obtain fresh estimates. This procedure is continued till the successive iteration yielded parameter estimate values are close to each other.
.To test the overall significance of the model the F test is used. F=
which follows F distribution with k (number of parameter in the model), (n-k-1) degrees of freedom. The individual regression co-efficient is tested using the t test t=
which follows t – distribution with (n-k-1) degrees of freedom, bj = estimated jth co-efficient and s.e.(bj) is the standard error of bj .In addition to the above, two more reliability statistics viz., Root Mean Square Error (RMSE) and Mean Absolute Error (MAE) are generally utilized to measure the adequacy of the fitted model and it can be computed as follows:
and
The lower the values of these statistics, the better is the fitted model.Before taking any final decision about the appropriateness of the fitted model, it is of paramount importance to investigate the basic assumptions regarding the error term, viz., randomness and normality. Normality of residuals is examined by using Shaprio-Wilks test (Agostid’no and Stephens[1]). Further, to test the presence or absence of autocorrelation in the data set Durbin-Watson test procedure (Lewis-Beck[13]) is utilized.![]() | (2.2.1) |
. The kernel weighted linear regression smoother (Fan[6]) is used to estimate the trend function non parametrically. The value of the local linear regression smoother at time x is the solution of a0 to the following weighted least squares problem:
where K is a bounded symmetric kernel density function and h is the bandwidth. Let
and
be the solutions to the weighted least squares problem. The estimate of the trend function m(t) is given by
where
and
The optimum bandwidth h can be obtained by the method of cross-validation. The slope m|(x) of m(x) can be considered as the simple linear growth rate at the time point x. The estimate of m|(x) is given by
Where
Under the assumption that the trend function m is smooth and m(x) ≠ 0 for all x
[0,1], the value of the relative growth rate at time X can be written as :
Since
and
, a consistent estimate of the relative growth rate rx is given by :
Taking arithmetic mean, the requisite compound growth rate over a given time-period may be obtained.
, i=1,…,n) at τ
[h, 1-h] with jump sizes
and
for the function m and its slop m’, respectively, is defined in the following sense m (τ +) – m (τ) =
and m’ (τ +) – m’ (τ) =
For any change point τ, we have some i, 1 ≤ i ≤ n, such that ti ≤ τ ≤ ti+1. However, the data cannot be used to distinguish possible changes in the interval. Therefore we assume that the change points occur at any of the design points in the interval[h, 1−h] and the distance between any two change points is greater than h.![]() | (2.3.1) |
[h, 1-h] with jump sizes
and
for the function m and its slop m|, respectively, then the minimization problem
becomes Minimize
where I is the indicator function and
be the coefficient vector.To estimate the change point, fit the following weighted least squares regression corresponding to all tk [h, (1−h)]:Minimize
The solution to the above weighted least squares problem is given by:
where, 
and Y’ =[y1 y2 … yn]The regression sum of squares due to
is given by:
The residual sum of squares is given by:
The ratio of the mean regression sum of squares of
to the mean residual sum of squares with t = tk is given by :
The estimate of the jump points is given by:
and the corresponding estimates of the coefficient vector
be the estimates of the jump sizes. The above procedure can easily be extended to the case of more than one jump point. Let there be q jump points for the regression function m and/or its derivative at tτj, j=1… q, then the estimates of the change points are given by: 
where
and the corresponding estimates of the co-efficient vector ∆ =[
] be the estimates of the jump sizes. If the number of jump points is not known in advance the above sequential procedure continues for j=1,..., p(say), where p is fixed in such a way that the max(st), t∈ AP is greater than or equal to its critical value Cx(p) and max(st) . Here, t∈ AP +1 is less than its critical value Cx(p + 1). ![]() | (3.1.1) |
|
|
|
![]() | (3.2.1) |
![]() | (3.3.1) |
![]() | Figure 1. Trends in Area based on nonparametric regression |
![]() | Figure 2. Trends in Production based on nonparametric regression |
![]() | Figure 3. Trends in Productivity based on nonparametric regression |