American Journal of Mathematics and Statistics
p-ISSN: 2162-948X e-ISSN: 2162-8475
2017; 7(4): 169-178
doi:10.5923/j.ajms.20170704.05

Samsad Jahan
Department of Arts and Sciences, Ahsanullah University of Science and Tehcnology, Dhaka, Bangladesh
Correspondence to: Samsad Jahan, Department of Arts and Sciences, Ahsanullah University of Science and Tehcnology, Dhaka, Bangladesh.
| Email: | ![]() |
Copyright © 2017 Scientific & Academic Publishing. All Rights Reserved.
This work is licensed under the Creative Commons Attribution International License (CC BY).
http://creativecommons.org/licenses/by/4.0/

This paper is an attempt to observe the extent of effect on the power of analysis of variance test to violations of assumptions i.e. normality assumption of the error of multiple linear regression model. The error of the model is considered as g-and-k distribution because of the fact that it has shown a considerable ability to fit to data and facility to use in simulation studies. The strength of ANOVA is evaluated by observing the power function of F-test for different combination of g (skewness) and k (kurtosis) parameter. From the simulation results it is observed that the performance of ANOVA is seen to be immensely affected in presence of excess kurtosis and for small samples (say, n<100). Skewness parameter has not much effect on the power of the test under non-normal situation. The effect of sample size on the existing test for multiple regression models is also observed here in this paper under various non normal situations.
Keywords: The g-and-k distribution, ANOVA-test, Multiple linear regression model, Power
Cite this paper: Samsad Jahan, ANOVA Procedures for Multiple Linear Regression Model with Non-normal Error Distribution: A Quantile Function Distribution Approach, American Journal of Mathematics and Statistics, Vol. 7 No. 4, 2017, pp. 169-178. doi: 10.5923/j.ajms.20170704.05.
![]() | (2.1) |
is the dependent variable,
and
are explanatory variables,
is the stochastic disturbance term, and 𝑖 is the 𝑖th observation.
is the intercept term, it gives the mean or average effect on
of all the variable excluded from the model, although its mechanical interpretation is the average value of
when
and
are set equal to zero. The coefficients
and
are called partial regression coefficients.
measures the change in the mean value of
per unit change in
, holding the value of
constant. Likewise,
measures the change in the mean value of
per unit change in
, holding the value of
constant. The coefficients
and
are called partial regression coefficients.
measures the change in the mean value of
per unit change in
holding the value of
constant. ![]() | (3.1) |
measures kurtosis (in general sense of peakness/tailedness) in the distribution and
is the th quantile of a standard normal variate, and is a constant chosen to help produce proper distributions. It can be clearly observed that for
, the quantile function in (3.1) is just the quantile function of a standard normal variate.The sign of the skewness parameter indicates the direction of skewness;
indicates the distribution is skewed to the left, and
indicates skewness to the right. Increasing/decreasing the unsigned value of increases/decreases the skewness in the indicated direction. When
the distribution is symmetric.The kurtosis parameter
, for the
-and-
distribution, behaves similarly.Increasing
increases the level of kurtosis and vice versa. The value
corresponds to no extra kurtosis added to the standard normal base distribution. However, this distribution can represent less kurtosis than the normal distribution, as
can negative values. If curves with more kurtosis required then base distribution with less kurtosis than standardized normal distribution can be used. For these distributions
is the value of overall (MacGilivray). For an arbitrary distribution, theoretically the overall asymmetry can be as large as one, so it would appear that for
, data or distribution could occur with skewness that cannot be matched by these distributions. However for
, the larger the value chosen for
, the more restrictions on
are required to produce a completely proper distribution. Real data seldom produce overall asymmetry values greater than 0.8 (MacGilivray and Canon). The value of
is taken as 0.83 throughout this paper. To examine extent of the effect of different level of non-normality on the test of multiple linear regression models, it is considered that the random error belongs to the
-and-
distribution.![]() | Figure 1. Density curves of g-and-k distribution for different combination of g and k |
|
and for the null hypothesis
At least one
is not equal to zero; 𝑖 = 1,2.Then the test statistic ![]() | (4.1) |
distribution with 2 and
df. Therefore, the
value of (4.1) provides a test of null hypothesis that the true slope coefficients are simultaneously zero. The null hypothesis
can be rejected if the
value computed from (4.1) exceeds the critical
value from the
table at
percent level of significance, otherwise
cannot be rejected.
of multiple linear regression models are normally distributed but here in this paper, the random error term
is assumed to follow the g-and-k distribution. The extent of non-normality on the size and power of ANOVA test is observed by varying the skewness and the kurtosis parameter of the
-and
distribution. Using the g –and -k distribution allows us to quantify how much the data depart from normality in terms of the values chosen for the g (skewness) and k (kurtosis) parameters. For g = k = 0, the quantile function for
and
distribution is just the quantile function of a normal variate.To observe the power of the tests, expression for the power curve is required. However, in practice, to obtain analytic expressions for these power functions is impractical. Instead, a simulation is conducted to estimate these power function for various combinations of the g and k parameter values for the error distribution from the
and
distribution. While simulating for the test, A is taken to be the location which is the median in case of
and
distribution but for non-normal situations the mean of the distribution moves away from A which actually is the median of the distribution. This departure varies as the values g and k of vary. The values of g and k are taken as
and
and 1. At first, the effect of non-normality on the size of the F test is observed. For simulating the size of F - test the explanatory variables
and
are generated from uniform distribution and the random error
from g-and-k distribution with location and scale parameters A=0 and B=1, respectively. Using statistical software R data are generated for sample size 20, 30 and 100, and the following hypothesis is tested.
Against the alternative
At least one
is not equal to zero; 𝑖 = 1,2.To determine the size of the test, data are generated under the null hypothesis and the test is repeated 5,000 times. The total number of times the hypothesis is rejected is divided by 5,000; tests are carried out using 2.5 percent level of significance. To compute the power of the F test, the explanatory variables
and
are considered from uniform distribution and the random error
from g-and-k distribution with location parameter A = 0 and scale parameter B = 1. The value of c is considered as 0.83.To simulate power, the following hypothesis is tested
Against the alternative
At least one
is not equal to zero; 𝑖 = 1,2.Data are generated using
(-2,-1.5,-1,-.5,0,.5,1,1.5,2) and
(-2,-1.5,-1,-.5,0,.5,1,1.5,2) and the test procedures are repeated 5000 times for each pair of
(-2,-2),(-2,-1.5), ……………(2,1.5),(2,2). Firstly the number of rejections of the test out of the 5000 times is determined for each pair of
in the mentioned set and the total number of rejections are divided by 5000, with the level of significance 
and
is generated from the uniform distribution and the random error from g- and -k distribution with location and scale parameters A = 0 and B =1, respectively. Data are generated for sample size 20, 30 and 100 using statistical software R and the following hypothesis is tested:
Against the alternative
At least one
is not equal to zero; 𝑖 = 1,2.To determine the size of the test, data are generated under the null hypothesis and repeat the test 5,000 times and divide the total number of times the hypothesis is rejected by 5,000; tests are carried out using 2.5 percent level of significance. The size of ANOVA for different combinations of (g, k) are presented in Table 2.
|
and
are generated from the uniform distribution and the random error
is considered from g-and-k distribution with location parameter A = 0 and scale parameter B =1. The value of c is taken to be 0.83 throughout the paper. To simulate power, the following hypothesis is considered
At least one
is not equal to zero; 𝑖 = 1, 2.To see how the power differs as the values of g and k change, the power for specified values of g and k is plotted to get the power curve for ANOVA test with sample sizes n= 20, 30 and 100. To get smooth power curve, many points for different combinations of g and k are used. For each combination we get power. The process is repeated where for each point 5,000 simulations are run.Figure (2) through (7) shows the power curves for different combination of (g,k) for sample size n = 20, 30 and 100. ![]() | Figure 2. Power curve of ANOVA for fixed value of g and varying Kurtosis parameter for (a) sample size n=20, (b) sample size n=30, (c) sample size n=100 |
![]() | Figure 3. Power curve of ANOVA for fixed value of kurtosis and varying skewness parameter for (a) sample size n=20, (b) sample size n=30, (c) sample size n=100 |
![]() | Figure 4. Power curves of ANOVA for a) g=0, k=0, b) g=1,k=0, c)g=0,k=1, d) g=1, k=1 for sample size 20 |
![]() | Figure 5. Power curves of ANOVA for a) g=0, k=0, b) g=1,k=0, c)g=0,k=1, d) g=1, k=1 for sample size 30 |
![]() | Figure 6. Power curves of ANOVA for a) g=0, k=0,b) g=0, k=-0.3, c) g=0, k=-0.5 for sample size 20 |
![]() | Figure 7. Power curves of ANOVA for a) g=0, k=0, b) g=0, k=-0.3, c) g=0, k=-0.5 for sample size 30 |
and
presented the data to be slightly suffered from asymmetry and light tailedness.