Hypothesis Testing on the Location Parameter of a Skew-Normal Distribution ( SND ) with Application

This work deals with testing a hypothesis on the location parameter (μ) of a skew-normal distribution (SND) based on a random sample of size n. The details of this work can be summarized in four major components: (a) First we review some useful results on SND, including the approximate probability distribution of the sample average. (b) Next, we develop several tests to test a hypothesis on μ based on the sample mean when the scale (σ) and shape (λ) parameters are known. (c) The tests for the known scale and shape are then extended for unknown scale and shape. (d) Finally, the test methods have been used for a real-life data set.


Introduction 1.Preliminaries
A random variable W is said to have a skew-normal distribution with location parameter µ, scale parameter σ, and shape parameter λ, henceforth denoted by 'W ∼ SND(µ, σ, λ)', provided its probability density function (pd f ) is given as where φ and Φ are the standard normal pdf and cdf respectively, for any −∞ < µ < ∞, −∞ < λ < ∞, and σ > 0. The objective of this work is to develop test procedures for testing H 0 : µ = µ 0 vs. H A : µ µ 0 .The two-sided alternative has been used only for convenience since an one-sided test can be developed easily with a suitable one-sided critical value.
The structure of SND was introduced in literature by several researchers independently (see Roberts (1966), O'Hagan and Leonard (1976), Aigner and Lovell (1977)), but it was given the present name by Azzalini (1985) who studied many characterization properties and applications of this distribution.For a nice review of many interesting properties of SND see Arnold and Lin (2004), and Gupta, Nguyen and Sanqui (2004) and other references therein.For applications of SND in reliability studies see Gupta and Brown (2001) which also used the SND to model IQ data taken from Roberts (1988).Figueiredo and Gomes (2013) used SND to model and monitor in statistical process control.Ngunkeng and Ning (2014) considered an information approach to detect changes in all three parameters in an SND, and then applied the proposed method to study Brazilian and Chilean stock market data (originally analyzed by Arellano-Valle et al. ( 2013)).For a multivariate generalization of the SND see Azzalini and Dalla Valle (1996), and Azzalini and Capitanio (1999).
The above property (vi) is important as it is often used to generate SND observations in simulation studies.Property (ix) will be useful in method of moments estimation (MME).For more properties and further generalizations, see Azzalini (2005), Henze (1986), and the other references therein.

Approximate Distribution of an SND Sample Average
In a recent study Thiuthad and Pal (2018) have shown that the exact distribution of the average of a random sample from SND, though extremely complicated, can be approximated fairly well by a suitable SND as described below. Assume ∼ SND(µ, σ, λ) where µ, σ, λ are functions of (µ, σ, λ) and n, which can be found (with the details omitted here) as follows.
(A) First, obtain λ as Remark 1.1 Note that µ, σ, λ are functions of (µ, σ, λ, n).Further, (σ, λ) is dependent only on (σ, λ, n).Interestingly, it is easy to see that when λ = 0, then λ = 0 and σ = σ/ √ n as well as µ = µ, which perfectly matches with the result of a normal distribution.It has been shown through extensive simulation that the proposed approximation (of W by SND(µ, σ, λ)) is very close even for small n.
The rest of the paper is organized as follows.Section 2 gives a brief summary of two types of point estimators as they will be essential in constructing the test statistics.Section 3 deals with various tests on µ with known (σ, λ), whereas Section 4 provides details in the most general case, that is with unknown (σ, λ).Finally, a real-life data set has been used in Section 5 as a demonstration.

A Brief Review of Point Estimation of µ
Thiuthad and Pal (2018) have provided the details of point estimation of µ under two separate cases as follows.

σ and λ are known
Here, the method of moments estimator (MME) of µ, say μMM , is given as (2.2) In this case, μMM is unbiased for µ and By CLT, μMM

σ and λ are unknown
The MME of µ depends on those of λ and σ.Therefore, the following steps are followed in a sequential order. Let 3, be the first three sample moments.First obtain λ by solving (provided a solution exists) 3) in terms of λ2 does not exist, then λ2 is deemed to have taken the value ∞.Thus, the MME of λ, i.e., λMM is defined as 3) has a solution.sign(m 3 ) × ∞, otherwise. (2.4) For practical purposes, in (2.4) replace '∞' by a sufficiently large value, say M which makes the estimated distribution almost half normal.After estimating λ as above, obtain σMM as otherwise. (2.5) (2.6) Note that even if λMM takes the infinite value with a positive probability, it does not prevent the MME of σ and λ from taking finite values with probability 1.When all parameters are unknown, the MLE of µ may not be a viable option as it may result into infinite value with a positive probability.Hence, a maximum penalized likelihood estimate (MPLE) looks more reasonable which maximizes Thiuthad and Pal (2018) provided results of their extensive simulation study comparing μMM and μMPL .Further, as noted by these authors, the Fisher information matrix I = nI 0 where I 0 is the 3 × 3 information matrix per observation) has at least three singularity points rendering I −1 useless as a benchmark dispersion matrix for the estimators of the parameter vector (µ, σ, λ).
3 Tests on the location of an SND with known (σ, λ)

The Case of a Single Observation
To develop the idea of testing on µ with known (σ, λ) we first consider the case of a single observation W ∼ SND(µ, σ, λ).(The method developed here can easily be transported to a test based on W . ∼ SND(µ, σ, λ).)As mentioned earlier, we test H 0 : µ = µ 0 against a suitable alternative which, for convenience, has been taken as the two sided one, i.e., H A : µ µ 0 .To develop the idea further, we consider a simple alternative as H A : µ = µ A , where µ A µ 0 .
The likelihood ratio test (LRT) statistic for the above simple null against the simple alternative becomes It is more convenient to work with the natural logarithm of the LRT statistic, i.e., 2) Write g 0 (W) = lnΦ λ(W − µ 0 )/σ , which has the following Taylor series expression by retaining up to the second order terms.Similarly, Using (3.3) and (3.4), we get a second order approximation of Λ * , say Λ (2) * (W), as where, G represents the cd f of SND(0, σ, λ); i.e., k 2 = G −1 (α/2).Similarly, i.e., k 1 = G −1 (1 − (α/2)).Further, note that the cut-off points k 1 and k 2 can be expressed as k i = σk i * , i = 1, 2, where k i * (i = 1, 2) are found by the following equations. (3.9) The following Table 3.1 tabulates the values of k i * (i = 1, 2) for various values of α.Note that when λ = 0, i.e., SND(0, σ, 0), the cut-off points are exactly those of N(0, σ 2 ).Also note that for a one sided alternative one should use only one of k 1 * and k 2 * with (α/2) replaced by α in one of the equations in (3.9).Remark 3.1 The above Table 3.1 lists the two sided cut-off points only for λ > 0.

The Case of a Sample of Size n (≥ 1)
We first look at the two point estimators, i.e., MME and MLE, given earlier as μMM and μML respectively.The exact variance of μMM is Table 3.1:The two-sided cut-off points (k 2 * , k 1 * ) satisfying (3.9) (i.e., each tail probability = α/2) for various values of λ > 0. But the exact variance expression of μML is not known.Hence we can use the asymptotic variance (AV) which is quite close to the exact one (see Thiuthad and Pal (2018)).
The rest of the methodology is same as the above case (Subsection 3.1) of a single observation, i.e., retain H 0 if W = (W − µ 0 ) is between k 2 and k 1 , and reject H 0 otherwise, where k i = σk i * , i = 1, 2, with k i * satisfying (3.9) with λ replaced by λ, i.e., The following Tables 3.2 -3.7 provide the simulated power (or size) of the above three tests, based on 2.5 × 10 4 replications as µ deviates from µ 0 = 0 (w.l.g.) for various values of λ.All the tables have been constructed using α = 0.05.Similar tables for other values of α have been provided in the Appendix A.1.Table 3.2: Power (or size, when µ = µ 0 = 0) of three tests for various µ with fixed λ = 0.0 Table 3.3: Power (or size, when µ = µ 0 = 0) of three tests for various µ with fixed λ = 1.0 Remark 3.2 The first three tables (Tables 3.2 -3.4) provide the power/size as a function of µ for fixed λ, whereas the last three tables (Tables 3.5 -3.7) provide the same as a function of λ for fixed µ.Therefore, the entire Table 3.5 provides size and the Tables 3.6 -3.7 provide pure power of the three proposed tests.On the other hand, the first row of Tables 3.2 -3.4 provide the size, and the remaining rows of these tables show power.All the three tests adhere to the level condition very closely, even for a sample size as small as 5.The patterns emerging from these tables are as follows.
(b) For λ close to 0, the test T 2 seems to have the best power performance.But note that T 2 becomes very liberal (with size exceeding α substantially) as λ gets large (≥ 3).
(c) The test T 3 has good local power for small λ (≤ 2), but its performance deteriorates as λ gets larger.This is not surprising because W is not a sufficient statistic for µ, and as shown in Thiuthad and Pal (2018), the amount of information about µ carried by W diminishes rapidly as λ deviates away from 0.
(d) The mechanisms of the subsection 3.2 can be followed easily even for unknown (σ, λ) provided one has a large sample which can give reliable estimates of σ and λ.
In such a case follow the methods of this section by replacing (σ, λ) by ( σ, λ), for a large n, and then pretend that the nuisance parameters were known.

Hypothesis Testing on µ with unknown (σ, λ)
In order to develop tests on µ for unknown σ and λ, we can think of replacing the nuisance parameters by their suitable estimates (MME and/or MPLE) and call the resultant tests (i.e., variants of T 1 , T 2 , T 3 as developed in Subsection 3.2) as T 1 , T 2 , T 3 respectively.But the main difficulty we face here is that the asymptotic distributions of μMM and μMPL are far from clear, and the normal cut-off point as used in Section 4 produces bad results.
We Let μ be an estimator based on the given data (W 1 , W 2 , ..., W n ) (μ can be either μMM (to create T * 1 ) or μMPL (to create T * 2 )).Our objective is to test H 0 : µ = µ 0 versus H A : µ µ 0 .The general steps of the PB method are as follows: Step-1: For a given set of n iid observations, obtain μ and compute a deviant metric D = (μ − µ 0 ), which measures the deviation of the point estimator μ from the null value of µ.
Step-2: Assume that H 0 is true, i.e., µ = µ 0 .Here, we obtain the estimates of σ and λ under H 0 which are called the restricted estimates (REs).For this step, we first derive the restricted method of moments (RMM) and the restricted maximum penalized likelihood (RMPL) estimators of ( σ, λ) (  Though not all of them may not show good performance, the idea behind them seems reasonable, and hence worth trying. It is tempting to modify the third test T 3 in (3.14) and propose T 3 as follows: where , we propose T * 3 with the deviant metric D = (W − µ 0 ) = W whose cut-off points are to be found through a parametric bootstrap method.The steps to implement T * 3 are identical to those of T * 1 and T * 2 as mentioned above.However, in step 3, the iid bootstrap observations W * i are to be generated from SND(µ 0 , σRE , λRE ), where σRE and λRE are the restricted estimates (RE) of σ and λ under H 0 .This gives rise to two subversions of T * 3 , called here as T * 3(RMM) and T * 3(RMPL) , depending on the type of RE used.If we use the method of moments estimates of σ and λ under the restriction of H 0 then it is called T * 3(RMM) and similarly T * 3(RMPL) .We propose two other tests based on the likelihood ratio (LR) statistic as detailed below.In order to test H 0 : µ = µ 0 vs. H A : µ µ 0 , the likelihood ratio statistic Note that the above Λ is supposed to be a likelihood based deviant metric which measures the compatibility of H 0 with the data.Asymptotically, i.e., as n → ∞, under H 0 ,

So we use Λ
as our deviant measure.This leads us to the fourth test T 4 as follows: which is the asymptotic LRT.This T 4 is expected to work well for a large sample size.
For small samples, we propose a parametric bootstrap version for our fourth test to create the PB LRT, say T * 4 , where the cut-off point is found by simulation.The method comes with the following steps: Step-1: Given the original data W 1 , W 2 , ..., W n , compute Λ * .
Step-3: Generate iid observations under H 0 , i.e., Step-4: Repeat the above Step-3 a large number of times, (N times).This gives us N copies of Λ * * , i.e., Λ * 1 * , Λ * 2 * , ..., Λ * N * .Order them as Λ * (1)  such that α P( Rej H 0 |H 0 is true).If Λ * > Λ * U , then we will reject H 0 .Alternatively, the p-value ITM Web of Conferences 20, 03003 (2018) https://doi.org/10.1051/itmconf/20182003003ICM 2018 Now, to study the power/size of a test, we have to generate the original data (W 1 , W 2 , ..., W n ) multiple times, say Q times, and then the power/size = proportion (out of Q times) of times H 0 is rejected.This simulation is quite time consuming since all the PB tests are subjected to double loops, and hence we had to restrict the number of replications to N = Q = 1000.The results of our simulation study for n = 10 have been provided in the following Tables 4.1 -4.2., and hence they should not be considered as reliable test methods.In terms of size T * 1 appears to be very conservative.The tests T 4 and T * 4 appear to be a bit liberal when λ gets large (i.e., SND becomes very skewed).On the other hand, the test T * 2 seems to be more reliable, and appears to maintain the level consistently, though a bit conservative.Therefore, we recommend using T * 2 for hypothesis testing on µ.However, a more comprehensive simulation with higher values of N and Q will be needed along with other values of n.

An application with a Real-life Data Set
In this section our objective is the usage of the tests provided in Section 4 for a realistic data set.For this purpose we use Krung Thai Bank (KTB)'s quarterly stock price data set taken from Thiuthad and Pal (2018) where the quarterly price change (QPC) in % has been observed over 16  As mentioned earlier, T * 2 seems to be the most reliable test, and for the given data its p-value = 0.7732 which suggest that we retain H 0 .

Step- 4 :
Repeat the above Step-3 a large number of times, say N times.This gives us N copies of D * , i.e., D * 1 , D * 2 , ..., D * N .Order them as D * (1) ≤ ... ≤ D * (N) .Then the cut off points for D are to be found by the (approximate) probability distribution of D under H 0 , i.e., the left cut-off point D L = D * ((α/2)N) and the right cut-off point D U = D * ((1−α/2)N) such that α P( Rej H 0 |H 0 is true).If D ∈ (D L , D U ), then we retain H 0 .If D < D L or D > D U , then we reject H 0 .Alternatively, we can compute the p-value as p PB = 2 min{p L , p U } where p L = {number of D * j < D}/N and p U = {number of D * j > D}/N.Apart from T * 1 and T * 2 (based on D MM and D MPL , respectively), we propose a few other tests as follows.

Table 3 . 6
: Power of three tests for various λ with fixed µ = 1.0

Table 5 .
quarters.Note that µ indicates the location 1: Quarterly stock price change of KTB from Thai stock market (Thiuthad and Pal (2018)) distribution, but not the mean of the distribution.Also, SND was found to have a good fit for this KTB data.The objective is to test H 0 : µ = 0 vs. H A : µ 0. The following are the p-value of all the six tests as discussed in the previous section.

Table A .
4: Size of three tests for various λ with fixed µ = 0.0

Table A .
10: Size of three tests for various λ with fixed µ = 0.0

Table A .
11: Power of three tests for various λ with fixed µ = 1.0

Table A .
12: Power of three tests for various λ with fixed µ = 2.0