This posting introduces various methods for measuring stochastic similarity between two groups A and B. 


1. Goodness-of-fit

 

   Goodness-of-fit is widely used to test whether two samples have identical (or similar) distribution. The goodness-of-fit of a statistical model describes how well it fits a set of observations. Measures of goodness-of-fit summarize the discrepancy between observed values and the values expected under the model in question. A goodness-of-fit statistic tests the following hypothesis:


   H_0: the model M_0 fits

   H_A: the model M_0 does not fit(or, some other model M_A fits)


We call H_0 as a null hypothesis and H_A as a alternative hypothesis. 


  In assessing, there are two cases :


1. when the distribution of sample B is known, and

2. when the distribution of sample B is unknown. 


At first, let's assume that group B has known distribution, Gaussian(normal) distribution. 


1-1. Normality Test


 In statistics, normality tests are used to determine if a data set is well-modeled by a normal distribution and to compute how likely it is for a random variable underlying the data set to be normally distributed. For this, following tests are available.


- D'agostin's K-squared test

- Jarque-Bera test

- Anderson-Darling test

- Cramer-von Mises criterion

- Lilliefors test

- Kolmogorov-Smirnov test

- Shapiro-Wilk test

- Pearson's chi-squared test

  

In [Razali et.al. 2011: "Power comparisons of Shapiro–Wilk, Kolmogorov–Smirnov, Lilliefors and Anderson–Darling tests"], Shapiro-Wilk has the best power for a given significance, followed closely by Anderson-Darling when comparing the Shapiro-Wilk, Kolmogorov-Smirnov, Lilliefors, and Anderson-Darling tests. Therefore, we will check those four tests. 





Shafiro test


Anderson-darling test




ANOVA?


Null hypothesis?


K-test?


Difference between One-way ANOVA & Two-way ANOVA


One-Way ANOVA: An ANOVA hypothesis tests the difference in population means based on one characteristic or factor. a----->b "An example of when a one-way ANOVA could be used is if you want to determine if there is a difference in the mean height of stalks of three different types of seeds. Since there is more than one mean, you can use a one-way ANOVA since there is only one factor that could be making the heights different. " Two-Way ANOVA: An ANOVA hypothesis tests comparisons between populations based on multiple characteristics. a---->c<----b "Suppose that there are three different types of seeds, and the possibility that four different types of fertilizer is used, then you would want to use a two-way ANOVA. The mean height of the stalks could be different for a combination of several reasons" Multivariate analysis of variance (MANOVA): it is simply an ANOVA with several dependent variables. That is to say, ANOVA tests for the difference in means between two or more groups, while MANOVA tests for the difference in two or more vectors of means. a----->c, b------>d, a----->d, b----->c 

Posted by Cat.IanKang
,