Test Statistics for Hypothesis Testing in Statistics with Functions in R Programming

In my previous article, we talked about the terminology in the statistical world of hypothesis testing.

Moving on, once we have our hypothesis about the population parameters (based on the sample) in place, we need some sophisticated ways to practically support one claim (either H0 or H1) by using statistical inference, Based on the type of statistics, i.e., population mean, population variance, population correlations, testing for normality or testing for attributes, we have separate test statistics for each hypothesis testing.

Lets go over some of the commonly used test statistics in hypothesis testing and their R-programming codes:

  1. Testing for the significance of the population mean (based on the sample or empirical observation)

Case (i): When population variance is known (generally not the case)

The test statistic

Here, we use the sample mean, the population mean as the estimate and the population variance (as given) to find the value of the test statistic.

R code: For a two tailed test for population mean

z.test (observations of sample X, alternative=”two.sided” ,mu=mu0, sigma.x=sigma)

Case (ii): When population variance is unknown (usual scenario)

This calls for a t-test, since, we use the estimate of the population variance, the sample variance, given by:

Here, we use the sample mean, the population mean as the estimate and the sample variance (as given) to find the value of the test statistic.

R code: For a two tailed test for population mean

t.test (observations of sample X, alternative=”two.sided” ,mu=mu0)

Result: As we know, when the p-value for the t-test for testing if the population average of the Sepal Length is equal to 6 cm is equal to 0.02186. Since, p-value is less than the general significance level (=0.05), we reject H0 and hence, we conclude that, the population average of sepal length for the iris data in R is not equal to 6cm.

2. Testing for the significance of the population variance

here,

There is no inbuilt function to perform this hypothesis testing in R. Therefore, it requires for us to code from scratch.

3. Testing for the independence of Attributes

This testing calls for the use of a chi square distribution. This is used for attributes, not variables, where, we have categories and some classifications (in the form of genders and other characteristics). This testing is used in cases like:

(i) When we would like to test the independence of two attributes, for example, we are given the attentive index of the students in the class based on the duration of the class (in groups of, Short and Lengthy) and the quality of interaction. Now in order to perform a test of independence between the attributes duration of the class and the quality of interaction in the class we use chi square test.

R code: For a two single tailed independence of attributes

chisq.test (matrix of the observed frequencies)

(ii) When we wish to perform the test for goodness of fit. There are times when we would like to know the kind of distribution (a pre determined pattern, with certain specific characteristics) the sample data tends to follow, we then make use of chi square goodness of fit test.

R code: For a two single tailed goodness of fit test

chisq.test (observations of sample X)

Suppose our question is:

Question: Use chi square test to test if the admissions on each day during the week are the same.

Here, we follow the code and obtain

Result: Since the p-value for testing the goodness of fit is equal to 2.2e^-16 <0.05(alpha or level of significance), we reject H0. Hence, we conclude, that the admissions in the hospital during the dates of the week is not equal.

Further topic of Discussion

We would further cover the testing for normality with various tests and forms of plots, like, PP plots and QQ plots.