Hypothesis tests

This page is intended as an introduction to some commonly used basic hypothesis tests. Before using any of these, it is important that the concepts behind hypothesis testing are understood. These concepts are explained on this page.

Hypothesis tests are very commonly used in epidemiological investigations, and a wide number of tests are available. These can be classified into groups according to the data types in question, according to whether a specific underlying distribution is assumed when performing the test (in which case, the test is known as a parametric test), and according to whether or not the data are matched or independent (i.e. whether comparisons are being made at the individual level or the group level). As described earlier, qualitative data are not numerical in nature, and include categorical and ordinal data (such as the breed of dog, or the body condition score of a cow). Quantitative data are numerical, and include variables such as weight, age and height.

Comparing a qualitative variable between different groups

Chi-square test

The chi-square test is one of the most commonly used hypothesis tests, and allows the comparison of any qualitative exposure with any qualitative outcome (given that certain assumptions are met). As a simple example, it may be used to investigate the effect of previous exposure to substance x on disease experience amongst a group of animals - as classified in the 2x2 contingency table below:

Disease status	Exposed to x	Unexposed to x	Total
Diseased	a₁	a₀	m₁
Non-diseased	b₁	b₀	m₀
Total	n₁	n₀	n

A chi-square test could also be used to investigate whether the body condition score of a horse was associated its lameness score, as classified in the rxc contingency table below:

Lameness score	Body condition score 1-3	Body condition score 4-6	Body condition score 7-9	Total
1	b₁	b₂	b₃	m_b
2	c₁	c₂	c₃	m_c
3	d₁	d₂	d₃	m_d
4	e₁	e₂	e₃	m_e
5	f₁	f₂	f₃	m_f
Total	n₁	n₂	n₃	n

In either case, the chi-square test is based upon the comparison of the observed results, with those results which would be expected if there was no association between the exposure and outcome of interest. For each individual cell, this 'expected' value is calculated, is subtracted from the observed value and the answer is squared. This is then divided by the expected value and the process is repeated for all other cells. These results are then summed to give a test statistic, which can be interpreted using a table of the chi-squared distribution (or by using a computer program) in order to give a p-value. The number of cells involved in the calculation of the test statistic will have an impact upon its magnitude, and this is accounted for in the calculation in the form of 'degrees of freedom' (which can be a difficult concept to understand, but relate in this context to the number of cells which are free to take any value, given that the test statistic is known). The number of degrees of freedom can be calculated by subtracting 1 from the number of rows, subtracting 1 from the number of columns, and multiplying these together - meaning that a 2x2 table has one degree of freedom.

The main assumptions of a chi-square test are:

the data are derived from a simple random sample
observations are independent of each other (i.e. there are no repeated measures etc...)
at least 80% of all cells (i.e. all cells in a 2x2 table, or eight cells in a 2x5 table) have expected values of greater than 5, with no cells having an expected value of zero.

Fisher's exact test

Fisher's exact test is most commonly used instead of the chi-square test when the sample size is small and/or when expected cell counts are less than 5. This test generally requires the variables of interest to be dichotomous (i.e. a 2x2 contingency table), although methods are available of applying the test to contingency tables of greater size. Instead of assuming the data approximates a distribution (as is the case with the chi-square test), the exact probability of the particular arrangement of data (and that of 'more extreme' patterns, given the row and column totals in the contingency table are fixed) is calculated (based upon the hypergeometric distribution).

Comparing a quantitative variable between two groups

t-test

The t-test (also known as the 'Student's t-test') is the most commonly used test for the comparison of two normally distributed variables, and can also be used to assess whether a single normally distributed variable differs from a particular value. As for many hypothesis tests, it involves the calculation of a test statistic which is assumed to follow a particular distribution (in this case, the t distribution). When calculating the test statistic for the comparison of two distributions with approximately equal variances and equal numbers of individuals in each group, the difference in mean values between the two groups is divided by the standard error of the difference between these two means (which is calculated as the product of the pooled standard deviation and the square root of two divided by the number of individuals in each group).

Mann-Whitney U test

Comparing a quantitative variable between more than two groups

ANOVA (analysis of variance)

Kruskal-Wallis test

Comparing a categorical outcome between matched observations

McNemar's chi-square test

Comparing a quantitative outcome between matched observations

Paired t-test

Comparing two quantitative outcomes between matched observations

Pearson's correlation coefficient

Spearman's rank correlation coefficient