Hypothesis tests

From WikiVet English
Jump to navigation Jump to search

This page is intended as an introduction to some commonly used basic hypothesis tests. Before using any of these, it is important that the concepts behind hypothesis testing are understood. These concepts are explained on this page.

Hypothesis tests are very commonly used in epidemiological investigations, and a wide number of tests are available. These can be classified into groups according to the data types in question, according to whether a specific underlying distribution is assumed when performing the test (in which case, the test is known as a parametric test), and according to whether or not the data are matched or independent (i.e. whether comparisons are being made at the individual level or the group level). As described earlier, qualitative data are not numerical in nature, and include categorical and ordinal data (such as the breed of dog, or the body condition score of a cow). Quantitative data are numerical, and include variables such as weight, age and height.

Comparing a qualitative variable between different groups

Chi-square test

The chi-square test is one of the most commonly used hypothesis tests, and allows the comparison of any qualitative exposure with any qualitative outcome (given that certain assumptions are met). As a simple example, it may be used to investigate the effect of previous exposure to substance x on disease experience amongst a group of animals - as classified in the 2x2 contingency table below:

Disease status Exposed to x Unexposed to x Total
Diseased a1 a0 m1
Non-diseased b1 b0 m0
Total n1 n0 n

A chi-square test could also be used to investigate whether the body condition score of a horse was associated its lameness score, as classified in the rxc contingency table below:

Lameness score Body condition score 1-3 Body condition score 4-6 Body condition score 7-9 Total
1 b1 b2 b3 mb
2 c1 c2 c3 mc
3 d1 d2 d3 md
4 e1 e2 e3 me
5 f1 f2 f3 mf
Total n1 n2 n3 n

In either case, the chi-square test is based upon the comparison of the observed results, with those results which would be expected if there was no association between the exposure and outcome of interest. For each individual cell, this 'expected' value is calculated, is subtracted from the observed value and the answer is squared. This is then divided by the expected value and the process is repeated for all other cells. These results are then summed to give a test statistic, which can be interpreted using a table of the chi-squared distribution (or by using a computer program) in order to give a p-value. The number of cells involved in the calculation of the test statistic will have an impact upon its magnitude, and this is accounted for in the calculation in the form of 'degrees of freedom' (which can be a difficult concept to understand, but relate in this context to the number of cells which are free to take any value, given that the test statistic is known). The number of degrees of freedom can be calculated by subtracting 1 from the number of rows, subtracting 1 from the number of columns, and multiplying these together - meaning that a 2x2 table has one degree of freedom.

The main assumptions of a chi-square test are:

  • the data are derived from a simple random sample
  • observations are independent of each other (i.e. there are no repeated measures etc...)
  • at least 80% of all cells (i.e. all cells in a 2x2 table, or eight cells in a 2x5 table) have expected values of greater than 5, with no cells having an expected value of zero.

Fisher's exact test

Comparing a quantitative variable between two groups

t-test

Mann-Whitney U test

Comparing a quantitative variable between more than two groups

ANOVA (analysis of variance)

Kruskal-Wallis test

Comparing a categorical outcome between matched observations

McNemar's chi-square test

Comparing a quantitative outcome between matched observations

Paired t-test

Comparing two quantitative outcomes between matched observations

Pearson's correlation coefficient

Spearman's rank correlation coefficient