| These issues can be better explained using a hypothetical example. Assuming a study is conducted to investigate the seroprevalence of Peste De Petits Ruminants virus in sheep in one region of an African country. A ''census'' of all animals could be conducted, which would allow the determination of the exact seroprevalence (assuming a perfect diagnostic test) - however, this is not logistically or financially viable, and therefore a sample of the sheep population is taken. We will assume that there is no bias at all in the sample, and that a simple random sampling protocol is used. The sample taken gives a point seroprevalence estimate of 30%, and the 95% confidence interval ranges from 20% to 40%. As such, we can be 95% confident that the true seroprevalence to PPRV in this region of the country is between 20% and 40% - no particular seroprevalence estimate within this range is any more or less likely than any other one. Despite this, there remains a small chance that the true seroprevalence lies outside this range. To explain this further, imagine we take another sample from the population, and we get another confidence interval, and we repeat this process again and again until we have 20 individual samples and associated confidence intervals (all from the same source population). On average, we would expect 19 of these (=95%) to contain the true seroprevalence, and 1 of these to not. Note that we cannot say anything about the ''probability'' of what the true prevalence is for any of these confidence intervals, since it is either correct (probability of containing the true seroprevalence=100%) or incorrect (probability of containing the true seroprevalence=0%). | | These issues can be better explained using a hypothetical example. Assuming a study is conducted to investigate the seroprevalence of Peste De Petits Ruminants virus in sheep in one region of an African country. A ''census'' of all animals could be conducted, which would allow the determination of the exact seroprevalence (assuming a perfect diagnostic test) - however, this is not logistically or financially viable, and therefore a sample of the sheep population is taken. We will assume that there is no bias at all in the sample, and that a simple random sampling protocol is used. The sample taken gives a point seroprevalence estimate of 30%, and the 95% confidence interval ranges from 20% to 40%. As such, we can be 95% confident that the true seroprevalence to PPRV in this region of the country is between 20% and 40% - no particular seroprevalence estimate within this range is any more or less likely than any other one. Despite this, there remains a small chance that the true seroprevalence lies outside this range. To explain this further, imagine we take another sample from the population, and we get another confidence interval, and we repeat this process again and again until we have 20 individual samples and associated confidence intervals (all from the same source population). On average, we would expect 19 of these (=95%) to contain the true seroprevalence, and 1 of these to not. Note that we cannot say anything about the ''probability'' of what the true prevalence is for any of these confidence intervals, since it is either correct (probability of containing the true seroprevalence=100%) or incorrect (probability of containing the true seroprevalence=0%). |
− | The approach to hypothesis testing first requires making the assumption that there is ''no difference'' between the two groups (which is known as the '''null hypothesis'''). Statistical methods are then employed in order to evaluate the probability that the observed data would be seen if the null hypothesis was correct (known as the '''p-value'''). Based on the resultant p-value, a decision can be made as to whether the support for the null hypothesis is sufficiently low so as to give evidence against it being correct. It is important to note, however, that the null hypothesis can never be completely disproved based on a sample - only evidence can be gained in support or against it. However, based on this evidence, investigators will often come to a conclusion that the null hypothesis is either 'accepted' or 'rejected'.<br>
| |
− | In any hypothesis test, there is a risk that the incorrect conclusion is made - which will either take the form of a type I or a type II error, as described below. Note that no single hypothesis test can be affected by both type I and type II errors, as they are each based on different assumptions regarding the source population. However, as the true state of the source population will not be known, both types of errors should be considered when interpreting a hypothesis test (and when calculating the required [[Sampling strategies#Sample size calculation|sample size]]).
| |
− | This type of error refers to the situation where it is concluded that a difference between the two groups exists, when in fact it does not. The probability of a type I error is often denoted with the symbol α. As this type of error is based on a situation in which the 'null hypothesis' is correct, it is associated with the p-value given in a hypothesis test, which is often set at 0.05 to indicate 'significance'. This means that there is a 5% chance of a type I error (which in the case of hypothesis testing, is interpreted as 'if the null hypothesis was correct, we would expect to see this difference or greater only 5% of the time - meaning that there is [weak] evidence against the null hypothesis being correct).
| |