An important consideration when sampling from a population is that of random error (also known as sampling error), which results from chance variation in the members of any sample taken from a larger population. Random error may affect the conclusions you draw from a study by affecting the precision of a descriptive study, or the power of an analytic study. However, although the magnitude of random error can be quantified to some degree, its direction cannot be predicted due to its random nature. Random errors can be accounted for to some degree through the application of inferential statistics when presenting and interpreting results.

Precision

The precision of an estimate is a measure of the 'repeatability' of this estimate. Therefore, it is a measure of the random error inherent in a sample, which in the case of descriptive studies is closely associated with the confidence interval.

Confidence intervals

Confidence intervals are commonly used in both descriptive studies and in analytic studies in order to indicate the precision of an estimate (whether it be a point prevalence estimate, a mean weight measurement or an odds ratio). Commonly a 95% confidence interval is used - this, simply put, quantifies the range of values which the investigator can be confident contains the true source population value. Therefore, an investigator would have greater confidence that a 99% confidence interval contains the true value than a 95% confidence interval. However, the correct interpretation of a confidence interval can be confusing, relating as it does to a hypothetical situation of repeated sampling.

These issues can be better explained using a hypothetical example. Assuming a study is conducted to investigate the seroprevalence of Peste De Petits Ruminants virus in sheep in one region of an African country. A census of all animals could be conducted, which would allow the determination of the exact seroprevalence (assuming a perfect diagnostic test) - however, this is not logistically or financially viable, and therefore a sample of the sheep population is taken. We will assume that there is no bias at all in the sample, and that a simple random sampling protocol is used. The sample taken gives a point seroprevalence estimate of 30%, and the 95% confidence interval ranges from 20% to 40%. As such, we can be 95% confident that the true seroprevalence to PPRV in this region of the country is between 20% and 40% - no particular seroprevalence estimate within this range is any more or less likely than any other one. Despite this, there remains a small chance that the true seroprevalence lies outside this range. To explain this further, imagine we take another sample from the population, and we get another confidence interval, and we repeat this process again and again until we have 20 individual samples and associated confidence intervals (all from the same source population). On average, we would expect 19 of these (=95%) to contain the true seroprevalence, and 1 of these to not. Note that we cannot say anything about the probability of what the true prevalence is for any of these confidence intervals, since it is either correct (probability of containing the true seroprevalence=100%) or incorrect (probability of containing the true seroprevalence=0%).