Sampling strategies

From WikiVet English
Jump to navigation Jump to search

The information gained through the study of disease in populations which will be increased if more members of the population are sampled. However, the sampling of every individual in a population is rarely feasible from either a logistical or an economic perspective (except in the case of very small-scale studies). Censuses are a form of descriptive study which aims to systematically collect information about every member of the population of interest (the source population), and are carried out in many countries for both livestock as well as for humans (although information regarding disease may not be collected). Statistical surveys are another type of descriptive study, which aim to select a sample (known as the study sample) from the source population, with the intention of extrapolating the information about these individuals to the source population. Similarly, in most analytic studies, a sample of the population must be selected for the same reasons.

This process of sampling from populations poses potential problems, as it must both select a sufficient number of individuals in order to be useful for the purposes of the study (whilst not sampling more than is required), and must also ensure that any biases in the selection process are minimised.

Concepts

A number of concepts relating to sampling from populations are presented here, using the example of a descriptive study investigating the prevalence of bovine tuberculosis amongst beef cattle in England.

Target population

The target population is the population to which the results of the study may be extrapolated out to, even if not all members of this population were eligible for sampling, and is often not clearly defined. In the example given here, it may be that the target population is viewed as all cattle (beef, dairy and noncommercial) in England, or all beef cattle in Great Britain, or all cattle in Great Britain. The decision regarding which population the results can be extrapolated to will depend on the knowledge and experience of the person interpreting the study, and the suitability of this extrapolation is described as the external validity of the study.

Source population

The source population is the population from which the sample was taken, and therefore all members of this population should have a chance of being selected for inclusion in the study. In the case of the example given here, the source population may be all registered beef herds in England. As such, the results obtained from the sample should relate to this population - if this is not the case, then there are likely to be considerable problems in the interpretation of the results. This is known as the internal validity of the study.

Study sample

The sample population includes those animals which are included in the final study. It is important to remember that in most epidemiological studies, we are not interested in this population per se - rather, we are interested in using this sample in order to make statements regarding the source population (and possibly the target population). Because not all members of the source population have been sampled, statistical techniques need to be applied to the results from the study group in order to estimate what the characteristics of the source population are expected to be. Due to this extrapolation, there is always a possibility that any estimates from a sample are incorrect due to random variation in the sample. Although this random variation cannot be controlled without increasing the sample size (or redefining the source population), the accuracy of the estimate can be maximised by ensuring that sources of bias are minimised.

Approaches to sampling

Probability sampling

In probability sampling, every individual in the source population has a calculable, non-zero probability of being randomly selected for the study sample. As such, it is the only appropriate method of sampling for descriptive studies (since the ability to extrapolate the results to the source population is of integral importance in these cases). Some types of probability sampling are described below.

Simple random sampling

Simple random sampling is the optimal method of sampling from a population, from a statistical viewpoint. It requires the formation of a sampling frame, which is a list of all the individuals in the source population. From this, a randomisation procedure is used to select animals for further study. As such, if a sampling frame is not available and cannot be created, simple random sampling cannot be used. A very basic example of a simple random sampling procedure could involve labelling each of the members of the source population on pieces of paper, and randomly selecting a number of these out of a bag - however, computerised techniques involving random numbers are more commonly used nowadays.

Systematic random sampling

Systematic random sampling does not require a sampling frame, but does require the individuals in the source population to each be identifiable and requires them to be randomly ordered in some way. A member of the population is initially selected, and then other individuals are selected based on a set sampling interval (calculated by dividing the size of the source population with the required sample size). A common application of systematic random sampling is when animals are ordered in order to pass through a race or when dairy cattle are entering the milking parlour (note, however, that recently calved animals may be excluded from the sample frame in the latter example, which may result in selection bias). For example, if you wanted to take a sample of 20 animals from a sheep flock containing 200 animals which are all due to pass through a race in order to be dosed with anthelmintics, you calculate the sampling interval (=10), and then randomly select a number within this interval to indicate the first sheep. For example, assume that the number selected is four: you would then sample the fourth animal passing through the race, followed by every 10th animal (i.e. 14th, 24th, 34th...184th, 194th) - giving you a total sample of 20 animals. Given that the order of passing through the race is random, every sheep has an equal probability of being selected (prior to determination of the sampling interval!).

Stratified random sampling

This form of sampling is based on simple or systematic random techniques, but prior to selection of the study sample, the source population is divided into a number of strata (often according to factors considered to be associated with disease). Most commonly, the proportion of animals within each stratum in the source population is used as the proportion of the total sample size to be taken from each stratum (and therefore, the number of animals to be selected per stratum). This approach ensures that every animal has an equal probability of selection. However, other approaches may be used which produce a 'weighted sample' (for example, animals from one particular stratum may be oversampled) - it is important to note that even in these cases, the sampling strategy is still a probability sample, as the probability for each animal within each stratum can still be calculated (even if the probability differs between strata). In these cases, additional approaches must be applied in the analysis stage in order to 'unweight' the sample.

Cluster sampling

Cluster sampling is used in cases where the individual animals of interest are 'clustered' within other groupings (such as animals within farms), and it is easier to sample many animals from a smaller number of clusters than it would be to sample small numbers of animals from many clusters (as would be the likely situation if simple random sampling was used), or if a sampling frame of the clusters (known as the primary sampling units) but not the individual animals is available. A random sample of clusters is first made (using simple or systematic random sampling techniques), followed by sampling of every individual within the selected cluster. As each cluster has an equal probability of being selected, and as every animal within these clusters is then sampled, the probability of selection of any individual animal is constant. It should however be noted that variation in the outcome of interest is likely to be lower within clusters than between clusters, meaning that this must be accounted for when calculating the sample size and when interpreting the results.

Multistage sampling

This sampling approach extends the concepts used in cluster sampling in order to avoid sampling every individual within each cluster (since this may be impractical, for example in the case of large sheep farms containing thousands of animals, or there may be very little variation between animals within the clusters). In order to ensure that the probability of selection of each individual (known as a secondary sampling unit) is constant, the sampling of SSUs within the PSUs can either be based on sampling with a probability proportional to size, or through using the same process as described for cluster sampling, with the selection of a set proportion of all animals within each cluster. Sampling with probability proportional to size involves weighting the larger clusters in order to increase their chance of selection, followed by the selection of a set number of animals within each selected cluster. This approach is often used as it is logistically simpler (since the exact same number of animals is sampled, regardless of the size of the farm). Methods of weighting larger clusters are described elsewhere

Non-probability sampling

Non probability sampling methods do not use random selection techniques, and so are not ideal from a statistical viewpoint. However, they are commonly used in analytic investigations, where the internal validity does not need to be as high as for descriptive studies.

Judgement sample

Convenience sample

Purposive sample

Sample size calculation

As mentioned earlier, it is important in any study not only that bias is minimised, but that the sample has sufficient precision (in the case of descriptive studies) or power (in the case of analytic studies).