Changes

Jump to navigation Jump to search
4,360 bytes added ,  07:37, 5 May 2011
no edit summary
Line 3: Line 3:  
This process of sampling from populations poses potential problems, as it must both select a sufficient number of individuals in order to be useful for the purposes of the study (whilst not sampling more than is required), and must also ensure that any biases in the selection process are minimised.
 
This process of sampling from populations poses potential problems, as it must both select a sufficient number of individuals in order to be useful for the purposes of the study (whilst not sampling more than is required), and must also ensure that any biases in the selection process are minimised.
 
   
 
   
==Concepts==
+
==Populations and samples==
A number of concepts relating to sampling from populations are presented here, using the example of a descriptive study investigating the prevalence of bovine tuberculosis amongst beef cattle in England.  
+
When sampling from populations and when interpreting the results of studies involving sampling, it is important to consider what can be inferred from the results. A number of concepts are presented here, using the example of a descriptive study investigating the prevalence of bovine tuberculosis amongst beef cattle in England.  
 +
 
 
===Target population===  
 
===Target population===  
 
The target population is the population to which the results of the study may be extrapolated out to, even if not all members of this population were eligible for sampling, and is often not clearly defined. In the example given here, it may be that the target population is viewed as all cattle (beef, dairy and noncommercial) in England, or all beef cattle in Great Britain, or all cattle in Great Britain. The decision regarding which population the results can be extrapolated to will depend on the knowledge and experience of the person interpreting the study, and the suitability of this extrapolation is described as the '''external validity''' of the study.
 
The target population is the population to which the results of the study may be extrapolated out to, even if not all members of this population were eligible for sampling, and is often not clearly defined. In the example given here, it may be that the target population is viewed as all cattle (beef, dairy and noncommercial) in England, or all beef cattle in Great Britain, or all cattle in Great Britain. The decision regarding which population the results can be extrapolated to will depend on the knowledge and experience of the person interpreting the study, and the suitability of this extrapolation is described as the '''external validity''' of the study.
Line 12: Line 13:     
===Study sample===
 
===Study sample===
The sample population includes those animals which are included in the final study. It is important to remember that in most epidemiological studies, we are not interested in this population ''per se'' - rather, we are interested in using this sample in order to make statements regarding the source population (and possibly the target population). Because not all members of the source population have been sampled, statistical techniques need to be applied to the results from the study group in order to estimate what the characteristics of the source population are expected to be. Due to this extrapolation, there is always a possibility that any estimates from a sample are incorrect due to [[Random variation|'''random variation''']] in the sample. Although this random variation cannot be controlled without increasing the sample size (or redefining the source population), the accuracy of the estimate can be maximised by ensuring that sources of [[Bias|'''bias''']] are minimised.  
+
The sample population includes those animals which are included in the final study. It is important to remember that in most epidemiological studies, we are not interested in this population ''per se'' - rather, we are interested in using this sample in order to make statements regarding the source population (and possibly the target population). Because not all members of the source population have been sampled, statistical techniques need to be applied to the results from the study group in order to estimate what the characteristics of the source population are expected to be. Due to this extrapolation, there is always a possibility that any estimates from a sample are incorrect due to [[Random error|'''random variation''']] in the sample. Although this random variation cannot be controlled without increasing the sample size (or redefining the source population), the accuracy of the estimate can be maximised by ensuring that sources of [[Bias|'''bias''']] are minimised.  
    
==Approaches to sampling==
 
==Approaches to sampling==
Line 47: Line 48:     
==Sample size calculation==
 
==Sample size calculation==
As mentioned earlier, it is important in any study not only that bias is minimised, but that the sample has sufficient [[Random variation#Confidence intervals and study precision|precision]] (in the case of descriptive studies) or [[Random variation#Hypothesis testing and study power|power]] (in the case of analytic studies). Both of these are closely related to the [[Random variation|random variability]] in any sample taken from a population. Although this can be reduced by increasing the sample size, a number of other considerations (usually logistical and economic considerations) will also be acting in order to reduce the number of samples which can realistically be taken. Statistical techniques are therefore available in order to calculate the required sample size. However, counterintuitively, these require assumptions to be made regarding the final results of the study.
+
As mentioned earlier, it is important in any study not only that bias is minimised, but that the sample has sufficient [[Random error#Precision|precision]] and [[Random error#Hypothesis testing and study power|power]] (in the case of analytic studies) to answer the question(s) for which the study is intended. Both of these are closely related to the random variability in any sample taken from a population. Although this can be reduced by increasing the sample size, a number of other considerations (usually logistical and economic considerations) will also be acting in order to reduce the number of samples which can realistically be taken. Statistical techniques are therefore available in order to calculate the required sample size. Counterintuitively, these require assumptions to be made regarding the final results of the study, as well as information regarding the required [[Random error#Confidence intervals|level of confidence]], precision or power of the study. Sample size formulae are not given here, but can be found in most statistical textbooks.
 +
 
 +
===Expected variation in the data===
 +
The variability of an outcome of interest in the sample collected will have a considerable effect on the precision and power of a study. When the outcome is a continuous variable, this variability can be measured as the variance in the source population. However, in the case of binary outcomes, the concept of variability can be more difficult to comprehend. In these cases, the binomial distribution is used to estimate the variance - calculated as the proportion of animals with the outcome of interest multiplied by the proportion of animals without the outcome of interest. This can be viewed as the expected variation in the proportion estimate of a sample if a number of samples were repeatedly taken from the source population, rather than the variation in the proportion estimate in the source population itself.
 +
 
 +
===Required precision===
 +
In the case of descriptive studies, this relates to the width of the 95% confidence interval. For example, you may want to estimate the seroprevalence to Bluetongue virus to within ±10% of the true population seroprevalence, or you may want to estimate the mean skin thickness of a group of cattle following tuberculin testing to within ±1mm of the true population mean. The concept of precision is also used in analytic studies, in the form of the difference between groups which you wish to detect. As this is closely associated with power calculations, it is mentioned in the 'power' section below.
 +
 
 +
===Level of confidence===
 +
This is used in descriptive studies in order to indicate the level of confidence that the confidence interval of the estimate produced will contain the true population value. Usually, a confidence level of 95% is used. The level of confidence is also The concept of confidence intervals is explained further in the section on [[Random error|random variation]].
 +
 
 +
===Power===
 +
This relates to the ability to detect a difference in a parameter of interest between two groups, and so relates to analytic studies. The power indicates the probability that a study will detect a 'significant' difference between groups (using a specified p-value [usually 0.05] to indicate significance), assuming that a difference of a specified size does exist. For example, if there is a true difference in mean annual milk yield of 500 litres between two groups of cows, a study with a power of 80% will detect a statistically significant difference 80% of the time. That is, if the same study was repeated again and again, selecting the calculated required number of cows from each herd, 80% of these studies would detect a difference between groups and 20% would not.
 +
 
 +
===Clustering===
 +
When cluster or multistage sampling techniques are used, the effect of clustering of the outcome of interest within clusters will have an effect on the required sample size, since animals within the same cluster would be expected to be more similar to each other than to those from other clusters. Therefore, formulas are available in order to calculate the 'design effect' (or DEFF), which indicates the factor by which the calculated sample size needs to be increased by in order to account for this.
 +
 
 +
===Sampling fraction===
 +
This relates to the proportion of the total target population which is sampled. In most epidemiological studies, samples are collected '''without replacement''' (i.e. an individual animal cannot be selected twice), although many of the calculations used are based on the concept of sampling with replacement. This does not cause a problem when (as in most cases), the sampling fraction is low (less than about 5%, when expressed as a percentage). However, if the sampling fraction is high, a correction known as the '''finite population correction''' should be made to account for this in the calculation of the required sample size (and in the final estimates).
 +
 
 +
===Multivariable studies===
 +
When the effect of confounding or interaction is to be accounted for in the study, the sample size needs to be increased accordingly.
 +
 
   −
[[Category:Veterinary Epidemiology - Introduction|E]]
+
[[Category:Veterinary Epidemiology - General Concepts|G]]
700

edits

Navigation menu