Confounding

The issue of confounding is of central importance in any analytic epidemiological study (as well as in those descriptive studies aiming to compare different populations), especially in the case of observational studies. Confounding results from non-random differences between the groups of animals being compared in relation to a second, 'confounding' exposure which is independently associated with both the exposure of interest (although not a consequence of this) and the outcome of interest (although not an effect of this). This results in the effect of the exposure of interest is 'mixed up' with the effect of the confounding exposure, and therefore an incorrect estimate of the true association. As such, confounding is viewed by many authors as a form of bias - however, unlike forms of selection and information bias, it is a natural feature of the data (in the case of an observational study), and techniques are available to account for it during analysis.

An example of confounding

As the concept of confounding can be difficult to understand, an example is given here. Consider a cross sectional study investigating the association between allowing dogs to roam freely in a sheep farming area and infection with Echinococcus granulosus tapeworm. Initial results show that animals which were allowed to roam freely were more likely to be infected with the tapeworm; however, further investigation shows that these dogs were also more likely to not have been treated with anthelmintics recently. In this example, anthelmintic use is confounding the association between free roaming and infection: dogs which have not been dosed recently with anthelmintics are both more likely to have been allowed to roam freely, and are more likely to be infected with the tapeworm. Anthelmintic dosing is not a consequence of being allowed to roam freely (i.e. it does not lie on the causal pathway between roaming and infection), nor is it a consequence of having become infected with the tapeworm (since infection is asymptomatic in dogs). If the effect of roaming was accounted for, the expected association between infection and free roaming would be expected to be reduced (although it could still remain to some degree). A common example of confounding in a human epidemiological study is the association between alcohol consumption and lung cancer, which is confounded by cigarette smoking (i.e. people who drink more are more likely to also smoke cigarettes, which are also known to be associated with development of lung cancer).

Identifying counfounding

The first step in identifying confounding should be based on logical consideration of the suspected association:

is there a plausible confounding effect? That is, is the suspected confounding variable independently associated with both the exposure of interest and the outcome of interest?
is the suspected confounding variable a consequence of the exposure of interest? If so, it cannot be considered a confounder. For example, in the example given above, access to dead sheep could not be considered as a confounding variable for the association between roaming and infection, as it is the main presumed mechanism of action of roaming. That is, dogs which roam more have greater access to dead sheep, and dogs which are not allowed to roam would be expected to have minimal access to dead sheep.
is the suspected confounding variable a consequence of the outcome of interest? If so, it cannot be considered a confounder.

If there is a logical explanation for the association, then the effect of the confounding variable on the measure of association (commonly, the odds ratio) between the exposure and outcome of interest should be investigated. There are a number of approaches for this available (commonly using multivariable techniques), but the basic principle can be illustrated by using the example of stratification of the data by the confounding variable. As confounding variables have a differential distribution amongst exposed compared to unexposed groups, and amongst diseased compared to non-diseased groups, stratification according to them will remove this effect. Assume we have three binary variables - x, y and z. If variable y is completely confounding the relationship between variables x and z, then by calculating the odds ratio for the association between x and z at each level of y, the effect of y will be removed, and the odds ratio will be in the region of 1.0. It should be emphasised that no statistical tests for confounding are available, and so its presence requires careful logical consideration as well as investigations such as that described here.

Dealing with confounding

Although the presence of confounding is a characteristic of the data, and is not an error as such, it can result in errors of interpretation of an analytic study if it is not accounted for. Methods available for accounting for confounding can be applied during the design of the study and selection of participants, or during analysis.

Techniques applied during study design

Restriction

By restriction of the source population to include only those individuals with the same level of exposure to the confounding variable, confounding due to this variable can be reduced. As this restriction is applied to all individuals included in the study, there should be minimal bias, and internal validity should be good. This is performed to some degree in any study, whenever the source population is defined - for example, a study investigating risk factors for mastitis in cows may be restricted to dairy cattle only, since the cattle production system is likely to be a strong confounder of any associations.

Matching

This process involves ensuring that the level of exposure to the confounding variable is the same in the groups being compared, and may be performed on an individual level (matching each animal with at least one other of the same exposure to the confounder - known as individual matching, or pair matching) or on a group level (attempting to make the distribution of exposure to the confounding variable similar in the two groups - known as frequency matching). The effects of matching, and therefore the reasons for using a matching strategy may differ between different study designs, and may or may not be primarily aimed at reduction of confounding. Care also needs to be taken when using matching, as there is a risk of overmatching (especially in case-control studies), where the true effect of the variable of interest is unable to be measured due to a close association with the matching variable. It is also important to note that in the case of individually matched studies, or when matching is used in case-control studies, this matching must be accounted for in the analysis.

Techniques applied during study analysis

Stratification

As described earlier, stratification is one method of identification of possible confounding. It can also be used in the case of descriptive studies which may be affected by confounders - for example, if an attempt was made to compare mortality rates amongst two groups with different age structures. Of course, presenting two measures of association (one for each level of the confounding variable) can make clear interpretation of associations more difficult. As such, statistical methods (such as the Wald test for homogeneity) are available to assess whether the stratum-specific odds ratios are approximately equal. If so, a pooled odds ratio can be created, which is a single measure of association, accounting for the confounding variable. If the odds ratios are not equal, this may suggest the presence of interaction (also known as effect modification).

Standardisation

Methods of standardisation can be used to account for confounders in both analytic studies and descriptive studies. Broadly, speaking, these methods involve the use of a standard population in order to remove the effect of differences in distribution of confounding variables between populations. Two methods of standardisation are recognised:

Direct standardisation involves the weighting of the observed measures of disease frequency in the study population within each stratum of the confounding variable according to the distribution of these strata within the standard population, and should only be used in the case of large sample sizes. Therefore, this technique requires knowledge of both the stratum-specific estimates in the study population and the distribution of strata of the confounder in the standard population.
Indirect standardisation involves the comparison of the observed disease frequency with that which would be expected if the stratum-specific measures of frequency were the same as those in the standard population, and can be used with smaller sample sizes. As such, it requires knowledge of the stratum-specific estimates of disease frequency in the standard population and the distribution of strata of the confounding variable in the study population. Techniques are used in order to calculate a standardised mortality (or morbidity) ratio, which is a measure of how many times more common the disease is in the study population than in the standard population.

Multivariable techniques

Multivariable techniques such as linear regression and logistic regression allow a number of variables to be accounted for simultaneously, and allow easy interpretation of the effect of each of these whilst the others are held at a set level. These techniques are the most commonly used method of accounting for confounding, and also allow a number of different exposures to be investigated simultaneously.