Confounding

The issue of confounding is of central importance in any analytic epidemiological study, especially in the case of observational studies. Confounding results from non-random differences between the groups of animals being compared in relation to a second, 'confounding' exposure which is independently associated with both the exposure of interest (although not a consequence of this) and the outcome of interest (although not an effect of this). This results in the effect of the exposure of interest is 'mixed up' with the effect of the confounding exposure, and therefore an incorrect estimate of the true association. As such, confounding is viewed by many authors as a form of bias - however, unlike forms of selection and information bias, it is a natural feature of the data (in the case of an observational study), and techniques are available to account for it during analysis.

An example of confounding

As the concept of confounding can be difficult to understand, an example is given here. Consider a cross sectional study investigating the association between allowing dogs to roam freely in a sheep farming area and infection with Echinococcus granulosus tapeworm. Initial results show that animals which were allowed to roam freely were more likely to be infected with the tapeworm; however, further investigation shows that these dogs were also more likely to not have been treated with anthelmintics recently. In this example, anthelmintic use is confounding the association between free roaming and infection: dogs which have not been dosed recently with anthelmintics are both more likely to have been allowed to roam freely, and are more likely to be infected with the tapeworm. Anthelmintic dosing is not a consequence of being allowed to roam freely (i.e. it does not lie on the causal pathway between roaming and infection), nor is it a consequence of having become infected with the tapeworm (since infection is asymptomatic in dogs). If the effect of roaming was accounted for, the expected association between infection and free roaming would be expected to be reduced (although it could still remain to some degree). A common example of confounding in a human epidemiological study is the association between alcohol consumption and lung cancer, which is confounded by cigarette smoking (i.e. people who drink more are more likely to also smoke cigarettes, which are also known to be associated with development of lung cancer).

Identifying counfounding

The first step in identifying confounding should be based on logical consideration of the suspected association:

  • is there a plausible confounding effect? That is, is the suspected confounding variable independently associated with both the exposure of interest and the outcome of interest?
  • is the suspected confounding variable a consequence of the exposure of interest? If so, it cannot be considered a confounder. For example, in the example given above, access to dead sheep could not be considered as a confounding variable for the association between roaming and infection, as it is the main presumed mechanism of action of roaming. That is, dogs which roam more have greater access to dead sheep, and dogs which are not allowed to roam would be expected to have minimal access to dead sheep.
  • is the suspected confounding variable a consequence of the outcome of interest? If so, it cannot be considered a confounder.

If there is a logical explanation for the association, then the effect of the confounding variable on the measure of association (commonly, the odds ratio) between the exposure and outcome of interest should be investigated. There are a number of approaches for this available (commonly using multivariable techniques), but the basic principle can be illustrated by using the example of stratification of the data by the confounding variable. As confounding variables have a differential distribution amongst exposed compared to unexposed groups, and amongst diseased compared to non-diseased groups, stratification according to them will remove this effect. Assume we have three binary variables - x, y and z. If variable y is completely confounding the relationship between variables x and z, then by calculating the odds ratio for the association between x and z at each level of y, the effect of y will be removed, and the odds ratio will be in the region of 1.0. It should be emphasised that no statistical tests for confounding are available, and so its presence requires careful logical consideration as well as investigations such as that described here.

Dealing with confounding

Although the presence of confounding is a characteristic of the data, and is not an error as such, it can result in errors of interpretation of an analytic study if it is not accounted for. Methods available for accounting for confounding can be applied during the design of the study and selection of participants, or during analysis.

Techniques applied during study design

Restriction

Matching

Techniques applied during study analysis

Stratification

Standardisation

Multivariable techniques