Measures of strength of association

Analytic studies are conducted in an attempt to identify whether the disease experience in a population differs between groups of animals within this population (defined by exposure to 'risk factors' of interest), in the hope that some indication of a causal association can be achieved. Therefore, methods are required in order to quantify any 'evidence' in support of a possible association. Epidemiologists commonly measure this using measures of strength of association and through the use of null hypothesis tests. It is important to use both of these measures whenever interpreting the results of an analytic study, as they measure different things. Measures of strength of association are an indication of the magnitude of the association, whereas the hypothesis test results give an indication of the probability of seeing the data obtained if there was no association between the exposure and outcome in the source population.

Correlation coefficients

Correlation coefficients are used when comparing two quantitative variables, and are based upon the covariance between these variables amongst the individuals in the study population. Strictly speaking, the covariance is a measure of how two variables differ in individuals in relation to their mean values in the whole population - or put more simply, it is a measure of how the variables change in relation to each other. As the magnitude of the covariance will depend upon the magnitudes of the variables in question, this value is 'standardised' in order to give a correlation coefficient, which lies between -1 (indicating a perfect negative correlation) and +1 (indicating a perfect positive correlation). A coefficient of 0 indicates no correlation, and therefore correlation coefficients are a useful measure of the strength of association between quantitative variables.

Ratio measures

Although correlation coefficients are commonly used in statistical studies, epidemiological investigations often deal with binary exposures and outcomes (such as presence or absence of a proposed risk factor for disease, and presence or absence of disease itself). Therefore, ratio measures such as the prevalence ratio, the risk ratio, the rate ratio and the odds ratio are commonly used as measures of strength of association in epidemiological studies.

Understanding how these measures are calculated is best approached using a contingency table (also known as a cross tabulation), as shown below. In this table, the columns divide all individuals into exposed and unexposed, whilst the rows divide individuals into those who are diseased and those who are not diseased. Therefore, cell 'm₁' represents all diseased individuals, cell 'n₁' represents all exposed individuals, and cell 'a₁' represents exposed individuals who are also diseased.

Disease status	Exposed	Unexposed	Total
Diseased	a₁	a₀	m₁
Non-diseased	b₁	b₀	m₀
Total	n₁	n₀	n

The measures of disease frequency which can be extracted from this table will depend on the study design used (which will be analytic in nature, as data regarding exposure have been collected).

In the case of a cross sectional study, the prevalence can be estimated amongst exposed individuals as (a₁/n₁), and amongst unexposed individuals as (a₀/n₀).

In the case of a cohort study or a experimental study, the disease status of individuals will relate only to new cases of disease (i.e. those which were not diseased at the start of the study. In these cases, the incidence risk can be estimated amongst exposed individuals as (a₁/n₁), and amongst unexposed individuals as (a₀/n₀). Alternatively, the incidence rate can be estimated, if the total animal-time for each exposure group is known, as (a₁/[total number of animal-time units in exposed group]) amongst exposed animals and (a₁/[total number of animal-time units in unexposed group]) amongst unexposed animals.

In the case of a case control study, no measures of disease frequency can be calculated, as selection of individuals was based upon their disease status. However an analytic study can still be conducted. This is achieved by looking at the odds of exposure in the different disease groups. This may seem incorrect (as we are more interested in the relative probabilities of disease amongst exposure groups than the odds of exposure amongst disease groups), but will be explained further below. The odds ratio amongst diseased individuals is calculated as (a₁/a₀), and amongst nondiseased individuals as (b₁/b₀).

For study designs apart from case-control studies, once estimates of the prevalences, risks or rates of disease amongst different exposure groups have been calculated, the ratio of these can be calculated by dividing the estimates for the different groups with each other. In most cases, the frequency of disease amongst exposed animals is divided by the frequency of disease amongst unexposed animals (although the opposite approach can be taken if desired). Therefore, the prevalence or risk ratio can be calculated using teh following equation:
(a₁/n₁) / (a₀/n₀)

As mentioned above, the output from a case control study will be the odds of exposure amongst diseased and nondiseased animals. It can be shown that (as long as the sampling fraction is different for cases and controls), the exposure odds ratio comparing diseased to non diseased animals is identical to the odds ratio for disease, comparing exposed to nonexposed animals. This is why the odds, rather than any other measure, is used in these types of studies. Although, strictly speaking, the exposure odds ratio is calculated as (a₁/a₀) / (b₁/b₀), it is often reformulated, for ease of calculation, into the following equation (known as the cross product ratio):
(a₁×b₀) / (b₁×a₀)

These ratio measures of strength of association vary from approximately 0 to +∞, with an estimate of 1 indicating no association. It should be noted that although the odds ratio for disease is a useful measure of strength of association, its value will differ from the equivalent prevalence or risk ratio, with a tendency towards more extreme (more positive in the case of prevalence/risk ratios greater than 1, or smaller in the case of prevalence/risk ratios less than 1) values when the disease under investigation is common in the population. This may not be a problem when using case control studies, as these are often used when the disease in question is rare. However, odds ratios are commonly used in more advanced statistical methods (particularly logistic regression - in which case, care must be taken when interpreting odds ratios.