Data description

From WikiVet English
Jump to navigation Jump to search

A central concept in any epidemiological investigation is that of appropriate data description. A number of methods are available for describing data, and the most appropriate one will depend upon both the type of data available and the aims of the investigation. If these issues are not considered, useful information may be lost, or more seriously, a misleading estimate may be made.

Measures of central tendency

In many cases, some estimate of an 'average' of the parameter of interest within the population is desired - also known as a measure of central tendency. There are three main measures of central tendency used in epidemiological studies - known as the mean, the median and the mode. These will be described below.

Mean

The mean of a set of numbers is what most people consider the 'average', and is calculated by adding all the numbers together and dividing by the number of individuals. There are a number of different types of means available, although they are all based upon the same calculation, but with different transformations applied before and after (the arithmetic mean is that described above; the geometric mean is calculated using the natural logs of the numbers, and so must the antilog must be taken of the resultant estimate; and the harmonic mean uses the reciprocals of the numbers, and so the reciprocal of the final estimate should be taken). Although the proportion of individuals experiencing a binary event (classified as 1 or 0) is calculated in the same way as the arithmetic mean, it is not itself considered a measure of central tendency.

Median

The median is the exact midpoint in a series of data which have been placed in an ascending order, and is also known as the 50th percentile. Therefore, approximately 50% of observations lie below the median and 50% lie above. In the situation where the number of observations is even, the mean of the middle two values is taken to indicate the median.

Mode

The mode is the most common value, and as such is the only measure of central tendency which may have more than one value. It is also the only measure of central tendency which can be used for non-numerical (categorical) data.

Choice of descriptive measure

As mentioned above, the descriptive measures available will depend upon the aim of the study and the data type in question. The options available for non-numerical (categorical) data are quite limited, but for numerical data, a measure of central tendency and a measure of 'spread' are often presented.

Qualitative data

Qualitative data may or may not have an intrinsic order, and can always be described using proportions (i.e. the proportion of animals in each 'category'). The mode can also be a useful measure of central tendency, and the median may be appropriate in some cases of numerical ordinal data, such as body condition score (although careful consideration should be given to the usefulness of this before using this measure. There are no meaningful measures of spread for qualitative data, as the difference between adjacent categories is not standard, although the range of ordinal values may be useful.

Quantitative data

These data can be described according to a measure of central tendency, spread and the shape of their distribution.