Data types

Epidemiological investigation requires a good understanding of different data types, as this will strongly influence data analysis and interpretation. Data can broadly be classified as qualitative and quantitative, and within each of these groups, data can be further categorised as shown below. Although different grouping systems are available, it is important to consider the type of data being dealt with prior to any analysis. If desired, data can often be changed into different types through manipulation (for example, the quantitative variable weight can be converted to qualitative variables such as low/medium/high or low/not low).

Qualitative data

Qualitative data are 'categorical' (or binary) data, and as such are often not expressed numerically, meaning that they are best summarised using percentages or proportions. These types of data can be further classified as nominal and ordinal:

Nominal

Nominal data differ from all other data types described here by lacking any order between the different categories, and can be described further as either binary ('yes/no') or categorical (containing more than two categories) in nature. Examples of binary data are disease status (positive/negative), sex (male/female) and presence/absence of a factor of interest; whereas examples of categorical data may be breed, coat colour, location and feed type.

Ordinal

Ordinal data are inherently categorical in nature, but have an intrinsic order to them. Examples of ordinal data are lameness score, level of agreement with a statement (Likert items), categorised weight and categorised lactation number. As can be seen in the last two examples here, ordinal data can be created through manipulation of quantitative data. It should be noted that even if numbers are used to describe these categories, these numbers do not necessarily follow the same scale (for example, the difference between a lameness score of 5 and 3 is not necessarily the same as the difference between scores of 4 and 2). Although ordinal data are commonly described in terms of percentages or proportions, the median may also be used as a measure of central tendency.

Quantitative data

Quantitative data are numerical in nature, with a set, meaningful interval between different measurements. Depending on the shape of the distribution, they may be described using the mean and standard deviation (for normally distributed data), or the median and the range/interquartile range (for non-normally distributed data). Quantitative data can be further classified as discrete or continuous:

Discrete

Discrete data only include integer values, with decimal places having little or no meaning. 'Count' data, derived by counting the number of events or animals of interest, are a type of discrete data. Examples of discrete data are the number of infected animals within a group, the number of episodes of pathogen shedding following initial infection, the number of piglets born per year, and the number of lactations which the animal has been through.

Continuous

Continuous data can take any of a range of values, which can only be estimated to some degree of accuracy (for example, by increasing the accuracy, the value obtained will change). As such, the possible number of different values which the data can take are infinite. Examples of types of continuous data are weight, height, volume of milk produced during a lactation, and the infectious period of a pathogen. Age may be classified as either discrete (as it is commonly measured in whole years) or continuous (as the concept of a fraction of a year is plausible) - of these, the latter is probably more appropriate. Of course, age could alternatively be categorised and treated as ordinal data. Continuous data can be further categorised according to the levels of measurement used:

Interval

Interval data do not possess what is known as relationship of scale, due to the presence of an arbitrarily-defined zero point. This means that although (as for all quantitative data), an absolute difference of a set magnitude is the same regardless of where on the scale of measurement this difference is, the same does not apply to relative differences. Interval data can also take negative values. To explain this, consider the celsius temperature scale: although the absolute difference of 10°C between 10°C and 20°C is the same as that between 90°C and 100°C, the relative difference in temperature between 10°C and 20°C is very different from that between 50°C and 100°C. As such, it would be incorrect to classify either of these relative differences as a difference of '100%'.

Ratio

Ratio data do possess a relationship of scale, and so both have a defined zero point and cannot accommodate negative values. Many common types of measurements are on the ratio scale - including mass, length and time. Temperature is also measured on the ratio scale when measured in Kelvin rather than Celsius or Fahrenheit (note that in the case of the difference between Kelvin and Celsius, the sole difference in measurement scale relates to the position of the zero point, since both scales have the same magnitude).