Difference between revisions of "Data description"
(4 intermediate revisions by one other user not shown) | |||
Line 1: | Line 1: | ||
− | + | A central concept in any epidemiological investigation is that of appropriate data description. A number of methods are available for describing data, and the most appropriate one will depend upon both the [[Data types|type of data]] available and the aims of the investigation. If these issues are not considered, useful information may be lost, or more seriously, a misleading estimate may be made.<br> | |
==Measures of central tendency== | ==Measures of central tendency== | ||
Line 5: | Line 5: | ||
===Mean=== | ===Mean=== | ||
− | The mean of a set of numbers is what most people consider the 'average', and is calculated by adding all the numbers together and dividing by the number of individuals. There are a number of different types of means available, although they are all based upon the same calculation, but with different ''transformations'' applied before and after (the '''arithmetic mean''' is that described above; the '''geometric mean''' is calculated using the natural logs of the numbers, and so must the antilog must be taken of the resultant estimate; and the '''harmonic mean''' uses the reciprocals of the numbers, and so the reciprocal of the final estimate should be taken) | + | The mean of a set of numbers is what most people consider the 'average', and is calculated by adding all the numbers together and dividing by the number of individuals. There are a number of different types of means available, although they are all based upon the same calculation, but with different ''transformations'' applied before and after (the '''arithmetic mean''' is that described above; the '''geometric mean''' is calculated using the natural logs of the numbers, and so must the antilog must be taken of the resultant estimate; and the '''harmonic mean''' uses the reciprocals of the numbers, and so the reciprocal of the final estimate should be taken). Although the '''proportion''' of individuals experiencing a binary event (classified as 1 or 0) is calculated in the same way as the arithmetic mean, it is not itself considered a measure of central tendency.<br> |
===Median=== | ===Median=== | ||
− | The median is the exact midpoint in a series of data which have been placed in an ascending order, and is also known as the '''50th percentile'''. Therefore, approximately 50% of observations lie below the median and 50% lie above | + | The median is the exact midpoint in a series of data which have been placed in an ascending order, and is also known as the '''50th percentile'''. Therefore, approximately 50% of observations lie below the median and 50% lie above. In the situation where the number of observations is even, the '''mean''' of the middle two values is taken to indicate the median. |
===Mode=== | ===Mode=== | ||
− | The mode is the most common value | + | The mode is the most common value, and as such is the only measure of central tendency which may have more than one value. It is also the only measure of central tendency which can be used for non-numerical (categorical) data. |
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
==Choice of descriptive measure== | ==Choice of descriptive measure== | ||
Line 29: | Line 17: | ||
===Qualitative data=== | ===Qualitative data=== | ||
− | Qualitative data may or may not have an intrinsic order, and can always be described using proportions (i.e. the proportion of animals in each 'category'). The '''mode''' can also be a useful measure of central tendency, and the '''median''' may be appropriate in some cases of numerical ordinal data, such as body condition score (although careful consideration should be given to the usefulness of this before using this measure. | + | Qualitative data may or may not have an intrinsic order, and can always be described using proportions (i.e. the proportion of animals in each 'category'). The '''mode''' can also be a useful measure of central tendency, and the '''median''' may be appropriate in some cases of numerical ordinal data, such as body condition score (although careful consideration should be given to the usefulness of this before using this measure. There are no meaningful measures of spread for qualitative data, as the difference between adjacent categories is not standard, although the '''range''' of ordinal values may be useful. |
===Quantitative data=== | ===Quantitative data=== | ||
− | + | These data can be described according to a '''measure of central tendency''', '''spread''' and the '''shape''' of their distribution. | |
− | These data can be described according to a '''measure of central tendency''', | ||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
[[Category:Veterinary Epidemiology - Statistical Methods|A]] | [[Category:Veterinary Epidemiology - Statistical Methods|A]] |
Revision as of 08:36, 10 May 2011
A central concept in any epidemiological investigation is that of appropriate data description. A number of methods are available for describing data, and the most appropriate one will depend upon both the type of data available and the aims of the investigation. If these issues are not considered, useful information may be lost, or more seriously, a misleading estimate may be made.
Measures of central tendency
In many cases, some estimate of an 'average' of the parameter of interest within the population is desired - also known as a measure of central tendency. There are three main measures of central tendency used in epidemiological studies - known as the mean, the median and the mode. These will be described below.
Mean
The mean of a set of numbers is what most people consider the 'average', and is calculated by adding all the numbers together and dividing by the number of individuals. There are a number of different types of means available, although they are all based upon the same calculation, but with different transformations applied before and after (the arithmetic mean is that described above; the geometric mean is calculated using the natural logs of the numbers, and so must the antilog must be taken of the resultant estimate; and the harmonic mean uses the reciprocals of the numbers, and so the reciprocal of the final estimate should be taken). Although the proportion of individuals experiencing a binary event (classified as 1 or 0) is calculated in the same way as the arithmetic mean, it is not itself considered a measure of central tendency.
Median
The median is the exact midpoint in a series of data which have been placed in an ascending order, and is also known as the 50th percentile. Therefore, approximately 50% of observations lie below the median and 50% lie above. In the situation where the number of observations is even, the mean of the middle two values is taken to indicate the median.
Mode
The mode is the most common value, and as such is the only measure of central tendency which may have more than one value. It is also the only measure of central tendency which can be used for non-numerical (categorical) data.
Choice of descriptive measure
As mentioned above, the descriptive measures available will depend upon the aim of the study and the data type in question. The options available for non-numerical (categorical) data are quite limited, but for numerical data, a measure of central tendency and a measure of 'spread' are often presented.
Qualitative data
Qualitative data may or may not have an intrinsic order, and can always be described using proportions (i.e. the proportion of animals in each 'category'). The mode can also be a useful measure of central tendency, and the median may be appropriate in some cases of numerical ordinal data, such as body condition score (although careful consideration should be given to the usefulness of this before using this measure. There are no meaningful measures of spread for qualitative data, as the difference between adjacent categories is not standard, although the range of ordinal values may be useful.
Quantitative data
These data can be described according to a measure of central tendency, spread and the shape of their distribution.