no edit summary
Line 35: Line 35:  
These data can be described according to a '''measure of central tendency''', their '''spread''' and the '''shape''' of their distribution. The shape of the distribution is important in deciding upon the most appropriate method of description, and can be described according to '''skew''' (symmetry of the distribution) and '''kurtosis''' ('pointyness' of the distribution). A '''normal distribution''' (shown below) has a skew of zero and a kurtosis of zero, and is a very commonly used distribution in statistics. If data follow a normal distribution, then they can be completely described using only the '''mean''' and the '''standard deviation'''.<br>
 
These data can be described according to a '''measure of central tendency''', their '''spread''' and the '''shape''' of their distribution. The shape of the distribution is important in deciding upon the most appropriate method of description, and can be described according to '''skew''' (symmetry of the distribution) and '''kurtosis''' ('pointyness' of the distribution). A '''normal distribution''' (shown below) has a skew of zero and a kurtosis of zero, and is a very commonly used distribution in statistics. If data follow a normal distribution, then they can be completely described using only the '''mean''' and the '''standard deviation'''.<br>
    +
[[File:Skewed.png|thumb|left|upright=2.0|An example of data with a right skew (above) and data with a left skew (below).]]
 
However, data may be skewed to the right (where there is a 'tail' on the right, also known as a positive skew) or to the left (where there is a 'tail' on the left, also known as a negative skew). In these cases, the observations in the tail can affect the estimate of the mean, and make it less useful as a measure of central tendency. This (and the lack of symmetry in the distribution) will also reduce the usefulness of the standard deviation as a measure of spread. In these cases, it is more appropriate to describe the data using the '''median''' and the '''interquartile range''' (as these measures are more ''robust'' against these extreme values).<br>
 
However, data may be skewed to the right (where there is a 'tail' on the right, also known as a positive skew) or to the left (where there is a 'tail' on the left, also known as a negative skew). In these cases, the observations in the tail can affect the estimate of the mean, and make it less useful as a measure of central tendency. This (and the lack of symmetry in the distribution) will also reduce the usefulness of the standard deviation as a measure of spread. In these cases, it is more appropriate to describe the data using the '''median''' and the '''interquartile range''' (as these measures are more ''robust'' against these extreme values).<br>
[[File:Skewed.png|thumb|left|upright=2.0|An example of data with a right skew (above) and data with a left skew (below).]]
+
 
    
In some cases (such as a bimodal distribution), the median may also not be an appropriate measure of central tendency, and the mode(s) may be more appropriate. This demonstrates that careful consideration of the usefulness of the available measures should be given whenever describing data, and 'common sense' should be used to select the most appropriate one. For example, although there is nothing statistically 'wrong' with using the mean to describe a highly skewed dataset, it does not offer the same amount of information as the median would do, and risks misrepresenting the data.<br>
 
In some cases (such as a bimodal distribution), the median may also not be an appropriate measure of central tendency, and the mode(s) may be more appropriate. This demonstrates that careful consideration of the usefulness of the available measures should be given whenever describing data, and 'common sense' should be used to select the most appropriate one. For example, although there is nothing statistically 'wrong' with using the mean to describe a highly skewed dataset, it does not offer the same amount of information as the median would do, and risks misrepresenting the data.<br>
    
[[Category:Veterinary Epidemiology - Statistical Methods|A]]
 
[[Category:Veterinary Epidemiology - Statistical Methods|A]]
700

edits