Changes

Data description (view source)

Revision as of 08:58, 10 May 2011

2,303 bytes added , 08:58, 10 May 2011

no edit summary

Line 5: Line 5:

===Mean===

−

The mean of a set of numbers is what most people consider the 'average', and is calculated by adding all the numbers together and dividing by the number of individuals. There are a number of different types of means available, although they are all based upon the same calculation, but with different ''transformations'' applied before and after (the '''arithmetic mean''' is that described above; the '''geometric mean''' is calculated using the natural logs of the numbers, and so must the antilog must be taken of the resultant estimate; and the '''harmonic mean''' uses the reciprocals of the numbers, and so the reciprocal of the final estimate should be taken). Although the '''proportion''' of individuals experiencing a binary event (classified as 1 or 0) is calculated in the same way as the arithmetic mean, it is not itself considered a measure of central tendency.<br>

+

The mean of a set of numbers is what most people consider the 'average', and is calculated by adding all the numbers together and dividing by the number of individuals. There are a number of different types of means available, although they are all based upon the same calculation, but with different ''transformations'' applied before and after (the '''arithmetic mean''' is that described above; the '''geometric mean''' is calculated using the natural logs of the numbers, and so must the antilog must be taken of the resultant estimate; and the '''harmonic mean''' uses the reciprocals of the numbers, and so the reciprocal of the final estimate should be taken). It should be noted that the mean can be considerably affected by extreme values (known as 'outliers'), and so generally should be avoided if these are present in the dataset. Although the '''proportion''' of individuals experiencing a binary event (classified as 1 or 0) is calculated in the same way as the arithmetic mean, it is not itself considered a measure of central tendency.<br>

===Median===

−

The median is the exact midpoint in a series of data which have been placed in an ascending order, and is also known as the '''50th percentile'''. Therefore, approximately 50% of observations lie below the median and 50% lie above. In the situation where the number of observations is even, the '''mean''' of the middle two values is taken to indicate the median.

+

The median is the exact midpoint in a series of data which have been placed in an ascending order, and is also known as the '''50th percentile'''. Therefore, approximately 50% of observations lie below the median and 50% lie above. It can be found by identifying the observation lying in place (n+1)/2 in a dataset of n observations, ordered from smallest to largest and where n is odd. In the situation where the number of observations is even, the ''mean'' of the middle two values (n/2 and (n+1)/2) is taken to indicate the median.

===Mode===

−

The mode is the most common value, and as such is the only measure of central tendency which may have more than one value. It is also the only measure of central tendency which can be used for non-numerical (categorical) data.

+

The mode is the most common value in the dataset, and as such is the only measure of central tendency which may have more than one value. It is also the only measure of central tendency which can be used for non-numerical (categorical) data.

+

==Measures of spread==

+

A variety of measures of the ''spread'' of the data are available, and include the '''standard deviation''', the '''variance''', the '''interquartile range''' and the '''range'''.

+

===Variance and standard deviation===

+

The variance of a set of data is calculated by adding together the squared differences of each value from the mean and dividing this by the number of observations. The ''square'' of each difference is used because if the difference itself were used, the values higher than the mean and the values lower than the mean would cancel each other out, meaning that the resulting number would be zero. However, as the squares are used, the variance is expressed in terms of the square of the units of measurement (for example, the variance of the weights of a sample of animals may be 25kg<sup>2</sup>. As this is not easy to relate back to the original units of measurement, the ''square root'' of the variance is often used - which is known as the '''standard deviation'''.

+

===Interquartile range===

+

The interquartile range is based upon percentile points in the data. One of these has already been described - the 50th percentile (also known as the median). In the same way as the 50th percentile separates the lower 50% of observations from the upper 50% of observations, the 25th percentile separates the lower 25% of observations from the upper 75%, and the 75th percentile separates the lower 75% of observations from the upper 25%. The 25th percentile is also known as the '''lower quartile''', and the 75th percentile as the '''upper quartile''', and by subtracting the lower quartile from the upper quartile, the ''interquartile range'' can be calculated.

+

===Range===

+

The range is a very basic measure of spread, and is the difference between the lowest value in the observation and the highest value. It can be strongly affected by outliers, and so care should be taken in its interpretation.

==Choice of descriptive measure==

Gauche

700

edits