Central Tendency vs Dispersion
In descriptive and inferential statistics, several indices are used to describe a data set corresponding to its central tendency, dispersion, and skewness: the three most important properties that determine the relative shape of the distribution of a data set.
What is central tendency?
Central tendency refers to and locates the center of the distribution of values. Mean, mode, and median are the most commonly used indices in describing the central tendency of a data set. If a data set is symmetric, then both the median and the mean of the data set coincide with each other.
Given a data set, the mean is calculated by taking the sum of all the data values and then dividing it by the number of data. For example, the weights of 10 people (in kilograms) are measured to be 70, 62, 65, 72, 80, 70, 63, 72, 77 and 79. Then the mean weight of the ten people (in kilograms) can be calculated as follows. Sum of the weights is 70 + 62 + 65 + 72 + 80 + 70 + 63 + 72 + 77 + 79 = 710. Mean = (sum) / (number of data) = 710 / 10 = 71 (in kilograms). It is understood that outliers (data points that deviate from the normal trend) tend to affect the mean. Thus, in the presence of outliers mean alone will not give a correct picture about the center of the data set.
The median is the data point found at the exact middle of the data set. One way to compute the median is to order the data points in ascending order, and then locate the data point in the middle. For example, if once ordered the previous data set looks like, 62, 63, 65, 70, 70, 72, 72, 77, 79, 80. Therefore, (70+72)/2 = 71 is at the middle. From this, it is seen that median need not be in the data set. Median is not affected by the presence of the outliers. Hence, median will serve as a better measure of central tendency in the presence of outliers.
The mode is the most frequently occurring value in the set of data. In the previous example, the value 70 and 72 both occurs twice and thus, both are modes. This shows that, in some distributions, there is more than one modal value. If there is only one mode, the data set is said to be unimodal, in this case, the data set is bimodal.
What is dispersion?
Dispersion is the amount of spread of data about the center of the distribution. Range and standard deviation are the most commonly used measures of dispersion.
The range is simply the highest value minus the lowest value. In the previous example, the highest value is 80 and the lowest value is 62, so the range is 80-62 = 18. But range does not provide a sufficient picture about the dispersion.
To calculate the standard deviation, first the deviations of data values from the mean are calculated. The root square mean of deviations is called the standard deviation. In the previous example, the respective deviations from the mean are (70 – 71) = -1, (62 – 71) = -9, (65 – 71) = -6, (72 – 71) = 1, (80 – 71) = 9, (70 – 71) = -1, (63 – 71) = -8, (72 – 71) = 1, (77 – 71) = 6 and (79 – 71) = 8. The sum of squares of deviation is (-1)2+ (-9)2+ (-6)2+ 12+ 92+ (-1)2+ (-8)2+ 12+ 62+ 82= 366. The standard deviation is √(366/10) = 6.05 (in kilograms). Unless the data set is greatly skewed, from this it can be concluded that the majority of the data is in the interval 71±6.05, and it is indeed so in this particular example.
What is the difference between central tendency and dispersion? • Central tendency refers to and locates the center of the distribution of values • Dispersion is the amount of spread of data about the center of a data set. |
Leave a Reply