Measures of Dispersion/Spread
While central tendency represents a range of numbers with a single value (except in some cases), the measures of dispersion shows us how evenly the data are spread.
Standard Deviation (ฯ)
Standard Deviation, put simply shows how much the data deviates from the mean. It represents how far the data are dispersed from the mean. How much the members of a group differ from the mean value for the group.
Advantages
- Same unit as of the data. This makes it much more interpretable.
How To
ฯ
(population) | s
(sample) | Unit = same as data
Variance (ฯ2)
Variance is another measure Indicates how spread out the data are. Both Standard Deviation and Variance measure the same thing and Variance is just SD squared.
Advantages
- Variances add up eg variance(x + y) = variance(x) + variance(y)
How To
- Total Error = Sum of deviances from the mean = โ (xi - xฬ)
- Sum of squared error (SSE) = โ (xi - xฬ)2
- Variance = SSE โ (n-1) for estimating population
- Variance = SSE โ (n) for sample
Standard Error (of the mean) (SEM)
(using the sample means to estimate the population mean) is the standard deviation of all sample means (of a given size). [see Central Limit Theorem]
ฯ รท โn
Confidence Interval
CI = mean ยฑ [(z-scores for confidence level)*standard error]
Z Scores
z is a unit of measure that is equivalent to the number of standard deviations a value is away from the mean value.
Z = (Data Point - Mean) / Standard Deviation Eg : if z = 1.79 , then it is 1.79 ฯ away from the mean*
Name | Spread |
---|---|
Range | Max - Min value |
Percentile | Divides the data into 100 equal parts using 99 points |
Decile | Divides the data into 10 equal parts using 9 points |
Quartile | Divides the data into 4 equal parts using 3 points (25%, 50%, 75%) ย Median is the 2nd quartile (50%) . Interquartile range : 25% - 75% |
Boxplot uses a limit of 1.5 IQR (Inter-Quartile Range) at whiskers to identify outliers