Measures of Dispersion/Spread


While central tendency represents a range of numbers with a single value (except in some cases), the measures of dispersion shows us how evenly the data are spread.

Standard Deviation (ฯƒ)

Standard Deviation, put simply shows how much the data deviates from the mean. It represents how far the data are dispersed from the mean. How much the members of a group differ from the mean value for the group.

Advantages

  • Same unit as of the data. This makes it much more interpretable.

How To

ฯƒ (population) | s (sample) | Unit = same as data

Variance (ฯƒ2)

Variance is another measure Indicates how spread out the data are. Both Standard Deviation and Variance measure the same thing and Variance is just SD squared.

Advantages

  • Variances add up eg variance(x + y) = variance(x) + variance(y)

How To

  • Total Error = Sum of deviances from the mean = โˆ‘ (xi - xฬ„)
  • Sum of squared error (SSE) = โˆ‘ (xi - xฬ„)2
  • Variance = SSE โž— (n-1) for estimating population
  • Variance = SSE โž— (n) for sample

Standard Error (of the mean) (SEM)

(using the sample means to estimate the population mean) is the standard deviation of all sample means (of a given size). [see Central Limit Theorem]

ฯƒ รท โˆšn

Confidence Interval

CI = mean ยฑ [(z-scores for confidence level)*standard error]

Z Scores

z is a unit of measure that is equivalent to the number of standard deviations a value is away from the mean value.

Z = (Data Point - Mean) / Standard Deviation Eg : if z = 1.79 , then it is 1.79 ฯƒ away from the mean*


Name Spread
Range Max - Min value
Percentile Divides the data into 100 equal parts using 99 points
Decile Divides the data into 10 equal parts using 9 points
Quartile Divides the data into 4 equal parts using 3 points (25%, 50%, 75%) ย 
Median is the 2nd quartile (50%) . Interquartile range : 25% - 75%

Boxplot uses a limit of 1.5 IQR (Inter-Quartile Range) at whiskers to identify outliers