Statistics for Business Intelligence – Shape

In this post i will discuss the measures of shape used for statistical analysis, specifically skewness and kurtosis. I will also discuss the box and whisker plot.

Skewness – A normal distribution is a bell curve that is perfectly symmetric. perfect symmetry implies that the values are distributed equally around the center. a graph is said to the skewed if it is not symmetrical. The graph may be either skewed towards the right (negatively skewed) or towards the left (positively skewed)

Image Source – Wikepedia
A normal distribution that has the mean, median and mode at the center of the distribution has no skewness.
skewness can be quantified by a measure known as Pearsonian coefficient of skewness.

If the coefficient is positive the plot is positively skewed. Larger the value, greater is the skewness.

Kurtosis :
Kurtosis defines how pointed (tall and thin) the plot is. If the plot if large and thin then it is referred to as leptokurtic. If it is flat and spread out it is called platykurtic. plots in between are called mesokurtic.

Image Source : http://www.uwsp.edu/

Box and Whisker plots : These plots are widely used to understand the data distribution. To understand these plots we need to understand quartile. A quartile breaks the data into four parts. i.e. there are three quartiles in the complete data range. Note that the data needs to be in ascending order to calculate the quartile. The first quartile contains the first 25 percentile of data. Here’s an example of the box and whisker plot

The characteristics of the plot are
1) The median or Q2 is the center of the graph.
2) the left end of the box is Q1 and the right is Q3. i.e. 50% of values are inside the box.
3) The line segment outside the box is called a whisker. A length of the whisker is 1.5 IQR (Interquartile range = Q3-Q1). This is also called the inner fence. If data is present outside this inner fence then an outer fence = 3 IQR can be drawn.
4) Values outside the inner fence but inside the outer fence are called mild outliers whereas data outside the outer fence are called extreme outliers.
5) If the median is to the right of the box then the middle 50% of data is skewed to the left.
6) If the longest whisker is to the right of the box then the outer data are skewed to the right.

1 thought on “Statistics for Business Intelligence – Shape”

Leave a Comment