# Measures of Dispersion # Histograms

### In addition to the basic summary statistics described above, we can also describe the data graphically.  For continuous data, the two most common graphs are the histogram and the boxplot.  The green histogram  below displays the diameter and breast height (DBH) measurements of 250 randomly selected loblolly pines in (imaginary) tract A in the Duke Forest.  The histogram shows no obvious skew, however, the green boxplot below suggests that the distribution of DBH values is positively skewed with four outliers. # A Right (Positively) Skewed Distribution # Tracts A and B

### Boxplots are another means to display the distribution of data points.  In the boxplots below, one can easily see the median (the solid line in the middle of the box), and the 25th and 75th percentiles (the top/bottom edges of the box).  The IQR is the distance between these edges.  The whiskers extend out from either edge of the box.  In this case (it can vary by software/statistician’s preference), the end of the whisker extends to the point just within the 1.5*IQR + 75th percentile (on the high end) and 25th percentile – IQR*1.5 on the low end of the distribution (these are called the fences).  In this case, outliers are identified as any point outside the fences.  There are four outlier DBH measurements identified in Tract A.  Because of the positive skewness of the data, we would expected the mean to be greater than the median in each of these tracts.  The descriptive (summary) statistics are shown below the boxplots. ###  ### As we can see in the figure above, four out of the ten distributions contain outliers according to the 1.5*IQR rule, even though the points were randomly drawn from a normal distribution.  This suggests that outliers may occur through a process of random chance. 