logo 2uData.com

The previous pageQuantilesThe next page

Quantiles (also known as fractiles) are used to determined the relative position of a value in relation with other values in a numerical set.

Let's consider an ascending ordered series (AOS); quantiles are the values that divide this series into “equal groups”, groups which contain equal, or approximately equal, number of elements. So median is a type of quantile.

Besides median, the most popular quantiles are percentiles and quartiles.

Percentiles

 

Percentiles are values dividing an AOS into 100 equal groups. So `p^(th)` percentile of an AOS, symbolized as `P_p`, is a value that (Fig. 1):

  • `p%` elements of this series are less than or equal to `P_p`,
  • `(100–p)%` elements of this series are greater than or equal to `P_p`
  • `P_p` can belong to this series or not.
p % elements(100 − p) % elements Pp

Fig. 1 `p^(th)` percentile of an ascending ordered series

We can notice that median is the same as 50th percentile.

In practice, to determine `p^(th)` percentile of an AOS, we:

  • calculate `k=(np)/100`,
  • if `k` is an integer, `P_p` is the average of values `k^(th)` and `(k + 1)^(th)` of series,
  • if `k` is not an integer, `P_p` is the `m^(th)` value of series; `m` is the integer next to `k`.

Example : What is the `80^(th)` percentile of an AOS consisting of 247 numbers.

  • `k=(247xx80)/100=197,6`
  • Because `k` is not an integer, so `80^(th)` percentile of this AOS is the `198^(th)` number.

Quartiles

 

Quartiles are the values, symbolized as `Q_1`, `Q_2`, `Q_3`, dividing an AOS into 4 equal groups (Fig. 2).

Q1Q2Q3 25 % elements25 % elements25 % elements25 % elements

Fig. 2 Quartiles

Hence `Q_1` is the same as `25^(th)` percentile, `Q_2` is the same as `50^(th)` percentile or median, and `Q_3` is the same as `75^(th)` percentile.

So for the AOS consisting of 247 number in the previous example

  • `Q_1` is the `62^(th)` number,
  • `Q_2` or median is the `124^(th)` number,
  • `Q_3` is the `186^(th)` number,

Quartiles can express, at the same time, the central tendency and degree of dispersion. So they are widely used in data description and data analysis.


Box plot

 

Box plot is the visualization of quartiles, it is extensively used to illustrate the distribution of a variable. This plot can be horizontal or vertical. Fig. 3 is the horizontal box plot of 150 numbers taken randomly in the range [0 - 1000].

MQ1Q3minmaxWhiskerOutliersIQRR

Fig. 3 Components of a box plot

Box plot consists of :

  • a rectangular (box), starts at `Q_1`, ends at `Q_3`. The distance between these two values is interquartile range (`IQR`),
  • in this rectangular, there is a line at median `M`,
  • two lines, frequently named as whiskers, start at two ends of this rectangular,
  • values outside whiskers (if exist) are considered as outliers.

There are several options for the other ends of whiskers, the most popular are:

  • lowest value `x_min` and highest value `x_max`, (in this case, there is not outlier),
  • values in the interval of `R` and farthest from corresponding quartile (as in Fig. 3).

`R` is the maximal length of whiskers. It is the product of `IQR` and whisker coefficient (common value is 1,5).



The previous pageThe first page of chapterThe next page


This web page was last updated on 01 December 2018.