Quantiles in Stata and R

Stata and R compute percentiles differently. Let us load the auto dataset and compute the 75th percentile of price using Stata’s centile

. sysuse auto, clear
(1978 Automobile Data)

. centile price, centile(75)

                                                       -- Binom. Interp. --
    Variable │       Obs  Percentile    Centile        [95% Conf. Interval]
─────────────┼─────────────────────────────────────────────────────────────
       price │        74         75        6378        5798.432      9691.6

. save auto, replace
file auto.dta saved

We find that the 75-th percentile is 6378.

Now let us do the same with R. We’ll use the haven library to read a Stata file

> library(haven)
> auto <- read_dta("auto.dta")
> q <- quantile(auto$price, 0.75); q
    75% 
6332.25 

According to R, the 75-th percentile is 6332.2.

Turns out R has 9 types of quantiles, the default is 7. To get the same result as centile specify type 6, which gives 6378.

The Stata commands summarize, detail, xtile, pctile and _pctile use yet another method, equivalent to R’s type 2. These give the third quartile as 6342. The last three commands have an altdef option that gives the same answer as centile.

For a discussion of these methods see Hyndman, R. J. and Fan, Y. (1996) Sample quantiles in statistical packages, American Statistician 50:361-365.