Stata and R compute percentiles differently. Let us load the `auto`

dataset and compute the 75th percentile of `price`

using Stata’s `centile`

. sysuse auto, clear (1978 Automobile Data) . centile price, centile(75) -- Binom. Interp. -- Variable │ Obs Percentile Centile [95% Conf. Interval] ─────────────┼───────────────────────────────────────────────────────────── price │ 74 75 6378 5798.432 9691.6 . save auto, replace file auto.dta saved

We find that the 75-th percentile is 6378.

Now let us do the same with R. We’ll use the `haven`

library to read a Stata file

> library(haven) > auto <- read_dta("auto.dta") > q <- quantile(auto$price, 0.75); q 75% 6332.25

According to R, the 75-th percentile is 6332.2.

Turns out R has 9 types of quantiles, the default is 7. To get the same result as `centile`

specify type 6, which gives 6378.

The Stata commands `summarize, detail`

, `xtile`

, `pctile`

and `_pctile`

use yet another method, equivalent to R’s type 2. These give the third quartile as 6342. The last three commands have an `altdef`

option that gives the same answer as `centile`

.

For a discussion of these methods see Hyndman, R. J. and Fan, Y. (1996) Sample quantiles in statistical packages, *American Statistician* 50:361-365.