Germán Rodríguez
Introducing R Princeton University

6 Conclusion

These notes have hardly scratched the surface of R, which has many more statistical functions. These include functions to calculate the density, cdf, and inverse cdf of distributions such as chi-squared, t, F, lognormal, logistic and others. The survival library includes methods for the estimation of survival curves, tests of differences between survival curves, and Cox proportional hazards models. The library lme4 includes code for fitting generalized linear mixed effect models, including multilevel models. Many new statistical procedures are first made available to the research community in the form of R functions.

In addition, R is a full-fledged programming language, with a rich complement of mathematical functions, matrix operations and control structures. If you would like to have a function to compute logits, for example, you can write one just like this:

logit <- function(p) {
	log(p/(1-p))
}

This function takes as argument a vector of proportions and returns the logits. (The last quantity calculated in a function is returned by default.) Of course this is a very primitive version, because there is no argument checking. A somewhat better version is this:

logit <- function(p) {
	if (!is.numeric(p) || any(p<0) || any(p>1)) 
		stop("argument must be probabilities between 0 and 1")
	log(p/(1-p))
}

The function any called with a logical vector returns true if any element of the vector is true. Of course a value may be in the range (0,1) but so close to either extreme that calculation of the logit could fail; bullet-proofing the function would require more sophisticated code, but the version above is serviceable. For production code use the built-in function qlogis(), which returns quantiles of the standard logistic distribution. The inverse function, going from logits to probabilities, is dlogis(), trype ?qlogis for details.

R is an interpreted language but it is reasonably fast, particularly if you take advantage of the fact that operations are vectorized and try to avoid looping. Where efficiency is crucial you can always write a function in a compiled language such as C or Fortran and then call it from R. Some of my work on multilevel generalized linear models uses this approach. To learn more about programming R read Venables and Ripley (2000), Chambers (2008), and the manual on Writing R Extensions that comes with the R distribution.

References

Becker, Richard A. and John M. Chambers (1984). S: An Interactive Environment for Data Analysis and Graphics Wadsworth, CA.

Becker, Richard A.; John M. Chambers and Allan R. Wilks (1988). The New S Language. Chapman & Hall, London

Braun, W. John and Duncan J. Murdoch (2007). A First Course in Statistical Programming with R. Cambridge University Press, Cambridge.

Chambers, John M. (1998). Programming with Data. Springer, New York.

Chambers, John M (2008). Software for Data Analysis: Programming with R. Springer, New York.

Chambers, John M (2016). Extending R. Chapman and Hall/CRC, Florida.

Chambers, John M. and Trevor J. Hastie, Editors (1992). Statistical Models in S. Chapman & Hall, London.

Dalgaard, Peter (2008). Introductory Statistics with R. 2nd Edition Springer, New York.

Everitt, Brian and Torsten Hothorn (2006). A Handbook of Statistical Analyses Using R. Chapman & Hall/CRC, Boca Raton, FL.

Fox, John (2002). An R and S-Plus Companion to Applied Regression. Sage Publications, Thousand Oaks, CA.

Murrell, Paul (2005). R Graphics. Chapman & Hall/CRC, Boca Raton, FL.

Pinheiro, Jose C. and Douglas M. Bates (2000). Mixed-Effects Models in S and S-Plus. Springer, New York.

Therneau,Terry M. and Patricia M. Grambsch (2000). Modeling Survival Data: Extending the Cox Model. Statistics for Biology and Health. Springer, New York.

Venables, William N. and Brian D. Ripley (2000). S Programming. Springer, New York.

Venables, William N. and Brian D. Ripley (2002). Modern Applied Statistics with S. Fourth Edition. Springer, New York. (Earlier editions published in 1994, 1997 an 1999.)