6 Conclusion
These notes have hardly scratched the surface of R, which has many more
statistical functions. These include functions to calculate the
density, cdf, and inverse cdf of distributions such as chi-squared, t, F,
lognormal, logistic and others. The survival library includes
methods for the estimation of survival curves, tests of differences
between survival curves, and Cox proportional hazards models. The
library nlme includes code for fitting linear mixed effect
models (including multilevel models) to normally distributed data.
Many new statistical procedures are first made available to the research
community in the form of S-Plus and R functions.
In addition, R is a full-fledged programming language, with a rich complement of mathematical functions, matrix operations and control structures. If you would like to have a function to compute logits, for example, you can write one just like this:
logit <- function(p) {
log(p/(1-p))
}
This function takes as argument a vector of proportions and returns the logits. (The last quantity calculated in a function is returned by default.) Of course this is a very primitive version, because there is no argument checking. A somewhat better version is this:
logit <- function(p) {
if (!is.numeric(p) || any(p<0) || any(p>1))
stop("argument must be probabilities between 0 and 1")
log(p/(1-p))
}
The function any called with a logical vector returns
true if any element of the vector is true. Of course a value may be
in the range (0,1) but so close to either extreme that calculation of
the logit could fail; bullet-proofing the function would require more
sophisticated code, but the version above is serviceable.
R is an interpreted language but it is reasonably fast, particularly if you take advantage of the fact that operations are vectorized and try to avoid looping. Where efficiency is crucial you can always write a function in a compiled language such as C or Fortran and then call it from R. Some of my work on multilevel generalized linear models uses this approach. To learn more about programming R read Venables and Ripley (2000), Chambers (2008), and the manual on Writing R Extensions that comes with the R distribution.
References
Becker, Richard A. and John M. Chambers (1984). S: An Interactive Environment for Data Analysis and Graphics Wadsworth, CA.
Becker, Richard A.; John M. Chambers and Allan R. Wilks (1988). The New S Language. Chapman & Hall, London
Braun, W. John and Duncan J. Murdoch (2007). A First Course in Statistical Programming with R. Cambridge University Press, Cambridge.
Chambers, John M. (1998). Programming with Data. Springer, New York.
Chambers, John M (2008). Software for Data Analysis: Programming with R. Springer, New York.
Chambers, John M. and Trevor J. Hastie, Editors (1992). Statistical Models in S. Chapman & Hall, London.
Dalgaard, Peter (2008). Introductory Statistics with R. 2nd Edition Springer, New York.
Everitt, Brian and Torsten Hothorn (2006). A Handbook of Statistical Analyses Using R. Chapman & Hall/CRC, Boca Raton, FL.
Fox, John (2002). An R and S-Plus Companion to Applied Regression. Sage Publications, Thousand Oaks, CA.
Murrell, Paul (2005). R Graphics. Chapman & Hall/CRC, Boca Raton, FL.
Pinheiro, Jose C. and Douglas M. Bates (2000). Mixed-Effects Models in S and S-Plus. Springer, New York.
Therneau,Terry M. and Patricia M. Grambsch (2000). Modeling Survival Data: Extending the Cox Model. Statistics for Biology and Health. Springer, New York.
Venables, William N. and Brian D. Ripley (2000). S Programming. Springer, New York.
Venables, William N. and Brian D. Ripley (2002). Modern Applied Statistics with S. Fourth Edition. Springer, New York. (Earlier editions published in 1994, 1997 an 1999.)

