## 6 Conclusion

These notes have hardly scratched the surface of R, which has many more
statistical functions. These include functions to calculate the
density, cdf, and inverse cdf of distributions such as chi-squared, t, F,
lognormal, logistic and others. The `survival`

library includes
methods for the estimation of survival curves, tests of differences
between survival curves, and Cox proportional hazards models. The
library `lme4`

includes code for fitting generalized linear
mixed effect models, including multilevel models.
Many new statistical procedures are first made available to the research
community in the form of R functions.

In addition, R is a full-fledged programming language, with a rich complement of mathematical functions, matrix operations and control structures. If you would like to have a function to compute logits, for example, you can write one just like this:

logit <- function(p) { log(p/(1-p)) }

This function takes as argument a vector of proportions and returns the logits. (The last quantity calculated in a function is returned by default.) Of course this is a very primitive version, because there is no argument checking. A somewhat better version is this:

logit <- function(p) { if (!is.numeric(p) || any(p<0) || any(p>1)) stop("argument must be probabilities between 0 and 1") log(p/(1-p)) }

The function `any`

called with a logical vector returns
true if any element of the vector is true. Of course a value may be
in the range (0,1) but so close to either extreme that calculation of
the logit could fail; bullet-proofing the function would require more
sophisticated code, but the version above is serviceable. For
production code use the built-in function `qlogis()`

,
which returns quantiles of the standard logistic distribution. The
inverse function, going from logits to probabilities, is
`dlogis()`

, trype `?qlogis`

for details.

R is an interpreted language but it is reasonably fast, particularly if
you take advantage of the fact that operations are vectorized and
try to avoid looping. Where efficiency is crucial you can always
write a function in a compiled language such as C or Fortran and
then call it from R. Some of my work on multilevel generalized
linear models uses this approach. To learn more about programming R
read Venables and Ripley (2000), Chambers (2008), and the
manual on *Writing R Extensions* that comes with the R
distribution.

## References

Becker, Richard A. and John M. Chambers (1984).
*
S: An Interactive Environment for Data Analysis and Graphics*
Wadsworth, CA.

Becker, Richard A.; John M. Chambers and Allan R. Wilks (1988).
*The New S Language*.
Chapman & Hall, London

Braun, W. John and Duncan J. Murdoch (2007).
*A First Course in Statistical Programming with R*.
Cambridge University Press, Cambridge.

Chambers, John M. (1998).
*Programming with Data*.
Springer, New York.

Chambers, John M (2008).
*Software for Data Analysis: Programming with R*.
Springer, New York.

Chambers, John M. and Trevor J. Hastie, Editors (1992).
*Statistical Models in S*.
Chapman & Hall, London.

Dalgaard, Peter (2008).
*Introductory Statistics with R*. 2nd Edition
Springer, New York.

Everitt, Brian and Torsten Hothorn (2006).
*A Handbook of Statistical Analyses Using R*.
Chapman & Hall/CRC, Boca Raton, FL.

Fox, John (2002).
*An R and S-Plus Companion to Applied Regression*.
Sage Publications, Thousand Oaks, CA.

Murrell, Paul (2005).
*R Graphics*.
Chapman & Hall/CRC, Boca Raton, FL.

Pinheiro, Jose C. and Douglas M. Bates (2000).
*Mixed-Effects Models in S and S-Plus*.
Springer, New York.

Therneau,Terry M. and Patricia M. Grambsch (2000).
*Modeling Survival Data: Extending the Cox Model*.
Statistics for Biology and Health.
Springer, New York.

Venables, William N. and Brian D. Ripley (2000).
*S Programming*.
Springer, New York.

Venables, William N. and Brian D. Ripley (2002).
*Modern Applied Statistics with S*. Fourth Edition.
Springer, New York. (Earlier editions published in
1994, 1997 an 1999.)