The second volume, by Michael Crawley, is an R code sequel to his monumental *Statistical Computing* (reviewed in *Clin Chem* 2003;49:1959–62), which comprehensively explained statistical procedures using the S-Plus language (which is closely related to R). Crawley comments “This book is an introduction to the essentials of statistical analysis for students who have little or no background in mathematics or statistics.” Later he states “The hardest part of any statistical work is getting started—and one of the hardest things about getting started is choosing the right kind of statistical analysis. The truth is that there is no substitute for experience; the way to know what to do, is to have done it properly lots of times before.” The key, Crawley suggests, is to correctly categorize the response and explanatory variables. Is the response variable continuous, a count, a proportion, a time-at-death, or a category? Is the explanatory variable continuous or variable, or a mixture of both? Once these aspects are established, the appropriate statistical method will suggest itself—ANOVA, ANCOVA, normal regression, logistic regression, log-linear models, binary logistic analysis, and survival analysis. Then follows much sound statistical advice—fit the model to the data, not the other way around (the best model produces minimal residual deviance); replicate to increase reliability; randomize to reduce bias; use controls (“no controls, no conclusions”).

Having laid out the statistical roadmap, the next few chapters cover the essential statistical tools: dataframes, central tendency, variance (this topic, Crawley suggests, is the most important one in the book because of its central importance in statistical analysis), the statistics of single and double samples. One of the most interesting chapters in the book is that on Statistical Modeling.

The purpose of such modeling is “to determine a minimal adequate model from the large set of potential models that might be used to describe the given set of data”. “Minimal” means that the model should be as simple as possible. The approach is thus to fit all the available factors, then start step-wise removal of the least significant terms (assessed by monitoring the resulting change in deviance). R contains an update function that ensures that the process of model simplification is “wonderfully simple”. Crawley sounds a warning regarding the use of residuals when examining a model. An influential data point forces a regression line to be close to it, creating a small residual. Thus, the influence of each data point should be measured. R does this with the Cook’s distance plot. Points with large influence can be removed and the model fit re-examined. All in all, this chapter is a marvelous guide to the remaining sections of the book.

Many chapters contain approaches to concepts with relevance to clinical chemistry such as Proportion Data (studies on percentage mortality or proportions responding to clinical treatment), Death and Failure Data, and Binary Response Variable (dead or alive or healthy or diseased). The book contains an Appendix outlining the elements of the R language. Unfortunately the individual chapters do not include references; these are collected at the end of the book together with an index. There are no exercises in the book; instead Crawley provides these on his web site (334 pages arranged in 12 fully worked stand-alone practical sessions) in addition to all of the book’s data files and scripts (programs) for the commands and figures.

Both volumes cover almost identical ground. The main difference between them is a subtle one of statistical perspective. Crawley is a distinguished biologist; therefore, his viewpoint is that of an experimentalist performing his own statistical analyses, whereas Maindonald and Braun are professional statisticians who collaborate with experimentalists. So which book to get? If you like the style of the Maindonald/Braun book, wait for the forthcoming second edition. If, however, you already possess Crawley’s S-Plus book, get his R book. Both will provide you with enhanced statistical insights (if you are prepared for some hard work) and access to a free and powerful computing language. Good luck!

- © 2006 The American Association for Clinical Chemistry