The Elements of Statistical Learning: Data Mining, Inference, and Prediction
The Elements of Statistical Learning: Data Mining, Inference, and Prediction
The Elements of Statistical Learning: Data Mining, Inference, and Prediction
net/publication/225734295
CITATIONS READS
12,787 10,079
4 authors, including:
Jerome H. Friedman
Stanford University
215 PUBLICATIONS 109,253 CITATIONS
SEE PROFILE
Some of the authors of this publication are also working on these related projects:
All content following this page was uploaded by Trevor Hastie on 30 May 2014.
The Elements of Statistical Learning: Data Mining, Inference, and Statistical Process Adjustment Methods for Quality Control.
Prediction.
Enrique DEL C AS TILLO . New York: Wiley, 2002. ISBN 0-471-43574-0.
Trevor H AS TIE , Robert T IBSHIR ANI, and Jerome F RIEDM AN . New York: xviii C 357 pp. $99.95 (H).
Springer-Verlag, 2001. ISBN 0-387-95284-5. viii C 533 pp. $74.95 (H).
This book addresses the core issues of integration between statistical process
control (SPC) and engineering process control (EPC). Traditionally, SPC
In the words of the authors, the goal of this book was to “bring together techniques have been developed to monitor variables and, through a (usually
many of the important new ideas in learning, and explain them in a statistical off-line) cycle of diagnosing and correcting special causes, to reduce process
framework.” The authors have been quite successful in achieving this objective, variability. In contrast, EPC techniques have been developed to directly reduce
and their work is a welcome addition to the statistics and learning literatures. process variability by adjusting or controlling input variables based on each
Statistics has always been interdisciplinary, borrowing ideas from diverse elds (usually real-time) observation of the output variables. The area of SPC–EPC
and repaying the debt with contributions, both theoretical and practical, to the integration certainly owes its development to George E. P. Box and his collab-
other intellectual disciplines. For statistical learning, this cross-fertilization is orators. Box and Luceño (1997) concluded that:
especially noticeable. This book is a valuable resource, both for the statisti- To augment the monitoring aspects of statistical process control with appro-
cian needing an introduction to machine learning and related elds and for the priate techniques for process adjustment has long been an evident need. Some
computer scientist wishing to learn more about statistics. Statisticians will es- 35 years ago, in response to a paper that attempted such enhancement a dis-
pecially appreciate that it is written in their own language. cussant [Prof. J. H. Westcott in the discussion of “Some Statistical Aspects
of Adaptive Optimization and Control” by Box and Jenkins, 1962] remarked,
The level of the book is roughly that of a second-year doctoral student in sta- “I welcome this irtation between control engineering and statistics. I doubt,
tistics, and it will be useful as a textbook for such students. In a stimulating arti- however, whether they can yet be said to be going steady.”
cle, Breiman (2001) argued that statistics has been focused too much on a “data
Box and Luceño went on to suggest that their book brings about the desired
modeling culture,” where the model is paramount. Breiman argued instead for
marriage, but I do not think that is completely true. With all due respect to their
an “algorithmic modeling culture,” with emphasis on black-box types of predic-
contribution, I think that perhaps we can celebrate an engagement, but there is
tion. Breiman’s article is controversial, and in his discussion, Efron objects that
a lot of room for improving the bridge between control and monitoring. I think
“prediction is certainly an interesting subject, but Leo’s paper overstates both
that the best approach is to focus on the industrial statistics audience to foster
its role and our profession’s lack of interest in it.” Although I mostly agree with
an appreciation of control engineering (as opposed to focusing on control en-
Efron, I worry that the courses offered by most statistics departments include gineers to develop their statistical appreciation). With that in mind, I believe
little, if any, treatment of statistical learning and prediction. (Stanford, where that this book goes a long way toward achieving this end. He states that the ob-
Efron and the authors of this book teach, is an exception.) Graduate students jective of his book is to “present process adjustment techniques based on EPC
in statistics certainly need to know more than they do now about prediction, methods and to discuss them from the point of view of controlling the quality
machine learning, statistical learning, and data mining (not disjoint subjects). of a product.” This product quality focus is a good point of connection. The
I hope that graduate courses covering the topics of this book will become more book goes on to truly synthesize several sources across time series, statistics,
common in statistics curricula. and control theory, with a clear focus on quality control outcomes.
Most of the book is focused on supervised learning, where one has in- The book’s organization makes a natural progression from process monitor-
puts and outputs from some system and wishes to predict unknown outputs ing basics (Chap. 1), to stochastic-dynamic process modeling (Chaps. 2–4), to
corresponding to known inputs. The methods discussed for supervised learn- process control techniques (Chaps. 5–9). In the rst chapter, Figure 1.23 pro-
ing include linear and logistic regression; basis expansion, such as splines and vides a very nice owchart guide to the use of the EPC and SPC techniques
wavelets; kernel techniques, such as local regression, local likelihood, and ra- discussed in the book. SAS procedures that support aspects of the modeling
dial basis functions; neural networks; additive models; decision trees based on and analysis are discussed in suf cient detail within the text. There are several
recursive partitioning, such as CART; and support vector machines. examples of SAS code and output. The graphical user interface of MATLAB’s
There is a nal chapter on unsupervised learning, including association system ID toolbox is presented and discussed brie y. Minitab’s STAT functions
rules, cluster analysis, self-organizing maps, principal components and curves, are also frequently used for data analysis and plotting.
and independent component analysis. Many statisticians will be unfamiliar with Although the author suggests that the book could be used in an undergradu-
at least some of these algorithms. Association rules are popular for mining com- ate course, I think its level demands a certain amount of statistical and mathe-
mercial data in what is called “market basket analysis.” The aim is to discover matical sophistication that would be beyond all but the very top undergraduate
types of products often purchased together. Such knowledge can be used to students. However, rst-year or second-year graduate students would be very
develop marketing strategies, such as store or catalog layouts. Self-organizing well prepared in the area by using this text. As a text for course instruction,
maps (SOMs) involve essentially constrained k-means clustering, where pro- this book certainly excels on the basis of exercises and real datasets. There are
totypes are mapped to a two-dimensional curved coordinate system. Indepen- about 15 problems at the end of each chapter (a solutions manual was prepared
dent components analysis is similar to principal components analysis and factor by Rong Pan) and 18 data les and spreadsheets that serve to illuminate top-
analysis, but it uses higher-order moments to achieve independence, not merely ics in each chapter. (In comparison, the Box and Luceño book has only one
zero correlation between components. or two problems in most chapters and only three datasets.) The author’s web-
A strength of the book is the attempt to organize a plethora of methods into a site, www.ie.psu.edu/faculty/castillo/castillo.htm, contains the electronic les,
coherent whole. The relationships among the methods are emphasized. I know solutions manual, and errata in the rst printing.
of no other book that covers so much ground. Of course, with such broad cov- One of the past criticisms of the quality area is the perspective that quality
erage, it is not possible to cover any single topic in great depth, so this book is free, or that quality objectives should be pursued for purely intrinsic reasons.
will encourage further reading. Fortunately, each chapter includes bibliographic Six Sigma, of course, has sought to work against this misconception with a
notes surveying the recent literature. These notes and the extensive references focus on bottom-line pro tability of quality improvement. This book, with its
strong focus on controlling the quality of products and processes, underscores
provide a good introduction to the learning literature, including much outside of
the high relevance of quality control to industrial practice. To further this idea,
statistics. The book might be more suitable as a textbook if less material were
I would have liked to see some strategic-level consideration of how statistical
covered in greater depth; however, such a change would compromise the book’s
process adjustment may factor into a company’s nancial strength by creating
usefulness as a reference, and so I am happier with the book as it was written.
opportunities that it may not have otherwise had.
Overall, I think that this is a great book that is well worth its price. Most of
David RUP PERT the text focuses on univariate system analysis, and it will help the reader ap-
Cornell University preciate the fundamentals. The nal chapter gives a brief introduction to mul-
tivariate system analysis and suggests other avenues of future research in the
SPC–EPC area. For those working in the manufacturing area—from either an
REFERENCE
academic or an industry point of view—this book is a valuable resource.
Breiman, L. (2001), “Statistical Modeling: The Two Cultures” (with discus- Harriet Black N E MBHAR D
sion), Statistical Science, 16, 199–231. University of Wisconsin, Madison