ENTROPY MINIMAX MULTIVARIATE STATISTICAL Modeeling PDF

Download as pdf or txt
Download as pdf or txt
You are on page 1of 80

This article was downloaded by: [University of Bristol]

On: 26 February 2015, At: 02:45


Publisher: Taylor & Francis
Informa Ltd Registered in England and Wales Registered Number: 1072954 Registered office: Mortimer House,
37-41 Mortimer Street, London W1T 3JH, UK

International Journal of General Systems


Publication details, including instructions for authors and subscription information:
http://www.tandfonline.com/loi/ggen20

ENTROPY MINIMAX MULTIVARIATE STATISTICAL


MODELING—II: APPLICATIONS
a
RONALD CHRISTENSEN
a
Entropy Limited , Lincoln, Massachusetts, U.S.A.
Published online: 06 Apr 2007.

To cite this article: RONALD CHRISTENSEN (1986) ENTROPY MINIMAX MULTIVARIATE STATISTICAL MODELING—II: APPLICATIONS,
International Journal of General Systems, 12:3, 227-305, DOI: 10.1080/03081078608934938

To link to this article: http://dx.doi.org/10.1080/03081078608934938

PLEASE SCROLL DOWN FOR ARTICLE

Taylor & Francis makes every effort to ensure the accuracy of all the information (the “Content”) contained
in the publications on our platform. However, Taylor & Francis, our agents, and our licensors make no
representations or warranties whatsoever as to the accuracy, completeness, or suitability for any purpose of the
Content. Any opinions and views expressed in this publication are the opinions and views of the authors, and
are not the views of or endorsed by Taylor & Francis. The accuracy of the Content should not be relied upon and
should be independently verified with primary sources of information. Taylor and Francis shall not be liable for
any losses, actions, claims, proceedings, demands, costs, expenses, damages, and other liabilities whatsoever
or howsoever caused arising directly or indirectly in connection with, in relation to or arising out of the use of
the Content.
This article may be used for research, teaching, and private study purposes. Any substantial or systematic
reproduction, redistribution, reselling, loan, sub-licensing, systematic supply, or distribution in any
form to anyone is expressly forbidden. Terms & Conditions of access and use can be found at http://
www.tandfonline.com/page/terms-and-conditions
Int. 1. General Systems, 1986, Vol. 12.227-305 (f;) 1986 Gordon lind Breach, Science Publishers, Inc.
0308·1079/86/1203-ll227 130.00/0 Printed in Great Britain

ENTROPY MINIMAX MULTIVARIATE


STATISTICAL MODELING-II: APPLICATIONS
RONALD CHRISTENSEN
Entropy Limited, Lincoln, Massachusetts, U.S.A.
(Received September 26, /985; in final form November 7, 1985)

Applications of entropy minimax are summarized in three major areas: meteorology, engineering!
materials science, and medicine/biology. The applications cover both discrete patterns in multi-
dimensional spaces of mixed quantitative and qualitative variables, and continuous patterns employing
concepts of potential functions and fuzzy entropies. Major achievements of entropy minimax modeling
include the first long range weather forecasting models with statistical reliability significantly above
Downloaded by [University of Bristol] at 02:45 26 February 2015

chance verified on independent data, the first models of fission gas release and nuclear fuel failure under
commercial operating conditions with significant and independently verified statistical reliability, and
the first prognosis models in coronary artery disease and in non-Hodgkin's lymphoma with significant
predictability verified on independent data. In addition, applications of entropy minimization and
maximization separately are reviewed, including feature selection, unsupervised classification. proba-
bility estimation, statistical distribution determination, statistical mechanics and thermodynamics,
pattern recognition, spectral analysis and image reconstruction. Comparisons between entropy minimax
and other methodolodies are provided, including sample average predictors, nearest neighbors
predictors, linear regression, logistic regression, Cox proportional hazards regression, recursive par-
titioning, linear discriminant analysis. mechanistic modeling, and expert (heuristic) programming.

INDEX TERMS: Cancer survival analysis, erossvalidation, data analysis, diagnosis, differential
diagnosis, entropy. entropy minimax. experimental design, failure analysis, feature
selection, fission gas release, forecasting, hazard analysis, heart disease survival
analysis, information theory, long range weather forecasting, materials defor-
mation, maximum entropy, medical diagnosis, medical prognosis, minimum entropy,
molecular evolution, multivariate statistics, nuclear fuel failure, pattern discovery,
predictive modeling, probability estimation, statistical modeling, statistical predic..
tion, survival analysis, system failure, systems modeling, tank lifetime analysis,
underground storage system leakage, weather forecasting, weight normalization.

I. INTRODUCTION

This is the second of two papers on entropy minimax. The first paper (1)56
reviewed the theory of entropy minimax as an information theoretic approach to
predictive modeling. This paper (II) reviews applications in three major areas:
• meteorology
• engineering/materials science
• medicine/biology
In each application area, the problem is one of predicting the future behavior of
a complex system based on limited, incomplete, uncertain and occasionally
contradictory information. Each is a real-life application using actual data, which,
in many cases, come from large scale projects extending over several years to
collect and manage the data. Some of the applications focus on survival of cancer
and heart disease patients. Others involve future rainfall supplying water for

227
228 R. CHRISTENSEN

domestic, agricultural and commercial uses. One involves rupture of nuclear fuel
pins in commercial power reactors. Another involves leaking of underground
storage tanks with possible risks to health and safety.
Because of the importance of these problems, there has been considerable prior
research in developing predictive models for them. A variety of methodologies,
including various forms of regression analysis, historical analog selection, mechan-
istic modeling and expert programming, have been used with varying degrees of
success and failure. Tn each case, the entropy minimax information-theoretic
approach has provided a margin of improvement in predictive performance when
assessed on independent data. Tn survival models for cancer and heart disease, for
example, the improvement is several percentage points gain in predictive accuracy.
The significance of the problem makes every point of improvement valuable. Tn
other cases, e.g., prediction of stress-corrosion cracking of fuel elements, the
improvement is even more dramatic. There is a wide gap between entropy
Downloaded by [University of Bristol] at 02:45 26 February 2015

minimax and the next best predictive model. In some cases, e.g., precipitation
forecasting at several months lead time, the entropy minimax models represent the
first time statistically significant predictive accuracy has been achieved on inde-
pendent data by any method.
The central focus of this paper is on applications in which both the mInImIZ-
ation and the maximization aspects of entropy minimax are utilized. This brings
together results not previously reviewed as a whole. Following the main lines of
this research to date, these results are discussed in three sections: meteorology,
engineering/materials science, and medicine/biology. A final section briefly sum-
marizes applications of entropy minimization and entropy maximization sep-
arately, on which there is already an extensive literature. (See Section V below for
references.)
Table I presents the results in applications for which categorical test/training
comparisons are available (percentages correct on binary classifications). On the
independent test data, errors of the entropy minimax models were consistently
lower than those of comparison methods. Tn 14 of the 18 cases, the accuracies of
the entropy minimax predictions were statistically significant at the 0.05 level or
better. In one case, the confidence level was 0.07. In another, it was 0.13. Tn two
cases, the results were indistinguishable from chance. (See Subsection 1T.B.7 below
for the definitions of assessment measures and confidence levels.)
The level of difficulty of these applications is indicated by the numbers of
independent variables and the magnitudes of the weight normalization compared
to the training sample sizes. The weight normalization, w, is an information
theoretic measure of the amount of noise in the data. See Subsection V.B.I below
for estimation procedures .and see paper (1)56 for theoretical derivations. Generally
speaking, w is the effective sample size of the background. The training' sample
must have a size of this magnitude in order to have a weight equal to that of the
background. This is the minimum needed to begin to achieve individual event
resolution. If the training sample has significantly fewer than w events, it is
unlikely that the accuracy of the model will significantly exceed that of a fixed
sample average predictor. The highest value in Table T, W= 120, occurred in a case
with training sample size of N = 82, for which the test data accuracy was 60%. The
greatest accuracies (> 80% correct) were achieved in cases with relatively low
weight normalizations.
For methodologies which are not protected against overfitting, such as step-wise
regression, a typical rule of thumb is that the training sample size must be at least
ENTROPY MINIMAX MODELING 229

TABLE I
Percentage correct of entropy minimax and of best comparison methodology for which test results
available on independent validation data. Also given are the number of independent variables, the
weight normalization (a measure of the amount of noise in the data), and the training and test sample
sizes. The two-tailed confidence levels of the entropy minimax results are given in parentheses (the
probability of achieving the observed classification accuracy by chance alone).

Data inform. charaet.


Sample sizes Percentage correct on test data
Wght.
Application No.IVs norm. Trgn. Test Com par. method Entropy minimax

Meteorology
N.CA ann. precip. 42 100 96 30 40% 70"/, (0.07)
W.OR winter precip. 24 70 92 37 45% 74% (0.01)
E.WA spring precip. 20 70 93 16 57% 69% (0.13)
E.WA winter precip. 42 70 72 37 35% 69% (0.03)
C.AZ winter precip.
Downloaded by [University of Bristol] at 02:45 26 February 2015

53 70 86 45 43% 71% (0.04)


C.AZ summer temp. 22 120 82 44 53% 60% (0.29)
Engineering/materials science
Cracking activity 33 40 64 63 50% 78% «0.01)
Crack width 33 40 56 56 54% 71% (0.01)
DiametraJ expansion 33 40 66 65 50";' 71% «0.01)
Gas release 42 25 Ill' 28' 77% 88% (<0.01)
Cladding rupture 76 40 1,707 1,695 68% 85% «0.01)
Rod bow 32 70 77 78 45% 55% (0.38)
Medicine/biology
Survival in CAD 62 30 515 524 61% 71% «0.01)
Survival in NHL 26 50 218 110 65% 69% «0.01)
Cervical spine diag. 31 6 76 38 NA 94% «0.01)
Drug screening 34 3-10 114 101 63% 64% (0.03)
Iris classification 4 3b 75 75 97% 99% «0.01)
Adenyl kinase class. 5 2b 648 194 87% 90% «0.01)

"Total sample of 139 in a randomized round-robin splitting 80%:20"10 between training and test.
"No trial crossvalidauon.

five times the number of IVs. Thirteen of the 18 cases in Table I would fail to
qualify under this restriction. Entropy minimax, on the other hand, which
inherently compensates for the likelihood of chance correlations, achieved statisti-
cally significant accuracy (at the 0.10 level or better) in 10 of these 13 cases. The
comparison methods achieved this level in only 2 of the 13 cases.

II. METEOROLOGY

A. Background
Long range weather forecasting poses a challenge which tests any methodology for
modeling the behavior of complex systems with incomplete information. Consider-
ing the fact that a significant portion of the globe may contribute effects to any
specific region's future weather over time frames of months, seasons, or years, the
potentially relevanl ocean-atmospheric variables number in the tens of thousands
or more. This is further compounded by the fact that each such variable, say the
atmospheric pressure at a particular station or the sea-surface temperature
230 R. CHRISTENSEN

averaged over a specific zone, is not one number but rather a time series of
numbers extending back through the historical record.
Although the set of available meteiologically relevant independent variables
(IVs) is very large, it is nonetheless incomplete. Furthermore, which variables are
relevant at any reasonably specific level of effect on the dependent variable (DV) of
interest, say precipitation or temperature, is not entirely clear. Even for variables
that are known to be relevant and available, data are often incomplete, suffering
gaps, uncertainties and inconsistencies. No records are available on some variables
for many years, particularly in the 19th and early 20th centuries, and during war
years, but also at a scattering of other times. In some cases, a record is kept but its
meaning changes discontinuously from one year to the next, for example, when a
building is constructed near a temperature gauge or when the position of a rainfall
gauge is changed. Often the recorded information does not conveniently inform us
of such changes and we can only suspect them from apparent shifts in the time
Downloaded by [University of Bristol] at 02:45 26 February 2015

series itself.
In order to have a proper frame of reference for assessing weather forecasting
models, it is necessary to have some understanding of what levels of predictability
may reasonably be expected of models and what success is achieved by classical
methodologies. Thus, we begin with a brief review of predictability and a
discussion of common meteorological predictors.

1. Predictability
Sensitivity to variations The possible predictability of weather factors, such as
precipitation and temperature, has been studied from two points of view. The first
is an analysis of the sensitivity of future weather factors to slight variations in
inputs to "general circulation" modelsl46.218 designed to simulate mechanistically
the lluid mechanics and thermodynamics of the ocean-atmosphere system. Beyond
3-5 days into the future, the outputs from these models have significant un-
certainties; beyond about 10 days they become highly unreliable; and beyond 15-
20 days it is virtually impossible for such models to track actual weather
patterns. 75.234 Although all we know for certain is that the error growth with
increased lead time is a characteristic of the specific models studied, this is
generally interpreted as indicating the actual sensitivity of future weather patterns
to slight changes in past conditions, and thus as imposing more-or-less "funda-
mental" limitations on preditability. This does not mean, however, that it is
impossible to predict anything about the weather with lead times longer than, say,
20 days. There exist predictable generalized characteristics totalled (or averaged)
over wilder geographic regions or longer time periods that are of great practical
importance.187.234 For example, as of Jan. 15, 1999, total precipitation in
Sacramento, California on January 15, 2000 may exceed such limitations, but
averaged total precipitation of a broader region for the month of January 2000
may not.

Actual-to-natural variability comparison A second approach, taken by


Leith164.165 and others/79.18o to the predictability problem has been to study the
relationship between the actually observed variability of long period (e.g., monthly
or seasonal) averages to what variability would be expected given the daily variability
and the lag-one-day autocorrelation.
ENTROPY MINIMAX MODELING 231

Actual variability for period T


a~ = observed interannual variability of means for the selected weather factor
(temp., precip., etc.) for a selected "long" period T (e.g., a particular month
or season)
I n
=- L (Xi- X)2,
ni= 1

where

Xi = mean for the selected period T in year i,

I n
X =- LXi'
ni= 1
Downloaded by [University of Bristol] at 02:45 26 February 2015

"Natural" variability for period T


a} = variability due, via lagged autocorrelations, to short period climatic
noise,163-165 e.g., day-to-day variability, assumed to be not predictable at
long times
<Xl

= J S(w)H}(w) dto,
-<Xl

where

co = frequency,
S(w) = input power spectral density function (estimated from short period, e.g.,
daily, data),
H}(w) = power transfer function from the short periods (e.g., days) to the long
period T (e.g., a month or a season).
,
Fsratio
The predictability is defined as the ratio of the actual to the "natural" variability:

F = a~/a}.
F -ratios of 1.5 and 2.0 indicate, for example, that it is potentially possible to
explain 33% and 50%, respectively, of the variance of the long period means. For
convenience, the percentage E of potentially explainable variance can be used in
place of the F-ratio. It is simple E=(l-l/FJ x 100%.
The numerical values of E not only differ with the variable (e.g., precipitation or
temperature), but also with the specific time period (e.g., month or season). The
general picture 8,147.180.226 is that the percentage of potentially explanable
variance in monthly averages, for example, in the Western U.S. for precipitation
and temperature are about 30-40% and 40-60%, respectively. In the Midwest, they
are about 0-10% and 20-30%; in the Appalachian mountains, 30-60% and 30-40%;
and in the East coast and Gulf regions, 0-10% and 20-30%.
Although the F-ratio is a useful device to obtain a rough idea of the relative
predictability of weather factors in different regions, it may not be an accurate
measure of the true predictability.
232 R. CHRISTENSEN

• Being defined in terms of squared deviations from the mean, F-ratios focus
heavily on being able to predict outlying extreme years. Thus, they may
significantly under-represent the predictability of discrete categories such as
"high," "medium" and "low." For example, one could make a statistically
significant percentage of correct category predictions and yet satisfy a low F-
ratio by being wrong on a small number of extreme cases.
• The F-ratio weights equal magnitudes of error equally, regardless of the
magnitude of the reference. For example, a IOcm error in precipitation for a
20cm observed value is given the same weight as for a IOOcm observed value,
even though the error is half the observed value in the one case and only 10%
in the other. Thus, the F-ratio may not pick up useful predictability of one
range of values because of fluctuations in other ranges.
• The F -ratio ignores the sign of the error. However, a drought, for example,
Downloaded by [University of Bristol] at 02:45 26 February 2015

may be much more serious than abnormally high precipitation. In such a


case, dry years, which have negative deviations from the norm, may have
useful predictability despite poor overall predictability because of large
random variations in wet years, or vice-versa.

2. Common meterological predictors


There are a number of common predictors against which weather forecasting
models may be compared. These include climatology, persistence, linear regression
and historical analog. None of these has proved satisfactory for long range
forecasting.
Climatology One of the most important predictors is climatology. Although
climatology could be defined as a long-period moving average, it is more typically
formulated to make the same prediction every year with an occasional updating.
Assume, for example, that the dependent variable being predicted is the total
spring rainfall in a particular geographical region. A climatological predictor may
use values averaged over, say, the past 50 years. The number of climatological
"base-line" years is set by balancing the objectives of reducing variance of the
mean against emphasis on the most recent data.
The climatological predictor, if formulated as a fixed number equal to the mean,
obviously explains none of the interannual variability. However, if the mean for
the periods being predicted differs from the mean for the periods over which the
climatological average is defined, then this predictor may yield nonzero predicted
variance (it may be either positive or negative, depending upon the relation of the
old mean to the new distribution).
Persistence A persistence predictor makes a new prediction each year by carrying
the conditions of the immediately previous period forward into the period being
predicted. (When carrying conditions of one season into the next, weather factors
are defined as deviation from or ratios to seasonal norms.) Obviously, many
predictors can be constructed intermediate between persistence and climatology.
Over very long periods, say hundreds of years, the two have the same expected
values, climatology being the more conservative and persistence being more risky
in attempting to particularize each prediction to its specific period. Neither
predictor, however, uses any "concomitant" information, such as meteorological
ENTROPY MINIMAX MODELING 233

variables from other regions, that may be related to the dependent variable being
predicted.
The performance of the persistence predictor for seasonal mean temperatures in
the U.S. has been studied by van Loon and Jenne"?" and Namais.I'" Percentages
of explained variance are generally low. The highest values (5-20%) occur in
various coastal areas, depending upon the season, and the figu:es are very low
(0-5%) over the bulk of the U.S. For precipitation forecasting, the persistence
predictor performs even more poorly than for temperature forecasting.
Linear regression Using a linear predictor methodology": 73.178.20( and coeffi-
cients with a yearly periodicity, models have been built to predict monthly and
seasonal mean atmospheric temperatures in the U.S. using IVs consisting of
contemporaneous or prior sea surface temperatures (SST) or sea level pressures
(SLP).8 [If IVs are used from the same time interval as the DV, then the process is
referred to as "specification" rather than "prediction."] Models were trained on the
Downloaded by [University of Bristol] at 02:45 26 February 2015

period 1902-1970 at four different lag times: 0, I, 2 and 3 seasons. At O-season lag
time (specification rather than prediction), the best SST variables were found to
explain 10-30% of the training data variance in mean seasonal temperatures for
west coastal regions, and zero in the midwest. At I-season lag, the figures drop to
5-20% in coastal regions and the zero-skill area broadens. Specification variance
explained was somewhat higher when SST was replaced by SLP, but the figures
similarly degraded with seasonal lag. The test period 1971-1979 was used for
independent validation and the results were considered to be not inconsistent with
the training data percentages, although 9 years is too short to attribute statistical
significance at the low skill levels observed.

Historical analog Another common weather prediction technique is use of


historical analogs. A number of "predictor" features are defined for the data
period, say ending December 31, 1984. These are compared in some way to the
same periods for previous years and the closest match is selected. The forecast, say
for January 1985, is then taken to be the outcome observed in the January
following the closest match period.
There are, of course, many ways in which this nearest-neighbor method of
weather forecasting may be implemented. Most have been for short or medium
range forecasting, e.g., 12-60 hour forecasts.P? One longer range implementation
by Barnett and Preisendorfer'? involved the computation of empirical orthogonal
functions (EOFs) (this is the terminology used in meteorology to describe what
elsewhere are referred to as principal components 'V) from the time history of a
47-variable descriptor of the global climate state. A modification of this method
has been tested by Livezey, Maisel and Barnston V'" in forecasting (with zero lead
time) 30- and 90-day outlooks for average temperature and total precipitation.
Skill levels for the long range precipitation analogs are uniformly indistinguish-
able from random forecasts. For temperature analogs, skill levels typically
fluctuate from - 5% to + 5% for spring and summer, from 0% to + 7% for fall,
and from +5% to + 15% for winter, (see Subsection n.R7 below for the definition
of skill. A skill of + 15%, for example, corresponds to a 57.5% accuracy of an
"above or below median" predictor.) The most important variables come from
North Atlantic and North Pacific sea surface temperatures. The low skill levels are
consistent with the large variances of EOF estimators using small samples. 231
234 R. CHRISTENSEN

B. Long Range Forecasting Modelsfor Precipitation and Temperature


I. Annual precipitation in Northern California at 2 month and 6 month lead times
Entropy minimax was first applied to developing weather forecast models in a
study prompted by the 1976-1977 drought in California. The study region was
defined as extending roughly from Sacramento to Tahoe in Northern Califomia.P?
Precipitation over this region supplies much of the water for the Central California
agricultural area. The dependent variable was defined as an average of the total
annual precipitation at weather stations selected for their long term meteorological
records. Figure 1 shows a plot of this Seven Station Precipitation Index, SSPI,
from 1852 through 1977. (The historical data begin about 1850 with the California
gold rush, and by 1852 there are sufficient numbers of regular records to start the
time-series.) .
Downloaded by [University of Bristol] at 02:45 26 February 2015

100,---------------------------------,

:z:
o
~
I-
0..
W
I..LJ
0::
0.. AVERAGE
:z:
o
I-
«:
l-
V)

r-,

10 20 30 40 50 60 70 80

WATER YEAR
FIGURE Seven station precipitation index (SSP I) for Northern California for the period 1852-
1977.

Since each year corresponds to an SSPI event, the total 'dataset contains 126
events. This dataset was randomly divided into two subsets of 63 events each,
subject to a constraint equilibrating the overall averages of the SSPI for the two
portions. One subset was used for model building, the other was reserved for
model verification. Because of the constraint, one degree of freedom must be
accounted for in significance testing.
ENTROPY MINIMAX MODELING 235

Independent variables were formed by applying filters to meteorological time


series. Over 100,000 time series were available in the historical data (atmospheric
variables, sea surface temperatures, stream flow records, ground moisture indices,
tree ring measurements, sunspot counts, etc.) and over 100 different filters were
considered. After preliminary screening on the basis of data completeness and
quality, and considering possible relevance to California precipi, 'Ition, the list was
pruned down to 87 time-series and 15 filters, for a total of 1305 candidate
independent variables.
-The 87 time-series included the yearly values for sea surface temperature
variables for 52 different season-zones;'] atmospheric variables (average temper-
ature, average barometric pressure and total precipitation) for 31 different loc-
ations extending from Japan to Alaska, California, Chile and Australia; a selected
stream flow reading in California (the Cosumnes River at Michigan Bar was
selected as relatively free of man-made influences such as dams); a tree ring index
in northern California (selected for long-term natural setting); and the sunspot
Downloaded by [University of Bristol] at 02:45 26 February 2015

count.
The 15 filters were all linear combinations of values for years (i, i-I, i - 2, ...)
preceding the year (i+ I) being predicted. Five of the filters were differential,
having a long term expected value of zero (e.g., moving difference= 11;- 11;-1'
moving biannual difference = 11;+ 11;-1- 11;-2- 11;-3' etc.). The remaining 10 were
integral, having nonzero expectation value (e.g., l-year cycle = 11;, 2-year cycle = 1I;~1>
moving average =(11;+ 11;-1)/2, linear extrapolar=211;- 11;-1' etc.). Note that for
an annually defined dependent variable, the l-year cycle filter is a persistence
predictor.
The 1305 variables were subjected to a preliminary screening by computing
correlational statistics for the 63 events in the model building sample:
• correlation coefficient r for the DV (future SSPI) and each IV (filtered time-
series). [This is called a "lag correlation," the lag specifying the interval by
which the DV is in the future relative to the IV.]
• ratio of Irl to an estimate of the correlation coefficient's standard deviation a
(estimated by Monte Carlo shuffiing of the DV values).
The lag correlations for the 1305 variables were found to be rather low. In
absolute value, the bulk were in the range 0.0--0.1, fewer in the range 0.1-{).2, fewer
still in the ranges 0.2-{).3 and 0.3-{).4, and very few larger than 0.4. There were,
however, significantly more in the higher ranges than would be expected from a
random distribution of correlation coefficients. Thus, if a reasonably large number
of the relatively highly correlating variables are selected for model building, there
is a good chance of enhancing the signal-to-noise ratio. Because of the modest
values of even the highest correlations, however, it is important not to try to focus
on a few variables, since that runs too high a risk of picking up purely chance
correlations. Thus, it is necessary to violate the restriction of many modeling
methodologies of using a very limited number of variables,3.93.133 say fewer than
one-third, one-fifth, or one-tenth the number of data points. (Use of many
variables confers no special advantage when assessment is on independent data
not used in model building.) Candidate variables with Irl/u above 2.0 were

tAn example is the average January-March sea surface temperature deviation from normal for
So-120oW longitude, o-lOoN latitude.
236 R. CHRISTENSEN

considered to have a correlation sufficiently above chance expectation to qualify


for inclusion. Variables for geographical locations in or near the seven station
region of California tended to outperform variables for more distant locations,
although some locations in Mexico and Australia and other areas were significant.
Integral filters tended to outperform differential filters. (Integrally filtered time
series are affected less by interannual fluctuations but more by generic changes
such as gauge repositioning, instrumentation modifications, etc.). The best single
filter was the linear extrapolator; the second best was the moving difference; third
and fourth were cycle filters.
Based upon the correlational analysis and further review of data quality and
possible relevance to the SSPI, a total of 42 features were selected. (Actually one
IV, the previous year's sunspot count, was included among those selected despite
its low correlation in order to make available to the pattern discovery process at
least one representative of each kind of data.)
Downloaded by [University of Bristol] at 02:45 26 February 2015

For purposes of entropy minimax pattern discovery, the DV was reformulated


as discrete bits of information from the continuous SSPI. This has the advantage
of helping to insulate the analysis against random fluctuations and outliers by
focusing entirely upon the major category differences. The first bit of information
used was whether the SSPI was above or below its model building median.
Accordingly, all years were classified as either "wet" or "dry". (The median is used
rather than the mean because of skewness in the distribution which is bounded
below but hot above.) .
Using the selected 42 features, trial crossvalidation runs were performed on
random subsets of'the 63 event model building data. Based on these runs, the a
priori weight normalization was set at W= 100. This relatively high value for w,
which exceeds the total model building sample size, is indicative of the amount of
noise in the meteorological system relative to the variables and data being used.
The value of w being in the order of 100 was the first really hopeful sign. If it had
turned out to be very low, say 10 or less, then entropy minimax may not have
performed much better than some straight rule approach such as historical
analogs using frequency estimation. If it had turned out to be extremely large, say
1,000 or greater, then the situation would have been hopeless for a model building
sample of only 63 years.
As it turned out, three patterns were found, one for wet years and two for dry
years. The lead time between the last datum referenced by an IV in a pattern and
the beginning of the DV period is 2 months. Thus, all the data are available to
make predictions about outcomes 2 or more months in the future. In about one-
third of the cases, the conditions for "wet" pattern match can be established 2
years before the event.
The three patterns found for next year's SSPI wet/dry status are:
No.1: P w et = 0.59 ± 0.11
• normal or high Pacific Ocean surface temperatures 2 summers ago in the
western portion of the ± 10° equitorial belt, and
• normal or high sea surface temperatures 3 springs ago in the northeastern
portion of this belt, and
• moderate or low precipitation at Nevada City 2 years ago.
No.2: Pw,,=0.22±0.15
• not pattern no. I, and
ENTROPY MINIMAX MODELING 237

• steady or declining (this year minus last year) winter sea surface temperatures
in the northeastern equitorial belt, and
• normal or high sea surface temperatures in the western equitorial belt last
year, and
• high precipitation at Colfax this year, and
• low tree ring growth (moving biannual difference) in the Truckee area.

No.3: p we . = 0.39 ± 0.10


• not patterns no. I or no. 2, and
• low precipitation at Placerville this year, and
• declining precipitation at Mazatlan (this year relative to last year).

The details of this model are not unique. They depend on the precise selection
Downloaded by [University of Bristol] at 02:45 26 February 2015

of features and model building years. If a large number of models were built with
varying inputs, we would obtain a cluster of low-entropy models of which the one
given above is an example. However, sharing similar informational properties,
their predictive capabilities would be similar. When tested on the 63 years reserved
for model verificanon.r ' the above pattern set makes correct predictions in 31,
incorrect predictions in 18, and 14 years do not match any of the three patterns.
See Table II. 6 7 The accuracy of (31/49) x 100%=63% has a statistical significance
of 1- (l( = 0.94 against the null hypothesis of arising by chance. This is compared,
for example, to 50% accuracy for the climatological average (32 of the 63 training
years were wet), and 55% for the persistence predictor. When only extreme years
are considered (more than about one standard deviation either wetter or dryer
than normal), the difference is even more pronounced. The persistence predictor
accuracy becomes 56% and the entropy minimax pattern rise to 78% correct.
By comparison, the accuracy of the persistence predictor with zero lead time is
55% on all years and 56% on extreme years. (Note: Skill ratio is computed as

TABLE II
Predictive performance for Northern California annual precipitation (63 randomly
selected verification years).

All years Extreme years

Climatological average
Percentage correct 50% 50%
Persistence (0 lead time)'
Percentage correct 55% 56%
Statistical significance 55% 50%
Variance explained" 2% 4%
Skill 10% 11%
Entropy minimax patterns (2 month lead time)
Percentage correct 63% 78%
Statistical significance 94% >99%
Variance explained (3 categories)' 20% 36%
Skill 27% 56%

"whetber SSPI for previous year was above or below median.


bPre.1865 years were dropped for variance calculations because of questionably large values which would have
dominated the squared differences.
238 R. CHRISTENSEN

K =(M - C)!(N - C), where N = total number predictions, M = number correct


predictions, C=climatological average. See Subsection II.B.7 below for compu-
tational procedures.)
Despite its success, there are several difficulties with' the original California
patterns. First, they are incomplete. The three patterns found cover only about
three-fourths of the verification years. In about one-fourth of the cases no pattern
is matched and one must fall back on something else such as persistence. Second,
one of the features used (in a. "dry" pattern) is a tree ring index. The rationale for
its inclusion in the database was that tree ring growth is a biological integrator of
local precipitation and temperature, and thus constitutes a long term record of
meteorological as well as soil and biological factors. However, tree ring data must
be "calibrated" to eliminate such effects as age, and calibrated indices are not as
readily and regularly available as are standard meteorological factors. Third, the
random division of the total dataset into model building and verification and
Downloaded by [University of Bristol] at 02:45 26 February 2015

subsets resulted in an intermingling of the building and verification years. Since


the filters reference years prior to each DV year, this means that verification years
will contribute to the IV data even if they are not used in the DV data of the
model building subset. Actually, there is no informational "contamination" of the
model building data, since all the IV data for each individual DV strictly precede
that particular DV. (Furthermore, under the zero-correlation null hypothesis,
against which the patterns are being validated, the lagged intermingling cannot
give the patterns any inherent advantage.) However, the statistical independence of
the results from the intermingling requires some thought and it would be good to
have a design in which the building and verification datasets are more obviously
separated to establish the predictive skill. Fourth, only "wet" vs. "dry" is predicted.
Although this may be important for drought contingency planning, a more refined
prediction would be useful for general agricultural planning, water resources
management, electric power demand estimation and other purposes.
The first three of these difficulties were overcome with a revised Northern
California model. 83 Attempts to overcome the fourth difficulty were unsuccessful
and it was concluded that more data would be needed to resolve SSPI predictions
into finer categories with adequate statistical significance. (Subsequently, a different
approach to the problem of increasing forecast specificity was taken with some
success for a Central Arizona model, described in Subsection II.B.4 below.)
In building the revised California model, the 30 most recent years (1948-1977)
were reserved for model verification and the earlier 96 years (1852-1947) were used
for model building. Because no constraints are employed in this sequential dataset
splitting, no degrees of freedom need to be accounted for in assessing statistical
significance. (The disadvantages of omitting recent years from model building are
that the recent data tend to be more accurate than older (e.g., 19th century) data,
and that such a split may mask any large scale climatic shifts in recent years and
thus unnecessarily withhold potentially useful information from the model building
process.)
The number of patterns found was the same as before, three. However, this time
the tree ring index was not used and, furthermore, all 30 verification years were
classified. (Some of the nonmatches on the first pattern set were due to missing
data entries for the earlier years.) The minimum lead time for any pattern in the
revised set is 6 months, although in about one-third of the cases a pattern match
may be established at a 12 month lead time.
ENTROPY MINIMAX MODELING 239

Statistically, the results of the revised model are similar to those of the original
model. See Table III. For all years, 67% of the entropy minimax predictions are
correct, a result with a 93% two-tailed significance against chance. For extreme
years, the percentage correct is 70%, which also has a 93% significance. Consider-
ing the F-ratio statistic for precipitation variability in the Western United States,
these three patterns predict about half of the overall SSPI variance that it is
possible for any model to predict. (The performance of the climatological predictor
is computed assuming equi-frequency wet: dry predictions-the model building
data were split evenly 48:48, while the verification data were 18:12 for all years
and 11:9 for extreme years.) When the predictions are assessed on an individual
pattern basis, the dry patterns turn out to have higher accuracies than wet
patterns, suggesting that drought may be a more predictable phenomenon in
California than heavy precipitation.
Downloaded by [University of Bristol] at 02:45 26 February 2015

TABLE III
Predictive performance for Northern California annual precipitauon (revised model
verified on the 30 most recent years).

All years Extreme years

Climatological average
Percentage correct 50% 50%
Percentage (0 lead time)
Percentage correct 47% 40%
Statistical significance neg neg
Variance explained neg neg
Skill -7% -20";'
Entropy minimax patterns (6 month lead time)
Percen tage correct 67% 70%
Statistical significance 93% 93%
Variance explained (2 categories) 9% 16%
Skill 33% 40%

Two attempts were made to build models giving more detailed predictions. In
the first, the SSPI was categorized into three equipopulated groups: low, medium
and high. In the second attempt, two unequal categories were used: lower one-
third and upper two-thirds. In both of these cases, the patterns failed to
crossvalidate on the model verification dataset. It was concluded that the 63/2=31
years per category of the equipopulated binary split was just enough to yield
patterns detectable in the noise, and that reducing this to 63/3=21 years per
category gave insufficient data. Both of these attempts were based on the original
63:63 building/verification split. On the 96:30 revised split, a 3-way categorization
would give 32 years per building category but only 10 years per verification
category. Thus, it may be possible to build a valid low/medium/high model, but,
as yet, there are insufficient data to demonstrate its validity.
Although progress toward a demonstrable 3-category resolution may be
achieved by further analysis (including further work on the list of independent
variables), there will still remain the basic limitations of <lata quality and sample
size. Cleaning out bad grid points in historical sea surface temperature data and
240 R. CHRISTENSEN

recalibrating meteorological time series are important and can help raise the
effective sample size. But how do we lengthen the historical record? One way is
simply to wait. Thirty years from now we will have 30 more data points. Another
way is to try to reconstruct the earlier history. The fact that an analysis technique
such as entropy minimax does not require precise values to obtain useful
information enhances the value of such reconstruction efforts. We do not know the
SSPI for 1848, for example. If someone is able to determine from some source
such as entries in Spanish mission Jogs, agricultural records, or tree rings, for
instance, whether 1848 was a wet, moderate, or dry year in Northern California
with reasonable confidence, that is all that is needed to -add another point to the
database. ::

.'
2. Winter precipitation in Western Oregon at 7 month lead time
In analogy with the Northern California SSPI, a Western Oregon Precipitation
Downloaded by [University of Bristol] at 02:45 26 February 2015

Index (WOPI) was defined as a multi-station average.?" Stations in the Willamette


Valley of Oregon were selected for lengths of historical record and geographical
dispersion throughout the region. Because Western Oregon precipitation exhibits a
strong monomodal pattern, roughly 75% coming in the 5 winter months
November-March, the dependent variable was defined for this period rather than
the entire year. The minimum lead time between the latest point in an IV and the
earliest point in the DV was set at 7 months. ,.
The total dataset included 129 years (1854-1982)." The most recent 37 years
(1946-1982) were reserved for model verification and the earlier 92 years (1854-
1945) were used for model building. (The number 37 was determined by
considering the sample size needed to obtain a significance level of at least 90% for
an accuracy expected to be comparable to that obtained in Northern California.)
A total of 994 time series were identified as candidates with potential relevance
to Western Oregon precipitation. These included 560 atmospheric temperature,
pressure and precipitation variables, 430 sea surface temperature variables, and 4
volcano eruptive tallies (volcanic activity deposits dust in the upper atmosphere
with possible subsequent effects on weather). Sunspot counts, which were found to
contain no useful information in Northern California, were not included.
The list of filters was expanded to include seasonally defined filters and to
include moving averages of annually defined integral filters. For purposes of IV
filtering, the seasons were defined on a calendar quarter basis: "Winter" = January-
March, etc. The idea behind the use of moving averages of integral filters was to
give these filters some "current time frame only" protection against generic
changes, while retaining their ability to smooth over year-to-year fluctuations. A
period of 20 years for the averaging process was selected by balancing the
increased insulation from generic changes of shorter periods against the increased
insulation from random fluctuations of longer periods.
Feature selection was again based on the lag correlation r, However, to improve
analysis efficiency at negligible sacrifice in statistical reliability, the Monte Carlo
estimation of (J for the correlation coefficient was replaced by the computation of
t = rJ(n - 2)/(1- r2 ) , which has a Student's-t distribution for normal, uncorrelated
populations.P" The distributions involved generally depart from normalcy, but t-
statistic significance levels are relatively robust under such departures.J'" For the
t-distribution, values of It I = 1.6 and 2.5, for example, correspond to significance
levels of 88% and 98% respectively.
ENTROPY MINIMAX MODELING 241

Each filter was applied to the entire set of 994 time-series, and the resulting
distribution of values of It I was compared with that expected by chance. Eight
filters were found to yield a significantly greater than random proportion of high
ItI values (significant at the 88% level). These included the 1-, 2- and 3-year cycle
filters, the linear extrapolater, the moving biannual difference, the most recent
Winter-Fall difference, and the 20 year moving averages of the 1- and 2-year cycle
filters.
A total of 24 features were selected. Trial crossvalidation within the 92 model
building years produced a value of w = 70 for the weight normalization.
Rather than run an unconstrained pattern search as was done in developing the
SSPI model, two constrained pattern searches were run for WOPI. The constraint
on one search was that the first pattern in the sequence must match predomi-
nantly dry years; and the other search was constrained to start with a wet pattern.
The dry-first and wet-first sequences are labeled "0" and "W", respectively.
Predictions are made by amalgamating the probabilities and uncertainties
Downloaded by [University of Bristol] at 02:45 26 February 2015

associated with the patterns matched, one from each sequence. In forming the
amalgamation, patterns are weighted according to the number of events matching
them in the model building data.
A total of 7 patterns were found, 3 in the D-sequence and 4 in the W-sequence.
Variables used in the patterns are precipitation at Havana (Cuba), temperature at
Darwin (Australia), sea surface temperature in the vicinity of Antofagasta (Chile),
precipitation at San Jose (Costa Rica), and precipitation in the Willamette Valley
itself during preceding time periods.
Results on the 37 years reserved for model verification are shown in Table IV.
The entropy minimax patterns at 7 month lead time are 74% correct on non-close-
call predictions. (A probability 0.475 < P < 0.525 is defined as a "close-call.") By
comparison, the persistence predictor at zero lead time has 45% accuracy. (The
explained variance of the patterns is negative despite the high categorical accuracy
because of errors on a few extreme years that dominate the squared deviations.)

TABLE IV
Predictive performance for Western Oregon winter (November-March) precipitation
(verification on 37 most recent years).

Excluding
All years close calls

Climatological average
Percentage correct 50% 50%
Persistence (0 lead time)'
Percentage correct 41% 45%
Statistical significance neg neg
Variance explained -2% -3%
Skill -19% -10%
Entropy minimax patterns (7 month lead time)
Percentage correct 79% 74%
Statistical significance >99% 99%
Variance explained (4 categories) 4% -6%
Skill 58% 48%

'whether western Oregon precipitation in preceding 5·month interval (June-October) was above or below its
median.
242 R. CHRISTENSEN

3. Spring and winter precipitation in Eastern Washington at 6 and 7 month lead


times
Nine stations were averaged to form the Eastern Washington Precipitation Index
(EWPI), 7 spread over the eastern half of Washington and 2 in Northwestern
Idaho."! Precipitation in Eastern Washington is bimodal, with about 60% in the
winter months (November-March) and another 25% in the spring (April-June). So
two EWPI. indices were defined, EWPI-W and EWPI-S for winter and spring,
respectively.
The same 37 year time period (1946-1982) was reserved for model verification as
for Western Oregon. However, because of the larger number of gaps in historical
records for Eastern Washington, the model building dataset contained only 73
years (1873-1945) for spring and 72 years (1874-1945) for winter precipitation. (An
attempt to reconstruct the 1854-1873 period by use of tree ring indices failed
because of low correlations with the seasonal precipitation indices for the region.)
Downloaded by [University of Bristol] at 02:45 26 February 2015

Trial crossvalidation using subsets of the model building data yielded a


normalization of w = 70, the error minimum occurring in a relatively irregular and
shallow bottom over the range 60-90.
Forty-two features were selected for winter precipitation pattern searching. They
included 3 filterings of the prior EWPI-W, and a variety of precipitation,
barometric pressure, volcanic activity, and atmospheric and sea surface temper-
ature variables (including latitudinal gradients of SST). Twenty features were
selected for spring precipitation pattern searching. These included 5 filterings of
the prior EWPI-S time series, 3 other precipitation variables, 3 volcanic activity
variables, 7 sea surface temperature variables, and 2 barometric pressure variables.
Four patterns were found for each sequence (D & W) for both spring and winter
precipitation, for a total of 16 patterns. Their details are given in Christensen and
Eilbert."!
The performance of the patterns is given in Table V in comparison to that of
climatology and persistence. The 69% correct percentage of the winter patterns is
statistically significant at the 97% level. The predictive performance is the same
when close-calls (CC) are omitted. The 54% figure for the spring patterns on the
37 verification years is, however, indistinguishable from chance. By comparison,
the persistence predictor for winter precipitation at lead times of 0, 3 and 6
months has negligible or negative skill. (The negative skill of the persistence
predictor at 6 month lead time is sufficient to give some indication of a "reverse
persistence.") The persistence predictor for spring precipitation has statistically
significant skill at zero lead time, but negligible skill at 3 and 6 month lead times.
A year-by-year analysis of the performance of the spring patterns reveals that
they do fairly .well (67% correct) during the first half of the 37 years and poorly
during the second half (43% correct). This suggests that there may have been some
large-scale changes, in the early 1960s necessitating a retraining of the pattern
series, or that the probability distributions of physical processes affecting spring
precipitation in Eastern Washington are generally less stationary than those for
winter precipitation, requiring retraining of spring patterns after roughly 15 years.
(With nonstationarity, one may either enlarge the feature set in an attempt to pick
up variables accounting for the changes, or retrain at a frequency dependent on
the length of intervals of approximate stationarity.)
Table VI shows the results of using retrained patterns for the second half of the
validation period. The data cutoff was taken as 1960 for the 1964-1982 validation
period in order to preserve the sequential building/validation separation, since
ENTROPY MINIMAX MODELING 243

TABLE V
Predictive performance for Eastern Washington spring (April-June) and winter (November-March)
precipitation (verification on 37 most recent years). Results are given both for all years and for years
excluding CC (close calls).

Spring (25% of total) Winter (60% of total)

All years Exc!. CC All years Exc!. CC

Climatological average
Percentagecorrect 50"10 50% 50% 50%
Persistence (0 lead time)'
Percentage correct 68% 67% 41% 41%
Statistical significance 97% 92% neg neg
Variance explained 16% neg neg
Downloaded by [University of Bristol] at 02:45 26 February 2015

1%
Skill 35% 33% -19% -18%
Persistence (3 month lead time)"
Percentage correct 54% 48% 51% 50%
Statistical significance 38% neg 13% 0%
Variance explained 2% -2% neg neg
Skill 8% -4% 3% 0%
Persistence (6 month lead time)'
Percen tage correct 49% 44% 38% 35%
Statistical significance neg neg neg neg
Variance explained -3% neg neg neg
Skill -3% -11% -24% -29%
Enrropy minimax patterns (6 month lead time)
Percentage correct 56% 54% 69% 69%
Statistical significance 50"10 31% 97% 97%
Variance explained (categ.) -8% -10% 13% 13%
Skill 11% 8% 37% 37%

"whether Eastern Washington precipitation in previous equallcngth interval, (January-March) or (June-October). was above or below
median.
"Predictor periods: previousOctober-December for spring DV, previousMarch-Julyfor winter Dv.
"Predictor periods: previousJuly-September forspring DV, previous December-April for winter Dv.

TABLE VI
Spring precipitation in Eastern Washington, using patterns based on
1874-1945 for predicting 1946-1963, and patterns based on 1874-1960
[or predicting 1964-1982 (excluding close-calls).

Entropy minimax patterns 1946-1963 1964-1982

Percentage correct 67% 69%


Statistical significance 75% 87%
244 R. CHRISTENSEN

some of the filters use data as early as 3 years preceding the predicted year. (It was
unnecessary to do this for the 1945 cutoff, since World War II put a natural gap
of several years in most Pacific Ocean data records during this period.rt The new
patterns perform better on the 1964-1982 period than the original patterns, 69%
correct compared to 43%, although the small sample size (16 years after
eliminating close-calls) gives a relatively modest statistical significance (87%) to
these patterns.

4. Winter precipitation in Central Arizona at 2 month lead time


The total dataset available on Central Arizona precipitation consisted of 131 years
(1850-1980). The most recent 4 years (1977-1980) plus 41 randomly selected-years,
for a total of 45, were reserved for model verification. The remaining 86 years were
used for model building. The building/verification split was constrained to
equilibrate overall averages, so one degree of freedom must be accounted for in
Downloaded by [University of Bristol] at 02:45 26 February 2015

assessing statistical significance. The region defined by the Central Arizona Winter
Precipitation Index (CAWPI) was determined by selecting stations with high
precipitation correlations, thereby establishing a reasonably homogeneous set. The
minimum lead time between the latest point in an IV and the earliest point in the
DV was set at 2 months. 8 o • 8 t
A total of 15 filters were considered, 8 combinations of annual variables and 7
combinations of seasonal variables. A total of 1,000 time series were preprocessed
on the model building data during feature selection, giving a total of 15,000
filtered-series variables. 561 of the time series represent atmospheric meteorological
data, 430 are time series for sea surface temperature, 2 for stream flow discharges,
2 for tree ring indices, 4 for volcano eruptive indices, and one for the sunspot
count.
Feature selection used the lag correlation t-statistic. Time series with less than
52 years of model building data were dropped as too short. At the 88%
significance level, 4 of the 15 filters yielded an above random number of high-
correlating IVs. This result was confirmed by reanalysis using rank-order corre-
lations and by another analysis using Monte Carlo rescramblings. The four
qualifying filters were: I-year cycle, moving difference, linear extrapolation, and the
most recent summer-winter difference.
A total of 53 features were selected. Twenty-nine involve sea surface temper-
atures (14 are latitudinal and longitudinal gradients), 18 are atmospheric variables,
2 are stream flow discharges, 2 are volcanic eruptive indices, and 2 are sunspot
variables (2 different filters applied to the sunspot time series).
Based on trial crossvalidation runs within the model building data, the weight
normalization was set at w=70. Three D-sequence and three W-sequence patterns
were found. Twenty-two of the 53 features are used in one or more of the 6
patterns.

tAs a general suggestion for research during the next few years using Pacific Ocean data, one may
adapt the building/verification split to the gaps in the data as follows: model building data=start-I924
and 1948-1962; verification data e- 1925-1945 and 1963-presenl. Omitting 1946-1947 eliminates any
possible carry-over from verification IV data (using Vo, V_I and V_ 2 type filters only) into DV data,
while minimizing data loss (much SST data during WWII being unavailable). The advantage of this
split is that it includes the relatively recent 1948-1962 period in the model building dataset while still
'complying both with the philosophy of restricting validation to later years and with the need for
adequate sample sizes.
ENTROPY MINIMAX MODELING 245

Results on the 45 model verification years are shown in Table VII. The entropy
minimax patterns have statistically significant predictive skill at 2 month lead time.
By comparison, the persistence predictor has no skill at zero lead time.
In order to improve the utility of the model, a new approach was taken to the
problem of increasing the predictive resolution. Rather than subdivide the DV into
more intervals, which would have reduced interval sample sizes, the original
bivariate categorization was used for pattern formation and quantitative predic-
tions were made in a post-processor on the basis of the magnitudes of the DV in
model building years matching each pattern. The averages in the model building
years matching the pattern-defined analogs provide the quantitative predictions
and the distribution over the analog years provides an estimate of the uncertainty.

TABLE VII
Predictive performance for Central Arizona winter (December-March) precipitation (45
Downloaded by [University of Bristol] at 02:45 26 February 2015

verification years).

All years Excluding pre-1890

Climatological average
Percentage correct 50% 50%
Persistence (0 lead time)"
Percentage correct 47% 43%
Statistical significance neg neg
Variance explained 2% -1%
Skill -7% -13%
Entropy minimax patterns (2 month lead time)
Percentage correct 65% 71%
Statistical significance 90% 96%
Variance explained (4 categories) 21% 14%
Skill 29% 42%

'wbetber August-November precipitation in central Arizona was above or below its median.

A quantitative post-processor algorithm based on these ideas was constructed


for the CAWPI model.V Percentile predictions, validated on reserved test data,
compared well with observed values. Models were built and validated using two
different training/test data splits. One used an 86:45 random split. At a 2-month
lead time for a December-March forecast interval, the bivariate accuracy of the
entropy minimax patterns is 65%. The quantitative post-processor gives numerical
predictions with correlction coefficient of 0.47 and explained variance of 20.4% on
the test data.
The second training/test split was 93:37 with the test years being the 37 most
recent (1946--1982). At I-month lead time for a February-March forecast interval,
the patterns have 69% bivariate accuracy. The quantitative predictions have
correlation coefficient of 0.34 and explained variance of 7.6% on the test data. The
lower values for this model reflect the narrowed forecast interval and the atypically
low precipitation during the test period.

5. Summer temperature in Central Arizona at 8 month lead time


A set of II stations was used to define the Central Arizona Summer Temperature
Index (CASTI).BO Five are stations ,in the greater-Phoenix area and 6 are auxiliary
246 R. CHRISTENSEN

stations to supplement pre-1900 records. The index was defined as deviations from
a moving average to filter out such effects as urbanization induced warming trends
in cities. The 126 years of available data (the period 1852-1982 with data missing
for 5 of the years, 1861-1864, 1866) were divided into 82 model building years and
44 verification years (the 7 most recent, 1976-1982, plus 37 randomly selected).
Each year was classified for pattern analysis purposes as "hot" or "cool,"
depending upon whether its CAST! was above or below median.
Temperatures affect energy demands, growing seasons and other planning
factors. Based on these considerations, the lead time between the last IV datum
and the first DV datum was set at 8 months. The F-ratio for mean July
temperatures in Phoenix is between 1.0 and 1.5, t60 representing Q-33% potentially
explanable variance. Seasonal persistence models (zero lag time) explain about lQ-
15% of the variance in mean summer temperatures in Central Arizona.6.177.t69 At
a two season lag, the performance of inter-seasonal correlation models becomes
Downloaded by [University of Bristol] at 02:45 26 February 2015

essentially random.
Based on trial crossvalidation using subsets of the model building data, the
weight normalization was set at W= 120. This magnitude, exceeding by 50% the
number of model building events, is indicative of a high degree of instability in
CAST! relative to the IVs and the data used. A total of 22 features were selected
for entropy minimax analysis. These included 2 atmospheric temperature variables,
9 Pacific sea surface temperature variables (3 being latitudinal gradients), 4
barometric pressure variables, I precipitation variable, 2 stream flow variables, 2
volcanic activity variables, and 2 sunspot variables.
Two pattern sequences were generated by the entropy minimax analysis: 4
patterns in the H-sequence (hot-first) and 4 patterns in the C-sequence (cool-first).
Results on the verification years are given in Table VIII. The 55-60% accuracy at .

TABLE VIII
Predictive performance for Central Arizona summer (June-September) temperature (44 verification
years).

All years Excluding pre-1980

Climatological average
Percentage correct 50"1. •. 50"1.
Persistence (0 lead time)'
Percentage correct 58% 62%
Statistical significance 71% 83%
Variance explained 7% 9%
Skill 16% 24%
Persistence (4 month lead time)"
Percentage correct 49% 53%
Statistical significance neg 27%
Variance explained 0"1. 0%
Skill -2% 6%
Entropy minimax patterns (8 month lead time)
Percentage correct 55% 60"1.
Statistical significance 40% 71%
Variance explained (4 categories) -9% -9%
Skill 9% 20"1.

"whether average February-May temperature in central Arizona was above or below it5 median.
"Whether previous October-January temperature was above or below its median.
ENTROPY MINIMAX MODELING 247

8 month lead time has insufficient confidence of not being a chance correlation for
reliable use. The persistence predictor has 58-62% accuracy at zero lead time.
When the lead time is extended to 4 months, this drops to 49-53%.

6. Discussion
For seasonal and annual precipitation forecasting, with lead times up to 6 months
or more, entropy minimax pattern matching models have demonstrated statisti-
cally significant skill in two category prediction. There are indications that this can
be refined to more quantitative forecasting. One approach is the pattern-average
post-processing used for CAWPI. Another, not yet tried in meteorological
modeling, is the potential function approach employing fuzzy entropies for
continuous DVs, successfully used in fission gas release modeling and in
lymphoma survival modeling, described in Subsections III.A.2 and IV.A.2 below.
Further work is also required to determine the relationship of predictive reliability
to DV period, lead time and other factors.
Downloaded by [University of Bristol] at 02:45 26 February 2015

Even though the pattern matching models are relatively easy to use once built,
the effort required in model building is considerable. The F-version of the current
pattern search program, for example, is limited to processing independent variables
in batches of no more than 100 at a time. Thus, for example, interactions between
variables no. 5 and no. 150 will only appear if both survive interactive screening of
no. 1-100 and no. 101-200 separately. Even if the limit is increased 10-fold, as is
readily done on many systems, the feature selection preprocessing required is still
considerable. Furthermore, the currently available time series of 100 to 130 years
are only on the border-line of being barely long enough to extract statistically
verifiable patterns, considering that error minimizing weight normalizations are in
the order of 70-120.
These factors have tended to restrict consideration of the approach to develop-
ment of models for specific areas that are important enough to justify the effort.
Examples being considered include:
• Annual precipitation forecasting in drought-sensitive regions (e.g., the Sahel in
Africa)t at lead times useful to remedial planning (e.g., timely conservation
and establishment of alternatives). Annual forecasts are also useful in areas
requiring water for hydroelectric generation.
• Seasonal precipitation forecasting in areas with agricultural requirements at
useful lead times (e.g., midwestern US, southwestern USSR, etc.).
• Precipitation forecasting for specific time frames in areas with construction
and maintenance planning needs (e.g., drilling operations, dam construction,
highway maintenance, trucking, large-scale building projects, etc.)
• Average temperatures and numbers of days with high summer temperatures
and low winter temperatures, at lead times useful in preparing for response to
varying demands for electric power, oil, and gas, etc.
• Combinations of temperature, precipitation and humidity during specific
seasons for energy requirement estimation.

tThe critical period for the Sahel in southern West Africa is the summer monsoonal season (June-
September). Appropriate lead times would be 5 to 8 months preceding June. Available time series data
are 8(}-IOO years long.
248 R. CHRISTENSEN

These are a few examples of important uses of long range forecasts and an even
longer list of general, commercial and governmental uses.
Research efforts that can help to make long range weather modeling more
efficient include:
• Further item-by-item completing, checking and cleaning up existing data
bases to reduce their noise levels.
• Extending existing data bases back further in time, even if only categorically
rather than numerically, by a combination of historical and scientific research.
• Standardizing and regularizing the publication and updating of new features
such as regionally averaged calibrated tree ring indices, other plant growth
indices, stream flow indices, mud varb (soil deposition) indices, etc.
• Developing master lists of time series and filters for large geographic areas (as
. can now be done for the Western U.S.) to facilitate feature selection for
Downloaded by [University of Bristol] at 02:45 26 February 2015

building models for individual regions within these areas.


• Development and testing of seasonal temperature forecasting models at 0-, 3-,
6-month and longer lead times for various regions.
• Development and testing of seasonal precipitation forecast models in eastern
regions of the U.S. with various F-ratios. Also, testing one-sided (e.g., dry)
predictability in regions with low F-ratios.
• Coupling model building with on-going predictor monitoring, assessment and
improvement programs.

7. Predictive performance assessment computations


There are a wide variety of measures which can be used to assess predictive
performance. S 1 The results reported in this paper use three which are fairly
common and can be easily computed for most types of predictions: percentage
correct, variance explained and skill. In addition, the statistical significance of the
percentage correct is given.

Percentage correct For purposes of computing the percentage correct, all predic-
tions were converted to a categorical form.

P=M/N,
where
N = total number of predictions,

M = number correct.

For probabilistic predictors for two categories, wet and dry, for example, the
predictions were categorized as "wet" if P weI ~ 0.5, otherwise "dry." For a predictor
of the specific numerical value of a variable, on the other hand, the predictions
were categorized according to percentiles of the values of the predictors.
The fraction correct for the climatological average, Po, was used to estimate the
variance, Vo, under the null hypothesis of no predictability. For a binary wet/dry
predictor, for example:
ENTROPY MINIMAX MODELING 249

The deviation of the observed fraction correct from the climatological average is
then measured in units of Vo for purposes of significance testing.

Using the Gaussian approximation, the two-tailed confidence level a is the sum
of the tail areas under the unit normal curve outside ± !:J.P, and the statistical
significance is 1- a. (A more exact test, using Student's r-distribution, would give a
negligible correction for the sample sizes involved.)

Skill The Heidke skill rati025.125 provides a comparison of the observed percen-
tage correct in the test data to the climatological expectation:

K =(M -C)/(N -C),


Downloaded by [University of Bristol] at 02:45 26 February 2015

where N is the number of test years predicted, M is the number correct, and C is
the expected number correct on the basis of the climatological average for the
training data. (If the number of training years, n, was odd and they were split
between two categories as (n+ 1)/2:(n-I)/2, then C was simply taken as NI2.)

Variance explained For both numerical and probabilistic predictors, the variance
explained was computed using squared deviations from a fitted estimator, dis-
counting for the number of degrees of freedom used in fitting.
For numerical predictors (e.g., persistence), let {X;} be the predictors and let
{Y;} be the observed values. A linear estimator was used,

Zi=a+pX i,

where

p=(Xy -x Y)(Vx
a=Y-X,

except that a= Y, p=O if X Y~XY. (Nonnegative p is implies by the meaning of


"persistence," although one could logically also define a reverse-persistence predic-
tor.) The fraction of explained variance is then

where

I N
S=- L (Y;- Z i)2
N-vi=!

N = number events

V={I if P=O (i.e., only a fitted)


2 if P>O (i.e., both a and P fitted).
250 R. CHRISTENSEN

For probabilistic predictors, the events were binned into 10 probability intervals
(i-I)/IO<p;;;i/l0, i= 1,... ,10, where p is the probability. The estimator Z, was
then taken as the average ~ V for the events in the ith bin. The expressions for S
and E; remain unchanged, where, in this case, the number of degrees of freedom,
v, used in fitting is the number of nonempty bins. (Where there was binned
categorization for explained variance computation, this is noted explicitly in the
tables.)
If the estimators in either the numerical or the probabilistic case are formed on
the training rather than the test data, then one simply uses v= 1. The test data fits
were used here to separate out baseline effects, so that Ep measures the ability of
the predictors to correctly categorize the data in a Ieast-squared-deviation sense.
The explained variance magnitudes are generally low compared to skill level
magnitudes, since a few large errors can dominate variance, whereas skill depends
on error size only to the extent that the size affects categorization.
Downloaded by [University of Bristol] at 02:45 26 February 2015

[The I/(N -v) factor in the definition of S is easily explained as follows: Suppose
one splits a sample into subs ampIes and fits model parameters to replicate the
means of each subsample. The expected values of the parameters are clearly
independent of the sizes of the subsamples since the expected value of a mean is
independent of sample size. Thus, we may, without loss of generality, compute v
for the split 1: I: ... : N - v, in which case we have simply fit v-I data points
exactly and averaged over the remaining subsample of size N - v.]

III. ENGINEERING/MATERIALS SCIENCE

A. Nuclear Engineering
A nuclear fuel rod is a sealed metallic cannister (called the "cladding"), usually
slightly less than 4 meters long and about 1 centimeter in diameter (wall thickness
generally 1-3 mm) containing nuclear fuel. The VO z nuclear fuel consists of
cylindrically shaped pellets, typically 1-2cm thick with a diameter permitting them
to fit inside the cladding with a small circumferential spacing or "gap." A 4 meter
column of J em pellets, for example, totals to 400 pellets. .
In a commercial light water reactor (LWR), the fuel rods are packaged into
assemblies, the rods being held in place by a web of spacers. Although the
numbers vary with the size of the reactor, the magnitudes involved are illustrated
by a reactor with 560 assemblies, each containing 8 x 8 = 64 rods. (Other configur-
ations involve fewer but larger assemblies.) This gives a total of over 30,000 fuel
rods containing roughly 14 million fuel pellets.
The fuel assemblies are loaded in geometric patterns into a large water filled
container, the primary containment. The primary coolant flows around the
individual fuel rods, carrying heat from them to an exchanger where this energy is
transferred to the secondary coolant and then to electric generators.
As in all industrial plants, there are a large number of systems to keep running
properly in order to maintain proper operation of the plant. Five potential
problem areas to which entropy minimax has been applied are described in this
section. These are: fracture of the fuel pellets, release of fission gas from within the
fuel pellets to the fuel-cladding gap, rupture of the cladding allowing fission gas
and particles to escape into the primary coolant, axial bowing of the fuel rods
creating local obstructions to coolant flow, and swelling of fuel rods in the
hypothetical catastrophic incident called a LOCA (Loss-Of-Coolant-Accident).
ENTROPY MINIMAX MODELING 251

J. Fracture of V0 2 pellets subjected to large thermal gradients


To simulate the thermal conditions in a reactor environment, stacks of fuel pellets
may be heated electrically by passing a high amperage current through
them.194.257 The total power delivered to the stack under this direct electrical
heating (DEH) is J2R, where J is the current and R is the stack resistance. The
heat is extracted by passing helium in turbulent flow axially along the stack.P?
D0 2 resistivity is a function of temperature, stoichiometry and other factors. The
feedback relationship between temperature, resistivity and power are modeled in
determining the temperature distribution. 59
A DEH experiment was conducted by Argonne National Laboratory to
generate data on D0 2 pellet fracturing for entropy minimax analysis.P! The
objective was to study pellet fracturing, fragment relocation, and fracture healing
as a function of thermal history and other factors. Successful experimental runs
produced centerline temperatures of 1,20o-I,800°C with surface temperatures in
the order of 300°e. (The melting point of U0 2 is about 2,800°C.) The high inside-
Downloaded by [University of Bristol] at 02:45 26 February 2015

to-outside temperature gradient causes material in the interior of the pellets to


expand more than material nearer the surface, resulting in cracking of the pellets.
Thus, in normal operation fuel pellets are expected to and do fracture. This
relieves the stresses which build up in the pellet interiors due to the high interior
temperatures. However, the fracturing also has the potential to cause problems
under commercial operating conditions in which the pellets undergoing nuclear
heating are encased in fuel rod cladding. Fractures create channels for fission gas
to escape from the pellet interior. Also, if a chip of material becomes dislodged
from the pellet, high local stresses can be exerted on the cladding. The combina-
tion of corrosion and stresses increases the likelihood of cracking and rupture of
the cladding wall.
Since the pellet fracture phenomenon is too erratic to predict mechanistically
even under the most carefully controlled experimental conditions, and especially
under the less well characterized conditions of commercial operations, it was
decided to build a statistical model.
Three dependent variables were defined to represent factors measured in real-
time during the tests: material cracking activity (measured acoustically), diametrial
expansion, and axial elongation. Dependent variables defined to represent post-test
conditions included: total accumulated cracking activity, maximum crack width on
pellet surface, total crack length in sectioned specimen, and amount of permanent
diametral expansion. Forty-nine independent variables were defined for purposes
of searches for patterns predictive of during-test DVs; 33 for post-test DV pattern
searches. For during-test DVs, the IVs used were various features and combina-
tions of features of the time series of power and temperature for the pellets in the
stack. For post-test DVs, the IVs included pellet and stack fabrication factors:
material grain size, pellet density, interface preparation, stack length, chamber
pressure and moisture, and various operating history descriptors.
Each DV was divided by magnitude into four equally populated classes. Weight
normalizations used for the different DVs varied from 25 to 70. Entropy minimax
patterns were determined for the three DVs for each of the 27 during-test cycles
individually, and for the four DVs for the post-free results as a whole. Seven of
these 85 pattern sequences are described in Christensen (Vol. 2).45 The sizes of the
samples from which they were derived are given as follows:
252 R. CHRISTENSEN

Dependent variable Sample size

During-test
Cracking activity (acoustic)-test A7 159
Diametral expansion-test A13 528
Diametral expansion-test A15 97
Post-test
Cracking activity (acoustic) 127
Crack width 112
Crack length 125
Diametral expansion 131

Table IX gives, as an example, the distributions for trammg and test events
Downloaded by [University of Bristol] at 02:45 26 February 2015

matching the first two patterns for crack width. The similarity of test and training
distributions, which is quite evident here, was even stronger for the other DVs.
Analysis of the during-test patterns revealed the importance of history effects. In
each pattern sequence, at least one history variable, as distinguished from
instantaneous state variables, appeared in a defining condition in the first or
second pattern. Short-term rather than long-term history variables were most
important.

TABLE IX
Classification distributions of events matching the first two patterns for maximum
crack width.

Narrow Mod.-Narrow Mod.-wide Wide

Crack width pal terti no. J


Training 10 0 1
Test 7 I 2
Crack width pattern no. 2
Training 1 0 9 9
Test 2 4 8 5

Centerline temperature variables were important in patterns pertaining to cold


state fuel expansion, as expected. Chamber pressure was important in patterns for
cracking activity. Examination of the timing of low entropy features for cracking
activity relative to the acoustic emission revealed time delays consistent with
thermal time constants for the material.
Analysis of the post-test patterns turned up some unexpected relationships.
Particularly noticeable in this regard was the strong association of cracking
activity to trace level amounts of moisture in the helium coolant. This relationship
was explained, in retrospect, as most likely due to the high sensitivity of V0 2
electrical resistivity to slight deviations in stoichiometry, and the consequent effects
of resistivity changes on power and hence, on temperatures. Also significant in
patterns for total extent of cracking were amount of power cycling, and power
peaks at the end of up-ramps.
ENTROPY MINIMAX MODELING 253

2. Fission gas release in nuclear fuel rods under commercial operating conditons
Another phenomenon which, like pellet fracturing, is too erratic to predict reliably
using mechanistic models is fission gas release (FGR). Fission gas release by such
mechanisms as diffusion through uranium dioxide, migration along grain bound-
aries, and escape along pellet fractures has been studied extensively, both
theoretically and experimentally. Under controlled laboratory conditions, the
amount of fission gas release for some fuel types is known to within a reasonable
uncertainty as a function of such variables as temperature, porosity and grain size
distribution. What makes fission gas release difficult to predict reliably under
operating conditons is its extremely high sensitivity to such variables as temper-
ature and grain size and the fact that these variables are themselves coupled to the
amount of fission gas released. Under important classes of operating conditions,
this coupling acts as a positive feedback. Increased fission gas release into the fuel-
cladding gap lowers the thermal conductivity of the gap. This insulating effect
produces higher fuel temperatures and more fission gas release. Once started, the
Downloaded by [University of Bristol] at 02:45 26 February 2015

process continues, driving the temperature upward, until the fuel expands enough
to close the gap and release stored heat via direct fuel-cladding contact.
At low centerline temperatures (say below I,OOO°C), generally less than 0.5% of
the fission gas produced in the U0 2 material will be released into the gap. At
higher temperatures, more fission gas is produced and a greater fraction is
released. When the release percentage reaches somewhere in the vicinity of 3-6%,
depending upon a complex of circumstances which are themselves known only
approximately, the positive feedback mechanism sets in if the gap is still open or if
closed and only in soft contact. There is a "burst" of fission gas release. The
percentage may rise to 10%, 20% or even to 30% or more before equilibrium is
reached via hard contact. At that point, centerline temperatures will generally be
in the order of 1,50Q-l,700°C or higher.
Mathematically, the process behaves as a cusp catastrophe.198.233 At low
temperatures, the system has only one stable state, namely low FGR. As the
temperature is raised, the system acquires a second stable state at high FGR with
no intermediate stable states. As the temperature is raised still further, the system
has only the higher stable state. The system behaves nondeterministically with an
increasing probability of being in the higher FGR state as the temperature is
raised. (Ways of avoiding or minimizing FGR burst include operating at low
powers to keep temperatures down, limiting burnup to keep fission gas production
down, prepressuring the rod to retard FGR insulating effects, and facilitating heat
transfer with early gap closure prior to significant gas production.)
An entropy minimax model of fission gas release was developed using data on
139 rods from 12 reactor-cycles. so Because of the small sample size, a set of
80%:20% splits was used for modeling building vs. verification. A master list of 199
features were screened for possible FG R information content. Thirty-three persons
from 23 different organizations participated in drawing up the list. Although all of
the features were included on the basis of possible relevance to FGR, most had
never before been tested as potential factors in an FG R model. The first 17 features
in this list are characteristics of the fuel rod in its as-fabricated state prior to
insertion into the reactor. The remaining 182 features are variables modeled
mechanistically in a computer code designed to stimulate the actual in-reactor
performance of the fuel rod. Feature selection screening was based on three
numerical criteria (entropy exchange, correlation coefficient, and chi-squared) in
addition to engineering judgment.
254 R. CHRISTENSEN

Only 6 of the 199 features were found to have correlation coefficients exceeding
0.5 on the model building data, the highest value being 0.58. Many features with
low correlations individually were found to have significant information content in
terms of pair-wise entropies when taken in combination with other features. .
A total of 42 features were selected to be independent variables for mJdel
building. Two are themselves estimates of the dependent variable. One is fission
gas release computed by a semi-empirical algorithm, FCODE-BETA, which iias
benchmarked against laboratory data. The other, GCODE, is a 4-variable
regression fit to FGR in the model building data, the 4 variables used being
chosen from among the most highly correlating of the other 41 on the building
data. (Since these 41 were selected by preprocessing involving entropy exchange,
GCODE represents some information from the entropy minimax model building
process.) '~
A total of 13 of the 42 features were used to define II entropy minimax
Downloaded by [University of Bristol] at 02:45 26 February 2015

patterns. GCODE was used as one of the defining features in 3 of the patterns. In
addition, the GCODE prediction itself is amalgamated with the pattern predic-
tions to form the final SPEAR-BETA prediction, the relative contributions from
GCODE and the patterns being given weights inversely proportional to their
variance on the model building data. This automatically makes GCODE the fall-
back predictor in the event of failure to match any pattern.
The dependent variable was expressed as 10g(FGR) and treated as a continuous
DV, using a Gaussian potential function and minimizing the associated fuzzy
entropy, to find the patterns. The values of log (FGR) ranged from about - 8 to
about -0.8. The analysis was conducted at a potential function resolution of
lllog (FG R) = 2, corresponding to about 4 magnitude classifications.
Based on trial crossvalidation using subsets of the model building data, the
error minimum was found to occur at a weight normalization of w=25. The low
value indicates the comprehensiveness of the feature list and the relative consis-
tency of the data. A total of II FGR patterns were found in two sequences of 6
and 5, respectively.
An example of a high-FGR (15.6~~.~3%) pattern is the following:
• GCODE> 8.52%, or
• Amplitude-weighted sum of the absolute values of the changes in the phase of
axial profile of burnup exceeds 0.00509 radians.
The full set of FGR patterns is given in Ballinger et al. 5 Prior research on fission gas
release focused on laboratory experimentation to measure its dependence on such
factors as temperature, grain size, and gas production history (from prior power and
temperature history). The idea was to isolate and benchmark the important
mechanisms in the laboratory and then to use these mechanisms to compute FGR
estimates under commercial operating conditions. The limited success of attempts to
validate the results under operational circumstances has been a continual challenge to
researchers.
The entropy minimax modeling did not find temperatures and grain sizes (which
themselves have to be estimated by mechanistic modeling) to be reliable predictors
of FGR under commercial operating conditions. Rather, the most informative
factors were found to be a variety of complex combinations, some of which can be
explained in retrospect and some of which appear to carry composite information.
Examples of important indicators which were found and which had not previously
ENTROPY MINIMAX MODELING 255

been considered include fuel crack width, fuel plasticity radius, axial profile of
burnup, and the product of tensile-work and estimates of FGR computed from
mechanistic models.
Figure 2 gives plots of 10glO(FGR) observed vs. predicted for each of the four
models on the entire dataset, model building and verification.' For purposes of
making categorical-type assessments, one can convert the predictions into a simple
"above median" and "below median" dichotomy, where the median for the model
building data was FG Rm'd = 2.4%. Table X gives the results on the verification
data for the predictions converted into this binary form.

.. ...
..0_°
Downloaded by [University of Bristol] at 02:45 26 February 2015

fo) 1'Iechanlstlc f'Iodel {U)l'IfTHE IJl-JI (0) !'IeChanlstlc PIodel (FCOOE-BETAl

-2.' -2,0 .1.5 -1.0 -0.5 -2.0 -1.$ -1.0 '0.5


PREDICTED - - PREDICTED ----

eel eearesston MOOeI (GeODE) ldl EntrOD~ Pllnlroox (SPEAR-BETA)

FIGURE 2 Observed vs. predicted values of loglo(FGR) for four fission gas release models
(N = 124 for (a) and N = 139 for (b-<l)).

All four models have statistically significant skill at binary "above/below median"
prediction. The entropy minimax model is the highest with 88% accuracy. In terms
of explained variance, the differences are quite pronounced. The entropy minimax
model explains 68% of the variance in 10g(FGR), while the COMETHE III-J
model'61 only explains 4% (due to many large differences between COMETHE
predicted FGR and observed FGR). The high explained variance of entropy
minimax illustrates its use to predict a continuous variable modeled with fuzzy set
theory in the potential function approach.
256 R. CHRISTENSEN

TABLE X
Predictive performance for fission gas release from rods in
12 different reactors, assessed on ability to predict whether
or not above median (2.4% FG R).

Sample average
Percentage correct 50%
Mechanistic model (COMETHE III-J)
Percentage correct 77%
Statistical significance >99%
Varianceexplained 4%
Skill 54%
Mechanistic model (FCODE-BETA)
Percentagecorrect 72%
Statistical significance >99%
Variance explained
Downloaded by [University of Bristol] at 02:45 26 February 2015

12%
Skill 44%
Regression model (GCODE)
Percentage correct 81%
Statistical significance >99%
Variance explained 49%
Skill 62%
Entropy minimax model (SPEAR-BETA)
Percentage correct 88%
Statistical significance' >99%
Variance explained 68%
Skill 76%

3. Fai/ure (rupture) of nuclear fuel rods under commercial operating conditions


A fuel rod is said to "fail" when it ruptures, allowing fission products to pass into
the primary coolant. The rupture may be a hairline crack, a long tear, or a hole
burst open. Although assembly failure rates were as high as 10% for older fuel
types, especially unprepressurized rods, they are much lower for modern fuel. After
a typical cycle of operations, fewer than 0.5% of modern assemblies have a failed
rod. (A cycle is generally in the order of 1.5 years.) However, occasionally the
failure rate is higher.
A number of mechanistic models have been developed to simulate the thermal,
mechanical and chemical history of fuel rods under varying operating conditions.
These include BEHAVE, COMETHE, FCODE, GAPCON, LIFE and others. To
each of these may be appended any of a number of semi-empirical models of
cladding failure. Such combinations are generally referred to as mechanistic failure
models, being built partly out of "basic" mechanistic laws and partly out of
parameterized curve fits to laboratory and other data using variables selected on
the basis of expert judgment.
Mechanistic models, which attempt to track the details of stress-corrosion
cracking, generally perform poorly in predicting failure, despite accurate perfor-
mance in predicting thermomechanical variables such as temperatures and strains,
and despite accurate tuning to experimental data on cladding crack initiation and
propagation. Frequently the introduction of new parameters is apparently justified
by showing that they enable one to represent additional physical processes and to
better fit specific data, only to find that the better fitting was illusory when
ENTROPY MINIMAX MODELING 257

verification is attempted on independent data. The difficulty is the increased risk of


error accompanying the greater specificity of models that tune many parameters to
fit data to a high level of detail.
Because of the unreliability of mechanistic models of fuel rod failure, a sequence
of research projects was undertaken to model fuel failure probabilis-
tically.?": 190. 191. 192 The relationship between entropy minimax and mechanistic
modeling is shown in Figure 3.

Precharacterization Input Operational History Failure Observations


(dimensions, densities, (power vs. time. (corresponding to
enrichments, porosities. pressure vs. time, operational
grain sizes, etc.) etc. ) hi story data)

+ +
Downloaded by [University of Bristol] at 02:45 26 February 2015

Mechanistic Model of Operations

(computes temperatures, pressures,


stresses. strains, etc. vs. time)

+
Mechanistic Model of Fai lure

(computes failure probability)

+
INDEPENDENT VARIABLES DEPENDENT VARIABLE
for Entropy Minimax for Entropy Minimax

FIGURE 3 Relationship of entropy minimax analysis to mechanistic modeling. The mechanistic


models, along with the raw precharacterization and operating history data, supply independent
variables to entropy minimax. (Building of the mechanistic models typically uses special laboratory
datasets. The operational failure observations used as the DV for entropy minimax analysis must, of
course, not be used in building the mechanistic models.)

Two entropy minimax models were built for failure of zircaloy fuel rods. The
first, SPEAR-ALPHA, used a mechanistic model of fuel performance, FCODE-
ALPHA, and a mechanistic failure model, CCODE-ALPHA, to compute the
independent variables, and a data base consisting of 1,187 assemblies from 4
different reactors.v"? The second, SPEAR-BETA, used improved mechanistic
codes FCODE-BETA and CCODE-BETA to compute the independent variables,
and an expanded data base consisting of 3,402 assemblies from 11 different
reactors.i-r" In addition, an entropy minimax model was built for failure of
stainless steel fuel rods using a modified mechanistic code, FCODE-BETA/SS, to
compute the independent variables, and a stainless steel cladding data base. 2 4 6
Failure models provide input to fuel assessment studies, to reactor loading and
operations management, and to cost implications analyses."

GEN. SYS.-L
258 R. CHRISTENSEN

Model building for the entropy minimax patterns for SPEAR-BETA used 1,707
events (each being an assembly-cycle), and model verification used an independent
set of 1,695 events. The splitting was random, subject to an overall failure rate
equilibration constraint. A total of II different reactors were represented.
Feature selection for zircaloy clad fuel failure modeling used the same master list
of 199 variables as for the FGR modeling. The correlation coefficients were
generally much lower for failure than for fission gas release. The highest individual
value for Irl was only 0.23. In part, this was due to the greater noise in the system
relative to fuel failure, which is one step further removed from the input conditions
of fuel precharacterization and operating power history than FGR (FGR may
contribute to fuel failure). However, the low values for Irl were also partly due to
the fact that the theoretically maximum possible correlation between a continuous
variable and a discrete variable such as failure status is less than unity.6o For
example, if one distribution is Bernoulli, fl(k)=pk(l-p)I-k, k=O or I, and the
Downloaded by [University of Bristol] at 02:45 26 February 2015

other is uniform, f2(X) = I, O~x~ I, then rm ax = j 3 p ( l - p ) . If, on the other hand,


the second distribution is the unit normal, f2(x)=exp(-x2/2)/~, then r max=
exp( -x5/2)/j2np(l-p), where x o=erfc- 1(1-p). Since correlation is unaffected
by translation or scaling, these results hold for the correlation of the Bernoulli
distribution to rectangular and normal distributions generally.
A total of 76 features were selected for building the entropy minimax model of
zircaloy cladding failure (SPEAR-BETA). These included II precharacterization
variables, the CCODE-ALPHA and CCODE-BETA failure probabilities, and 63
additional during-cycle and end-of-cycle variables computed by FCODE-BETA
for the specific rod. The full list of features is given in Christensen.t"
For the earlier version, SPEA-R-ALPHA, using 58 features, weight normaliz-
ations of 32 and 128 were used for the S- and D-sequences, respectively, the
difference being sufficient to justify using up an extra degree of freedom. The low
failure probability coupled with incompleteness of failure information in the 58
ALPHA-version features may have contributed to the high D-sequence weight.
This was not the case for SPEAR-BETA. In building this model, the weight
normalization was set at w = 40 for both the S- and D-sequence pattern searches.
The 76 BETA-version features appeared to capture more failure information than
the ALPHA-version features. A total of 36 patterns were found in the 1707 events
in the SPEAR-BETA model-building sample, 18 in the S-sequence and 18 in the
D-sequence. The patterns are quite complex, each involving as many as 12 or more
features. The first two patterns in each sequence are given in Ballinger et ai? The
full set is given in Christensen (Vol. 2).48 The coefficient of correlation of the
SPEAR-BETA predictions to observed failure status (I = fail, O=nonfail) in the
verification data is 0.42. Failure probabilities associated with individual patterns
range from a low of 0.004 (+0.003, -0.002) to a high of 0.62 (±O.I3). (The
amalgamated predictor has a somewhat greater possible range because of the
mechanistic algorithm (CCODE-BETA), although CCODE-BETA was given high
weight only when it produced a very low failure probability.)
Prior to the entropy minimax analyses, the prominent factors affecting fuel
failure were regarded to be high power and ramp rates, release of corrosive fission
gases such as radioactive iodine, and stress due to hard pellet-cladding contact,
especially at pellet-pellet interfaces and near relocated pellet fragments. The
entropy minimax analyses confirmed these factors in varying degrees, and also
revealed the importance of previously unconsidered factors. Especially important
ENTROPY MINIMAX MODELING 259

among these were combinations such as power-corrosion (the product of measures


of power and corrosion), strain-corrosion, the integrated stress-strain rate (work
density), and irregularities in the axial profile of power, stress and strain. For
example, the single most informative failure indicator is rapid increase in strain-
corrosion, while the most informative nonfailure indicator is low work density.
(Not only is the conventional "stress-corrosion cracking" theory replaced by
strain-corrosion, but the key variable becomes the rate of increase of strain-
corrosion.) The second-most informative failure indicator is a strain rate, and the
third is a product of power and ramp rate.
Reactors are typically operated in balanced or nearly balanced fuel loading
patterns and power distributions. Assemblies with similar fuel types and burnup
histories are generally placed in symmetrical positions in the core. Depending
upon the fuel management protocol, the symmetry followed may be quadrant or
octant, with various degrees of asymmetry. In the case of quadrant symmetry, for
example, for any given assembly there will be another 3 assemblies with the same
Downloaded by [University of Bristol] at 02:45 26 February 2015

values for the independent variables and hence with the same failure probability
prediction. Thus, instead of making a prediction of probability P for each of these
assemblies individually, one can make a prediction of Q = 1-(1-P)4= P(4-6P+
4p 2 - p 3 ) that there will be at least one failure in the 4-assembly symmetry group.
For categorical comparison purposes, this can be expressed as a binary prediction
of "fail" if Q >! and "nonfail" if Q <!.
Table XI shows the predictive performance of the various models on the
verification data when evaluated at various symmetry levels. Percentage correct,
statistical significance and skill are given for the mechanistic model, expert
programming and entropy minimax patterns. (The DV is, itself, categorical in this
case. So explained variance would contribute no information in addition to
percentage correct.)
The sample average predictor is simply P f a H = 1 -(1- Po)', where Po = 0.086 is
the overall assembly failure rate on the model building data and s= 1, 4 or 8 is the
assumed symmetry specifying the resolution of the prediction. Converted to a

TABLE XI
Predictive performance for fuel failure in assemblies in II different reactors (1695 verification events).

8-Assembly 4-Assembly I-Assembly


Symmetry Gp Symmetry Gp Symmetry Gp

Sample average
Percentage correct 51% 70% 91%
Mechanistic model (CCODE-BETA)
Percentage correct 40% 61% 91%
Statistical significance neg neg 0"/0
Skill -22% -28% 0"/0
Expert programming (PaSHa)
Percentage correct 68% 63% 82%
Statistical significance 91% neg neg
Skill 36% -23% -100"/0
Entropy minimax patterns (SPEAR-BETA)
Percentage correct 85% 78% 91%
Statistical significance >99% >99% 0%
Skill 70% 27% 0%
260 R. CHRISTENSEN

"fail" vs, "nonfail" prediction, depending upon whether P ra il > 0.5 or P ra il <0.5, the
sample average (for Po = 0.086) always predicts "nonfail" for s = I or 4, and always
predicts "fail" for s=8. (The case s= I is simply the individual assembly predictor.)
The entropy minimax predictor has 85% accuracy at the 8-assembly resolution
(i.e., it predicts whether or not there is a failure among the 8 assemblies with 85%
accuracy). As the resolution of the predictor is sharpened, the accuracy at first
drops off (it is 78% at s = 4) and then rises. The amalgamated failure probability
assigned to a single assembly rarely exceeds 0.5, so that at the single assembly
resolution, s= I, the categorical conversion almost always produces a "nonfail"
prediction. By comparison, the mechanistic model has negative skill at s = 4 and 8,
and has zero skill at s = I (for which its categorical behavior is the same as the
sample average, always predicting nonfailure). The expert programming model has
positive skill at s = 8, and negative skill at s = I and 4.
If the fuel assembly conditions are relatively uniform, the core-wide power
Downloaded by [University of Bristol] at 02:45 26 February 2015

distributions relatively smooth, and threshold conditions on power, ramp rates


and other factors not exceeded, then failure predictions are fairly uniform from
assembly-to-assembly. This was the case, for example, with predictions for Maine
Yankee Cycle-4 made prior to shutdown and fuel inspection. 57 Using the entropy
minimax patterns, SPEAR-BETA predicted that 6±3 of the 217 assemblies would
be failed, with individual assembly failure probabilities differing by less than a
factor of 2. When the reactor was shut down 5 months later and the fuel was
inspected (by a process of sipping adjacent coolant and measuring the radio-
activity level) it was found that there' were 8 assemblies with the localized failure
typical of stress-corrosion cracking and one assembly with extensive failure,
possibly due to preoperational defects. Under circumstances where there are
several fuel types in the reactor (different enrichments, assembly 'configurations,
etc.), or where there are significant spatial asymmetries in the power distribution,
or numerous and steep power ramps, failure probability differences of a factor of 10
or greater are obtained, providing a higher degree of geometric resolution to the
failure probability distribution in the core.

4. Longitudinal distortion (bowing) of rods in nuclear fuel assemblies during


commercial reactor operations
It is important that the fuel rods in a reactor remain straight, so that the spacing
between rods is uniform along their length. Rod bowing partially closes the
coolant channels, giving rise to local hot spots in the core's temperature
distribution.
Data from 3 different reactors were used to find entropy minimax patterns in
rod bow. l 73 The data base included ISS closure measurements from the Surry-I
and Surry-2 reactors (PWRs with 4 types of Westinghouse cladding tubes), and
4,928 closure measurements from the Oconee-2 reactor (a PWR with 4 types of
Babcock & Wilcox cladding tubes and 3 types of Exxon cladding in 4 different
assembly designs).
The data for the Surry reactors covered 32 . independent variables. These
included spatial location, cladding tube type, pre-irradiation dimensions and
forces, burnup and neutron flux. The Oconee data, although containing a much
larger sample of inter-rod spacings, covered only 13 variables, very little pre-
irradiation characterization data being available other than the general types of
rod s and pellets.
ENTROPY MINIMAX MODELING 261

The Surry data were randomly subdivided 77:78 into training and trial portions.
On crossvalidation analyses, the error in using the training portion to predict the
trial portion was found to be a minimum in the vicinity of w=70. The patterns
found indicated that important variables include the pitch of cladding eccentricity,
axial location in the span (which may be related to mid-assembly compressive
forces), tube diameter, rod elongation, and pre-irradiation bow. In general,
however, the dataset was not found to contain adequate information regarding
closure to formulate a very definitive model. As an example, the channel closure
distribution for events in the training and trial sets matching and not matching the
first pattern are given in Table XII. When assessed as a categorical predictor of
whether or not the closure will be at least 30%, the training sample average
predicts "no", which is correct on only 45% of the test cases. By comparison,
predictions based on matching or not matching this pattern are correct in 55% of
the test cases, a slight improvement (statistical significance of 62%).
Downloaded by [University of Bristol] at 02:45 26 February 2015

TABLE XII
Training and trial sample distributions for events matching and events not
matching first channel closure pattern.

End-of-life closure

0-29% 30-39% 40-49% 50-100% Total

Match
Training 34 12 0 0 46
Trial 16 \I 3 2 32
Non-match
Training 14 7 8 2 31
Trial 19 15 7 5 46
Total 155

5. Radial distortion (swelling) oj nuclear Juel rods under simulated LOCA


(loss-of-coolant-accident) conditions
A LOCA is defined as an accident in which all of the primary coolant
instantaneously vanishes. Even though the presumed instantaneous disappearance
is impossible, rapid departure of coolant is possible and the LOCA scenario is
studied with respect to reactor safety as exemplifying a worst-case limit.
Small scale near-LOCA's are simulated in the laboratory for individual rods and
bundles of rods in specially designed transient overpower experiments. During the
rapid temperature excursion following heat path removal, fuel rods swell and may
burst open.
In 1974, data on all then available LOCA simulation experiments, covering
1,334 fuel rods, were assembled by Battelle Northwest Laboratories for entropy
minimax analysis. Five dependent variables were defined, each related to some
aspect of fuel rod swelling. The need for five DVs arose from dilTerences in the
factors measured in dilTerent experiments. Sample sizes for individual DVs ranged
from 40 to 567. A total of 88 independent variables were specified by AEC (now
NRC) specialists as potentially relevant to the extent of swelling.

GEN. SYS.-D
262 R. CHRISTENSEN

The data were analyzed for entropy minimax patterns associated with the
selected measures of swelling.v' The purpose of the analysis was to assess the
adequacy of the IVs specified and the data collected with respect to understanding
factors influencing swelling. Eighty IVs covered fuel rod and assembly charac-
teristics, and an additional 8 IVs specified the site at which the experiment
was conducted. Included among the IVs were data on fuel bundle geometry, rod
size and position, fuel preparation and cladding, heating method, atmosphere,
pressurization, temperature, heating and pressurization rates, clad wall thickness
and burn up. Patterns were found for statistical distributions of rod deformation as
a function of fuel- and experiment-dependent parameters. Weight normalizations
used for the different DVs were in the range 6G-120. The first two patterns for
each of the DVs are given in Christensen.t" Important variables included array
geometry, locations of rod in array and rod coating. The significant role in the
patterns of the experiment-dependent factors, such as test site, heating method,
Downloaded by [University of Bristol] at 02:45 26 February 2015

and temperature measurement method, indicated a need for further testing


utilizing a more uniform experimental design and data collection protocol.

B. Underground Storage System Leakage


Many different types of materials are transported, stored ana disposed of in
underground tanks and piping. Examples include heating oil at private homes and
businesses, and gasoline in underground tanks at service stations, farms, industrial
plants, municipal facilities and military installations. (Less than half of the
underground gasoline tanks are at service srationsj.?" Other examples include
solvents, cleaning compounds, manufacturing materials, by-products and a wide
variety of chemicals in underground tanks and piping at commercial establish-
ments and industrial sites, and radioactive waste at nuclear disposal sites.
The potential dangers to health and safety posed by risks of leakage from
underground containers has prompted considerable research on underground
leakage. The problem is technically challenging because of the great variety of
factors which can affect the risks of leaking, and the fact that the containment
systems (tanks, piping, unions, valves, etc.) are buried and not conveniently
monitored.8s.2s4
Underground tanks come in a variety of sizes and types. Typical sizes range
from 5G-500 gallon waste oil tanks, 30G- J,500 gal. heating oil tanks, 2,00G-20,000
gal. gasoline tanks, 10,00G-50,000 gal. aviation fuel tanks, to I million gal. and
larger tanks at industrial and military sites. Tank types include:
• Bare steel
• Exterior coated steel (e.g., asphalt, coal tar epoxy, urethane)
• I nterior lined steel (e.g., epoxy based resin)
• Cathodically protected steel (impressed current or sacrificial anode)
• Fiberglass
• Fiberglass coated steel
• Double walled [e.g., steel-steel, fiberglass-fiberglass)
• Reinforced and lined concrete
• Combinations of the above
In containment systems employing steel tanks, over half of the leaks occur in
the piping. The percentage approaches 90% or more in systems employing
fiberglass and highly protected tanks. Just as there are many types of tanks, there
ENTROPY MINIMAX MODELING 263

are also many types of piping. Examples include carbon steel, stainless steel,
galvanized steel, plastic, fiberglass-reinforced plastic, and a variety of others.
Equal in importance to the tanks and piping are the excavation and backfill.
They provide the external electro-chemical environment (important to corrosion of
steel and decomposition of certain resins) and the structural support and stability.
(A rising water table could float an empty tank; and large backfill voids could
remove support and result in rupture.) The excavation itself may be lined (e.g.,
synthetic membranes such as urethane coated fabrics, sealants such as bentonite,
low permeability soils such as clay, and concrete vaults).
Causes of leaking include corrosion, chemical deterioration, stress induced
cracking, biological degradation, wear, and improper installation. Modes of failure,
Table 13, include external corrosion, internal corrosion, loose fittings, rup-
ture/breakage, flex connector failure and others (such as damage prior to or during
installation).
Downloaded by [University of Bristol] at 02:45 26 February 2015

TABLE XIII
Modes of failure of underground storage systems.

Estimated percentage

Mode of leaking Steel tanks Fiberglass tanks Piping

Ext. corrisionjdecomp. 75% 5% 55%


lot. corrosion/decamp. 15% 5% 10%
Loose fittings 1% 20% 10%
Rupture/breakage 2% 20"1. 15%
Other (inc. instal. damage) 7% 50% 10%

100% 100"1. '100%

Factors affecting containment system failure vary with the type of tanks and
piping, and, to some extent, with the type of material stored and the volume and
frequency of usage. Currently, the most common are steel tank systems, for which
the dominate failure mode is external corrosion. Important factors include soil
resistivity, pH, moisture, tank wall thickness, and soil and tank non uniformities.
Also important are tank age, circuits to dissimilar materials and presence of
nearby electrical potentials (e.g., DC machinery, high voltage lines, etc.). With
regard to fiberglass tanks, as another example, the relative importance of factors is
different, with more emphasis on such items as backfill and ground water level
changes that can affect structural support.
For purposes of analyzing leakage of underground tanks, a total of 73
independent variables were defined. Included were characteristics of the tank (age,
size, type, protection, etc.), tank installation (depth', backfill, tank field layout, etc.),
tank usage (material, thruput, etc.), and environmental factors (soil characteristics,
water table, nearby high voltages, etc.). In one study, 52 506 tanks were randomly
divided 253:253 into training and test samples. The error in using the training
based model to predict the test results was found to be a minimum at w = 20. In a
second study;" 1,340 tanks were randomly divided 670:670 into training and test
samples. In this case, the error was a minimum at w=60.
In both studies, entropy minimax patterns were found in the training data and
264 R. CHRISTENSEN

checked against the results in the test data. In the first study, 5 patterns were
found in the training data. Applied to the 253 tank test data, the 2 x 5 contingency
table was found to have X2=19.99. For 5.degrees offreedom «2-1)(5-1)=4 for the
row and column summation constraints plus I for weight normalization setting),
this result has a statistical significance of 99.8% against chance. Five patterns were
also found in the training data in the second study. The chi-squared for the 670-
tank test data in this case was X2 = 19.19, almost the same as in the first study.
Again the statistical significance is 99.8% against the null hypothesis.
Table XIV lists examples of some of the high and low risk indicators found in
the entropy minimax analyses. In some cases, the converse of a low risk indicator
is a high risk indicator, and vice-versa, e.g., soil moisture, soil pH and tank wall
thickness. In other cases this is not necessarily so, e.g., soil resistivity, tank age,
backfill, and corrosion indices. The failure of "converse" reasoning with respect to
these factors is indicative both of the nonlinearity of the DVflV relationships and
Downloaded by [University of Bristol] at 02:45 26 February 2015

of the intrinsically multidimensional character of the IV factors. The DV depends


upon interactions between IVs and not just on a union (additive or otherwise) of
separate IVs. For example, under typical conditions, low age is a low risk
indicator, but high age is not necessarily a high risk indicator. Also note that the
entropy minimax thresholds for soil pH do not occur at the chemically neutral
value of 7.0. A chemically neutral pH is not risk neutral. The soil must be
somewhat alkaline (pH> 8.3) to enter the low risk regime.

TABLE XIV
Examples of high and low risk indicators for underground tank leakage
(populations of typical underground steel tanks: 200-20,000 gal, and new 10
30 yrs old).

Examples oj indicarors oj high risk


High soil moisture at tank midpoint (> 20%)
High soil moisture at tank bottom (> 13%)
High groundwater level (at or above tank bottom)
Low soil pH ( < 7.5)
High soil aggressiveness index (> 10 high; > 14 very high)
(a compound of soil resistivity, moisture and chemical factors'")
Backfill with clay and/or rubble
Thin tank wall ( < 3.4 mm)
. Examples of indicators oj low risk
High soil resistivity] (> I3,000 ohm-em)
Low soil aggressiveness index ( < 5)
Low soil conductivity] ( < 50 micro-mho/em)
Low soil moisture ( < 5%)
High soil pH (>8.3)
Low age (low: < 13 yrs; very low: < 6 yrs)
Moderate or thick tank wall (>4.5mm)
Low localized corrosion failure index
(a compound of age, tank and soil factors)
Low tank usage internal corrosion failure index
(a compound of age, tank and usage factors)

tNolc; Soil resistivity and conductivity, as defined here, are not direct inverses of each other since they are measured at different
concentrations (resistivity@saturation. conductivity@50:50 distilled H 20 mixture).
ENTROPY MINIMAX MODELING 265

C. Other Engineering Applications


1. Entropy minimax hazard axes
Hazard analyses study probabilities of failure as a function of time t or of a more
general "hazard axis" x. The entropy minimax criterion has been suggested as one
way of constructing a hazard axis in terms of a number of factors which may
affect failure probabilties.v" The hazard formalism assumes failure to be an
irreversible process, so that the cumulative probability of being in a failed state
monotonically nondecreases with time. There are many ways of complying with
this assumption. A simple example is to construct the hazard axis as a linear
combination, with positive coefficients, of factors which are individually
monotonic.
The hazard axis formalism not only allows one to model irreversible processes
with incomplete information, it also enables one to estimate the time of failure for
the individual samples, in a dataset for which failure status is known only at a
single point in time.P" This reconstruction of individual failure times is accom-
Downloaded by [University of Bristol] at 02:45 26 February 2015

plished by an iterative procedure of self-consistent ensemble statistics, based on the


distribution of the sample points along the hazard axis/"
Applied to data on failure of nuclear fuel elements, the hazard axis formalism
was found to be useful for developing backup models in case of failure to match a
multivariate pattern due to missing data. However, because the hazard axis is
defined as a fixed combination of factors, it shares with regression techniques the
limitation of being unable to represent widely varying processes in different regions
of feature space, and thus was found to have poorer predictive performance on
real-world data than entropy minimax pattern models.t" Statistical significance of
results was assessed by comparing to distributions generated by randomly
permutating the DV values among the events, thus preserving the actual structure
of the multidimensional IVs. 6 0

2. Step sizing in numerical solution of differential equations


Entropy minimax has been suggested as an approach to the problem of step sizing
in large-scale simulation codes.? Run times, hence computer costs, for codes which
solve sets of differential equations numerically, are a function of step size (e.g., !!'t
in time-dependent modeling). Greater efficiency is attained by using larger steps,
but there is consequent degradation of accuracy. The degradation can be so severe
as to introduce anomolous instabilities. In large-scale complex codes, the accuracy
vs. step size relationship is not fixed. It can vary by orders of magnitude with
variations in numerous parameters and variables. Furthermore, numerical esti-
mation of the relationship generally faces the very same step sizing problem. Thus,
even for a deterministic set of differential equations, the finitude of computer
speeds makes step sizing for complex codes a problem with incomplete
information.
The database for building an entropy minimax step sizing algorithm is a library
of prior runs of the code. The dependent variable is an assessment, at various step
sizes, of the output accuracy (estimation by "benchmark" computations using
sufficiently small steps). The independent variables consist of parameters and
variables available to the code at the start of computation for the step in question.
The improved efficiency derives from the fact that testing pattern matches requires
much less computation than a sequence of cycles through the entire code at small
step sizes.
266 R. CHRISTENSEN

IV. MEDICINE/BIOLOGY

A. Medicine
Two general classes of problems in medieine to which entropy minimax has been
applied are diagnosis and prognosis. The prototypical problem in diagnosis is that
of ditTerentiating between the broad category regarded as "normal" and various
more or less specifically defined disease states. The independent variables consist of
signs, symptoms and test results, coupled with information on family background,
sociodemographic factors, diet, work/living activities, and personal and medical
history.
ln those cases in which the disease is defined in terms of specific factors and the
status of each factor is known, the 'problem is identification rather than prediction.
A diagnostic prediction problem arises when there is incomplete information. In
such cases, one or more of the defining factors are unknown or uncertain. For
Downloaded by [University of Bristol] at 02:45 26 February 2015

example, the disease definition may depend upon examinations requiring surgery.
A diagnostic prediction problem also arises when one wishes to estimate the
likelihood of disease at some future time.
Prognosis problems arise with respect to the future course of a disease. Will it
progress to a more severe stage? Will it remit to a more benign state? Will the
patient experience complete or partial recovery? Will there be a relapse after
partial recovery, or a recurrence after complete recovery? What is the probability
of survival and of recovery as a function of time into the future?
Prognosis tends to be more difficult than diagnosis. Prognosis has an explicit
time dimension. It involves transition from one stage to another of the disease,
and, not infrequently, the interaction of multiple diseases. Furthermore, the
analysis of the data involves an essential complication, namely the censoring
problem. Left-censoring arises when a patient is observed whose condition has
changed but it is not known when the change occurred. Right-censoring occurs
when one loses track of a patient. In both cases, there is data incompleteness
which can bias models of time-dependent processes (e.g., survival, recovery, etc.) if
one does not' properly account for the censoring.

I. Prognosis in coronary artery disease


Coronary artery disease (CAD), generally due to subintimal deposition of
atheromas (accumulations, initiated by injury, hypertension, hypoxia or other
factors, of fibrous tissue, cholesterol and cellular debris) in the large and medium-
sized arteries serving the heart, may abruptly obstruct essential blood flow. Its
major complications are angina pectoris, myocardial infarction and sudden cardiac
death. It accounts for over 35% of deaths of males aged 35-50 in the U.S.
Follow-up studies, ranging from 5 to 25 years, of CAD patients by several
groups have resulted in the identification of a number of variables statistically
related to survival rates. I 16 These include age, sex, history of myocardial infarction
and heart failure, ECG abnormalities, exercise stress test results, and cardiac
catheterization variables.
1n one of these studies, data on 89 variables describing characteristics of over
two thousand CAD patients were collected and analyzed by McNeer et al. t B2 at
Duke University Medical Center using logical tree categorization and multiple
logistic regression.i":' These data were further analyzed by Harris et al. 1 2 0 using
ENTROPY MINIMAX MODELING 267

Cox proportional hazards regression." [ All of these analyses involved model


building based on the entire dataset, with no validation on independent data.
Using almost the same set of variables as Harris et ai., the CAD data (provided by
F. E. Harrell) were analyzed by Reichert and Christcnserr'?" with entropy
minimax. The entropy minimax model was built on a randomly selected half and
validated on the other half. Subsequently, the model building/verification protocol
was applied by Harrell et ai.[[ 8 to these data using stepwise Cox regression, and
incomplete principal components Cox regression.
For the entropy minimax modeling, a total sample of 2,436 patients was
randomly divided into model building and verification portions, stratified on four
of the independent variables (NYHA class of congestive heart failure (CHF),
presence or absence of left main coronary artery occlusion, left ventricular
contraction normality or abnormality, and number of occluded coronary vessels).
All patients in the sample were either alive at the end of the follow-up period or
had died of cardiovascular causes. One thousand two hundred and thirteen
Downloaded by [University of Bristol] at 02:45 26 February 2015

patients were assigned to the model building subsample. Data on the remaining
1,223 patients were withheld for verification after model building.
For analysis purposes, 2 dependent and 62 independent variables were defined.
Because of the length and thoroughness of the follow-up, it was possible to handle
the censoring problem to a reasonable approximation by simply specifying short
enough periods for the dependent variables. The first dependent variable was
defined as survival status two years after catheterization. Of the 1,213 building and
1,223 verification patients, 1,003 and 996, respectively, were tracked at least two
years. For each group, the two-year survival rate was 85%.
The second dependent variable was defined as survival status after a period of
time following catheterization equal to 20% of the patient's expected remaining
lifetime (ERL) based on age alone. For patients 30, 40, 50, 60 and 70 years old, for
example, this is roughly 9, 7, 6, 4 and 2 years, respectively. Except for ages over 71
years, the 20%-ERL criterion is a longer survival period than the fixed 2-year
criterion. Although the 2-year period is more commonly used, the ERL criterion
has the advantage of controlling for the single most dominant general factor
affecting survival: age. The 20% figure was selected by balancing length of survival
against sample size reduction due to follow-up limitations. The numbers tracked at
least 20% of their ERL were SIS building and 524 verification cases. The survival
rates were 59% and 62%, respectively.
Preliminary analyses were conducted on random subsets of the model building
datasets (randomly selected half of 1,003 patients for 2-year survival and of SIS for
20%-ERL survival). Based on these analyses, the weight normalization was set at
w = 60 for 2-year survival, and at w = 30 for 20%-ERL survival. Interestingly, the
data revealed greater instability with respect to the shorter time period. This is
analogous to the greater amount of noise in daily fluctuations than in monthly
means in meteorology. In both cases, there are unknown factors outside the
specified IVs which can affect the DV. In the case of short period DVs, single
unknown factors can affect results significantly. Over longer periods, it is more
likely for several unknown factors to come into play and introduce a certain
amount of mutual compensation, smoothing out results to a degree.
With the normalizations set at these values, the full model building samples
were processed for entropy minimax patterns. S-sequence patterns were defined as
those which minimize the conditional entropy, S(survival statusjfeatures), D-
sequence patterns were defined as those which maximize the entropy exchange,
268 R. CHRISTENSEN

~S(survival status, features). Ten feature-conditioned patterns plus one residual


pattern were found in each sequence, for a total of 44 patterns. An example of a
good prognosis pattern (96% chance of 2-year survival) for CAD is the following:
• Two or fewer vessels with significant occlusion, and
• No history of myocardial infarction, and
• No ST- T wave changes, and
• No diagnostic Q wave, and
• No significant abnormality on the left main cineangiogram.
An example of a poor prognosis pattern (27% chance of 20%-ERL survival) is:
• Inferior and anterior asynergy on LV contraction, and
• Ejection fraction less than 46%.
The full set of 44 pattern definitions and associated survival probabilities and
Downloaded by [University of Bristol] at 02:45 26 February 2015

uncertainties is given in Reichert and Christensen (pp. 472-473).207


Although the majority of the 62 features are used in some way in one or more
of the 44 patterns, there are some features which are particularly important
overall. Identified by low entropy cuts as key indicators of good survival prognosis
are normal left ventricular (LV) contraction, absence of ST-T waves changes,
normal RCA, LCA and LAD summaries, and having fewer than' three vessels with
significant occlusion. Key indicators of poor' survival prognosis are diffusely
abnormal LV contraction, high (> 22 mm Hg) LV end diastolic pressure, severe
congestive heat failure (CHF Class III or IV), cardiomegaly, asynergy (anterior,
apical or interior) on LV contraction, significant abnormality on the left main
cineangiogram, ventricular gallop and intraventricular conduction disturbance. All
of these are clinically reasonable, although unanalyzed clinical experience would
not necessarily enable one to separate these indicators from numerous other
indicators of apparently equivalent reasonableness which do not have a statisti-
cally significant relation to survival in the data.
Table XV shows the predictive performance of the '5- and D-sequence amalga-
mated entropy minimax patterns as predictors for survival in the verification data

TABLE XV
Predictive performance for survival of 524 patients with
coronary artery disease, assessed on categorical predictions of
whether or not survive at least 20% of ERL (expected
remaining lifetime based on age).

Sample average
Percentage correct 61%
Regression model
Percentage correct 61%
Statistical significance neg
Variance explained 4%
Skill 0%
Entropy minimax patterns
Percentage correct 71%
Statistical significance >99%
Variance explained (8 categories) 19%
Skill 26%
ENTROPY MINIMAX MODELING 269

which were withheld from the model building process. For purposes of the
assessment shown in the table, the survival probabilities were converted to
categorical "live vs. die" predictions, depending upon whether P,u,v;v. ~ 50% or
P,u<V;v. < 50%. The DV for Table XV is 20%-ERL. (Categorical assessment is not used
for the 2-year DV since almost all of the individual pattern probabilities exceed 0.5, the
sample average being 85%, and thus the categorized pattern predictor would be
equivalent to the categorized sample average, i.e., always predicting "live.")
The entropy minimax 20%-ERL survival predictions are correct for 71% of the
verification sample of 524 patients, compared to 61% for the sample average
predictor. This is statistically significant at the 0.01 level and represents a skill of
26%. By comparison, the regression model (using the ratio of survival time to 20%-
ERL as the DV and predicting "survive" if the ratio is at least 1.0) has the same
performance as the sample average, its low definitiveness arising from shrinkage
due to several iarge squared deviations. If the categorization threshold for the
regression model is shifted from 1.0 to 0.8, then it is correct in 69% of the cases,
Downloaded by [University of Bristol] at 02:45 26 February 2015

representing a skill of 21%. (The variance explained is still 4%.) This is the
percentage correct for the best entropy minimax threshold predictor ("survive" if
the ejection fraction is at least 0.46).
When the predictive performance of the entropy minimax predictions was
assessed on an individual pattern basis, the patterns in the first half of each
pattern sequence were found to perform better than those in the second half.
Examination of the details of the definitions revealed that this difference is
associated with the presence of the second half of patterns with union-type logic
involving some combinations of feature conditions that may have been picking up
chance correlations in the training data. Further research is in process involving
the idea of restricting union-type pattern searches to definitions for which each
individual feature condition is separately significant, independent of the infor-
mational significance of the pattern as a whole.

2. Prognosis in non-Hodgkin's lymphoma


Non-Hodgkin's lymphoma (NHL) is a heterogeneous group of cancers of the
reticuloendothelial and lymphatic systems. NHL is distinguished from Hodgkin's
disease and mycosis fungoides by histologic study of neoplastic cell characteristics
in excised lymph node, capsule and adjacent fat tissue. It is the most frequent of
the lymphomas, with an estimated 24,000 new cases in 1983 and 12,000 deaths."?
The incidence rises with age, is equally likely among males and females, and
accounts for about 2.8% of all new cancers. The cause is unknown, although there
is evidence for a viral etiology.
In the US, non-Hodgkin's lymphomas have been usually, until very recently,
staged using the Rappaport classification system or a modification. Based primarily
on lymph node architecture, two overall categories are defined, "favorable histologies"
and "unfavorable histologies," with subcategories defined within each of these
depending upon neoplastic cell type. Untreated unfavorable histology NHL is
generally aggressive and rapidly fatal, except somewhat paradoxically, that a sig-
nificant fraction of these patients enjoy long disease-free periods and even cures.
The natural course of favorable histology is typically long and indolent, with
multiple recurrences; however, it is ultimately fatal to nearly all affiicted patients.
Treatment is chemotherapy for all except localized disease, where radiotherapy
is employed. Overall, the 2-year survival rate for favorable histology disease is
about 75%. Two-year survival for unfavorable histology disease has historically

GEN. SYS.-E
270 R. CHRISTENSEN

approximated 50%, but recent improvements in therapy have raised this rate to
about that for favorable histology disease.
Data on 328 NHL. patients, collected from 20 different institutions in a clinical
trial conducted by the Southeastern Cancer Study Group.?" were analyzed for
entropy minimax patterns. 2 0 9 , 2 10 The total sample was randomly divided 218:110
into model building and verification samples, stratified on sex (M, F), age
«60, ~60), hemoglobin « 12g, ~ l2g) and histology (favorable, unfavorable).
The dependent variable was defined as survival time after completion of treatment
(either of two multi-agent chemotherapeutic regimes, COP or BCOP). The
independent variables consisted of 26 features characterizing demographic factors
(age, sex, etc.), health background (obesity, diabetes, etc.), signs and symptoms
(fever, night sweats, weight loss, Karnofsky's index of physical performance, etc.),
tests (hemoglobin level, white blood count, etc.), stage of disease (Rappaport stage,
nodal involvement, marrow involvement, etc) and treatment (chemothetapy, radio-
Downloaded by [University of Bristol] at 02:45 26 February 2015

therapy, etc.).
Based on a preliminary analysis of random subsamples of the 218 patient model
building sample, the weight normalization was set at w = 50. Pattern searches were
cond ucted within the entire 218 patient model building sample and also in several
subsamples separately. These subsamples included favorable histology, unfavorable
histology, responders (partial + complete) and complete responders. In all five of
these searches, a good prognosis pattern was found with the following definition:

Good prognosis pattern CD


• Type "A" symptoms (no night sweats, no fever > 38.4°C, and no weight loss
~ 10% in last 6 months), and
• Karnofsky status ~ 80%, and
• Normal serum glutamic oxaloacetic transaminase (SGOT ~ 35 U/L).
In 4 of the 5 searches (i.e., all except the favorable histology search), a poor
prognosis pattern was found with the definition:

Poor prognosis pattern @


• Unfavorable histology, and either
• Karnofsky status < 70%, or
• Night sweats.
The remaining patients have intermediate prognosis:

Intermediate prognosis pattern 0


• Remainder.
Figure 4 shows the Kaplan-Meier (K-M) plots of survival curves for favorable
and unfavorable histology patients in the model building samples. (Use of K-M
plots is a way to account for censorship of the data. 1 4 8 They are derived from
maximum likelihood estimation of the conditional survival probabilities for
adjacent observation intervals, which is equivalent to the zero interval limit
obtained by assuming uniform distribution of time of death over each interval. 14S)
Figures 5 and 6 show K-M survival curves for the subsamples matching each of
the patterns. (The number matching each pattern is shown in parentheses.) Note
that the survival differences between good and intermediate, and between inter-
mediate and poor prognosis are roughly equal to those between favorable and
ENTROPY MINIMAX MODELING 271

NHL-349 MODEL BUILDING SAMPLE


1.0

.90
....I
« .80
>
FAVORABLE
> .70
HISTOLOGY (87)
'"
::::>
V>
.60
u,
0
>- .50
>-
....I .40
'"-c . 30
'"cc
0
c,
UNFAVORABLE
HISTOLOGY (131)
.20
.10
Downloaded by [University of Bristol] at 02:45 26 February 2015

0.0
10 20 30 40 50 60 70 80
MONTHS
FIGURE 4 Favorable and unfavorable histology survival curves for NHL-349 model building
sample. Median survival of 87 favorable histology patients was 62 months, and of 131 unfavorable
histology patients was 18 months.

unfavorable histologies, while the differences between good and poor prognosis are
considerably greater.
The predicted survival curves are also given in Figures 5 and 6. The continuous
dependent variable, survival time, I, was analyzed in a Gaussian potential function
representation. The analysis was conducted at a resolution of 6.1 = 6 months. Each
predicted curve corresponds to a particular pattern. Each is a smooth fit to points
determined by the means of intervals along the DV and the associated probabilities.
Note the shrinkage effect of the weight normalization. .
Each of the 110 patients in the model verification sample was assigned to one of
the 5 predicted survival curves based on the pattern matched. These curves were
then compared to K-M plots of the actual survival curves for each of the 5
subgroups of verification patients, and the predicted and observed curves were
found to be quite close. These results are shown in Figures 7 and 8.
In addition to the verification on the reserved 110 patients from the original
clinical trial, NHL-349, a new sample subsequently became available from a
second clinical trial, NHL-317. 9 8 NHL-317 contained 270 unfavorable histology
patients, of whom 131 were treated with BCOP (the remaining 139 were treated
with CHOP and are thus, assuming treatment to be relevant to prognosis, not
covered by the patterns trained on NHL-349 COP and BCOP patients). Figure 9
shows the predicted vs. observed survival curves for these 131 patients from NHL-
317. Again the comparison is quite close. The entropy minimax patterns were
correct in 78% of the cases, representing a skill of 26% compared to a 62% figure
for the sample average predictor.
The curves in Figures 7-9 show the predictive performance for the groups of
patients matching each pattern. The results may also be assessed on an individual
patient basis by converting the probabilities to categorical 2-year survival predic-
tions depending upon whether P;;;;0.5 or P<0.5 (at 1=24 months). Table XVI
272 R. CHRISTENSEN

FAVORABLE HISTOLOGY
NHL-349 MODEL BUILDING SAMPLE

...J
« I Observed (41)
>
Pr-ed i c tor-
>
<>:
:::>
V>
u.
o Intermediate Pro~nosis
>-
.... Dbse rved (40)

...J
Predictor

'"o
«
'"a.
Downloaded by [University of Bristol] at 02:45 26 February 2015

<>:
.2

.1ll
o. 0 _J-_:":-----,:':-----,L-.---::"':----:"=----=-'::_-::'.
10 20 30 40 50 60 70 80
MONTHS
FIGURE 5 Survival curves for 81 favorable histology patients in model building sample matching
good and intermediate prognosis patterns. (Data needed to determine pattern match were missing on 6
favorable histology patients in this sample.)

UNFAVORABLE HISTOLOGY
NHL-349 MODEL BUILDING SAMPLE
1.0

.90

<l. .80
>
> .70
<>:
:::> Observed (29)
V>
.60 Predictor
u.
o
....>-
...J .40

:i'" .30
o
<>:
a. .20
'-..._--------""-<o::::::----c»
-------'0 Predictor
Obse rved (52)

O. 0 L---'_--'_--'_---'_---'_---'_---'_-'
10 20 30 40 50 60 70 80
MONTHS

FIGURE 6 Survival curves for 130 unfavorable histology patients in model building sample matching
each of the three entropy minimax patterns. (Data missing on I unfavorable histology patient.)
ENTROPY MINIMAX MODELING 273

FAVORABLE HISTOLOGY
NHL-349 VERIFICATION SAMPLE

-'
«
>
> Good Pr cancs Ls
'"
:::l
<I)
Predicted
___-------cv Observed (19)
U.
o
>-
>-
-'
'"o« Predicted
Observed (20)
'"sx
Downloaded by [University of Bristol] at 02:45 26 February 2015

Q.

10 20 30 40
MONTHS
FIGURE 7 Comparison of predicted and observed survival curves for N = 39 favorable histology
verification patients in NHL-349. (Data missing on 4 favorable histology patients.)

UNFAVORABLE HISTOLOGY
NHL-349 VERIFICATION SAMPLE

-'
«
>
>
'"
:::l
<I)
Predicted
U. Observed (14)
o
>-
>- Intermediate Pro.E!nosis
-'
Pre ci c te d
'"o
« Observed (22)
'"ce
Q.

Predicted
Observed (26)

o. 0'L-----''----'-_---'-_--'--_--'--_..l.-_-''------'
10 20 30 40 50 60 70 80
MONTHS
FIGURE 8 Comparison of predicted and observed survival curves for N =62 unfavorable histology
verification patients in NHL-349. (Data missing on 5 unfavorable histology patients.)
274 R. CHRISTENSEN

UNFAVORABLE HISTOLOGY
NHL-317 (BCOP ARM)

--l
«
>
>
cc
:> _____-------(fi1 Predicted
<J>
1 Observed (46)
LL
o
>-
f- Pr cencs i s
Downloaded by [University of Bristol] at 02:45 26 February 2015

...J
co
---~<::~:::::;~==:~
..
Predicted
Observed (41)
«
co
o
'"
0.. Pr-cenos i s
Predicted
- - - - - < : l l Observed (44)

o. 0 '--------'_---'_---'_----'_----'_----'_----'_-l
10 20 30 40 50 60
~IONTHS

FIGURE 9 Comparison of predicted and observed survival curves for N = 131 NHL-317 BCOP
patients matching good, intermediate and poor prognosis patterns.

TABLE XVI
Predictive performance for 2-year survival of patients with non-Hodgkin's lymphoma.

NHL·349

(All 110 (57 matching


patients) no. I or no. 3)

Sample average
Percentage correct 54% 53%
Histology (Rappaport: favorable/unfavorable)
Percentage correct 65% 70%
Statistical significance 99% 99%
Variance explained (categorical) 2% 8%
Skill 25% 37%
Entropy minimax patterns
Percentage correct 69% 77%
Statistical significance >99% >99%
Variance explained (categorical) 11% 22%
Skill 31% 52%
ENTROPY MINIMAX MODELING 275

shows how the entropy rrurumax patterns compare on this basis to the model
building sample average and a Rappaport histology predictor. The accuracies of
both the Rappaport and the entropy minimax predictors are significant at the 0.01
level. The entropy minimax patterns add about 5 percentage points in predictive
accuracy, and 10 points in skill, to the Rappaport histology for individual patients.
For the 57 verification patients matching an extreme pattern (no. I or no. 3), the
performance of both the histology and the entropy minimax predictors improves
significantly. The histology predictor was correct in 70% of the cases and the entropy
minimax predictor was correct in 77%.
More recently, another study"!" was completed comparing the predictive
performance of various modeling methodologies on non-Hodgkin's lymphoma data.
The dependent variable used was whether or not there was complete response
(CR) to treatment, defined as absence of disease after six cycles of chemotherapy.
The methodologies compared were: stepwise variable selection using the SAS
LOG1ST procedure.I!" a pre-specified "sickness score," multiple logistic regres-
Downloaded by [University of Bristol] at 02:45 26 February 2015

sion 243 applied to incomplete principle components, recursive partitioning.i"


and entropy minimax. The training dataset consisted of 334 patients in NHL-349.
Tested on independent data, 116 patients NHL-317, the methods with highest
predictive discrimination were entropy minimax and incomplete principal com-
ponents logistic regression.l "? Recursive partitioning had least predictive dis-
crimination; the sickness score and the SAS LOG1ST procedure had intermediate
results.

3. Diagnosis, diseases of the cervical spine


Cervical spondylosis, multiple sclerosis and amyotrophic lateral sclerosis are three
of a large number of diseases of the cervical spine. Cervical spondylosis (CS) is a
component of degenerative arthritis, producing narrowing and lipping of the
intervertebral spaces and spurs that often compress nerve roots. Multiple sclerosis
(MS) is a slowly progressive central nervous system disease characterized by
disseminated patches of demyelination in the brain and spinal cord. Amyotropic
lateral sclerosis (ALS) is a motor neuron disease characterized by progressive
degeneration of corticospinal tracts. Firm differential diagnosis of CS, MS or ALS
can rarely be made in early stages, and they may remain as unresolved possibilities
for some time together with syringomyelia, herniated disk, spinal cord tumor,
pernicious anemia and other disorders. 184. 245
Initial analyses of cervical spine disease data by entropy minimax used a sample
of 85 patients diagnosed as having one of the three: CS, MS or ALS.74.206.211
Twenty-five features were extracted from the medical record of each patient by
personnel at four Pittsburgh area hospitals. Predictability of the patterns found
was poor for ALS and only fair for CS and MS. On review of these results, it was
noted that nearly all of the patients in the training set with ALS came from the
files of a single physician whose criteria for diagnosis differed markedly from those
employed by others. A second sample was collected. The list of clinical feature
definitions was revised, the criteria for patient selection were tightened and
procedures were implemented to better ensure uniformity in data collection and
wider sampling. The new list of features contained 31 variables. The new sample,
containing 114 patients, was assembled and analyzed for entropy minimax
patterns.P? .
A randomized 70%:30% split was used to subdivide the data into 76 model
building patients and 38 verification patients. The model building outcome
276 R. CHRISTENSEN

distribution was 34 ALS, 22 MS and 20 CS. The verification outcome distribution


was 16ALS, 12MS and lOCS. Weight normalizations of w=3, 6, 20 and 70 were
tested and the value w=6 was selected. Three patterns were found in the model
building sample, corresponding to the highest probability being assigned to each
of the three possible outcomes, CS, MS and ALS. Table XVII gives the definitions
of these patterns and their results on the 38 verification patients (assessed on the
basis of a categorical prediction corresponding to the highest probability). Sixty-
eight of the 76 model building patients matched one of the three patterns and were
classified exactly according to disease (29 ALS, 20 MS, 19CS). Thus, with w= 6, the
first pattern predicts ALS with probability P=(29 + 6(34/76»/(29 + 6) =0.91. The
stability of the results was tested by repeating the entire analysis using a second
randomized split. The pattern definitions and results on the second verification set
are given in Table XVIII.
Downloaded by [University of Bristol] at 02:45 26 February 2015

TABLE XVII
Predictive results on verification sample of patterns found in model building sample,
using first building/verification split.

Actual disease
Percentage
Pattern CS MS ALS correct

I. ALS 0 12 92%
Diplopia absent, and
Fasciculations present
2. MS 0 II 0 100%
Not ALS pattern, and either
Nystagmus present, or
Spinal films normal
3. CS 8 0 89%
Not ALS or MS patterns, and
No hyper-reflexia, and
Normal gait.
4. No pattern malch 3

Tolal 10 12 16

The two pattern sets are very similar, and the feature conditions used to define
the patterns are clinically reasonable. Both sets validate with a high predictive
accuracy. The high accuracy is consistent with low value of the optimal weight
normalization, w < 10. The first set would appear to be somewhat the better
because it classifies a greater fraction of the patients (89%) and suggests the
possibility of tentatively classifying non-matches as ALS (although a larger sample
size would be necessary to attribute statistical significance to this).
Table XIX gives the overall predictive performance statistics for the two pattern
sets based on a categorical prediction of the highest probability disease. Thus, the
sample average always predicts ALS, which comprises 42% and 45% of the
samples.
ENTROPY MINIMAX MODELING 277

TABLE XVIII
Predictive results on verification sample of patterns found in model building sample, using
second building/verification split.

Actual disease
Percentage
Pattern CS MS ALS correct

1. ALS o o 13 too%
Diplopia absent, and
Fasciculations present, and
Tone decreased,
2. MS o II o 100%
Not ALS pattern, and either
Nystagmus present, or
Spinal films normal
3. CS 6 o 86%
Downloaded by [University of Bristol] at 02:45 26 February 2015

Not ALS or MS patterns, and


No cranial nerve abnormalities, and
No hyper-reflexia, and
No respiratory difficulty or discomfort
4. No pattern match 3 3

Total 9 12 17

TABLE XIX
Predictive performance for differential diagnosis of three diseases of the cervical spine.

First sample Second sample

Sample average
Percentage correct 42% 45%
Entropy minimax patterns
Percentage correct 94% 97%
Statistical significance >99% >99%
Skill 90% 94%

4. Differential diagnosis of heart diseases using ECG waveforms


Electrocardiogram (ECG) interpretation is one of a number of procedures used in
cardiovascular clinical diagnosis, along with patient history, physical signs, chest
X-ray, and specialized laboratory tests. Generally, a diagnosis would not be made
on the basis of ECG alone. The accuracy of diagnosis purely by human ECG
interpretation is low and, in fact, a physician in interpreting at a later time an
ECG he or she has previously classified may very well disagree with his or her
own previous diagnosis.P
A considerable amount of research has been directed at the task of bringing
objectivity to ECG interpretation in the form of computerized recognition
systerns.?" Closely associated with the recognition problem is the problem of data
compression, as the amount of data accumulated in on-line monitoring is
enormous. One approach is to store the coefficients of Karhunen-Loeve (K-Ii)

. GEN. SYS.-F
278 R. CHRISTENSEN

basis functions.66.-160 There are numerous problems associated with establishing


the basis functions, filtering the data, and meeting medical and legal criteria for
reconstructability. However, the idea of characterizing the waveforms in terms of
their K-L coefficients prompted a study of the possibility of using these coeffi-
cients as the independent variables in an entropy minimax analysis.
ECG data, sampled at 250 Hz, were analyzed for 192 patients, each of whom
had been classified as normal or as one of several types of abnormality. The data
were supplied by H. V. Pipberger;!"? who had conducted early work in automated
electrocardiogram diagnosis.196.197 In an entropy minimax analysis of these
data,129.130 the dependent variable was taken to be a four-class variable: AMI
(anterior myocardial infarct), LMI (lateral myocardial infarct), LVH (left ven-
tricular hypertrophy) and RVH (right ventricular hypertrophy). Applying K-L
expansion techniques to each waveform in the Frank 3-lead system (U, V, W), 19
features were extracted (expansion coefficients, reconstruction errors, rms energy
Downloaded by [University of Bristol] at 02:45 26 February 2015

and time scale factor). A random half of the patients were used as the training
sample, the other half as the test sample.
Four distinct patterns were found, one for each disease class. [The first pattern
(with LMI the dominant class) was: U rms error between 0.13 and 0.35, Vrms
energy between 0.90 and 2.69, and 2nd K-L coefficient of V between -0.12 and
0.73. The fuJI set is given in Hirschrnan.P"] Table XX shows, for each pattern and
each classification, the probability based on the training sample and the frequency
observed in the test sample. The weight normalization was permitted to vary with
the pattern. Its values were w = 56.0, 30.4, 29.2 and 58.4, for the four patterns
respectively.

TABLE XX
Predictive performance of ECG patterns when applied to the test sample
(probabilities based on training sample and frequency ratios observed in test
sample).

Pipberger's classification
Number test
Pattern AMI LMI LVH RVH events

No. I Prob. 0.47 0.28 0.22 0.03


Freq. 0.32 0.32 0.21 0.16 38
No.2 Prob. 0.18 0.48 0.15 0.19
Freq. 0.25 0.50 0.25 0.00 8
No.3 Prob. 0.21 0.07 0.52 0.20
Freq. 0.14 0.00 0.71 0.14 7
No.4 Prob. 0.27 0.07 0.21 0.45
Freq. 0.08 0.21 0.29 0.42 38

Total pattern matches 91

Table XXI gives the 4 x 4 contingency table for the patterns. Values expected by
chance are shown in parentheses (e.g., the first entry is 38 x 18/91 = 7.5). Assuming
the general conditions under which Pearson's T is approximately chi-squared
distributed, these results give X2 = 128. With v=(4-1)(4-1) +4= 13 degrees of
freedom (4 were used in the weight setting), any X2 > 35 has a statistical
ENTROPY MINIMAX MODELING 279

TABLE XXI
Contingency table for the 91 test cases, comparing predicted classification (based on
highest probability) to observed classification (Pipberger). Figures in parentheses are
the expected values assuming statistical independence of rows and columns. Excesses
of observed counts on the main diagonal over their expected values indicate greater
than random predictive success.

Observed

AMI LMI LVH RVH Total

AMI 12 12 8 6 38
(Pat. no. I) (7.5) (10.0) (10.6) (9.6)
LMI 2 4 2 0 8
(Pat. no. 2) (1.6) (2.1) (2.3) (2.0)
Predicted
LVH I 0 5 I 7
Downloaded by [University of Bristol] at 02:45 26 February 2015

(Pat. no. 3) (1.4) (1.8) (2.0) (1.8)


RVH 3 8 11 16 38
(Pat. no. 4) (7.5) (10.0) (10.6) (9.6)
Total 18 24 26 23 91

significance exceeding 99.9%. [Note, however, that a larger sample size is needed
to validate the 4-way result because of the small numbers in the LMI and LVH
categories. The (observed RVH) vs. (predicted RVH) entry alone contributes 41 to
'1. 2 , so the AMI vs. RVH differential diagnosis is clearly significant. The principal
source of error in these patterns is LMI/LVH confounding in AMI and RVH
predictions.]

B. Biomedicine and Biology


1. Screening drugs for potential antineoplastic activity
One approach used by the National Cancer Institute in seeking new chemo-
therapeutic agents has been a drug screening hierarchy.los.l09 The process acts as
a massive funnel. At the input end is a large number of chemical compounds. At
the output are the few compounds that pass all the way through to clinical trials.
One of the screening mechanisms used is biological testing. Resource requirements
limit the number of compounds that can be assessed each year. The reduction
from biological testing to clinical trials is about 4,000: I. A pre-biological screen
employing an analytic technique such as statistical pattern recognition to help in
the selection of inputs could enrich the yield of the screening process.
Entropy minimax was applied to two groups of drugs.?? One contained 36 pre-
clinical drugs. The second contained a larger sample of 215 nucleosides. Chemical
structure, physico-chemical properties, and other features of the drugs were
analyzed for patterns in their ability to retard tumor growth in mice. The factors
under investigation included functional groupings, ring structures, carbocyclic ring
details, substituent occupancies on sugars (furanose and pyranose) and bases
(pyrimidine and purine), molecular weight, melting point, solubility (in different
media), optical rotation, and UV spectra. Because of its small sample size, the pre-
clinical group was analyzed in its entirety without crossvalidation. Crossvalidation
analyses were conducted on the group of nucleosides.
280 R. CHRISTENSEN

A single nucleoside "event" consists of an experiment with 12 mice, 6 treated


and 6 controls. The dependent variable is· the treated-to-control ratio TIC of
a verage lifetimes. Thirty-four independent variables were used (24 structural and
10 biological features). The 215 nucleoside events were randomly divided into a
114 event training sample and a 101 event test sample. A total of seven patterns
were found. The weight normalization was permitted to vary with the pattern, but
remained quite low for all patterns (it was IV = 3.4 for the first two patterns and 4,
6 and 10 for other patterns). Table XXlI gives, for each pattern, the predicted and
observed distributions in three TIC percentage categories: low (0-74), medium (75-
124) and high (125-300).

TABLE XXII
Performance of nucleoside patterns: Probabilities of TIC categories, based
on training sample, compared to frequency ratios in test sample. (6 of the
Downloaded by [University of Bristol] at 02:45 26 February 2015

101 lest sample nucleosides did not match any of the 7 pallerns.)

TIC (x 100%)
Number test
Pattern 0-74 75-124 125-300 events

No.1 Prob. 0.11 0.86 0.03


Freq. 0.27 0.73 0.00 II
No.2 Prob. 0.02 0.90 0.08
Freq. 0.00 1.00 0.00
No.3 Prob. 0.07 0.77 0.15
Freq. 0.18 0.82 0.00 22
NO.4 Prob. om 0.27 0.72
Freq. 0.00 0.45 0.55 11
No.5 Prob. 0.07 0.50 0.43
Freq. 0.09 0.61 0.30 23
No.6 Prob. 0.40 0.54 0.06
Freq. 0.50 0.50 0.00 10
No.7 Prob. 0.16 0.48 0.36
Freq. 0.29 0.53 0.18 17

Total pallern malches 95

When the results are assessed on the basis of their ability to predict low/medium
(0-124) or high (125-300) TIC categories (see Table XXIII), the result is X2 =9.0
using Yates' continuity correction. With v=(2-1)(2-1)+2=3 degrees of freedom
(2 were used in weight setting), this chi-squared has statistical significance of 97%
against the null hypothesis of chance. Thus, there is statistical evidence that the
patterns identify subsets of compounds with more definitive probabilities of anti-
tumor activity than the sample as a whole. The enrichment over expectation is a
factor of 2.2 in the high category, but only 1.1 in the low/medium category. Since
the time of the study (1975), there have been additions to the data and
enhancements in the pattern search algorithm. Also, there has been further
research on feature specification. A new analysis would be expected to provide
further enrichment.
ENTROPY MINIMAX MODELING 281

TABLE XIII
Contingency table for the 95 test cases, comparing observed to predicted
TIC category. Expected values, assuming row-column independence, are
shown in parentheses. Excesses of observed counts on the main diagonal
over their expected values indicate greater than random predictive success.

Observed

Low-Med. High Total

Low-Med. 74 10 84
(Pat. no. 1-3,5-7) (65.7) (13.2)
Predicted
High 5 6 11
(Pat. no. 4) (13.3) (2.7)
Total 79 16 95
Downloaded by [University of Bristol] at 02:45 26 February 2015

2. Classifying Fisher's Iris data


Fisher's Iris data"? consist of 150 examples of the plant genus Iris. The dependent
variable is the species (no. I = Iris Setosa, no. 2 = Iris Versicolor, no. 3 = Iris
Virginica). The dataset contains 50 examples of each species. The independent
variables are four measurements: sepal length, sepal width, petal length, and petal
width.
A random 75:75 split into training and test samples was performed subject to
the constraint of assigning 25 examples of each species to each sample. This uses
one degree of freedom with respect to each of the three values of the dependent
variable.
Four different methods of multivariate analysis were applied to the training data
and subsequently checked on the test data. Principal components.P? 3-nearest
neighbors.f" and linear discriminant analyses" were conducted by Wold. 2 5 3 The
entropy minimax analysis was conducted by Leontiades.I''" Beyond those used in
sample splitting, no additional degrees of freedom were used by any of the
methodologies with respect to the test data. In particular, the weight normaliza-
tion in the entropy minimax modeling was simply set at the minimal value of
w=3, representing the number of outcome possibilities.
All four methods performed quite well on the test data. See Table XXIV.
Entropy minimax had a slight edge over the next best method, principal
components. There are only 4 independent variables and the examples can be seen
to separate into distinct clusters with very little overlapping when plotted into
various 3-space projections of the corresponding 4-space. The primary reason for
conducting the analysis was to demonstrate the use of the minimum entropy
principle to find optimal orientations of rotated axes in multidimensional spaces.
3. Other biomedical applications oj entropy millimax
Assessing symptoms of pelvic surgery patients In addition to those described
above, there have been a number of other uses of entropy minimax related to
medicine and biology. One of the early applications in these areas' was the
problem of assessing the diagnostic information content of 54 different symptoms
of pelvic surgery patients.i? Using a sample of 1,430 patients in 12 different disease
classes, the threshold for each variable which maximized t.S( 0, C) was found,
282 R. CHRISTENSEN

TABLE XXIV
Performance on verification sample in
Iris classification problem (N = 75).

Random
Percentage correct 33%
Principal components
Percentage correct 97%
Statistical significance >99%
Skill 96%
S-Nearest neighbors
Percentage correct 93%
Statistical significance >99%
Skill 90"1.
Linear discriminants
Downloaded by [University of Bristol] at 02:45 26 February 2015

Percentage correct 92%


Statistical significance >99%
Skill 88%
Entropy minimax
Percentage correct 99%
Statistical significance >99%
Skill 98%

where ouIcome 0 is the disease class, and C is the condition of being above or
below the threshold for the specific symptom. The average information content of
each ~S-maximizing cut was 0.0528 nats. Each cut was subsequently assessed for
clinical acceptability based on general medical background. Half of the cuts found
on the basis of this one sample (and not considering ~S's for pairs or higher order
IV combinations) required no adjustment for medical acceptability. When the
other half of the cuts were adjusted to agree with medically accepted thresholds,
the average information content decreased only 0.0022 nats (or about 4% of the
original value).

Analyzing out-patient clinic use/overuse In another study, data on patients treated


at an out-patient clinic were analyzed for patterns in overuse. An "overuser" was
defined as a patient appearing more than 12 times in one year. 18 3 This being a
purely "frequency of use". definition, no implication of excessiveness of use is
necessarily implied. However, at least one study.P" using data on 189 patients and
14 independent variables, supported the stereotype characterization of frequent
users as women living alone, widowed or divorced, of low educational level and
income, and whose care is subsidized.
Entropy minimax analysis of the same data contradicted this stereotyping.i'"
Specifically, no patterns were found among "overusers." The only patterns found
permit one to characterize various subclasses of "normal" users. The unclassified
remainder is approximately 50:50 normal users and overusers. This is a multi-
variate example of a phenomenon frequently seen in entropy minimax analyses of
systems involving nonlinear relationships: outcome 0 being associated with
characteristic C does not necessarily imply that outcome not-O will be associated
with characteristic not-e.
ENTROPY MINIMAX MODELING 283

Developing screens for radiology Another area of entropy minimax application has
been in the development of screening algorithms to aid in decision-making regarding
ordering of radiological procedures. In the U.S., for example; radiological examina-
tions contribute annually many billions of dollars to health care costs. The decision of
whether or not to order an X-ray is made on the basis of immediate clinical information
coupled with the medical record, family history and other factors (including medical-
legal precaution): When subgroups of patients who would otherwise have had X-ray
examination are identified for which the risks do not justify expenditure of these
resources, then there is an opportunity for more effective health care resources
allocation and improved patient management. 1S. 199 The socioeconomic utility of such
screening algorithms has been a primary motivation of the development of entropy
minimax patterns to recognize low risk patients with respect to skull fracture
radiology,200.221.222 radionuclide brain scanning,104,lOS nuclide uptake and thyroid
scan interpretation.l'" and lung scan analysis.I'" In each case, one or more patterns
were found which enabledthe identification of subgroups of patients for which the risks
Downloaded by [University of Bristol] at 02:45 26 February 2015

were insufficient to warrant radiological examination. Additionally, the importance of


the weight normalization was set without cross validation at the arbitrarily low value of
w = I and the pattern errors exceeded those of logistic regression (although the results
are not strictly comparable since the regression analysis included additional
information representing intuitive knowledgej.P"

Analyzing protein structure/function patterns Entropy rmmmax has also been


applied to the problem of seeking patterns in DNA coding of proteins associated
with their 3-dimensional "secondary" structure. The protein adenyl kinase (the
smallest of the known phosphoryl transferases) consists of a sequence of 194
residues (each being the site of one of 20 amino acids). In a normal biological
environment (specified by the medium, temperature, etc.), proteins wrap themselves
into 3-dimensional configurations which facilitate their ability to perform their
biological functions. Presumably, stable or meta-stable configurations could be
determined by a comprehensive quantum-mechanical analysis of the energy states
ef the molecule as a function of the relative orientation and spacings of the
residues. However, the magnitude of the energy minimization problem has made
human heuristics and computerized pattern recognition approaches more feasible.
Important components of a protein's 3-D configuration are a-helices, p-pleated
sheets and various types of bends. Schemes have been devised by a number of
researchers to predict, from the code sequence, which residues will (under normal
environmental conditions) be in a-helix formations, which on p-pleated sheets, etc.
A number of these prediction methods, using techniques ranging from molecular
physics to Markovian statistics to pragmatic heuristics, were applied to adenyl
kinase (the code for which was unknown at the time the schemes were developed)
and the results were assessed on a comparative basis. 22 3 Subsequently, an entropy
minimax model was developed and applies to this molecule/"
The training data for the entropy minimax modeling consisted of 648 residues
taken from several protein sequences, excluding adenyl kinase, selected to repre-
sent a diverse population with regard to function and origin. For each residue, 5
independent variables were used, specifying all amino acids no more than 2
residues distant. Thus, the pattern search was limited to locally interactive
information,
Table XXV gives the results of the patterns found when applied to adenyl
284 R. CHRISTENSEN

kinase. In adenyl kinase, 105 residues are in ~-helices. The entropy minimax
classification is 4th ranking in terms of high percentage of the actual ex-helix
residues identified correctly and 1st ranking in terms of low percentage of errors in
ex-helix classification predictions. The two predictors which perform best on these
data are those of Finkelstein and Ptitsyn and entropy minimax. The entropy
minimax patterns leave more ex-helix residues unclassified, while the Finkelstein
and Ptitsyn algorithm makes more erroneous ex-classifications.

TABLE XXV
Results of applying a-helix classification methods to adenyl kinase.

Percentage of actual a's Percentage of erroneous


Method correctly classified a-classifications
Downloaded by [University of Bristol] at 02:45 26 February 2015

Entropy minimax 63% 9.6%


Barry & Friedman 53 15.2
Chou & Fasman 67 16.7
Finkelstein & Ptitsyn 75 13.2
Levitt & Robson 40 16.0
Lim 78 26.1
Nagano 58 15.3
Burgess & Scheraga-I 39 45.3
Burgess & Scheraga-2 44 29.2
Burgess & Scheraga-3 20 12.5

Molecular evolution Of possible theoretical interest has been the suggestion that
entropy minimax may play a role in understanding certain phenomena in
molecular evolution. The information content of the code sequences for protein
types common to different species have been studied by Gatlin,99.100 Smith,229
Reichert.e?" and Reichert and Wong. 213 One such protein is cytochrome c, which
is present in the cells of all eukaryotic organisms. Studies quantifying the
information content of the sequence of amino acids comprising this protein have
produced a hierarchy extending over 39 species from Baker's yeast to homo
sa piens, 2 14
Detailed analyses of the occurrence frequencies .of residues and residue com-
binations along molecular chains reveal two seemingly inconsistent effects.
First, at the single residue. level, the individual nucleotides are more nearly equi-
probable for species higher in the evolutionary hierarchy than for those lower in
the hierarchy. Both Smith 229 and Gatlirr'P? have pointed out that this means
increased potential message variety, hence a potentially higher entropy, for the
more highly developed species. The deviations from equiprobability for the higher
species are consistent with entropy maximization subject to the constraints of
the genetic code. I 01.154.229 Second, at higher association levels (pairs, triplets,
ctc.), the nucleotide sequences for higher lifeforms are more organized in the sense
of having lower entropy per residue.100.102.213 It thus appears that evolution is
playing the entropy minimax game. The global entropy (amino acid combina-
tions) is being minimized while the local entropy is being maximized.i!" As a
molecule evolves, it tends to equilibrate the proportions of the different amino
acids subject to the constraints of the genetic code, i.e., to maximize the uncon-
ENTROPY MINIMAX MODELING 285

ditional: residue-specific (local) entropy subject to genetic constraints. This gives it


a molecular pool with greater adaptability to functional needs and environmental
constraints. Simultaneously, it tends to actually use smaller fractions of the
potential message variety of multi-residue segments, the sequences of which
govern its biological functions. In minimizing this function-conditional global
entropy, it becomes more highly tuned to biological requirements.

V. APPLICATIONS FOR ENTROPY MINIMIZATION AND


MAXIMIZATION SEPARATELY

A. Entropy Minimization
The entropy minimization aspect of entropy rnimrnax is a variation of the
partition (for discrete analyses) or potential functions (for continuous analyses) of
Downloaded by [University of Bristol] at 02:45 26 February 2015

the spaces of independent variables (feature space). Specifically, a set C is sought


which minimizes the conditional entropy

where C is the set of partition cell boundaries in the discrete case and the set of
potential function parameters in the continuous case (in the continuous case
integration replaces summation). An alternative formulation of the entropy mini-
mization principle is maximization of the entropy exchange

M(O, C) = L P(C,) L p(OkIC,j log [p(OkIC,)/P(Ok)].


, k

Separate applications of entropy minimization are often not true examples of


the minimum entropy principle used in entropy minimax for any of three reasons:
i) They may fail to use maximum entropy expectation estimators of the
probabilities p(OkICj), P(C,) and P(Ok)' Often what are used are simply frequency
ratio estimators, i.e., the "straight rule," which is equivalent to setting the weight
normalization under constant background information constraints equal to zero,
w=O.
ii) They may seek to maximize some combination other than

L P(C j) Lk p(OkIC,) log [p(OkIC,)/P(Ok)]


i

or the equivalent.
iii) They may seek to maximize t.S(O, C) by varying {Ok} rather than by varying
{C;}. In entropy minimax, the {Ok} partition is defined by the specification of the
pattern discovery problem to be solved. It is the partition on which the utilities (or
disutilities) are defined in decision theory. The {C;} partition, on the other hand,
defines how events are grouped in terms of the IVs, and this is what must be
determined in pattern discovery by analysis of the data in light of background
information.
286 R. CHRISTENSEN

I. Feature selection
The most widespread application of entropy minimization is in feature selection.
Functional forms related to the mutual entropy exchange liS(O, C) have been used
as a measure of the "goodness" of a feature or characteristic by Lewis ' 72 (his
"G/'), and Sebestyen and Edie 224 (their "1"). Related forms have also been used to
measure "information correlation" of one variable to another by JeffreysI41.142.143
(his "support"), Shannon 225 (his "mutual information"), Good 110 (his "weight of
evidence"), Kullback and Leibler P" (their "directed divergence"), and others.!": 17.
39. Ill, 157. 171, 174. 195. 228. 242, 248.255

Consider, for efample, the two-sided entropy S(X) and the one-sided entropies
SI.(X) and SR(X), discussed in the previous paper (1).56 The quantities, S, SL and
SR' for the minimum entropy cut X, can be used along with the correlation
coefficient r, Pearson's T, and other variables as measures of association between
the DV and individual IVs, pairs of IVs, etc. Using such measures, one may
Downloaded by [University of Bristol] at 02:45 26 February 2015

formulate criteria for efficiently pruning to manageable dimensions lists as long as


hundreds of thousands of candidate variables."
An interesting fact observed in meteorological feature selection, for example, is
that the one-sided entropies SI. and SR often tend to be more reliable measures
than full range measures such as Sand r. (The reliability of a measure may be
estimated by determining the extent to which it enriches the high-association end
of the distribution of features in comparison to the distribution on randomly
scrambled data.) Some variables, for example, are reasonably consistent indicators
on one or the other of their extremes, high or low, but poor indicators in the
midrange and at the other extreme.

2. Curve fitting
As discussed in Paper (1),56 minimization of the entropy of the residuals has been
suggested as a general criterion for curve fitting. This procedure was applied to
geological data by McConnent 8 1 and found to be effective in modeling curves
with a number of "displacements". due to geological faults. For a discussion of this
and other goodness-of-fit criteria, see Christensen. 53 Entropy minimization has
been combined with crossvalidation as a general procedure for curve-fitting which
includes both error assessment and an information theoretic tradeoff for predictive
reliability by controlling the complexity of the curve. 54

3. Unsupervised classification
Supervised classification sets up cell boundaries in IV-space to sort events in tenus
of DV values (the DV being the "supervisor" of the IV partition). Unsupervised
classification trys to do this without a DV. Events must thus be sorted into groups
by adjusting cell boundaries on some other basis such as a measure of local
density in IV space. An example is to define the boundaries as a network of lines
or surfaces in low density regions of the space, separating "different" regions of
high density.
Obviously some rules must be assumed, else the unsupervised classification
problem is so open-minded as to admit virtually any partition as a solution. A
sizable literature is available on methodologies suggested under a variety of
choices for the restrictions. For general reveiws, see Cormack,"? Duran and
Odell,"? Blashfield and Aldenderfer.!" and Everitt."? Examples of approaches
utilizing concepts from information theory are given by Watanabe.e"? Wallace and
Boulton,244 and.Ruspini.i"?
ENTROPY MINIMAX MODELING 287

Entropy minimization is an information theoretic criterion for unsupervised


classification. Formally, this criterion may be expressed in terms of a process of
seeking a partition {Ci } which minimizes the entropy

. S(C)= - LP(Ci)logP(C i),


i

subject to assumed constraints and to a criterion for estimating the cell-occupancy


probabilities P( CJ For the P(Ci)' one might use frequency ratios (with the proviso
that 0 10gO=O) or one might use maximum entropy expectation probabilities. The
heart of the matter, then, is the constraints to use in forming the partition.
Obvious possibilities include rectilinear grids and general linear discriminant
partitions. However, there is a more interesting alternative. This is to use the trial
crossvalidation approach of entropy minimax.
We randomly subdivide the total data into model building and verification
Downloaded by [University of Bristol] at 02:45 26 February 2015

samples, and further subdivide the building sample into training and test sub-
samples. We select a set of partitions however large, small, simple or complex we
wish. We start with a trial value of the weight normalization, say w=5. We
compute S(C) on the training data for each partition in the set and identify which
partition gives the lowest S(C). We then compute S(C) on the trial data for that
partition, and assign that partition and that value of S(C) to the normalization
w= 5. Then we change w, say to w= 10, and repeat the process.
After a sequence of these calculations, we will have value of S(C) assigned to
each trial value of w. We select the value of w (and the associated partition) for
which S(C) is a minimum. This gives us a partition with minimum entropy on the
trial data in a manner that is protected from the overfitting trap. (This is the trap
we would fall into if we tried to lower the entropy by defining an arbitrarily
contorted or contrived partition.)
As a final step, we repeat the entire process for several different random
splittings of the data and select a final S(C) which is closest to the average of these
minima. The variance of the minimum S(C) for these different splittings gives us a
measure of the precison of our choice.
This procedure does not, however, free us from the need to specify some
constraints on the set of candidate partitions. This is easily seen by noting that the
one-cell partition of the entire IV space has zero entropy (or, at least very low
entropy, depending on the details of the P(C,) estimator used). Artificially low S(C)
is similarly obtained for partitions with one cell enclosing all the data and any
number of other cells in empty regions of IV-space. Thus, "unsupervised"
classification must be supervised by some choice of constraints, even though it is
not supervised by the values of a specific DV for the data points."

B. Entropy Maximization
The entropy maximization aspect of entropy rmrnmax is estimation of expected
values and associated uncertainties of future frequency ratios. These expected value
estimates are the probabilities p(O.IC i ) , P(C i ) and P(O.). P(O.), for example, is
given by
1
P(O.) = Jo p!.(pIB, D) dp,
288 R. CHRISTENSEN

where h(pl B, D) is the maximum entropy probability density function (pdf) for the
frequency ratio p of occurrence of O, given background B and data D. The pdf
J.(pl B) is found, by the Lagrange multiplier method, as a function which
maximizes the entropy
1
S= - JfbjB) 10gfk(pIB)dp,
o

subject to all known constraints, e.g., Jfk(pIB)dp= 1, Jplogfk(pIB)dp=constant (in


some cases), Jp fk(pIB) dp=constant (in some cases), etc.-the background B
specifies these constants. Once fk(pIB) is determined, then h(pID, B) is obtained
from Bayes' Theorem h(pID,B)=h(P/B) Pk(Dlp,B)/Pk(DIB).
Entropy maximization may be applied to very diffusely defined systems or to
highly structured systems. An example of the former is its use to make probability
Downloaded by [University of Bristol] at 02:45 26 February 2015

estimates based on general literature data. At the other extreme are physical
systems with precisely defined constraints governing the maximum entropy proba-
bility distributions.

I. Estimating probabilities using data reported in published literature


Entropy maximization is a procedure for determining the numerical values of
probabilities. A natural application is the problem of using data in published
reports and papers to make probabilistic estimates of future frequency ratios. lOB
Suppose, for example, it is reported that 9% of non-Hodgkin's lymphoma
patients, with diffuse aggressive and unfavorable histologies involving bone
marrow, were observed to respond to chemotherapy in a specific study. What
response probability should a physician use in making medical decisions regarding
a patient with these characteristics? What additional factors can be used to obtain
an estimate with improved predictive accuracy?
Situations such as this typically do not involve well-defined physical constraints
on specifically structured systems. Rather they generally involve diffuse amounts of
background information about complex systems of unspecified structure. The
appropriate constraint in such cases is that there is a certain quantity of
background information. As shown in Paper (1),56 the maximum entropy expec-
tation value of the probability under this constraint is given by

where

Pk = probability of outcome class k,


nk = number of instances of outcome k in the sample D of events matching
specific conditions C,

11 = sam pie size ( ~ Ilk)'


Fk = frequency ratio of outcome k in background population B (events matching
more general conditions), and
w = weight normalization.
ENTROPY MINIMAX MODELING 289

Tn the example described above, the conditions were C=(non-Hodgkin's lymph-


oma, diffuse aggressive, unfavorable histology, bone marrow involvement). For
patients matching these conditions, the response ratio was nkln = 0.09. To compute
P k> we also need:
n = total number patients matching conditions C (the denominator of the 0.09
ratio),
F, = response ratio for patients matching more general "background" conditions
B,and
w = weight normalization for background B.
Determining the value of n should not be difficult (although there are some
publications that give fractions only and omit sample and subsample sizes). In
the study upon which the above example is based,"! the subsample size was given:
n=22.
Downloaded by [University of Bristol] at 02:45 26 February 2015

F, The problem of determining the background frequency ratio F k involves both


a practical and a theoretical question. The practical question is: for what
definitions of B are data available? The theoretical question is: what are
appropriate definitions of B assuming that any desired data can be obtained?
The practical question is generally the easier to answer and may render the
theoretical question moot. In the above example, one could take B to be "non-
Hodgkin's lymphoma's of diffuse aggressive and unfavorable histologies." C, then,
specifies the subset with bone marrow involvement. However, data are also
available on other enlargements of C. An example is to simply specify "non-
Hodgkin's lymphoma." However, if data on enlarged conditions must be obtained
from a source different from that for the specific subset of interest, then another
consideration enters. This is the problem of determining whether or not there are
unstated conditions X which distinguish response rates for the specific subsample
from those for the more general population. This would be the case, for example, if
the sample was taken at a hospital which served a catchment area with a
population differing from the general population of non-Hodgkin's lymphoma
patients in factors significant to response (e.g., age, stage, etc.). As a practical rule,
it is often useful to let B be the broadest category for which data are reported with
the same selection criteria as C.
Theoretically, what one is seeking is a definition of B which enlarges C just
enough for B to have stable probabilities with reasonably low uncertainties. This
may be formulated as follows:
Background specification principle: The background B for any specific set of
conditions C is the minimum enlargement of C for which (i) the sample size is
large enough to determine the outcome probabilities with reasonably low
uncertainties, and (ii) the probabilities in the applications of interest are
reasonably stable over time.
The sample size requirement ensures that the background population N is large
enough to specify F k with acceptably low associated uncertainty (generally
proportional to l/JN). In some cases, it may be necessary to make maximum
entropy expectation estimates of F, itself, using an even larger "background to the
background" with its own weight normalization, and to work one's way up a
hierarchy of enlargements until one reaches a sample size large enough to
290 R. CHRISTENSEN

dominate over the weight normalization for the next higher level. How far one
must go in the hierarchy, before the analysis may be truncated with a frequency
estimator approximation, depends upon how finely one wishes to resolve the
probability under the specific conditions C. (This is the informational analog of
the need to use large amounts of energy to resolve small objects in microscopic
physics.)
The stability requirement ensures that the definition of B captures all variable
factors that can affect the outcome distribution. The minimum enlargement
requirement guards against introducing, into the background, subpopulations that
are irrelevant to C.

W The weight normalization may be determined by trial crossvalidation in the


usual way." It is preferable to use two different studies conducted under similar
conditions. One study is then taken as the training sample and the other as the
Downloaded by [University of Bristol] at 02:45 26 February 2015

trial sample. Frequency counts for subgroups common to both studies are
tabulated, and the value of w is set to minimize the error in using the training data
to predict the trial outcome ratios.
If only one study is available, then, in principle, one can accomplish the same
end by randomized splitting. However, this requires event-specific information of
characteristics and outcomes. This is often not available in the published reports,
especially for large samples.
Trial determinations of the weight normalization were conducted by Reichert
and Christensen-?" for four examples in the medical literature:

i) Response to chemotherapy in diffuse aggressive and unfavorable histologies


of non-Hodgkin's lymphoma
The training sample was taken as N = 151 patients treated at the National
Cancer Institute."! The trial sample was taken as 135 patients studied in SECSG
protocol NHL-349. 7B Feature conditions included liver involvement, type "B"
symptoms, bone marrow and hemoglobin level. The error was found to be a
minimum at W= 70 events. Values of this order of magnitude are typical when the
subgroupings do not capture all the variables characterizing dissimilarities between
training and trial groups. The example given above, in which the subgroup with
bone marrow involvement had a 9% response rate, came from these data. With
w=70, the prediction is P= 100 x (2+0.54 x 70)/(22+ 70)% =43%. The observed
response rate in the trial data was 30%. (The overall response rate for the training
sample was 54% while that for the trial sample was 42%. So the subgroup rate was
higher in the trial sample even though the overall rate was lower.)
ii) Response to chemotherapy in non-Hodgkin's lymphoma (all histologies)
The training sample was taken as N = 97 patients at M. D. Anderson Hospital
and Tumor Institute.P" The Trial sample was taken as 224 patients studied in
SECSG protocol NHL-349. 7B Feature conditions included sex, prior therapy and
absolute lymphocyte count. The error minimum was found at W= 15 events. The
relatively low value of w is indicative of the similarity of the training and trial
group and the relative stability of the outcome distributions.
iii) Response to chemotherapy in acute myelocytic leukemia.
The training sample was taken as N = 300 patients at M. D. Anderson.P"
ENTROPY MINIMAX MODELING 291

Feature conditions included sex, platelet count, hemoglobin level, infection status,
and temperature. Two trial samples were used. One was a second group of 107
patients also at M. D. Anderson.P? Here the error minimum was found at W= 10.
The other was a group of 94 patients at the Cleveland Clinic Foundation." For
this trial sample, the mimimum was found at w = 200. The difference in normaliz-
ations for the two trial samples illustrates the general tendency for larger
normalizations to be required when making probabilistic estimates for groups with
greater dissimilarity from the training group.
iv) Five-year survival in coronary artery disease
The training sample was taken as N = 590 patients at Cleveland Clinic. 2 0 2 The
trial sample was taken as 203 patients at the Cardiopulmonary Laboratory, Dept.
of Medicine, Queen's University, Kingston, Ontario." Nineteen subgroups of
patients were identified using various characteristics of cardiac condition and
function. The error minimum was found at IV = 20.
Downloaded by [University of Bristol] at 02:45 26 February 2015

Based on these results and on a long experience of w-setting analyses in other


areas, some rules-of-thumb can be suggested for cases in which one wishes to
make rough estimates quickly without conducting a training/trial analysis:
• IV ~ 10-20: The features characterize the probabilities very well, the proba-
bilities are quite stable, and the training sample is representative of the cases
for which predictions are desired.
• IV~30-50: The situation falls short on one or more of the above criteria, but not
significantly.
• w~60-90: Unspecified factors may significantly affect probabilities.
• w> 100: The cases being predicted differ in significant ways from those in the
training sample.
(Only under ideal circumstances is the error minimized at IV mi n = K, where
K = 2, 3,... is the number of outcome possibilities. These are circumstances in which
all of the relevant factors are specified and the sampling is from a distribution
identical to that of the cases being predicted. At the other extreme, the upper limit
resolvable by a training sample of size N, with minimum subgroup size m, is in the
order of W ma x = N - m.)

2. Statistics
A primary application of entropy maximization in statistics has been in providing
a priori probability distributions, J(x), for Bayesian analyses. A number of
distributions have been derived by assuming different functional forms for the
constraints. The maximum entropy distribution, J(x), may be unconditional or
conditional (i.e., relative to an assumed prior distribution, g(x)). The unconditional
form maximizes the entropy of the distribution J(x):

S(J) = - SJ(x) In J(x) dx,


while the conditional form minimizes the Kullback-Leibler P" mean information
for discrimination between J(x) and g(x):

l(f,g) = SJ(x) In [J(x)/g(x)] dx,


292 R. CHRISTENSEN

where g(x) is the assumed prior distribution. See also Jeffreys,142.143 Good,' 10
Kullback.P" Renyi,215 and others": 42. 43, 136,240,255 (For discrete variates, the
integration is replaced by summation.) The unconditional form may be regarded
as a special case of the conditional form with a uniform assumed prior. (For
infinite ranges, an appropriate limiting function can be used.) Table XXVI gives a
list of distributions specified by some of the simpler forms of constraints. For
derivations, see Kullback '"? and others. 42, 76. 107. 136. 138. 144, 175.203.238.256

TABLE XXVI
Distribution with maximum entropy subject to various fixed expected value constraints. For pdf
functional forms, see, e.g., Christensen. ~J

Range of variate Constraint(s) Prior pdf ME pdf

DISCRETE: (O<k;:>/I) none uniform disc. rectangular


Downloaded by [University of Bristol] at 02:45 26 February 2015

DISCRETE: (0 <k;:>/I) E[k] uniform binomial


DISCRETE: (kO;O) E[k] binomial binomial
DISCRETE: (kO;O) E[k] Poisson Poisson
DISCRETE: (k>O) E[k] uniform geometric
DISCRETE: (k>O) E[k] reciprocals log-series
DISCRETE: (k>O) E[k] zeta gen. geometric
DISCRETE: (k>0) E[ln k] uniform zeta
CO NT: (O;:>x;::; I) none uniform uniform
CO NT: (0;::; x;:> I) E[x] uniform rectangular
CONT: (0;::; x;::; I) E[x], E[x'] uniform trune. normal
or rectangular
CONT: (O;::;x;:> I) E[lo x] uniform power function
CO NT: (0;:> x ;::; 1) E[ln x]. E[ln(l-x)] uniform beta
CONT: (O;:>x<oo) E[x] uniform exponential
CO NT: (O;:>x < 00) E[x], E[ln x] uniform gamma
CONT: (0;::; x < 00) E[x"] uniform half Subbotin
CONT: (0;::; x < 00) E[x"]=I,E[lnx]= -yla uniform Weibull
CONT: (0;::;.«00) E[ln x], E[(ln x)'] uniform log-normal
CO NT: (O;:>x<oo) E[x], E[x'] uniform If /,,;::;2/';: trunc.
normal, expon.,
or trunc. U;
If 1', > 2/'i: none
CO NT: (O;::;x<oo) E[j;J uniform gen. gamma
CO NT: (-00 -cc c co] none uniform none
CO NT: (-oo<x<oo) E[x'] uniform normal (I' = 0)
CONT: (-oo<x<oo) E[x],E[x 2 ] uniform normal
CONT: (-00 -cx-c eo) E[lxl] uniform Laplace (/,=0)
CONT: (-00 -cx-c oo) E[x], EElx-il] uniform Laplace
CO NT: (-00 -cx-c oo) E[ln(l +x')] uniform gen. Cauchy

3. Statistical mechanics and thermodynamics


The relationship between thermostatics/thermodynamics and concepts of proba-
bility, statistics and information has a long and evolving history. The period 1822-
1854 saw the struggle to understand and clarify the notions of heat, energy and
entropy, and to develop the three laws of tnermodynarnics.P? The effort to
explain the second law in terms of statistical mechanics began with the work of
Maxwell, Boltzmann and Gibbs and has continued through work on the ergodic
hypothesis and on irreversible thermodynamics to this very day,
ENTROPY MINIMAX MODELING 293

Almost from the moment that entropy was tied to probabilities, physicists
entertained the notion that it was related to the incompleteness of information
about a system. Maxwell's comment about throwing a tumblerfull of water into
the ocean and trying to take it back out, his entropy-reducing demon, and Gibbs'
comment on entropy as mixed-up-ness are examples. Szilard 232 made it clear that
Maxwell's demon, in order to operate the trap door through which molecules pass,
must receive information. Brillouin,26,27 elaborating upon Szilard's analysis in
light of the work on information theory by Shannonv" and Wiener.P? developed
the notion of negentropy. Ter Haar l 14 summarized this viewpoint with the
statement that "entropy measures our ignorance or lack of knowledge or lack of
(detailed) information." See also Bergmann and Thomson!" and Richardson.t!" It
has been suggested that, rather than reflecting merely a subjective lack of
information, the second law arises from irreducible quantal dispersions of mixed
states. 121
Overlapping this work in which the law of increasing entropy was seen as a
Downloaded by [University of Bristol] at 02:45 26 February 2015

(very highly probable) consequence of statistical mechanics, was a line of develop-


ment which takes maximum entropy as a starting point rather than an end result.
This line began with another interpretation of Szilard's paper and proceeded
through Lewis.'?" Elsasser.P" Jaynes.P" and Ingarden.P" Further developments
may be found in Tribus,235.236 Jaynes, 137,139 Katz/ 50 and Hobson. 131 The
primary role of information theory in statistical mechanics has been not so much
in producing new predictions as in providing an overall framework for tying
together existing theories. The shift from a maximum likelihood perspective to a
maximum information expectation perspective has more significant implications
with respect to the foundations of statistical mechanics and the ease with which
results are derived.

4. Spectral analysis
Applications of entropy maximization to spectral analysis (reconstruction of an
estimate of a spectral density Iunction based on a finite sample) began with the
work of Burg. 28,30 The relationship between maximum entropy and maximum
likelihood spectra has been studied.i? It has been shown that the maximum
entropy method yields a spectrum which maximizes the entropy of a stationary
random process consistent with a set of given autocovariance functions.?? The
equivalence of autoregressive (AR) and maximum entropy (ME) spectra has
been demonstrated,20,241 This makes the well-developed AR algorithms available
to the ME method, despite the fact that the two methods are quite different
conceptually. I 93 Several references are now available reviewing spectral esti-
mation"!: 122, 123, 124.217 and the role of entropy maximization. 31.37.140

5. Image reconstruction
The maximum entropy image reconstruction formalism is a multidimensional
generalization of Burg's maximum entropy spectral reconstruction. For reviews of
this development, see Kikuchi and Soffer l 53 and Frieden."? Early versions.s": 95.
96.112.166 e.g., MART, were slow and required considerable memory. Subsequent
versions, e.g., MENT,185 converge faster and have lower memory requirements
(since an array representing the source is not required). In addition, MART
produced anomolous streaks which are not present in MENT reconstructions.
(Both algorithms are only approximations to a mathematically precise constrained
294 R. CHRISTENSEN

entropy maximization, The MART approximation not only had larger errors, it
also had error regularities which showed up as streaks. The errors of MENT are
smaller and more randomly dispersed relative to visual interpretation.)
MENT has been tested on a variety of real-world data such as X-ray data on an
air bubble in a plastic tube surrounded by an iron pipe. The smooth bubble's
density profile is clearly defined with fluctuations of about ± 15% in the
reconstruction. This and other tests '2.186 have shown that the iterative MENT
algorithm performs better than two widely used techniques, Fourier space inver-
sion and convolutional back projection. The MENT reconstruction is more regular
and truer to the actual shapes of the objects. Additionally, in MENT reconstruc-
tion, zero source values are correctly reproduced, without "ringing" artifacts.

6. Pattern recognition
Pattern discovery is concerned with finding unknown relationships between a.
Downloaded by [University of Bristol] at 02:45 26 February 2015

dependent variable and a set of independent variables based on a statistical sample


of events. Pattern recognition, on the other hand, presumes that the relationships
are known and seeks an efficient means of using the independent variables to
determine the value of the dependent variable. This may be accomplished by use
of a decision tree in which an independent variable with maximum entropy is
examined at each node. ' 59.172
An example given by Landa assumes four binary valued independent variables
A, B, C, D and a five-valued dependent variable X. The DVflV relationship is
specified, e.g.:

Further, incompatibility relations are specified, e.g.:

cil=bd = ail =ac=O ,

and 'the a priori probabilities are specified, e.g.:

The problem is to design an efficient procedure for determining the value of X.


Figure 10 shows a decision tree with maximum entropy nodes for this
example. 113.159
Such a maximum entropy tree building procedure has been applied to renal
disease recognition." Twenty-seven symptoms usually investigated by physicians
constitute the independent variables. The values of the dependent variable are
13 ditTerent renal diseases. The mean value of the number of symptoms that must
be examined to determine the disease is 7.86, i.e., 29% of the total of 27 symptoms.

7. Other applications of maximum entropy


Because of the extensiveness of the literature on entropy maximization separately,
and the availability of reviews,"3.149.169 other applications will only be men-
tioned briefly here. Included are applications to modeling population density
distributions." transportation networks, 87.88.249.251 search etTort allocation,"
ENTROPY MINIMAX MODELING 295
Downloaded by [University of Bristol] at 02:45 26 February 2015

FIGURE 10 Maximum entropy solution to an illustrative pattern recogmlton problem (see text).
First, independent variable B is examined. If its value is b, then D is examined and the dependent
variable's value is Xl or X2 depending upon whether D is d or il, respectively. If, on the other hand B is
5, then C is examined. If it is c, the value is X3- If C, then A is examined and the value if X4 or X s for a
and ii, respectively.

reliability and quality control engineering.P? queueing distributions.P? game


strategy,1l3 stocks and commodities price distributions.F brand-switching in
marketing,' 3,t27. t28 asset depreciation rates, t68 general systems reconstructability
analysis, It s. ISS and learning algorithms for Boltzmann machines. I

ACKNOWLEDGMENTS

I wish to thank Drs. T. A. Reichert (Becton Dickinson Immunocytometry Systems), R. F. Eilbert


(Entropy Limited) and R. G. Ballinger (MIT) for helpful comments and suggestions regarding the
biomedical. meteorological and engineering sections, respectively, of this paper. The research abstracted
in this paper was supported by the U.S. Dept. of Interior (OWRT), U.S. Dept. of Commerce (NOAA).
U.S. Dept. of Energy, Nuclear Regulatory Commission, National Cancer Institute, National Science
Foundation, Electric Power Research Institute, Science Applications, Inc., Arthur D. Little, Inc.,
Northeast Utilities Service Company, Arizona Salt River Project, and Western Oil and Gas
Association. Contributing to and cooperating with the research, as specified by the references, were
Entropy Limited, Massachusetts Institute of Technology (Nuclear Engineering Dept. and Sloan School
of Management), Stanford University (Dept. of Materials Science), Duke University Medical Center,
Southeastern Cancer Study Group, University of Michigan (Dept. of Nuclear Engineering), Michigan
State University (Dept. of Radiology), Carnegie-Mellon University (BioMedical Engineering), Johns
Hopkins University School of Medicine, University of Pittsburgh School of Medicine, Veterans
Administration Hospitals, University of Minnesota (Dept. of Civil and Mineral Engr.), Scripps Institute
of Oceanography, Cornell University (Civil and Environmental Engr.), University of Arizona (Labora-
tory-of Tree-Ring Research), California Dept. of Water Resources, U.S. Geological Survey. National
Climate Data Center, Pacific Environmental Group, Battelle Northwest Laboratory, Argonne National
Laboratory, Yankee Atomic Electric Co., Duke Power Co., General Public Utilities, Carolina Power
and Light Co., and American Petroleum Institute.
296 R. CHRISTENSEN

REFERENCES

I. D. H. Ackley, G. E. Hinton and T. J. Sejnowski, "A learning algorithm for Boltzmann machines."
Cognitive Sciellce, 9, 1985, pp. 147-169.
2. M. L. Adams, "Investigation of techniques for SIMMER-II neutronics time-step controL" Rept.
LA-UR-84-3995, Los Alamos National Laboratory, Los Alamos, NM, August 3, 1984.
3. O. C. Allais, "The problem of too many measurements in pattern recognition and prediction."
IEEE 111I1. COllvelllioll Record, Part 7 (Discrimination and Measurement), March 21-25, 1966, pp.
124-130.
4. M. Bad and E. Bad, "L'algorithrne Ie plus rationnel de reconnaissance applique dans Ie diagnostic
des maladies renales." La Sallie Publique, 15-e, No. I, 1972, pp. 109-115.
5. R. G. Ballinger, R. A. Christensen, R. F. Eilbert, S. T. Old berg and E. T. Rumble, "Fission gas
release and fuel reliability at extended burnup: predictions by the SPEAR-BETA code." Tech.
Rept, EPRI RP971, Enlropy Limited, Lincoln, MA, presented at American Nuclear Society,
Topical Meeting: LWR Extended Burnup-Fuel Performance and Utilization, Williamsburg, VA,
April 4-8, 1982.
6. R. G. Ballinger, R. A. Christensen, R. F. Eilbert, S. T. Oldberg, E. T. Rumble and G. S. Was,
Downloaded by [University of Bristol] at 02:45 26 February 2015

"Clad failure modeling." Zirconium ill the Nuclear industry: Fiflh Conference, cd, by D. G.
Franklin, Am. Society for Testing and Materials, ASTM STP 754,1982, pp. 129-145.
7. W. H. Barker, "Information theory and the optimal detection search," Operations Research. 25,
1977, pp. 304-314.
8. T. P. Barnell, "Statistical prediction of North American air temperatures from Pacific predictors."
MOIIIMy Weather Review, 109, 1981, pp. 1021-1041.
9. T. P. Barnett and K. Hasselmann, "Techniques of linear prediction, with application to oceanic
and atmospheric fields in the tropical Pacific." Rev. Geophys. Space Phys., 17, 1979, pp. 949-968.
10. T. P. Barnett and R. W. Preisendorfer, "Multifield analog prediction of short-term climate
fluctuations using a climate slate vector." Jour. Almos. Sci., 35,1978, pp. 1771-1787.
I I. M. S. Bartlett, "Further aspects of the theory of multiple regression." Proc. Cambridge Phil. Soc.,
34, 1938, pp. 33-40.
12. C. F. Barton, "Computerized axial tomography for neutron radiography of nuclear fuel:' Trans.
Amer. Nuel. Soc., 27, 1977, pp, 212-213.
13. F. M. Bass, "The theory of stochastic preference and brand switching:' Jour. Market Res., II,
1974, pp. 1-20.
14. C. B. Bell, "Mutual information and maximal correlation as measures of dependence." Ann. Math.
Statisr.,33, 1962, pp. 587-595.
15. R. S. Bell and J. W. Loop, "The utility and futility of radiographic skull examination for trauma:'
New England Jour. oj Medicine, 284, 1971, pp. 236-239.
16. P. G. Bergmann and A. C. Thomson, "Generalized statistical mechanics and Onsager relations."
Phys. Rev., 91, 1953, pp. 180-184.
17. N. M. Blachman, "The amount of information that y gives about X." IEEE Trails. on IIIJorm.
Theory, IT-14, 1968, pp. 27-31.
18. R. K. Blashfield and M. S. Aldenderfer, "The literature on cluster analysis:' Multiv. Behav. Res.,
13,1978, pp. 271-295.
19. S. A. Borg and S. Rosenthal, Handbook oj Cancer Diagnosis and Staging, A Clinical Atlas. J.
Wiley, New York, 1984.
20. A. van den Bos, "Alternative interpretation of maximum entropy spectral analysis." IEEE Trans.
InJor. Theory, IT-17, 1971, pp. 493-494.
21. C. T. Bosch and B. A. Valde, "Consultants' report to the OPA/PACE special task force on
underground storage tanks." Petroleum Assoc. for Conservation of the Canadian Environment,
Ottowa, Feb. 1978.
22. B. E. Boyle, "Symptom partitioning by information maximization:' NIH Grant 5 POI GM 14940-
05, Mass. lnst. of Tech., Cambridge, MA, 1972. Entropy Minimax Sourcebook, 4, Entropy Pub.,
Lincoln, MA, 1981, pp. 201-210.
23. J. Brandrnan, R. M. Bukowski, R. Greenstreet, J. S. Hewlett and G. C. Hoffman, "Prognostic
factors affecting remission, remission duration and survival in adult acute non lymphocytic
leukemia." Callcer, 44, 1979, pp. 1062-1065.
24. L. Brieman, J. H. Friedman, R. A. Olshen and C. J. Stone, Classification and Regression Trees.
Wadsworth IntI. Group, Belmont, CA, 1984.
25. G. W. Brier and R. A. Allen, "Verification of weather forecasts:' Compendium of Meteorology, ed.
by T. F. Malone, Amer. Meteorological Society, Boston, MA, 1951, pp. 841-848.
ENTROPY MINIMAX MODELING 297

26. L. Brillouin, "Maxwell's demon cannot operate: information and entropy. I." Jour. of Applied
Physics, 22, 1951, pp. 334-343.
27. L. Brillouin, 'The negentropy principle of information." Jour. of Applied Physics, 24, 1953, pp.
1152-1163.
28. J. P. Burg, "Maximum entropy spectral analysis." Presented at the 37th Meeting of the Society of
Exploration Geophysicists, Oklahoma City, OK, Oct. 31, 1967.
29. J. P. Burg, "The relationship between maximum entropy spectra and maximum likelihood
spectra." Geophysics, 37, 1972, pp. 375-376.
30. J. P. Burg, "Maximum entropy spectral analysis." Ph.D. Dissertation, Dept. Geophys., Stanford
University, Palo Alto, CA, 1975, 136 pp.
31. J. P. Burg, D. G. Luenberger and D. L. Wenger, "Estimation of structured covariance matrices."
Proc. of the IEEE, 70,1982, pp. 963-974.
32. G. W. Burggraf and J. O. Parker, "Prognosis in coronary artery disease." Cirulcation, 51, 1975, pp.
146-156.
33. R. Bussiers and F. Snickers, "Derivation of the negative exponential model by an entropy
maximizing method." Environment and Planning, 2, 1970, pp. 295--301.
34. F. Cabanillas, J. S. Burke, T. L. Smith, T. E. Moon, J. J. Butler and V. Rodriguez, "Factors
predicting for response in adults with advanced non-Hodgkin's lymphoma." Arch. Intern. Med.,
Downloaded by [University of Bristol] at 02:45 26 February 2015

138, 1978, pp. 413-4t8.


35. C. A. Caceres, "The case against electrocardiographic automation." Computer, 6, July 1973, pp.
15--21.
36. C. A. Caceres and L. S. Dreifus, eds., Clinical Electrocardiography and Computers. Academic Press,
New York, 1970.
37. 1. A. Cadzow, "Spectral estimation: an overdetermined rational model equation approach." Proc.
of the IEEE, 70, t982, pp. 907-939.
38. L. L. Campbell, "Equivalence of Gauss's principle and minimum discrimination information
estimation of probabilities." Ann. Math. Statist., 41, 1970, pp. 1011-1015.
39. M. Castans and M. Medina, "La correlacion logaritrnica.' An. R. Soc. Esp. de Fis. y Quim., 52 (A),
1956, pp. 117-136.
40. Y. Censor, A. V. Lakshminarayanan and A. Lent, "Relaxational methods for large-scale entropy
optimization problems, with application in image reconstruction." Information Linkage Between
Applied Mathematics and Industry, ed. by P. C. C. Wang, Academic Press, New York, t979, pp.
539-546.
41. D. G. Childers (ed.), Modern Spectrum Analysis. IEEE Press, New York, 1978.
42. R. A. Christensen, "Induction and the evolution of language." Physics Dept., Univ. of Calif.,
Berkeley, 1963.
43. R. A. Christensen, "A general approach to pattern discovery." Tech. Rept. No. 20, Computer
Center, Mathematics Dept., Univ. of Calif., Berkeley, 1967.
44. R. A. Christensen, "Entropy minimax analysis of simulated LOCA burst data." Report to Atomic
Energy Commission, AT(49-24)-OO83, ELTD-74/I, NTIS PB85-210151, Entropy Limited, Belmont,
MA, Dec. 31, 1974, 32pp.
45. R. A. Christensen, Thermal Mechanical Behavior of UO, Nuclear Fuel, 4 Vols., Entropy Pub.,
Lincoln, MA, 1978.
46. R. A. Christensen, "Nuclear fuel rod failure hazard axes." Fuel Rod Mechanical Performance
Modeling, Task 3: Fuel Rod Modeling and Decision Analysis, FRMPM33-2 and FRMPM34-1,
Dec. 1979. Entropy Minimax Sourcebook, 4, Entropy Pub., Lincoln, MA, 1981, pp. 699-708.
47. R. A. Christensen, SPEAR Fuel Reliability Code System: General Description, NTIS EPRINP-
1378. Electric Power Research Inst., Palo Alto, CA, 1980.
48. R. A. Christensen, SPEAR-BETA Fuel Performance Code System, Vol. I: General Description,
EPRI NP-2291, NTIS DE82-903280, 1982, and Vol. 2: Programmer's and User's Manuals, EPRI
NP-2291-CCM, NTIS DE83·901905. Electric Power Research Inst .. Palo Alto, CA, 1982.
49. R. A. Christensen, SPEAR-BETA Fuel Performance Code System, COSTF: Cost Implications
Analysis Postprocessor, NTIS EPRINP-2914. Electric Power Research Inst., Palo Alto, CA, 1983.
50. R. A. Christensen, SPEAR-BETA Fuel Performance Code System: Fission Gas Release Module,
NTIS EPRINP-2905. Electric Power Research Inst., Palo Alto, CA, 1983.
51. R. A. Christensen, Multivariate Statistical Modeling, Entropy Pub., Lincoln, MA, 1983,724 pp.
52. R. A. Christensen, "Leak detection and site analysis." Underground Fuel Storage Workshop.
Pioneer Valley Planning Comm., Holyoke College, W. Springfield, MA, June 7, 1984.
53. R. A. Christensen, Data Distributions, Entropy Pub., Lincoln, MA, 1984, 299pp.
54. R. A. Christensen, "Polynomial curve fitting." Appendices D-F of Order and Time. Entropy Pub.,
Lincoln, MA, 1984, pp. 87-112.
298 R. CHRISTENSEN

55. R. A. Christensen, "Predicting tank leakage." Conf on Managing Leaking SubsurJace Storage Tank
Risks, Groundwater Technology, Factory Mutual Conference Center, Norwood, MA, May 8-9,
1985.
56. R. A. Christensen, "Entropy minimax multivariate statistical modeling-I: Theory." lntl. Jour.
General Systems, II, 1985, pp. 231-276.
57. R. A. Christensen and R. G. Ballinger, "In-service predictions." Joint EPRI(DOE Fuel Perfor-
mance Contractors' Overview Meeting, Atlanta, GA, April 8, 1980. Entropy Minimax Sourcebook,
4, Entropy Pub., Lincoln, MA, 1981, pp. 79-81.
58. R. A. Christensen. and E. Duchane, "Element-specific failure time estimation from ensemble
statistics." Fuel Rod Mechanical Performance Modeling, Task 3: Fuel Rod Modeling and
Decision Analysis, FRMPM32-2, July 27, 1979, Entropy Minimax Sourcebook, 4, Entropy Pub.,
Lincoln, MA, 1981, pp. 687-696.
59. R. A. Christensen and R. F. Eilbert, "Temperature profiles in UO, fuel under direct electrical
heating conditions." Jour. oj Nuclear Materials, 96, 1981, pp. 285-296.
60. R. A. Christensen and R. F. Eilbert, "Estimating chance correlation likelihood for hazard axis
analysis." Fuel Rod Mechanical Performance Modeling, Task 3: Fuel Rod Modeling and Decision
Analysis, FRMPM33-2 and FRMPM34-I, Nov. 1979, Entropy Minimax Sourcebook, 4, Entropy
Downloaded by [University of Bristol] at 02:45 26 February 2015

Pub., Lincoln, MA, 1981, pp. 711-731.


61. R. A. Christensen and R. F. Eilbert, "Seasonal precipitation forecasting with a 6-7 month lead
time in the Pacific Northwest using an information theoretic model." Monthly Weather Review,
113, 1985, pp. 502-518.
62. R. A. Christensen, R. F. Eilbert, O. H. Lindgren and L. L. Rans, "An exploratory application of
entropy minimax to weather prediction: estimating the likelihood of multi-year droughts in
California." Report to U.S. Dept. oj 1nterior, OWRT 14-34-001-8409, NTIS PB81-182255, 1980,
39 pp.
63. R. A. Christensen, R. F. Eilbert, O. H. Lindgren and L. L. Rans, "Successful hydrologic forecasting
for California using an information theoretic model." Jour. Appl. Meteor., 20, 1981, pp, 706-713.
64. R. A. Christensen, R. F. Eilberl and S. T. Oldberg, "Entropy minimax hazard axes for failure
analysis." Fuel Rod Mechanical Performance Modeling, Task 3: Fuel Rod Modeling and Decision
Analysis, FRMPM32-2, July 20, 1979, Entropy Minimax Sourcebook, 4, Entropy Pub., Lincoln,
MA, 1981, pp. 673-684.
65. R. A. Christensen, R. F. Eilbert and T. A. Reichert, "Self-consistent estimates for individual life
expectancy from censored data" (1985 in preparation).
66. R. A. Christensen and A. D. Hirschman, "Automatic phase alignment for the Karhunen-Loeve
expansion." 1EEE Trans. on Biomedical Engr., BME-26, 1979, pp. 94-99. .
67. R. A. Christensen and O. H. Lindgren, "Hydrologic forecasting for California," ELTD-81(I,
presented at the Seventh Conference on Probability and Statistics in the Atmospheric Sciences,
Amer. Meteorological Society, Monterey, Calif., Nov. 2--<i, 1981.
68. R. A. Christensen and O. H. Lindgren, "Santa Clara county draft hazardous materials model code,
I: County-wide issues." Presented at Santa Clara County Intergovernmental Council Meeting, San
Jose, CA, Feb. 3, 1983.
69. R. A. Christensen and T. A/ Reichert, "A preliminary entropy mimirnax search for patterns in
structural, physico-chemical and biological features of selected drugs that may be related to
activity in retarding lymphoid leukemia, lymphocytic leukemia and melanocarcinoma in mice:'
Report to National Cancer Institute, NOI-CM-23711(A10373, ELTD-75fl, Entropy Limited,
Pittsburgh, PA, June 30,1975, 302pp.
70. R~ 'M-:-Corriu,ck-;<'A review of classification." Jour. Royal Statist. Soc.; Ser. A, 34, 1971, pp. 321-
353. .
71. D. R. Cox, "Regression models and life tables." Jour. Royal Slalist. Soc., Ser. B, 34, 1972, pp. 187-
202.
72. J. M. Cozzolino and M. J. Zahner, "The maximum-entropy distribution of the future price of a
stock." Operations Research, 21,1973, pp. 1200-1211.
73. R. E. Davis, "Techniques for statistical analysis and prediction of geophysical fluid systems."
Gcophys. Astrophys. Fluid Dyn., 8, 1977, pp. 245-277.
74. D. R. Deakter, A. J. Krieger and T. A. Reichert, "Pattern recognition in diseases of the cervical
spine." Tech. Rept., Biotechnology Program, Carnegie-Mellon Univ., Pittsburgh, PA, May 1974,
Elltropy Minimax Sourcebook, 4, Entropy Pub., Lincoln, MA, 1981, pp. 225-237.
75. L. A. Diky and T. D. Koronatova, "On the stability of equations determining the motion of
vortices under perturbations in initial and boundary conditions" (in Russian). Meteorol. i Gidrol,
No.5, 1964, pp. 39-44.
ENTROPY MINIMAX MODELING 299

76. D. C. Dowson and A. Wragg, "Maximum-entropy distributions having prescribed first and second
moments." IEEE Trans. Infor. Theory, IT-19, 1973, pp. 689-693.
77. B. S. Duran and P. L. Odell, Cluster Analysis; A Survey, Springer-Verlag, Berlin, 1974.
78. 1. R. Durant, R. A. Gams, A. A. Bartolucci and R. F. Dorfman, "BCNU with and without
cyclophosphamide, vincristine and prednisone (COP) and cycle-active therapy in non-Hodgkin's
lymphoma." Cancer Treat. Rep., 61, 1977, pp. 1085-1096. .
79. J. A. Edward and M. M. Fitelson, "Notes on maximum-entropy processing." IEEE Trans. lnfor.
Theory, IT-19, 1973, pp. 232-234.
80. R. F. Eilbert, "Long range weather forecasting study for Central Arizona." Report to Salt River
Project, Phoenix, EL RN-202, 1983.
81. R. F. Eilbert, "Mid-seasonal updating of winter precipitation forecasts for Central Arizona,"
Report to Salt River Project, Phoenix, EL RN·211. 1984.
82. R. F. Eilbert, "Quantitative description of entropy minimax forecasts." Report to Salt River
Project, Phoenix, EL RN-220, 1984.
83. R. F. Eilbert and R.A. Christensen, "Performance of the entropy minimax hydrological forecasts
for California, Water Years 1948-1977." Jour. Climate Appl. Meteor; 22, 1983, pp. 1654-1657.
84. W. M. Elsasser, "On quantum measurements and the role of the uncertainty relations in quantum
mechanics." Phys. Rev., 52, 1937, pp. 987-999.
85. R. S. Emmet and J. C. Livingston, "Underground petroleum storage tanks: local regulation of a
Downloaded by [University of Bristol] at 02:45 26 February 2015

groundwater hazard." Conservation Law Foundation of New England, Boston, MA, 1984.
86. S. England, T. A. Reichert and R. A. Christensen, "Entropy minimax classification of the
secondary structure of adenyl kinase," Entropy Minimax Sourcebook, 4, Entropy Pub., Lincoln,
MA, 1981, pp. 449-454.
87. S. Erlander, Optimal Interaction and the Gravity Models. Springer-Verlag, New York, 1978.
88. S. P. Evans, "A relationship between the gravity model for trip distribution and the transportation
problem in linear programming." Transpn. Res., 7, 1973, pp. 39-61.
89. B. Everitt, Cluster Analysis. J. Wiley, New York, 2nd ed., 1980.
90. R. A. Fisher, "The use of multiple measurements in taxonometric problems." Ann. Eugenics, 7,
1936, pp. 179-188.
91. R. I. Fisher, S. M. Hubbard, V. T. DeVita, C. W. Berard, R. Wesley, J. Cossman and R. C. Young,
"Factors predicting long-term survival in diffuse mixed, histiocytic, or undifferentiated lymphoma."
Blood, 58, 1981, pp. 45-51.
92. E. Fix and J. L. Hodges, Jr., "Discriminatory analysis, non parametric discrimination: consistency
properties." USAF School of Aviation Medicine, Randolf Field, TX, Project 21-49-004, Rept. 4,
Contract AF41(128)-3I, NTIS ATI-II0-633, Feb. 1951.
93. D. H. Foley, "Considerations of sample and feature size." IEEE Trans. lnfor. Theory, IT-18, 1972,
pp. 618-626.
94. D. Franklin, H. Ocken and S. T. Oldberg, "SPEAR code development." LWR Core Materials
Performance Program: Progress in 1979-1980, NTIS EPRINP·I770SR, Electric Power Research
Inst., Palo Alto, CA, 1981, pp. 4.4-4.8.
95. B. R. Frieden, "Restoring with maximum likelihood and maximum entropy." Jour. Optical Soc.
Amer., 62, 1972, pp. 511-518.
96. B. R. Frieden, "Estimation-a new role for maximum entropy." 1976 SPSE Conference Proceed-
ings, ed. by R. Shaw, Society of Photographic Scientists and Engineers, Wash., DC, 1977, pp. 261-
265.
97. B. R. Frieden, "Statistical models for the image restoration problem." Comput. Graph. and Image
Proc., 12, 1980, pp. 40-59.
98. R. A. Gams, M. Raney, A. A. Bartolucci and M. Dandy, "Phase JJI study of BCOP vs. CHOP in
unfavorable categories and malignant lymphoma." Jour. Clinical Oncology (1985 in press).
99. L. L. Gatlin, "The information content of DNA." Jour. Theoret. Bioi., 10, 1966, pp. 281-300.
100. L. L. Gatlin, "The information content of DNA II." Jour. Theoret. BioI., 18, 1968, pp. 181-194.
101. L. L. Gatlin, "The entropy maximum of protein." Math. BioSci., 13, 1972, pp. 213-227.
102. L. L. Gatlin, lrformauon Theory and the Living System. Columbia Univ, Press, New York, 1972,
pp.79-96.
103. D. A. Gift, J. W. Gard and W. R. Schonbein, "Thyroid scanning-pursuing the relationships of
signs and symptoms to nuclide uptake and scan interpretation." Report to U.S. Dept. of Energy,
EX-76-5-02-2777.A003, Dept. of Radiology, Michigan State Univ., E. Lansing, MI, Entropy
Minimax Sourcebook, 4, Entropy Pub., Lincoln, MA, 1981, pp. 417-429.
104. D. A. Gift and W. R. Schonbein, "Diagnostic yield analysis of indications for radionuclide brain
scanning." Report to Ll.S. Dept. of Energy, E(II-l)2777, Dept. of Radiology, Michigan State
Univ., E. Lansing MI, Entropy Minimax Sourcebook, 4, Entropy Pub., Lincoln, MA, 1981, pp.
405-414.
300 R. CHRISTENSEN

105. D. A. Gift. W. R. Schonbcin and E. J. Potchen, "An introduction to entropy rmrnmax pattern
detection and its use in the determination of diagnostic test efficiency." National Cancer Institute,
CA 18871·02 DHEW, Dept. of Radiology, Michigan State Univ., E. Lansing, MI, Sept. 1978,
Entropy Minimax Sourcebook, 4, Entropy Pub., Lincoln, MA, 1981, pp. 385-401.
106. D. A. Gift, W. R. Schonbein, E. L. Saenger and E. J. Potchen, "Application of an information-
theoretic method for efficacy assessment." Jour. of Nuclear Medicine, 26,1985, pp. 807-811.
107. D. V. Gokhale, "Maximum entropy characterization of some distributions." Statistical Distri-
hllliOlIS in Scientific Work, Vol. 3, ed. by G. P. Pati!, S. Kotz and J. K. Ord. D. Reidel,
Dordrecht-Holland, 1975, pp. 299-304.
108. A. Goldin and H. B. Wood, Jr., " Preclinical investigation of alkylating agents in cancer
chemotherapy." Annals of the New York Academy of Sciences, 163, 1969, pp. 954-1005.
109. A. Goldin, H. B. Wood, Jr. and R. R. Engle, "Relation of structure of purine and pyrimidine
nucleosides to antitumor activity." Cancer Chemotherapy Reports, 1 (2), 1968, pp. 1-272.
110. I. J. Good, Probability and the Weighing of Evidence, Chas. Griffin & Co., London, 1950, p. 63.
III. I. J. Good, "Maximum entropy for hypothesis formulation, especially for multidimensional
contingency tables." Ann. Math. Statist., 34, 1963, pp. 911-934.
(12. R. Gordon, R. Bender and G. T. Herman, "Algebraic reconstruction techniques (ARn for three-
dimensional electron microscopy and Xvray photography." Jour. Theo. Biol.; 29, 1970, pp. 471-
Downloaded by [University of Bristol] at 02:45 26 February 2015

481.
113. S. Guiasu, Information Theory with Applications, McGraw-Hili, New York, 1977, pp. 365-378.
114. D. ter Haar, Elements of Statistical Mechanics. Holt, Rinehart and Winston, New York, 1954, p.
160.
115. A. Hai and G. J. Klir, "An empirical investigation of reconstructability analysis: probabilistic
systems." Int. Jour. Man-Machine Studies, 22, 1985, pp. 163-192.
116. K. E. Hammermeister, T. A. DeRouen and H. T. Dodge, "Variables predictive of survival in
patients with coronary disease. Selection of univariate and multivariate analyses from the clinical,
electrocardiographic, exercise, arteriographic, and quantitative angiographic evaJuations." Circul-
atian, 59, 1979, pp. 421-430.
117. r. E. Harrell, "The LOG 1ST procedure." SUGI Supplemental Library User's Guide, 1983 Edition,
. ed. by S. P. Joyner, SAS Institute, Cary, NC, 1983, pp. 181-202.
118. F. E. Harrell, K. L. Lee, R. M. Califf, D. B. Pryor and R. A. Rosati, "Regression modelling
strategies for improved prognostic prediction." Stat. in Med., 3,1984, pp. 143-152.
119. F. E. Harrell, K. L. Lee, D. B. Matchar and T. A. Reichert, "Regression models for prognostic
prediction: advantages, problems, and suggested solutions," Cancer Trealment Reports, 69, 1985,
pp. 1071-1077.
120. P. J. Harris, r. E. Harrell, K. L. Lee, V. S. Behar and R. A. Rosati, "Survival in medically treated
coronary artery disease." Circulation, 60, 1979, pp. 1259-1269.
121. G. N. Hatsopoulos and E. P. Gyftopoulos, "A unified quantum theory of mechanics and
thermodynamics." Foundations of Physics, 6, 1976, pp. 15-31, 127-141,439-455,561-570.
122. S. Haykin (ed.), Nonlinear Methods of Spectral Analysis. Springer-Verlag, New York, 2nd ed.,
1983.
123. S. Haykin and J. Cadzow (eds.), Proc of 'he First ASSP Workshop in Spectral Estimation, IEEE
Acoustic, Speech, Signal Processing Society, McMaster Univ., Hamilton, Ontario, Canada. August
1981.
124. S. Haykin and S. Kesler, "Prediction-error filtering and maximum-entropy spectral estimation,"
. Nonlinear Methods of Spectral Analysis. ed. by S. Haykin, Springer-Verlag, New York, .1979, pp.
9-72.
125. P. Heidke, "Berechnung des erfolges und der gute der windstarkevorhersagen im sturmwarnungs-
dienst." Geogr. Ann. Stockh., 8, 1926, pp. 3W-349.
126. J. N. Helfer, "An identification of overusers of out-patient facilities." Masters Thesis, Biotech-
nology Program, Carnegie-Mellon Univ .• Pittsburg. PA, May 1973.
127. J. D. Herniter, "An entropy model of brand purchase behavior." Jour. Market Res., 10, 1973, pp.
361-375.
128. J. D. Herniter, "A comparison of the entropy model and the Hendry model." Jour. Markel Res.,
11, 1974, pp. 21-29.
129. A. D. Hirschman, "An application of entropy minimax pattern discovery in a multiple class
electrocardiographic problem." BicMedical Engineering Program, Electrical Engineering Dept.,
Carnegie-Mellon Univ., Pittsburgh, PA, Dec. 18, 1975, Entropy Minimax Sourcebook, 4, Entropy
Pub., Lincoln, MA, 1981, pp. 295-315.
ENTROPY MINIMAX MODELING 301

130. A. D. Hirschman, "Methods for efficient compression, reconstruction, and evaluation of digitized
electrocardiograms." Ph.D. Dissertation, BioMedical Engineering Program, Electrical Engineering
Dept., Carnegie-Mellon Univ., Pittsburgh, PA, 1977.
131. A. Hobson, Concepts in Statistical Mechanics. Gordon and Breach, New York, 1971, 172pp.
132. H. Hotelling, "Analysis of a complex of statistical variables into principal components." Jour.
Educ, Psychol., 24, 1933, pp. 417-441.
133. G. F. Hughes, "On the mean accuracy of statistical pattern recognizers." IEEE Trans. Infor.
Theory, IT-14, 1968, pp, 55-63.
134. R. S. Ingarden, "Information theory and variational principles in statistical theories." Bull. Acad.
Polan. Sci., Ser. Sci. Math. Astronom. Phys., 11,1963, pp. 541-547.
135. E. T. Jaynes, "Information theory and statistical mechanics." Phys. Rev., 106, 1957, pp. 620-630;
108, 1957, pp. 171-190.
136. E. T. Jaynes, "New engineering applications of information theory." Proc. oj the First Symposium
on Engineering Applications of Random Function Theory and Probability, ed. by J. L. Bogdanoff
and F. Kozin, J. Wiley, New York, 1963, pp. 163-203.
137. E. T. Jaynes, "Foundations of probability theory and statistical mechanics." Studies in the
Foundations, Methodology and Philosophy oj Science, Vol. I: Delaware Seminar in the Foundations
of Physics, ed. by M. Bunge, Springer-Verlag, New York, 1967, pp. 77-101.
Downloaded by [University of Bristol] at 02:45 26 February 2015

138. E. T. Jaynes, "Prior probabililies." IEEE Trans. Systems Science and Cybernetics, SSC-4, No.3,
1968, pp. 227-241.
139. E. T. Jaynes, "Where do we stand on maximum entropy?" The Maximum Entropy Formalism, ed.
by R. D. Levine and M. Tribus, MIT Press, Cambridge, MA, 1979, pp. 15-118.
140. E. T. Jaynes, "On the rationale of maximum-entropy methods." Proc. oj the IEEE, 70, 1982, pp.
939-952.
141. H. Jeffreys, "Further significance tests." Proc. Camb. Phil. Soc., 32, 1936, pp. 416-445.
142. H. Jeffreys, "An invariant form for the prior probability in estimation problems." Proc. Roy. Soc.
London, Ser. A, 186, 1946, pp. 453-461.
143. H. Jeffreys, Theory of Probability. Oxford at the Clarendon Press, London, 2nd ed., 1948, p. 158.
144. A. M. Kagan, Y. V. Linnik and C. R. Rao, Characterization Problems in Mathematical Statistics. J.
Wiley, New York, 1973, pp. 408-410.
145. 1. D. Kalbfleisch and R. L. Prentice, The Statis/ical Analysis of Failure Time Data, J. Wiley, New
York, 1980.
146. E. Kalnay-Rivas, A. Bayliss and J. Storch, 'The 4th order GISS model of the global atmosphere."
Beitrage zur Physik der Atmosphiire, 50, 1977, pp. 299-311.
147. E. Kalnay-Rivas and R. Livezey, "Weather predictability beyond a week: an introductory review,"
Turbulence and Predictability in Geophysical Fluid Dynamics and Climate Dynamics, ed. by M. Ghil,
R, Benz; and G. Parisi, North-Holland, Amsterdam, 1985, pp. 311-346.
148. E. L. Kaplan and P. Meier, "Nonparametric estimation from incomplete observations." Jour.
Amer, Statist. Assoc., 53, 1958, pp. 457-481.
149. J. N. Kapur, "Twenty-five years of maximum-entropy principle," Jour. of Mathematical and
Physical Sciences, 17, 1983, pp. 103-156.
150. A, Katz, Principles oj Statistical Mechanics; The InJormation Theory Approach. W, H. Freeman,
San Francisco, 1967,
151. C. R. Kennedy, R. A. Christensen and R. F, Eilbert, U0 2 Pellet Fragment Relocation; Kinetics
and Mechanics, NTIS EPRINP-1106, Electric Power Research Inst., Palo Alto, CA, 1979.
152. C. R. Kennedy, D, S, Kupperman and B. J. Wrona, "Acoustic emission from thermal-gradient
cracks in U0 2 . " Mater. £val., 34, 1976, pp, 91-96. '
153. R. Kikuchi and B. H. Soffer, "Maximum entropy image restoration. I. The entropy expression,"
Jour. Optical Soc, Amer., 67, 1977, pp, 1656-1665,
154. J, L. King, "The role of mutation in evolution." Proc. of the Sixth Berkeley Symposium on
Mathematical Statistics and Probability,S, 1971.
155, G. J. Klir and E. C. Way, "Reconstructability analysis: aims, results, open problems," Systems
Research, 2, 1985, pp. 141-163.
156. S. Kullback, "An application of information theory to multivariate analysis," Ann. Math. Statist.,
23, 1952, pp. 88-102.
157. S. Kullback, Information Theory and Statistics. 1. Wiley, New York, 1959, pp. Ill, 120, 143.
[Republished, with corrections and additions, by Dover, New York, 1968],
158. S. Kullback and R, A, Leibler, "On information and sufficiency." Annals of Math. Statist., 22,
1951, pp. 79-86.
159. L. N, Landa, "Logical-informational algorithm for learning theory." Psychological Journal (in
Russian), 2, 1962, pp. 19-40.
302 R. CHRISTENSEN

160. A. A. Langer, "Expansion coefficients on an orthonormal basis as features for the QRS complex."
Ph.D. dissertation, Carnegie-Mellon Univ., Pittsburgh, PA, 1974.
161. S. Lee, L. Rayes, E. Rumble, D. Wheeler and A. Woodis, "Comparison of COMETHE II1-J and
FCODE-BETA fission gas release predictions with measurements." SAO-279-82-PA, EPRI
RP971·2 Report, Science Applications, Inc., Palo Alto, CA, January 1982.
162. E. L. Lehmann, Testing Statistical Hypotheses. 1. Wiley, New York, 1959, p. 173.
163. C. E. Leith, "The standard error of time-average estimates of climatic means." J. Appl. Meteor.,
12, 1973, pp. 1066-1069.
164. C. E. Leith, "The design of a statistical-dynamical climate model and statistical constraints on the
predictability of climate." Appendix 2.2 of The Physical Basis of Climate and Climate Modelling,
World Meteor. Org., No. 16, 1975, 265pp.
165. C. E. Leith, "Predictability of climate." Nature, 276, 1978, pp. 352-355.
166. A. Lent, "A convergent algorithm for maximum entropy image restoration, with a medical X-ray
application." Image Analysis and Evaluation, 1976 SPSE Conference Proceedings, July 19-23, 1976,
Toronto, ed. by R. Shaw, Society of Photographic Scientists and Engineers, Wash., DC, 1977, pp.
249-257.
167. K. Leontiades, "Computationally practical entropy minimax rotations, applications to the iris data
Downloaded by [University of Bristol] at 02:45 26 February 2015

and comparison to other methods." Tech. Rept., BioMedical Engr. Program, Carnegie-Mellon
Univ., Pittsburgh, PA, 1976, Entropy Minimax Sourcebook, 4, Entropy Pub., Lincoln, MA 1981,
pp. 319-327.
168. B'. Lev and H. Theil, "A maximum entropy approach to the choice of asset depreciation." Jour.
Accoullling Res., 16, 1978,pp. 286-293.
169. R. D. Levine and M. Tribus (eds.), The Maximum Entropy Formalism. The MIT Press, Cambridge,
MA,1979,
170. G.N, Lewis, "The symmetry of time in physics." Science, 71,1930, pp. 569-577.
171. P. M. Lewis, "Approximation probability distributions to reduce storage requirements." Infor-
mation and Control, 2,1955, pp, 214-225.
172. P, M. Lewis, "The characteristic selection problem in recognition systems." IRE Trans. on Infor.
Theory, IT-S, 1962, pp. 171-178.
173. E. Lim, Coolant Channel Closure Modeling Using Pattern Recognition: A Preliminary Report,
EPRI FRMPM21-2, SAl-I 75-79-PA, Science Applications, Inc., Palo Alto, CA, June 1979.
174, E. H, Linfoot, "An informational measure of correlation." Information and Control, 1, 1957, pp.
85-89.
175, J. H. C. Lisman and M. C. A. van Zuylen, "Note on the generation of most probable frequency
distributions." Statistica Neerlandica, 26, 1972,pp. 19-23.
176. R. E. Livezey, T. N. Maisel and A. G. Barnston, "Experiments in seasonal prediction by analogs
using modified versions of the Barnett-Preisendorfer system." Proc. Eighth Climate Diagnostics
Workshop, Downsview, Ontario, Oct. 17-21, 1983, CAC(NOAA-S{f 84-115, NTIS PB84-192418,'
pp, 350-356,
177. H. van Loon and R. L. Jenne, "Estimates of seasonal mean temperature, using persistence between
seasons." Monthly Weather ReView, 103, 1975,pp. 1121-1128.
178. E. N. Lorentz, "Empirical orthogonal functions and statistical weather prediction." Rep. I,
Statistical Forecasting Proj. MIT, Cambridge, MA, 1956,49pp.
179, R. A. Madden, "Estimates of the natural variability of time averaged sea-level pressure." Monthly
Weather Review, 104, 1976, pp. 942-952.
180. R, A, Madden and D. 1. Shea, "Estimates of the natural variability to time averaged temperatures
over the United States." Momhiy Weather Review, 106, 1978, pp, 1695-1703,
181. R. K, McConnell, Jr., "Minimum description analysis of faulted data sets." Canadian Explor.
Geophys. Soc-s-Am. Geophys. Union, Mining Geophys, Symp., Toronto, May 22-23, 1980.
182. J. F. McNeer, C. F. Starmer, A, G. Bartel, V. S. Behar, Y. Kong, R. H, Peter and R. A. Rosati,
"The nature of treatment selection in coronary artery disease." Circulation, 49, 1974, pp. 606-614.
183. O. von Mering and L. W. Earley, "The diagnosis of problem patients." Human Organization, 25,
1966, pp. 20-23.
184. H. H, Merrill, A Textbook of Neurology, Lea and Febiger, Philadelphia, PA, 6th ed., 1979,
185. G. N. Minerbo, "MENT: A maximum entropy algorithm for reconstructing a source from
projection data." Comput. Graph. and Image Proc., 10, 1979, pp, 48-68.
186. G, N. Minerbo and J. G. Sanderson, "Reconstruction of a source from a few (2 or 3) projections."
Rept. LA-6747-MS, Los Alamos National Laboratory, Los Alamos, NM, 1977.
187. A. S. Monin, Weather Forecasting as a Problem in Physics (1969), tr. by J. Smagorinsky, MIT
Press, Cambridge, MA, 1972, pp, 147-148.
188, D. F. Morrison, Multivariate Statistical Methods, McGraw-Hili, New York, 2nd ed., 1976, p. 103.
ENTROPY MINIMAX MODELING 303

189. J. Namias, "Multiple causes of the North American abnormal winter 1976-77." Monthly Weather
Review, 106, 1978, pp. 279-295.
190. S. T. Oldberg, "Probabilistic code development." Planning Support Document for the EPRI Light
Water Reactor Fuel Performance Program, ed. by J. T. A. Roberts, F. E. Gelhaus, H. Ocken, N.
Hoppe, S. T. Old berg, G.R. Thomas and D. Franklin, NTIS EPRINP-737SR, Electric Power
Research Inst., Palo Alto, CA, 1978, pp. 2.52-2.57.
191. S. T. Oldberg, "New code development activities." LWR Fuel Performance Program: Progress in
1978, ed. by J. T. A. Roberts, F. E. Gelhaus, H. Ocken, N. Hoppe, S. T. Oldberg, G. R. Thomas
and D. Franklin, EPRI NP-1024SR, Electric Power Research Inst., Palo Alto, CA, 1979, pp. 2.32-
2.39.
192. S. T. Oldberg and R. A. Christensen, "Dealing with uncertainty in fuel rod modeling." Nuclear
Technology, 37, 1978, pp. 40-47.
193. E. Parzen, "Autoregressive spectral estimation, log spectral smoothing and entropy." Proc. of the
First ASSP Workshop in Spectral Estimation, ed. by S. Haykin and J. Cadzow, IEEE Acoustic,
Speech, Signal Processing Soc., McMaster Univ., Hamilton, Ontario, Canada, August 1981, pp.
131-137.
194. E. Patrassi, "Die warmeleitfahigkeit von urandioxid bei sehr hohen temperaturgradienten"
(Thermal conductivity of U0 2 at very high temperature gradients). Jour. Nucl. Mater., 22, 1967,
Downloaded by [University of Bristol] at 02:45 26 February 2015

pp.311-319.
195. W. H. Pearson, "Estimation of a correlation measure from an uncertainty measure." Psycho-
metrika, 31, No.3, 1966, pp. 421-433.
196. H. V. Pipberger, R. J. Arms and F. W. Stallman, "Automatic screening of normal and abnormal
electrocardiograms by means of a digital electronic computer," Proc. of the Soc. for Experimental
Biology and Medicine, 106, 1961, pp. 13(}-132.
197. H. V. Pipberger, "Computer analysis of electrocardiograms." Clinical Electrocardiography and
Computers, ed. by C. A. Caceres and L. S. Dreifus, Academic Press, New York, 1970, pp. 109-119.
198. T. Poston and I. Stewart, Catastrophe Theory and Its Applications. Pitman, London, 1978.
199. E. J. Potchen, "Study on the use of diagnostic radiology." Current Concepts in Radiology, 2,
Mosby Co., St. Louis, MO, 1975, pp. 18-30.
200. E. J. Potchen and W. R. Schonbein, "A strategy to study the use of radiology as an information
system in patient management." Joint Masters Dissertation. Sloan School of Management, Mass.
Inst. of Tech., Cambridge, MA, June 1973, 164pp.
201. R. W. Preisendorfer, "Model skill and model significance in linear regression hindcasts." SID Ref
Series No. 79-12, Scripps Institution of Oceanography, Univ. of Calif., La Jolla, CA, July 1979.
202. W. L. Proudfit, A. V. G. Bruschke and F. M. Sones, Jr., "Natural history of obstructive coronary
artery disease: ten-year study of 601 nonsurgical cases." Progress in Cardiovascular Diseases, 21,
1978, pp. 53-78.
203. C. R. Rao, Linear Statistical Inference and Its Applications. J. Wiley, New York, 2nd ed., 1973.
204. T. A. Reichert, "The amount of information stored in proteins and other short biological code
sequences." Proc. of the Sixth Berkeley Symposium on Mathematical Statistics and Probability,S,
1971, pp. 297-309.
205. T. A. Reichert, "Patterns of overuse of health care facilities-A comparison of methods." Proc. of
the IEEE 1973 Inti. Conf. on Cybernetics and Society, IEEE Systems, Man and Cybernetics
Society, 73 CHO 799-7 SMC, Boston, MA, November 5-7, 1973, pp. 328-329.
206. T. A. Reichert, "The security hyperannulus-a decision assist device for medical diagnosis." Proc.
of the Twenty-Seventh Annual Conf on Engineering in Medicine and Biology, Alliance for
Engineering in Medicine and Biology, Philadelphia, PA, October 6-10, 1974, p. 331.
207. T. A. Reichert and R. A. Christensen, "Validated predictions of survival in coronary artery
disease." Entropy Minimax Sourcebook, 4, Entropy Pub., Lincoln, MA, 1981, pp. 457-490.
208. T. A. Reichert and R. A. Christensen, "Anticipating and compensating for deviations from training
experience." Fifth Annual Meeting, Society for Medical Decision Making, Toronto, Canada,
October 2-5, 1983.
209. T. A. Reichert, R. A. Christensen and A. A. Bartolucci, "Patterns of prognosis I: Survival in
advanced non-Hodgkin's lymphomas." (1985 in preparation).
210. T. A. Reichert, R. A. Christensen, A. A. Bartolucci and C. Walker, "Patterns of survival in
advanced non-Hodgkin's lymphoma." Proc. of the Second Inti. Conf on Malignant Lymphoma,
Swiss League Against Cancer, Lugano, Switzerland, 1984. Malignant Lymphomas and Hodgkin's
Disease: Experimental and Therapeutic Advances, ed. by F. Cavalli, G. Bonadonna and M.
Rozencweig, Martinus NijholT, (1985 in press), 653 pp.
211. T. 'A. Reichert and A. J. Krieger, "Quantitative certainty in dilTerentia1 diagnosis." Proc. of 'he
Second Inti. Joint Conf on Pattern Recognition, IEEE 74 CHO 885-4C, Copenhagen, Denmark,
August 13-15, 1974, pp. 434-437.
304 R. CHRISTENSEN

212. T. A. Reichert and Y~ Stephanedes, "Aden dum on dilTerential diagnosis of three diseases of the
cervical spine." Entropy Minimax Sourcebook, 4, Entropy Pub., Lincoln, MA, 1981, pp. 255-258.
213. T. A. Reichert and A. K. C. Wong, "A finite state information source view of molecular genetic
phenomena." Proe. of the Pittshurgh Symposium 011 Modelling and Simulatioll, 1971, pp. 45-51.
214. T. A. Reichert, J. M. C. Yu and R. A. Christensen, "Molecular evolution as a process of message
refinement." Jour. of Molecular Euouuton. 8, 1976, pp. 41-54. .
215. A. Renyi, WahrscheinJichkeitsrechntmg, mit einem Anhanguber lrformaiionstheorie, Deutscher
Verlag der Wissenschaften, Berlin, 1962.
216. J. M. Richardson, "The hydrodynamical equations of a one component system derived from
nonequilibrium statistical mechanics." Jour. Math. Anal. Appl., I, 1960, pp. 12-60.
217. E. A. Robinson, "A historical perspective of spectrum estimation." Proc. of the IEEE, 70, 1982, pp.
885-907.
218. T. E. Rosmond, "NOGAPS: Navy Operational Global Atmospheric Prediction System." Fifth
Conference on Numerical Weather Prediction, Amer. Meteor. Soc., Monterey, CA, November 2-6,
1981, pp. 74-79.
219. E. H. Ruspini, "A new approach to clustering." Inform. and Control, 15, 1969, pp. 22-32.
220. E. L. Saenger, C. R. Buncher, B. L. Specker and R. A. McDevitt, "Determination of clinical
Downloaded by [University of Bristol] at 02:45 26 February 2015

efficacy: nuclear medicine as applied to lung scanning." Jour. of Nuclear Medicine, 26, 1985, pp.
793-806.
221. W. R. Schonbein, "Identification of patterns in diagnostic attributes in skull trauma cases using an
entropy minimax approach." Presented at IEEE Inti. Conf. on Cybernetics and Society, NTIS COO-
2427-2, Sloan School, MIT, Cambridge, MA, 1973. .
222, W. R. Schonbein, "Analysis of decisions and information in patient management." Current
Concept ill Radiology, 2, Mosby Co., SI. Louis, MO, 1975, pp. 31-58.
223. G. E. Schultz, C. D. Barry, J. Friedman, P. Y. Chou, G. D. Fasman, A. V. Finkelstein, V. I. Lim,
O.B. Ptitsyn, E. A. Kabat, T. T. Wu, M. Levitt, B. Robson and K. Nagano, "Comparison of
predicted and experimentally determined secondary structure of adenyl kinase." Nature, 250, 1974,
pp. 140-142.
224. G. Sebestyen and J. Edie, "Pattern recognition research." AFCRL-64-821, NTIS AD-608-692,
Litton Systems, lnc., Waltham, MA, June 14, 1964.
225. C. E. Shannon, "A mathematical theory of communication." The Bell Sys. Tech. Jour., 27, 1948,
pp. 379-423, 623-656.
226. D. J. Shea, "Sensitivity studies on the estimates of climate noise and potential long range
predictability of January temperature and precipitation over the U.S. and Canada." Proc. of the
Eighth' Climate Diagnostics Work.,hop, Downsview, Ontario, October 17-21, 1983, CAC(NOAA-
s/r 84-115, NTIS PB84-192418, pp. 313-321.
227. J. E. Shore, "Derivation of equilibrium and time-dependent solutions to MIMloollN and MIMloo
queueing systems using entropy maximization." Nat. Computer Conf, AFIPS, Anaheim, CA, June
5-8, 1978, pp. 483-487.
228. J. E. Shore and R. M. Gray, "Minimum cross-entropy pattern classification and cluster analysis."
I EEE Trails. 011 Pattern Analysis and Machine Intelligence, PAMI-4, 1982, pp. 11-17.
229. T. F. Smith, "The genetic code, information density, and evolution." Math. BioSei., 4, 1969, pp,
179-187.
230. T. L. Smith, E. A. Gehan, M. J. Keating and E. J. Freireich, "Prediction of remission in adult
leukemia."·Cancer,SO, 1982, pp. 466-472.
231. H. von Storch and G. Hannoschock, "Statistical aspects of estimated principal vectors (EOFs)
based on small sample sizes." Jour. Climate Appl. Meteor., 24, 1985, pp. 716--724.
232. L. Szilard, "On the decrease of entropy in a thermodynamic system by the intervention of
intelligent beings." Zeitschrift fur Physik, 53, 1929, pp. 840-856; tr. by A. Rapoport and M.
Knoller in Behavioral Science, 9, 1964, pp. 301-310; reprinted in Quantum Theory and Measure-
mem, ed, by J. A. Wheeler and W. H. Zurek, Princeton Univ. Press, Princeton, 1983, pp. 539-548.
233. R. Thom, Structural Stability and Morphogenesis (1972), tr. by D. H. Fowler, Benjamin-Addison
Wesley, New York, 1975.
234. P. D. Thompson, "A heuristic theory of large-scale turbulence and long-period velocity variations
in barotropic now." Tellus,9, 1957, pp. 69-91.
235. M. Tribus, "Information theory as the basis for thermostatics and thermodynamics." Jour. Appl.
Meek, 28, 1961, pp. 1-8.
236. M. Tribus, Thermostatics and Thermodynamics. An Introduction to Energy, Information and States
of Matter, with. Engineering Applications, D. Van Nostrand, Princeton, NJ, 1961.
237. M. Tribus, "The use of the maximum entropy estimate in the estimation of reliability:' Recent
Deielopmerus in lnformatton and Decision Processes, ed. by R. E. Machol and P. Gray, Macmillan,
New York, 1962, pp. 102-140.
ENTROPY MINIMAX MODELING 305

238. M. Tribus, RatiO/la1 Descriptions, Decisions, and Designs, Pergamon Press, New York, 1969.
239. C. A. Truesdell, The Tragicomical History of Thermodynamics 1822-1854, Springer-Verlag, New
York, 1980.
240. N. S. Tzannes and J. P. Noonan, "The mutual information principle and applications:' Inform.
and Control. 22, 1973, pp. 1-12.
241. T. J. Ulrych and T. N. Bishop. "Maximum entropy spectral analysis and autoregressive
decomposition." Rev. Geophysics and Space Physics, 43, 1975, pp. 183-200.
242. I. Vincze, "An interpretation of the I-divergence of information theory:' Trans. of the Second
Prague Con! on Information Theory, Statistical Decision Functions, Random Processes, Prague.
June 1-6, 1959, Academic Press, New York, 1960, pp. 681-684.
243. S. H. Walker and D. B. Duncan, "Estimation of the probability of an event as a function of
several independent variables:' Biometrika, 54,1967, pp. 167-179.
244. C. S. Wallace and D. M. Boulton, "An information measure for classification," Computer Journal,
II, 1968, pp. 185-194.
245. J. N. Walton, Brain's Diseases of the Nervous System. Oxford Univ. Press, London, 7th ed., 1969.
246. G. S. Was, R. A. Christensen. C. Park and R. W. Smith, "Statistical patterns of fuel failure in
stainless steel clad light water reactor fuel rods:' Nuclear Technology, 71, 1985, pp. 445-457.
247. S. Watanabe, "Une explication mathernatique du classement d'objets." Information and Prediction
Downloaded by [University of Bristol] at 02:45 26 February 2015

in Science, ed. by S. Dockx and P. Bernays, Academic Press, New York, 1965, pp. 39-76.
248. S. Watanabe, Knowing and Guessing, J. Wiley, New York, 1969.
249. M. J. Webber, Informalion Theory and Urban Spatial SUlIcture, Croom Helm, London, 1979.
250. N. Wiener, Cybernetics, The MIT Press, Cambridge, MA, 2nd ed., 1961.
251. A. G. Wilson, Entropy In Urban and Regional Modelling, Pion, London, 1970.
252. L. J. Wilson and H. R. Stanski, "Assessment of operational REEP/MDA probability of
precipitation forecasts." Eighth Conference on Probability and Statistics in Atmospheric Sciences,
Amer. Meteorological Society, Hot-Springs, AR, November 16-18, 1983, pp. 193-199.
253. S. Wold. "Pattern recognition by means of disjoint principal components models." Tech. Rept.
No.2, Research Group for Chernometrics, Umea Univ., Sweden, March 1975.
254. P. H. Woods, W. H. Tusa, P. J. Sausville, J. W. Ritz and W. E. Blain, "Technology for the storage
of hazardous liquids, a state-of-the-art review." Dept. of Environmental Conservation, Albany,
NY, January 1983.
255. P. M. Woodward and 1. L. Davies, "Information theory and inverse probability in telecommunic-
ation." Proc. IEEE, 99, Part 3,1952, pp. 37-44.
256. A. Wragg and D. C. Dowson, "Fitting continuous probability density functions over [0,00] using
information theory ideas." IEEE Trans. Infor. Theory, IT-16, 1970, pp. 226-230.
257. B. J. Wrona, J. T. A. Roberts, E. Johanson and W. D. Tuohig, "First report on apparatus \0
simulate in-reactor transient heating conditions in oxide fuel columns." Nucl. Technol., 20, 1973,
pp. 114--123.

Ronald Christensen has headed Entropy Limited, conducting statistical modeling research in science,
engineering and medicine, since 1973. Previously, he held research positions at IBM, the RAND Corporation
and the Lawrence Berkeley Laboratory. He has taught and conducted research at Carnegie-Mellon
University, the University of Maine and the University of California, Berkeley, and is the author of General
Description of Entropy Minimax, 1981, Multivariate Statistical Modeling, 1983,Order and Time, 1984, Data
Distributions. t984, and other books and papers in physics, statistics, and predictive modeling.
Dr. Christensen received a Ph.D. in Theoretical Physics from the University of California, Berkeley, a J.D.
from Harvard Law School, an M.S. in Mechanical Engineering from the California Institute of Technology,
and a B.S. in Electrical Engineering from Iowa State University. He is a rnemberIof the American
Mathematical Society, the American Statistical Association and the American Physical Society.

You might also like