ENTROPY MINIMAX MULTIVARIATE STATISTICAL Modeeling PDF
ENTROPY MINIMAX MULTIVARIATE STATISTICAL Modeeling PDF
ENTROPY MINIMAX MULTIVARIATE STATISTICAL Modeeling PDF
To cite this article: RONALD CHRISTENSEN (1986) ENTROPY MINIMAX MULTIVARIATE STATISTICAL MODELING—II: APPLICATIONS,
International Journal of General Systems, 12:3, 227-305, DOI: 10.1080/03081078608934938
Taylor & Francis makes every effort to ensure the accuracy of all the information (the “Content”) contained
in the publications on our platform. However, Taylor & Francis, our agents, and our licensors make no
representations or warranties whatsoever as to the accuracy, completeness, or suitability for any purpose of the
Content. Any opinions and views expressed in this publication are the opinions and views of the authors, and
are not the views of or endorsed by Taylor & Francis. The accuracy of the Content should not be relied upon and
should be independently verified with primary sources of information. Taylor and Francis shall not be liable for
any losses, actions, claims, proceedings, demands, costs, expenses, damages, and other liabilities whatsoever
or howsoever caused arising directly or indirectly in connection with, in relation to or arising out of the use of
the Content.
This article may be used for research, teaching, and private study purposes. Any substantial or systematic
reproduction, redistribution, reselling, loan, sub-licensing, systematic supply, or distribution in any
form to anyone is expressly forbidden. Terms & Conditions of access and use can be found at http://
www.tandfonline.com/page/terms-and-conditions
Int. 1. General Systems, 1986, Vol. 12.227-305 (f;) 1986 Gordon lind Breach, Science Publishers, Inc.
0308·1079/86/1203-ll227 130.00/0 Printed in Great Britain
Applications of entropy minimax are summarized in three major areas: meteorology, engineering!
materials science, and medicine/biology. The applications cover both discrete patterns in multi-
dimensional spaces of mixed quantitative and qualitative variables, and continuous patterns employing
concepts of potential functions and fuzzy entropies. Major achievements of entropy minimax modeling
include the first long range weather forecasting models with statistical reliability significantly above
Downloaded by [University of Bristol] at 02:45 26 February 2015
chance verified on independent data, the first models of fission gas release and nuclear fuel failure under
commercial operating conditions with significant and independently verified statistical reliability, and
the first prognosis models in coronary artery disease and in non-Hodgkin's lymphoma with significant
predictability verified on independent data. In addition, applications of entropy minimization and
maximization separately are reviewed, including feature selection, unsupervised classification. proba-
bility estimation, statistical distribution determination, statistical mechanics and thermodynamics,
pattern recognition, spectral analysis and image reconstruction. Comparisons between entropy minimax
and other methodolodies are provided, including sample average predictors, nearest neighbors
predictors, linear regression, logistic regression, Cox proportional hazards regression, recursive par-
titioning, linear discriminant analysis. mechanistic modeling, and expert (heuristic) programming.
INDEX TERMS: Cancer survival analysis, erossvalidation, data analysis, diagnosis, differential
diagnosis, entropy. entropy minimax. experimental design, failure analysis, feature
selection, fission gas release, forecasting, hazard analysis, heart disease survival
analysis, information theory, long range weather forecasting, materials defor-
mation, maximum entropy, medical diagnosis, medical prognosis, minimum entropy,
molecular evolution, multivariate statistics, nuclear fuel failure, pattern discovery,
predictive modeling, probability estimation, statistical modeling, statistical predic..
tion, survival analysis, system failure, systems modeling, tank lifetime analysis,
underground storage system leakage, weather forecasting, weight normalization.
I. INTRODUCTION
This is the second of two papers on entropy minimax. The first paper (1)56
reviewed the theory of entropy minimax as an information theoretic approach to
predictive modeling. This paper (II) reviews applications in three major areas:
• meteorology
• engineering/materials science
• medicine/biology
In each application area, the problem is one of predicting the future behavior of
a complex system based on limited, incomplete, uncertain and occasionally
contradictory information. Each is a real-life application using actual data, which,
in many cases, come from large scale projects extending over several years to
collect and manage the data. Some of the applications focus on survival of cancer
and heart disease patients. Others involve future rainfall supplying water for
227
228 R. CHRISTENSEN
domestic, agricultural and commercial uses. One involves rupture of nuclear fuel
pins in commercial power reactors. Another involves leaking of underground
storage tanks with possible risks to health and safety.
Because of the importance of these problems, there has been considerable prior
research in developing predictive models for them. A variety of methodologies,
including various forms of regression analysis, historical analog selection, mechan-
istic modeling and expert programming, have been used with varying degrees of
success and failure. Tn each case, the entropy minimax information-theoretic
approach has provided a margin of improvement in predictive performance when
assessed on independent data. Tn survival models for cancer and heart disease, for
example, the improvement is several percentage points gain in predictive accuracy.
The significance of the problem makes every point of improvement valuable. Tn
other cases, e.g., prediction of stress-corrosion cracking of fuel elements, the
improvement is even more dramatic. There is a wide gap between entropy
Downloaded by [University of Bristol] at 02:45 26 February 2015
minimax and the next best predictive model. In some cases, e.g., precipitation
forecasting at several months lead time, the entropy minimax models represent the
first time statistically significant predictive accuracy has been achieved on inde-
pendent data by any method.
The central focus of this paper is on applications in which both the mInImIZ-
ation and the maximization aspects of entropy minimax are utilized. This brings
together results not previously reviewed as a whole. Following the main lines of
this research to date, these results are discussed in three sections: meteorology,
engineering/materials science, and medicine/biology. A final section briefly sum-
marizes applications of entropy minimization and entropy maximization sep-
arately, on which there is already an extensive literature. (See Section V below for
references.)
Table I presents the results in applications for which categorical test/training
comparisons are available (percentages correct on binary classifications). On the
independent test data, errors of the entropy minimax models were consistently
lower than those of comparison methods. Tn 14 of the 18 cases, the accuracies of
the entropy minimax predictions were statistically significant at the 0.05 level or
better. In one case, the confidence level was 0.07. In another, it was 0.13. Tn two
cases, the results were indistinguishable from chance. (See Subsection 1T.B.7 below
for the definitions of assessment measures and confidence levels.)
The level of difficulty of these applications is indicated by the numbers of
independent variables and the magnitudes of the weight normalization compared
to the training sample sizes. The weight normalization, w, is an information
theoretic measure of the amount of noise in the data. See Subsection V.B.I below
for estimation procedures .and see paper (1)56 for theoretical derivations. Generally
speaking, w is the effective sample size of the background. The training' sample
must have a size of this magnitude in order to have a weight equal to that of the
background. This is the minimum needed to begin to achieve individual event
resolution. If the training sample has significantly fewer than w events, it is
unlikely that the accuracy of the model will significantly exceed that of a fixed
sample average predictor. The highest value in Table T, W= 120, occurred in a case
with training sample size of N = 82, for which the test data accuracy was 60%. The
greatest accuracies (> 80% correct) were achieved in cases with relatively low
weight normalizations.
For methodologies which are not protected against overfitting, such as step-wise
regression, a typical rule of thumb is that the training sample size must be at least
ENTROPY MINIMAX MODELING 229
TABLE I
Percentage correct of entropy minimax and of best comparison methodology for which test results
available on independent validation data. Also given are the number of independent variables, the
weight normalization (a measure of the amount of noise in the data), and the training and test sample
sizes. The two-tailed confidence levels of the entropy minimax results are given in parentheses (the
probability of achieving the observed classification accuracy by chance alone).
Meteorology
N.CA ann. precip. 42 100 96 30 40% 70"/, (0.07)
W.OR winter precip. 24 70 92 37 45% 74% (0.01)
E.WA spring precip. 20 70 93 16 57% 69% (0.13)
E.WA winter precip. 42 70 72 37 35% 69% (0.03)
C.AZ winter precip.
Downloaded by [University of Bristol] at 02:45 26 February 2015
"Total sample of 139 in a randomized round-robin splitting 80%:20"10 between training and test.
"No trial crossvalidauon.
five times the number of IVs. Thirteen of the 18 cases in Table I would fail to
qualify under this restriction. Entropy minimax, on the other hand, which
inherently compensates for the likelihood of chance correlations, achieved statisti-
cally significant accuracy (at the 0.10 level or better) in 10 of these 13 cases. The
comparison methods achieved this level in only 2 of the 13 cases.
II. METEOROLOGY
A. Background
Long range weather forecasting poses a challenge which tests any methodology for
modeling the behavior of complex systems with incomplete information. Consider-
ing the fact that a significant portion of the globe may contribute effects to any
specific region's future weather over time frames of months, seasons, or years, the
potentially relevanl ocean-atmospheric variables number in the tens of thousands
or more. This is further compounded by the fact that each such variable, say the
atmospheric pressure at a particular station or the sea-surface temperature
230 R. CHRISTENSEN
averaged over a specific zone, is not one number but rather a time series of
numbers extending back through the historical record.
Although the set of available meteiologically relevant independent variables
(IVs) is very large, it is nonetheless incomplete. Furthermore, which variables are
relevant at any reasonably specific level of effect on the dependent variable (DV) of
interest, say precipitation or temperature, is not entirely clear. Even for variables
that are known to be relevant and available, data are often incomplete, suffering
gaps, uncertainties and inconsistencies. No records are available on some variables
for many years, particularly in the 19th and early 20th centuries, and during war
years, but also at a scattering of other times. In some cases, a record is kept but its
meaning changes discontinuously from one year to the next, for example, when a
building is constructed near a temperature gauge or when the position of a rainfall
gauge is changed. Often the recorded information does not conveniently inform us
of such changes and we can only suspect them from apparent shifts in the time
Downloaded by [University of Bristol] at 02:45 26 February 2015
series itself.
In order to have a proper frame of reference for assessing weather forecasting
models, it is necessary to have some understanding of what levels of predictability
may reasonably be expected of models and what success is achieved by classical
methodologies. Thus, we begin with a brief review of predictability and a
discussion of common meteorological predictors.
1. Predictability
Sensitivity to variations The possible predictability of weather factors, such as
precipitation and temperature, has been studied from two points of view. The first
is an analysis of the sensitivity of future weather factors to slight variations in
inputs to "general circulation" modelsl46.218 designed to simulate mechanistically
the lluid mechanics and thermodynamics of the ocean-atmosphere system. Beyond
3-5 days into the future, the outputs from these models have significant un-
certainties; beyond about 10 days they become highly unreliable; and beyond 15-
20 days it is virtually impossible for such models to track actual weather
patterns. 75.234 Although all we know for certain is that the error growth with
increased lead time is a characteristic of the specific models studied, this is
generally interpreted as indicating the actual sensitivity of future weather patterns
to slight changes in past conditions, and thus as imposing more-or-less "funda-
mental" limitations on preditability. This does not mean, however, that it is
impossible to predict anything about the weather with lead times longer than, say,
20 days. There exist predictable generalized characteristics totalled (or averaged)
over wilder geographic regions or longer time periods that are of great practical
importance.187.234 For example, as of Jan. 15, 1999, total precipitation in
Sacramento, California on January 15, 2000 may exceed such limitations, but
averaged total precipitation of a broader region for the month of January 2000
may not.
where
I n
X =- LXi'
ni= 1
Downloaded by [University of Bristol] at 02:45 26 February 2015
= J S(w)H}(w) dto,
-<Xl
where
co = frequency,
S(w) = input power spectral density function (estimated from short period, e.g.,
daily, data),
H}(w) = power transfer function from the short periods (e.g., days) to the long
period T (e.g., a month or a season).
,
Fsratio
The predictability is defined as the ratio of the actual to the "natural" variability:
F = a~/a}.
F -ratios of 1.5 and 2.0 indicate, for example, that it is potentially possible to
explain 33% and 50%, respectively, of the variance of the long period means. For
convenience, the percentage E of potentially explainable variance can be used in
place of the F-ratio. It is simple E=(l-l/FJ x 100%.
The numerical values of E not only differ with the variable (e.g., precipitation or
temperature), but also with the specific time period (e.g., month or season). The
general picture 8,147.180.226 is that the percentage of potentially explanable
variance in monthly averages, for example, in the Western U.S. for precipitation
and temperature are about 30-40% and 40-60%, respectively. In the Midwest, they
are about 0-10% and 20-30%; in the Appalachian mountains, 30-60% and 30-40%;
and in the East coast and Gulf regions, 0-10% and 20-30%.
Although the F-ratio is a useful device to obtain a rough idea of the relative
predictability of weather factors in different regions, it may not be an accurate
measure of the true predictability.
232 R. CHRISTENSEN
• Being defined in terms of squared deviations from the mean, F-ratios focus
heavily on being able to predict outlying extreme years. Thus, they may
significantly under-represent the predictability of discrete categories such as
"high," "medium" and "low." For example, one could make a statistically
significant percentage of correct category predictions and yet satisfy a low F-
ratio by being wrong on a small number of extreme cases.
• The F-ratio weights equal magnitudes of error equally, regardless of the
magnitude of the reference. For example, a IOcm error in precipitation for a
20cm observed value is given the same weight as for a IOOcm observed value,
even though the error is half the observed value in the one case and only 10%
in the other. Thus, the F-ratio may not pick up useful predictability of one
range of values because of fluctuations in other ranges.
• The F -ratio ignores the sign of the error. However, a drought, for example,
Downloaded by [University of Bristol] at 02:45 26 February 2015
variables from other regions, that may be related to the dependent variable being
predicted.
The performance of the persistence predictor for seasonal mean temperatures in
the U.S. has been studied by van Loon and Jenne"?" and Namais.I'" Percentages
of explained variance are generally low. The highest values (5-20%) occur in
various coastal areas, depending upon the season, and the figu:es are very low
(0-5%) over the bulk of the U.S. For precipitation forecasting, the persistence
predictor performs even more poorly than for temperature forecasting.
Linear regression Using a linear predictor methodology": 73.178.20( and coeffi-
cients with a yearly periodicity, models have been built to predict monthly and
seasonal mean atmospheric temperatures in the U.S. using IVs consisting of
contemporaneous or prior sea surface temperatures (SST) or sea level pressures
(SLP).8 [If IVs are used from the same time interval as the DV, then the process is
referred to as "specification" rather than "prediction."] Models were trained on the
Downloaded by [University of Bristol] at 02:45 26 February 2015
period 1902-1970 at four different lag times: 0, I, 2 and 3 seasons. At O-season lag
time (specification rather than prediction), the best SST variables were found to
explain 10-30% of the training data variance in mean seasonal temperatures for
west coastal regions, and zero in the midwest. At I-season lag, the figures drop to
5-20% in coastal regions and the zero-skill area broadens. Specification variance
explained was somewhat higher when SST was replaced by SLP, but the figures
similarly degraded with seasonal lag. The test period 1971-1979 was used for
independent validation and the results were considered to be not inconsistent with
the training data percentages, although 9 years is too short to attribute statistical
significance at the low skill levels observed.
100,---------------------------------,
:z:
o
~
I-
0..
W
I..LJ
0::
0.. AVERAGE
:z:
o
I-
«:
l-
V)
r-,
10 20 30 40 50 60 70 80
WATER YEAR
FIGURE Seven station precipitation index (SSP I) for Northern California for the period 1852-
1977.
Since each year corresponds to an SSPI event, the total 'dataset contains 126
events. This dataset was randomly divided into two subsets of 63 events each,
subject to a constraint equilibrating the overall averages of the SSPI for the two
portions. One subset was used for model building, the other was reserved for
model verification. Because of the constraint, one degree of freedom must be
accounted for in significance testing.
ENTROPY MINIMAX MODELING 235
count.
The 15 filters were all linear combinations of values for years (i, i-I, i - 2, ...)
preceding the year (i+ I) being predicted. Five of the filters were differential,
having a long term expected value of zero (e.g., moving difference= 11;- 11;-1'
moving biannual difference = 11;+ 11;-1- 11;-2- 11;-3' etc.). The remaining 10 were
integral, having nonzero expectation value (e.g., l-year cycle = 11;, 2-year cycle = 1I;~1>
moving average =(11;+ 11;-1)/2, linear extrapolar=211;- 11;-1' etc.). Note that for
an annually defined dependent variable, the l-year cycle filter is a persistence
predictor.
The 1305 variables were subjected to a preliminary screening by computing
correlational statistics for the 63 events in the model building sample:
• correlation coefficient r for the DV (future SSPI) and each IV (filtered time-
series). [This is called a "lag correlation," the lag specifying the interval by
which the DV is in the future relative to the IV.]
• ratio of Irl to an estimate of the correlation coefficient's standard deviation a
(estimated by Monte Carlo shuffiing of the DV values).
The lag correlations for the 1305 variables were found to be rather low. In
absolute value, the bulk were in the range 0.0--0.1, fewer in the range 0.1-{).2, fewer
still in the ranges 0.2-{).3 and 0.3-{).4, and very few larger than 0.4. There were,
however, significantly more in the higher ranges than would be expected from a
random distribution of correlation coefficients. Thus, if a reasonably large number
of the relatively highly correlating variables are selected for model building, there
is a good chance of enhancing the signal-to-noise ratio. Because of the modest
values of even the highest correlations, however, it is important not to try to focus
on a few variables, since that runs too high a risk of picking up purely chance
correlations. Thus, it is necessary to violate the restriction of many modeling
methodologies of using a very limited number of variables,3.93.133 say fewer than
one-third, one-fifth, or one-tenth the number of data points. (Use of many
variables confers no special advantage when assessment is on independent data
not used in model building.) Candidate variables with Irl/u above 2.0 were
tAn example is the average January-March sea surface temperature deviation from normal for
So-120oW longitude, o-lOoN latitude.
236 R. CHRISTENSEN
• steady or declining (this year minus last year) winter sea surface temperatures
in the northeastern equitorial belt, and
• normal or high sea surface temperatures in the western equitorial belt last
year, and
• high precipitation at Colfax this year, and
• low tree ring growth (moving biannual difference) in the Truckee area.
The details of this model are not unique. They depend on the precise selection
Downloaded by [University of Bristol] at 02:45 26 February 2015
of features and model building years. If a large number of models were built with
varying inputs, we would obtain a cluster of low-entropy models of which the one
given above is an example. However, sharing similar informational properties,
their predictive capabilities would be similar. When tested on the 63 years reserved
for model verificanon.r ' the above pattern set makes correct predictions in 31,
incorrect predictions in 18, and 14 years do not match any of the three patterns.
See Table II. 6 7 The accuracy of (31/49) x 100%=63% has a statistical significance
of 1- (l( = 0.94 against the null hypothesis of arising by chance. This is compared,
for example, to 50% accuracy for the climatological average (32 of the 63 training
years were wet), and 55% for the persistence predictor. When only extreme years
are considered (more than about one standard deviation either wetter or dryer
than normal), the difference is even more pronounced. The persistence predictor
accuracy becomes 56% and the entropy minimax pattern rise to 78% correct.
By comparison, the accuracy of the persistence predictor with zero lead time is
55% on all years and 56% on extreme years. (Note: Skill ratio is computed as
TABLE II
Predictive performance for Northern California annual precipitation (63 randomly
selected verification years).
Climatological average
Percentage correct 50% 50%
Persistence (0 lead time)'
Percentage correct 55% 56%
Statistical significance 55% 50%
Variance explained" 2% 4%
Skill 10% 11%
Entropy minimax patterns (2 month lead time)
Percentage correct 63% 78%
Statistical significance 94% >99%
Variance explained (3 categories)' 20% 36%
Skill 27% 56%
Statistically, the results of the revised model are similar to those of the original
model. See Table III. For all years, 67% of the entropy minimax predictions are
correct, a result with a 93% two-tailed significance against chance. For extreme
years, the percentage correct is 70%, which also has a 93% significance. Consider-
ing the F-ratio statistic for precipitation variability in the Western United States,
these three patterns predict about half of the overall SSPI variance that it is
possible for any model to predict. (The performance of the climatological predictor
is computed assuming equi-frequency wet: dry predictions-the model building
data were split evenly 48:48, while the verification data were 18:12 for all years
and 11:9 for extreme years.) When the predictions are assessed on an individual
pattern basis, the dry patterns turn out to have higher accuracies than wet
patterns, suggesting that drought may be a more predictable phenomenon in
California than heavy precipitation.
Downloaded by [University of Bristol] at 02:45 26 February 2015
TABLE III
Predictive performance for Northern California annual precipitauon (revised model
verified on the 30 most recent years).
Climatological average
Percentage correct 50% 50%
Percentage (0 lead time)
Percentage correct 47% 40%
Statistical significance neg neg
Variance explained neg neg
Skill -7% -20";'
Entropy minimax patterns (6 month lead time)
Percen tage correct 67% 70%
Statistical significance 93% 93%
Variance explained (2 categories) 9% 16%
Skill 33% 40%
Two attempts were made to build models giving more detailed predictions. In
the first, the SSPI was categorized into three equipopulated groups: low, medium
and high. In the second attempt, two unequal categories were used: lower one-
third and upper two-thirds. In both of these cases, the patterns failed to
crossvalidate on the model verification dataset. It was concluded that the 63/2=31
years per category of the equipopulated binary split was just enough to yield
patterns detectable in the noise, and that reducing this to 63/3=21 years per
category gave insufficient data. Both of these attempts were based on the original
63:63 building/verification split. On the 96:30 revised split, a 3-way categorization
would give 32 years per building category but only 10 years per verification
category. Thus, it may be possible to build a valid low/medium/high model, but,
as yet, there are insufficient data to demonstrate its validity.
Although progress toward a demonstrable 3-category resolution may be
achieved by further analysis (including further work on the list of independent
variables), there will still remain the basic limitations of <lata quality and sample
size. Cleaning out bad grid points in historical sea surface temperature data and
240 R. CHRISTENSEN
recalibrating meteorological time series are important and can help raise the
effective sample size. But how do we lengthen the historical record? One way is
simply to wait. Thirty years from now we will have 30 more data points. Another
way is to try to reconstruct the earlier history. The fact that an analysis technique
such as entropy minimax does not require precise values to obtain useful
information enhances the value of such reconstruction efforts. We do not know the
SSPI for 1848, for example. If someone is able to determine from some source
such as entries in Spanish mission Jogs, agricultural records, or tree rings, for
instance, whether 1848 was a wet, moderate, or dry year in Northern California
with reasonable confidence, that is all that is needed to -add another point to the
database. ::
.'
2. Winter precipitation in Western Oregon at 7 month lead time
In analogy with the Northern California SSPI, a Western Oregon Precipitation
Downloaded by [University of Bristol] at 02:45 26 February 2015
Each filter was applied to the entire set of 994 time-series, and the resulting
distribution of values of It I was compared with that expected by chance. Eight
filters were found to yield a significantly greater than random proportion of high
ItI values (significant at the 88% level). These included the 1-, 2- and 3-year cycle
filters, the linear extrapolater, the moving biannual difference, the most recent
Winter-Fall difference, and the 20 year moving averages of the 1- and 2-year cycle
filters.
A total of 24 features were selected. Trial crossvalidation within the 92 model
building years produced a value of w = 70 for the weight normalization.
Rather than run an unconstrained pattern search as was done in developing the
SSPI model, two constrained pattern searches were run for WOPI. The constraint
on one search was that the first pattern in the sequence must match predomi-
nantly dry years; and the other search was constrained to start with a wet pattern.
The dry-first and wet-first sequences are labeled "0" and "W", respectively.
Predictions are made by amalgamating the probabilities and uncertainties
Downloaded by [University of Bristol] at 02:45 26 February 2015
associated with the patterns matched, one from each sequence. In forming the
amalgamation, patterns are weighted according to the number of events matching
them in the model building data.
A total of 7 patterns were found, 3 in the D-sequence and 4 in the W-sequence.
Variables used in the patterns are precipitation at Havana (Cuba), temperature at
Darwin (Australia), sea surface temperature in the vicinity of Antofagasta (Chile),
precipitation at San Jose (Costa Rica), and precipitation in the Willamette Valley
itself during preceding time periods.
Results on the 37 years reserved for model verification are shown in Table IV.
The entropy minimax patterns at 7 month lead time are 74% correct on non-close-
call predictions. (A probability 0.475 < P < 0.525 is defined as a "close-call.") By
comparison, the persistence predictor at zero lead time has 45% accuracy. (The
explained variance of the patterns is negative despite the high categorical accuracy
because of errors on a few extreme years that dominate the squared deviations.)
TABLE IV
Predictive performance for Western Oregon winter (November-March) precipitation
(verification on 37 most recent years).
Excluding
All years close calls
Climatological average
Percentage correct 50% 50%
Persistence (0 lead time)'
Percentage correct 41% 45%
Statistical significance neg neg
Variance explained -2% -3%
Skill -19% -10%
Entropy minimax patterns (7 month lead time)
Percentage correct 79% 74%
Statistical significance >99% 99%
Variance explained (4 categories) 4% -6%
Skill 58% 48%
'whether western Oregon precipitation in preceding 5·month interval (June-October) was above or below its
median.
242 R. CHRISTENSEN
TABLE V
Predictive performance for Eastern Washington spring (April-June) and winter (November-March)
precipitation (verification on 37 most recent years). Results are given both for all years and for years
excluding CC (close calls).
Climatological average
Percentagecorrect 50"10 50% 50% 50%
Persistence (0 lead time)'
Percentage correct 68% 67% 41% 41%
Statistical significance 97% 92% neg neg
Variance explained 16% neg neg
Downloaded by [University of Bristol] at 02:45 26 February 2015
1%
Skill 35% 33% -19% -18%
Persistence (3 month lead time)"
Percentage correct 54% 48% 51% 50%
Statistical significance 38% neg 13% 0%
Variance explained 2% -2% neg neg
Skill 8% -4% 3% 0%
Persistence (6 month lead time)'
Percen tage correct 49% 44% 38% 35%
Statistical significance neg neg neg neg
Variance explained -3% neg neg neg
Skill -3% -11% -24% -29%
Enrropy minimax patterns (6 month lead time)
Percentage correct 56% 54% 69% 69%
Statistical significance 50"10 31% 97% 97%
Variance explained (categ.) -8% -10% 13% 13%
Skill 11% 8% 37% 37%
"whether Eastern Washington precipitation in previous equallcngth interval, (January-March) or (June-October). was above or below
median.
"Predictor periods: previousOctober-December for spring DV, previousMarch-Julyfor winter Dv.
"Predictor periods: previousJuly-September forspring DV, previous December-April for winter Dv.
TABLE VI
Spring precipitation in Eastern Washington, using patterns based on
1874-1945 for predicting 1946-1963, and patterns based on 1874-1960
[or predicting 1964-1982 (excluding close-calls).
some of the filters use data as early as 3 years preceding the predicted year. (It was
unnecessary to do this for the 1945 cutoff, since World War II put a natural gap
of several years in most Pacific Ocean data records during this period.rt The new
patterns perform better on the 1964-1982 period than the original patterns, 69%
correct compared to 43%, although the small sample size (16 years after
eliminating close-calls) gives a relatively modest statistical significance (87%) to
these patterns.
assessing statistical significance. The region defined by the Central Arizona Winter
Precipitation Index (CAWPI) was determined by selecting stations with high
precipitation correlations, thereby establishing a reasonably homogeneous set. The
minimum lead time between the latest point in an IV and the earliest point in the
DV was set at 2 months. 8 o • 8 t
A total of 15 filters were considered, 8 combinations of annual variables and 7
combinations of seasonal variables. A total of 1,000 time series were preprocessed
on the model building data during feature selection, giving a total of 15,000
filtered-series variables. 561 of the time series represent atmospheric meteorological
data, 430 are time series for sea surface temperature, 2 for stream flow discharges,
2 for tree ring indices, 4 for volcano eruptive indices, and one for the sunspot
count.
Feature selection used the lag correlation t-statistic. Time series with less than
52 years of model building data were dropped as too short. At the 88%
significance level, 4 of the 15 filters yielded an above random number of high-
correlating IVs. This result was confirmed by reanalysis using rank-order corre-
lations and by another analysis using Monte Carlo rescramblings. The four
qualifying filters were: I-year cycle, moving difference, linear extrapolation, and the
most recent summer-winter difference.
A total of 53 features were selected. Twenty-nine involve sea surface temper-
atures (14 are latitudinal and longitudinal gradients), 18 are atmospheric variables,
2 are stream flow discharges, 2 are volcanic eruptive indices, and 2 are sunspot
variables (2 different filters applied to the sunspot time series).
Based on trial crossvalidation runs within the model building data, the weight
normalization was set at w=70. Three D-sequence and three W-sequence patterns
were found. Twenty-two of the 53 features are used in one or more of the 6
patterns.
tAs a general suggestion for research during the next few years using Pacific Ocean data, one may
adapt the building/verification split to the gaps in the data as follows: model building data=start-I924
and 1948-1962; verification data e- 1925-1945 and 1963-presenl. Omitting 1946-1947 eliminates any
possible carry-over from verification IV data (using Vo, V_I and V_ 2 type filters only) into DV data,
while minimizing data loss (much SST data during WWII being unavailable). The advantage of this
split is that it includes the relatively recent 1948-1962 period in the model building dataset while still
'complying both with the philosophy of restricting validation to later years and with the need for
adequate sample sizes.
ENTROPY MINIMAX MODELING 245
Results on the 45 model verification years are shown in Table VII. The entropy
minimax patterns have statistically significant predictive skill at 2 month lead time.
By comparison, the persistence predictor has no skill at zero lead time.
In order to improve the utility of the model, a new approach was taken to the
problem of increasing the predictive resolution. Rather than subdivide the DV into
more intervals, which would have reduced interval sample sizes, the original
bivariate categorization was used for pattern formation and quantitative predic-
tions were made in a post-processor on the basis of the magnitudes of the DV in
model building years matching each pattern. The averages in the model building
years matching the pattern-defined analogs provide the quantitative predictions
and the distribution over the analog years provides an estimate of the uncertainty.
TABLE VII
Predictive performance for Central Arizona winter (December-March) precipitation (45
Downloaded by [University of Bristol] at 02:45 26 February 2015
verification years).
Climatological average
Percentage correct 50% 50%
Persistence (0 lead time)"
Percentage correct 47% 43%
Statistical significance neg neg
Variance explained 2% -1%
Skill -7% -13%
Entropy minimax patterns (2 month lead time)
Percentage correct 65% 71%
Statistical significance 90% 96%
Variance explained (4 categories) 21% 14%
Skill 29% 42%
'wbetber August-November precipitation in central Arizona was above or below its median.
stations to supplement pre-1900 records. The index was defined as deviations from
a moving average to filter out such effects as urbanization induced warming trends
in cities. The 126 years of available data (the period 1852-1982 with data missing
for 5 of the years, 1861-1864, 1866) were divided into 82 model building years and
44 verification years (the 7 most recent, 1976-1982, plus 37 randomly selected).
Each year was classified for pattern analysis purposes as "hot" or "cool,"
depending upon whether its CAST! was above or below median.
Temperatures affect energy demands, growing seasons and other planning
factors. Based on these considerations, the lead time between the last IV datum
and the first DV datum was set at 8 months. The F-ratio for mean July
temperatures in Phoenix is between 1.0 and 1.5, t60 representing Q-33% potentially
explanable variance. Seasonal persistence models (zero lag time) explain about lQ-
15% of the variance in mean summer temperatures in Central Arizona.6.177.t69 At
a two season lag, the performance of inter-seasonal correlation models becomes
Downloaded by [University of Bristol] at 02:45 26 February 2015
essentially random.
Based on trial crossvalidation using subsets of the model building data, the
weight normalization was set at W= 120. This magnitude, exceeding by 50% the
number of model building events, is indicative of a high degree of instability in
CAST! relative to the IVs and the data used. A total of 22 features were selected
for entropy minimax analysis. These included 2 atmospheric temperature variables,
9 Pacific sea surface temperature variables (3 being latitudinal gradients), 4
barometric pressure variables, I precipitation variable, 2 stream flow variables, 2
volcanic activity variables, and 2 sunspot variables.
Two pattern sequences were generated by the entropy minimax analysis: 4
patterns in the H-sequence (hot-first) and 4 patterns in the C-sequence (cool-first).
Results on the verification years are given in Table VIII. The 55-60% accuracy at .
TABLE VIII
Predictive performance for Central Arizona summer (June-September) temperature (44 verification
years).
Climatological average
Percentage correct 50"1. •. 50"1.
Persistence (0 lead time)'
Percentage correct 58% 62%
Statistical significance 71% 83%
Variance explained 7% 9%
Skill 16% 24%
Persistence (4 month lead time)"
Percentage correct 49% 53%
Statistical significance neg 27%
Variance explained 0"1. 0%
Skill -2% 6%
Entropy minimax patterns (8 month lead time)
Percentage correct 55% 60"1.
Statistical significance 40% 71%
Variance explained (4 categories) -9% -9%
Skill 9% 20"1.
"whether average February-May temperature in central Arizona was above or below it5 median.
"Whether previous October-January temperature was above or below its median.
ENTROPY MINIMAX MODELING 247
8 month lead time has insufficient confidence of not being a chance correlation for
reliable use. The persistence predictor has 58-62% accuracy at zero lead time.
When the lead time is extended to 4 months, this drops to 49-53%.
6. Discussion
For seasonal and annual precipitation forecasting, with lead times up to 6 months
or more, entropy minimax pattern matching models have demonstrated statisti-
cally significant skill in two category prediction. There are indications that this can
be refined to more quantitative forecasting. One approach is the pattern-average
post-processing used for CAWPI. Another, not yet tried in meteorological
modeling, is the potential function approach employing fuzzy entropies for
continuous DVs, successfully used in fission gas release modeling and in
lymphoma survival modeling, described in Subsections III.A.2 and IV.A.2 below.
Further work is also required to determine the relationship of predictive reliability
to DV period, lead time and other factors.
Downloaded by [University of Bristol] at 02:45 26 February 2015
Even though the pattern matching models are relatively easy to use once built,
the effort required in model building is considerable. The F-version of the current
pattern search program, for example, is limited to processing independent variables
in batches of no more than 100 at a time. Thus, for example, interactions between
variables no. 5 and no. 150 will only appear if both survive interactive screening of
no. 1-100 and no. 101-200 separately. Even if the limit is increased 10-fold, as is
readily done on many systems, the feature selection preprocessing required is still
considerable. Furthermore, the currently available time series of 100 to 130 years
are only on the border-line of being barely long enough to extract statistically
verifiable patterns, considering that error minimizing weight normalizations are in
the order of 70-120.
These factors have tended to restrict consideration of the approach to develop-
ment of models for specific areas that are important enough to justify the effort.
Examples being considered include:
• Annual precipitation forecasting in drought-sensitive regions (e.g., the Sahel in
Africa)t at lead times useful to remedial planning (e.g., timely conservation
and establishment of alternatives). Annual forecasts are also useful in areas
requiring water for hydroelectric generation.
• Seasonal precipitation forecasting in areas with agricultural requirements at
useful lead times (e.g., midwestern US, southwestern USSR, etc.).
• Precipitation forecasting for specific time frames in areas with construction
and maintenance planning needs (e.g., drilling operations, dam construction,
highway maintenance, trucking, large-scale building projects, etc.)
• Average temperatures and numbers of days with high summer temperatures
and low winter temperatures, at lead times useful in preparing for response to
varying demands for electric power, oil, and gas, etc.
• Combinations of temperature, precipitation and humidity during specific
seasons for energy requirement estimation.
tThe critical period for the Sahel in southern West Africa is the summer monsoonal season (June-
September). Appropriate lead times would be 5 to 8 months preceding June. Available time series data
are 8(}-IOO years long.
248 R. CHRISTENSEN
These are a few examples of important uses of long range forecasts and an even
longer list of general, commercial and governmental uses.
Research efforts that can help to make long range weather modeling more
efficient include:
• Further item-by-item completing, checking and cleaning up existing data
bases to reduce their noise levels.
• Extending existing data bases back further in time, even if only categorically
rather than numerically, by a combination of historical and scientific research.
• Standardizing and regularizing the publication and updating of new features
such as regionally averaged calibrated tree ring indices, other plant growth
indices, stream flow indices, mud varb (soil deposition) indices, etc.
• Developing master lists of time series and filters for large geographic areas (as
. can now be done for the Western U.S.) to facilitate feature selection for
Downloaded by [University of Bristol] at 02:45 26 February 2015
Percentage correct For purposes of computing the percentage correct, all predic-
tions were converted to a categorical form.
P=M/N,
where
N = total number of predictions,
M = number correct.
For probabilistic predictors for two categories, wet and dry, for example, the
predictions were categorized as "wet" if P weI ~ 0.5, otherwise "dry." For a predictor
of the specific numerical value of a variable, on the other hand, the predictions
were categorized according to percentiles of the values of the predictors.
The fraction correct for the climatological average, Po, was used to estimate the
variance, Vo, under the null hypothesis of no predictability. For a binary wet/dry
predictor, for example:
ENTROPY MINIMAX MODELING 249
The deviation of the observed fraction correct from the climatological average is
then measured in units of Vo for purposes of significance testing.
Using the Gaussian approximation, the two-tailed confidence level a is the sum
of the tail areas under the unit normal curve outside ± !:J.P, and the statistical
significance is 1- a. (A more exact test, using Student's r-distribution, would give a
negligible correction for the sample sizes involved.)
Skill The Heidke skill rati025.125 provides a comparison of the observed percen-
tage correct in the test data to the climatological expectation:
where N is the number of test years predicted, M is the number correct, and C is
the expected number correct on the basis of the climatological average for the
training data. (If the number of training years, n, was odd and they were split
between two categories as (n+ 1)/2:(n-I)/2, then C was simply taken as NI2.)
Variance explained For both numerical and probabilistic predictors, the variance
explained was computed using squared deviations from a fitted estimator, dis-
counting for the number of degrees of freedom used in fitting.
For numerical predictors (e.g., persistence), let {X;} be the predictors and let
{Y;} be the observed values. A linear estimator was used,
Zi=a+pX i,
where
p=(Xy -x Y)(Vx
a=Y-X,
where
I N
S=- L (Y;- Z i)2
N-vi=!
N = number events
For probabilistic predictors, the events were binned into 10 probability intervals
(i-I)/IO<p;;;i/l0, i= 1,... ,10, where p is the probability. The estimator Z, was
then taken as the average ~ V for the events in the ith bin. The expressions for S
and E; remain unchanged, where, in this case, the number of degrees of freedom,
v, used in fitting is the number of nonempty bins. (Where there was binned
categorization for explained variance computation, this is noted explicitly in the
tables.)
If the estimators in either the numerical or the probabilistic case are formed on
the training rather than the test data, then one simply uses v= 1. The test data fits
were used here to separate out baseline effects, so that Ep measures the ability of
the predictors to correctly categorize the data in a Ieast-squared-deviation sense.
The explained variance magnitudes are generally low compared to skill level
magnitudes, since a few large errors can dominate variance, whereas skill depends
on error size only to the extent that the size affects categorization.
Downloaded by [University of Bristol] at 02:45 26 February 2015
[The I/(N -v) factor in the definition of S is easily explained as follows: Suppose
one splits a sample into subs ampIes and fits model parameters to replicate the
means of each subsample. The expected values of the parameters are clearly
independent of the sizes of the subsamples since the expected value of a mean is
independent of sample size. Thus, we may, without loss of generality, compute v
for the split 1: I: ... : N - v, in which case we have simply fit v-I data points
exactly and averaged over the remaining subsample of size N - v.]
A. Nuclear Engineering
A nuclear fuel rod is a sealed metallic cannister (called the "cladding"), usually
slightly less than 4 meters long and about 1 centimeter in diameter (wall thickness
generally 1-3 mm) containing nuclear fuel. The VO z nuclear fuel consists of
cylindrically shaped pellets, typically 1-2cm thick with a diameter permitting them
to fit inside the cladding with a small circumferential spacing or "gap." A 4 meter
column of J em pellets, for example, totals to 400 pellets. .
In a commercial light water reactor (LWR), the fuel rods are packaged into
assemblies, the rods being held in place by a web of spacers. Although the
numbers vary with the size of the reactor, the magnitudes involved are illustrated
by a reactor with 560 assemblies, each containing 8 x 8 = 64 rods. (Other configur-
ations involve fewer but larger assemblies.) This gives a total of over 30,000 fuel
rods containing roughly 14 million fuel pellets.
The fuel assemblies are loaded in geometric patterns into a large water filled
container, the primary containment. The primary coolant flows around the
individual fuel rods, carrying heat from them to an exchanger where this energy is
transferred to the secondary coolant and then to electric generators.
As in all industrial plants, there are a large number of systems to keep running
properly in order to maintain proper operation of the plant. Five potential
problem areas to which entropy minimax has been applied are described in this
section. These are: fracture of the fuel pellets, release of fission gas from within the
fuel pellets to the fuel-cladding gap, rupture of the cladding allowing fission gas
and particles to escape into the primary coolant, axial bowing of the fuel rods
creating local obstructions to coolant flow, and swelling of fuel rods in the
hypothetical catastrophic incident called a LOCA (Loss-Of-Coolant-Accident).
ENTROPY MINIMAX MODELING 251
During-test
Cracking activity (acoustic)-test A7 159
Diametral expansion-test A13 528
Diametral expansion-test A15 97
Post-test
Cracking activity (acoustic) 127
Crack width 112
Crack length 125
Diametral expansion 131
Table IX gives, as an example, the distributions for trammg and test events
Downloaded by [University of Bristol] at 02:45 26 February 2015
matching the first two patterns for crack width. The similarity of test and training
distributions, which is quite evident here, was even stronger for the other DVs.
Analysis of the during-test patterns revealed the importance of history effects. In
each pattern sequence, at least one history variable, as distinguished from
instantaneous state variables, appeared in a defining condition in the first or
second pattern. Short-term rather than long-term history variables were most
important.
TABLE IX
Classification distributions of events matching the first two patterns for maximum
crack width.
2. Fission gas release in nuclear fuel rods under commercial operating conditons
Another phenomenon which, like pellet fracturing, is too erratic to predict reliably
using mechanistic models is fission gas release (FGR). Fission gas release by such
mechanisms as diffusion through uranium dioxide, migration along grain bound-
aries, and escape along pellet fractures has been studied extensively, both
theoretically and experimentally. Under controlled laboratory conditions, the
amount of fission gas release for some fuel types is known to within a reasonable
uncertainty as a function of such variables as temperature, porosity and grain size
distribution. What makes fission gas release difficult to predict reliably under
operating conditons is its extremely high sensitivity to such variables as temper-
ature and grain size and the fact that these variables are themselves coupled to the
amount of fission gas released. Under important classes of operating conditions,
this coupling acts as a positive feedback. Increased fission gas release into the fuel-
cladding gap lowers the thermal conductivity of the gap. This insulating effect
produces higher fuel temperatures and more fission gas release. Once started, the
Downloaded by [University of Bristol] at 02:45 26 February 2015
process continues, driving the temperature upward, until the fuel expands enough
to close the gap and release stored heat via direct fuel-cladding contact.
At low centerline temperatures (say below I,OOO°C), generally less than 0.5% of
the fission gas produced in the U0 2 material will be released into the gap. At
higher temperatures, more fission gas is produced and a greater fraction is
released. When the release percentage reaches somewhere in the vicinity of 3-6%,
depending upon a complex of circumstances which are themselves known only
approximately, the positive feedback mechanism sets in if the gap is still open or if
closed and only in soft contact. There is a "burst" of fission gas release. The
percentage may rise to 10%, 20% or even to 30% or more before equilibrium is
reached via hard contact. At that point, centerline temperatures will generally be
in the order of 1,50Q-l,700°C or higher.
Mathematically, the process behaves as a cusp catastrophe.198.233 At low
temperatures, the system has only one stable state, namely low FGR. As the
temperature is raised, the system acquires a second stable state at high FGR with
no intermediate stable states. As the temperature is raised still further, the system
has only the higher stable state. The system behaves nondeterministically with an
increasing probability of being in the higher FGR state as the temperature is
raised. (Ways of avoiding or minimizing FGR burst include operating at low
powers to keep temperatures down, limiting burnup to keep fission gas production
down, prepressuring the rod to retard FGR insulating effects, and facilitating heat
transfer with early gap closure prior to significant gas production.)
An entropy minimax model of fission gas release was developed using data on
139 rods from 12 reactor-cycles. so Because of the small sample size, a set of
80%:20% splits was used for modeling building vs. verification. A master list of 199
features were screened for possible FG R information content. Thirty-three persons
from 23 different organizations participated in drawing up the list. Although all of
the features were included on the basis of possible relevance to FGR, most had
never before been tested as potential factors in an FG R model. The first 17 features
in this list are characteristics of the fuel rod in its as-fabricated state prior to
insertion into the reactor. The remaining 182 features are variables modeled
mechanistically in a computer code designed to stimulate the actual in-reactor
performance of the fuel rod. Feature selection screening was based on three
numerical criteria (entropy exchange, correlation coefficient, and chi-squared) in
addition to engineering judgment.
254 R. CHRISTENSEN
Only 6 of the 199 features were found to have correlation coefficients exceeding
0.5 on the model building data, the highest value being 0.58. Many features with
low correlations individually were found to have significant information content in
terms of pair-wise entropies when taken in combination with other features. .
A total of 42 features were selected to be independent variables for mJdel
building. Two are themselves estimates of the dependent variable. One is fission
gas release computed by a semi-empirical algorithm, FCODE-BETA, which iias
benchmarked against laboratory data. The other, GCODE, is a 4-variable
regression fit to FGR in the model building data, the 4 variables used being
chosen from among the most highly correlating of the other 41 on the building
data. (Since these 41 were selected by preprocessing involving entropy exchange,
GCODE represents some information from the entropy minimax model building
process.) '~
A total of 13 of the 42 features were used to define II entropy minimax
Downloaded by [University of Bristol] at 02:45 26 February 2015
patterns. GCODE was used as one of the defining features in 3 of the patterns. In
addition, the GCODE prediction itself is amalgamated with the pattern predic-
tions to form the final SPEAR-BETA prediction, the relative contributions from
GCODE and the patterns being given weights inversely proportional to their
variance on the model building data. This automatically makes GCODE the fall-
back predictor in the event of failure to match any pattern.
The dependent variable was expressed as 10g(FGR) and treated as a continuous
DV, using a Gaussian potential function and minimizing the associated fuzzy
entropy, to find the patterns. The values of log (FGR) ranged from about - 8 to
about -0.8. The analysis was conducted at a potential function resolution of
lllog (FG R) = 2, corresponding to about 4 magnitude classifications.
Based on trial crossvalidation using subsets of the model building data, the
error minimum was found to occur at a weight normalization of w=25. The low
value indicates the comprehensiveness of the feature list and the relative consis-
tency of the data. A total of II FGR patterns were found in two sequences of 6
and 5, respectively.
An example of a high-FGR (15.6~~.~3%) pattern is the following:
• GCODE> 8.52%, or
• Amplitude-weighted sum of the absolute values of the changes in the phase of
axial profile of burnup exceeds 0.00509 radians.
The full set of FGR patterns is given in Ballinger et al. 5 Prior research on fission gas
release focused on laboratory experimentation to measure its dependence on such
factors as temperature, grain size, and gas production history (from prior power and
temperature history). The idea was to isolate and benchmark the important
mechanisms in the laboratory and then to use these mechanisms to compute FGR
estimates under commercial operating conditions. The limited success of attempts to
validate the results under operational circumstances has been a continual challenge to
researchers.
The entropy minimax modeling did not find temperatures and grain sizes (which
themselves have to be estimated by mechanistic modeling) to be reliable predictors
of FGR under commercial operating conditions. Rather, the most informative
factors were found to be a variety of complex combinations, some of which can be
explained in retrospect and some of which appear to carry composite information.
Examples of important indicators which were found and which had not previously
ENTROPY MINIMAX MODELING 255
been considered include fuel crack width, fuel plasticity radius, axial profile of
burnup, and the product of tensile-work and estimates of FGR computed from
mechanistic models.
Figure 2 gives plots of 10glO(FGR) observed vs. predicted for each of the four
models on the entire dataset, model building and verification.' For purposes of
making categorical-type assessments, one can convert the predictions into a simple
"above median" and "below median" dichotomy, where the median for the model
building data was FG Rm'd = 2.4%. Table X gives the results on the verification
data for the predictions converted into this binary form.
.. ...
..0_°
Downloaded by [University of Bristol] at 02:45 26 February 2015
FIGURE 2 Observed vs. predicted values of loglo(FGR) for four fission gas release models
(N = 124 for (a) and N = 139 for (b-<l)).
All four models have statistically significant skill at binary "above/below median"
prediction. The entropy minimax model is the highest with 88% accuracy. In terms
of explained variance, the differences are quite pronounced. The entropy minimax
model explains 68% of the variance in 10g(FGR), while the COMETHE III-J
model'61 only explains 4% (due to many large differences between COMETHE
predicted FGR and observed FGR). The high explained variance of entropy
minimax illustrates its use to predict a continuous variable modeled with fuzzy set
theory in the potential function approach.
256 R. CHRISTENSEN
TABLE X
Predictive performance for fission gas release from rods in
12 different reactors, assessed on ability to predict whether
or not above median (2.4% FG R).
Sample average
Percentage correct 50%
Mechanistic model (COMETHE III-J)
Percentage correct 77%
Statistical significance >99%
Varianceexplained 4%
Skill 54%
Mechanistic model (FCODE-BETA)
Percentagecorrect 72%
Statistical significance >99%
Variance explained
Downloaded by [University of Bristol] at 02:45 26 February 2015
12%
Skill 44%
Regression model (GCODE)
Percentage correct 81%
Statistical significance >99%
Variance explained 49%
Skill 62%
Entropy minimax model (SPEAR-BETA)
Percentage correct 88%
Statistical significance' >99%
Variance explained 68%
Skill 76%
+ +
Downloaded by [University of Bristol] at 02:45 26 February 2015
+
Mechanistic Model of Fai lure
+
INDEPENDENT VARIABLES DEPENDENT VARIABLE
for Entropy Minimax for Entropy Minimax
Two entropy minimax models were built for failure of zircaloy fuel rods. The
first, SPEAR-ALPHA, used a mechanistic model of fuel performance, FCODE-
ALPHA, and a mechanistic failure model, CCODE-ALPHA, to compute the
independent variables, and a data base consisting of 1,187 assemblies from 4
different reactors.v"? The second, SPEAR-BETA, used improved mechanistic
codes FCODE-BETA and CCODE-BETA to compute the independent variables,
and an expanded data base consisting of 3,402 assemblies from 11 different
reactors.i-r" In addition, an entropy minimax model was built for failure of
stainless steel fuel rods using a modified mechanistic code, FCODE-BETA/SS, to
compute the independent variables, and a stainless steel cladding data base. 2 4 6
Failure models provide input to fuel assessment studies, to reactor loading and
operations management, and to cost implications analyses."
GEN. SYS.-L
258 R. CHRISTENSEN
Model building for the entropy minimax patterns for SPEAR-BETA used 1,707
events (each being an assembly-cycle), and model verification used an independent
set of 1,695 events. The splitting was random, subject to an overall failure rate
equilibration constraint. A total of II different reactors were represented.
Feature selection for zircaloy clad fuel failure modeling used the same master list
of 199 variables as for the FGR modeling. The correlation coefficients were
generally much lower for failure than for fission gas release. The highest individual
value for Irl was only 0.23. In part, this was due to the greater noise in the system
relative to fuel failure, which is one step further removed from the input conditions
of fuel precharacterization and operating power history than FGR (FGR may
contribute to fuel failure). However, the low values for Irl were also partly due to
the fact that the theoretically maximum possible correlation between a continuous
variable and a discrete variable such as failure status is less than unity.6o For
example, if one distribution is Bernoulli, fl(k)=pk(l-p)I-k, k=O or I, and the
Downloaded by [University of Bristol] at 02:45 26 February 2015
values for the independent variables and hence with the same failure probability
prediction. Thus, instead of making a prediction of probability P for each of these
assemblies individually, one can make a prediction of Q = 1-(1-P)4= P(4-6P+
4p 2 - p 3 ) that there will be at least one failure in the 4-assembly symmetry group.
For categorical comparison purposes, this can be expressed as a binary prediction
of "fail" if Q >! and "nonfail" if Q <!.
Table XI shows the predictive performance of the various models on the
verification data when evaluated at various symmetry levels. Percentage correct,
statistical significance and skill are given for the mechanistic model, expert
programming and entropy minimax patterns. (The DV is, itself, categorical in this
case. So explained variance would contribute no information in addition to
percentage correct.)
The sample average predictor is simply P f a H = 1 -(1- Po)', where Po = 0.086 is
the overall assembly failure rate on the model building data and s= 1, 4 or 8 is the
assumed symmetry specifying the resolution of the prediction. Converted to a
TABLE XI
Predictive performance for fuel failure in assemblies in II different reactors (1695 verification events).
Sample average
Percentage correct 51% 70% 91%
Mechanistic model (CCODE-BETA)
Percentage correct 40% 61% 91%
Statistical significance neg neg 0"/0
Skill -22% -28% 0"/0
Expert programming (PaSHa)
Percentage correct 68% 63% 82%
Statistical significance 91% neg neg
Skill 36% -23% -100"/0
Entropy minimax patterns (SPEAR-BETA)
Percentage correct 85% 78% 91%
Statistical significance >99% >99% 0%
Skill 70% 27% 0%
260 R. CHRISTENSEN
"fail" vs, "nonfail" prediction, depending upon whether P ra il > 0.5 or P ra il <0.5, the
sample average (for Po = 0.086) always predicts "nonfail" for s = I or 4, and always
predicts "fail" for s=8. (The case s= I is simply the individual assembly predictor.)
The entropy minimax predictor has 85% accuracy at the 8-assembly resolution
(i.e., it predicts whether or not there is a failure among the 8 assemblies with 85%
accuracy). As the resolution of the predictor is sharpened, the accuracy at first
drops off (it is 78% at s = 4) and then rises. The amalgamated failure probability
assigned to a single assembly rarely exceeds 0.5, so that at the single assembly
resolution, s= I, the categorical conversion almost always produces a "nonfail"
prediction. By comparison, the mechanistic model has negative skill at s = 4 and 8,
and has zero skill at s = I (for which its categorical behavior is the same as the
sample average, always predicting nonfailure). The expert programming model has
positive skill at s = 8, and negative skill at s = I and 4.
If the fuel assembly conditions are relatively uniform, the core-wide power
Downloaded by [University of Bristol] at 02:45 26 February 2015
The Surry data were randomly subdivided 77:78 into training and trial portions.
On crossvalidation analyses, the error in using the training portion to predict the
trial portion was found to be a minimum in the vicinity of w=70. The patterns
found indicated that important variables include the pitch of cladding eccentricity,
axial location in the span (which may be related to mid-assembly compressive
forces), tube diameter, rod elongation, and pre-irradiation bow. In general,
however, the dataset was not found to contain adequate information regarding
closure to formulate a very definitive model. As an example, the channel closure
distribution for events in the training and trial sets matching and not matching the
first pattern are given in Table XII. When assessed as a categorical predictor of
whether or not the closure will be at least 30%, the training sample average
predicts "no", which is correct on only 45% of the test cases. By comparison,
predictions based on matching or not matching this pattern are correct in 55% of
the test cases, a slight improvement (statistical significance of 62%).
Downloaded by [University of Bristol] at 02:45 26 February 2015
TABLE XII
Training and trial sample distributions for events matching and events not
matching first channel closure pattern.
End-of-life closure
Match
Training 34 12 0 0 46
Trial 16 \I 3 2 32
Non-match
Training 14 7 8 2 31
Trial 19 15 7 5 46
Total 155
GEN. SYS.-D
262 R. CHRISTENSEN
The data were analyzed for entropy minimax patterns associated with the
selected measures of swelling.v' The purpose of the analysis was to assess the
adequacy of the IVs specified and the data collected with respect to understanding
factors influencing swelling. Eighty IVs covered fuel rod and assembly charac-
teristics, and an additional 8 IVs specified the site at which the experiment
was conducted. Included among the IVs were data on fuel bundle geometry, rod
size and position, fuel preparation and cladding, heating method, atmosphere,
pressurization, temperature, heating and pressurization rates, clad wall thickness
and burn up. Patterns were found for statistical distributions of rod deformation as
a function of fuel- and experiment-dependent parameters. Weight normalizations
used for the different DVs were in the range 6G-120. The first two patterns for
each of the DVs are given in Christensen.t" Important variables included array
geometry, locations of rod in array and rod coating. The significant role in the
patterns of the experiment-dependent factors, such as test site, heating method,
Downloaded by [University of Bristol] at 02:45 26 February 2015
are also many types of piping. Examples include carbon steel, stainless steel,
galvanized steel, plastic, fiberglass-reinforced plastic, and a variety of others.
Equal in importance to the tanks and piping are the excavation and backfill.
They provide the external electro-chemical environment (important to corrosion of
steel and decomposition of certain resins) and the structural support and stability.
(A rising water table could float an empty tank; and large backfill voids could
remove support and result in rupture.) The excavation itself may be lined (e.g.,
synthetic membranes such as urethane coated fabrics, sealants such as bentonite,
low permeability soils such as clay, and concrete vaults).
Causes of leaking include corrosion, chemical deterioration, stress induced
cracking, biological degradation, wear, and improper installation. Modes of failure,
Table 13, include external corrosion, internal corrosion, loose fittings, rup-
ture/breakage, flex connector failure and others (such as damage prior to or during
installation).
Downloaded by [University of Bristol] at 02:45 26 February 2015
TABLE XIII
Modes of failure of underground storage systems.
Estimated percentage
Factors affecting containment system failure vary with the type of tanks and
piping, and, to some extent, with the type of material stored and the volume and
frequency of usage. Currently, the most common are steel tank systems, for which
the dominate failure mode is external corrosion. Important factors include soil
resistivity, pH, moisture, tank wall thickness, and soil and tank non uniformities.
Also important are tank age, circuits to dissimilar materials and presence of
nearby electrical potentials (e.g., DC machinery, high voltage lines, etc.). With
regard to fiberglass tanks, as another example, the relative importance of factors is
different, with more emphasis on such items as backfill and ground water level
changes that can affect structural support.
For purposes of analyzing leakage of underground tanks, a total of 73
independent variables were defined. Included were characteristics of the tank (age,
size, type, protection, etc.), tank installation (depth', backfill, tank field layout, etc.),
tank usage (material, thruput, etc.), and environmental factors (soil characteristics,
water table, nearby high voltages, etc.). In one study, 52 506 tanks were randomly
divided 253:253 into training and test samples. The error in using the training
based model to predict the test results was found to be a minimum at w = 20. In a
second study;" 1,340 tanks were randomly divided 670:670 into training and test
samples. In this case, the error was a minimum at w=60.
In both studies, entropy minimax patterns were found in the training data and
264 R. CHRISTENSEN
checked against the results in the test data. In the first study, 5 patterns were
found in the training data. Applied to the 253 tank test data, the 2 x 5 contingency
table was found to have X2=19.99. For 5.degrees offreedom «2-1)(5-1)=4 for the
row and column summation constraints plus I for weight normalization setting),
this result has a statistical significance of 99.8% against chance. Five patterns were
also found in the training data in the second study. The chi-squared for the 670-
tank test data in this case was X2 = 19.19, almost the same as in the first study.
Again the statistical significance is 99.8% against the null hypothesis.
Table XIV lists examples of some of the high and low risk indicators found in
the entropy minimax analyses. In some cases, the converse of a low risk indicator
is a high risk indicator, and vice-versa, e.g., soil moisture, soil pH and tank wall
thickness. In other cases this is not necessarily so, e.g., soil resistivity, tank age,
backfill, and corrosion indices. The failure of "converse" reasoning with respect to
these factors is indicative both of the nonlinearity of the DVflV relationships and
Downloaded by [University of Bristol] at 02:45 26 February 2015
TABLE XIV
Examples of high and low risk indicators for underground tank leakage
(populations of typical underground steel tanks: 200-20,000 gal, and new 10
30 yrs old).
tNolc; Soil resistivity and conductivity, as defined here, are not direct inverses of each other since they are measured at different
concentrations (resistivity@saturation. conductivity@50:50 distilled H 20 mixture).
ENTROPY MINIMAX MODELING 265
IV. MEDICINE/BIOLOGY
A. Medicine
Two general classes of problems in medieine to which entropy minimax has been
applied are diagnosis and prognosis. The prototypical problem in diagnosis is that
of ditTerentiating between the broad category regarded as "normal" and various
more or less specifically defined disease states. The independent variables consist of
signs, symptoms and test results, coupled with information on family background,
sociodemographic factors, diet, work/living activities, and personal and medical
history.
ln those cases in which the disease is defined in terms of specific factors and the
status of each factor is known, the 'problem is identification rather than prediction.
A diagnostic prediction problem arises when there is incomplete information. In
such cases, one or more of the defining factors are unknown or uncertain. For
Downloaded by [University of Bristol] at 02:45 26 February 2015
example, the disease definition may depend upon examinations requiring surgery.
A diagnostic prediction problem also arises when one wishes to estimate the
likelihood of disease at some future time.
Prognosis problems arise with respect to the future course of a disease. Will it
progress to a more severe stage? Will it remit to a more benign state? Will the
patient experience complete or partial recovery? Will there be a relapse after
partial recovery, or a recurrence after complete recovery? What is the probability
of survival and of recovery as a function of time into the future?
Prognosis tends to be more difficult than diagnosis. Prognosis has an explicit
time dimension. It involves transition from one stage to another of the disease,
and, not infrequently, the interaction of multiple diseases. Furthermore, the
analysis of the data involves an essential complication, namely the censoring
problem. Left-censoring arises when a patient is observed whose condition has
changed but it is not known when the change occurred. Right-censoring occurs
when one loses track of a patient. In both cases, there is data incompleteness
which can bias models of time-dependent processes (e.g., survival, recovery, etc.) if
one does not' properly account for the censoring.
patients were assigned to the model building subsample. Data on the remaining
1,223 patients were withheld for verification after model building.
For analysis purposes, 2 dependent and 62 independent variables were defined.
Because of the length and thoroughness of the follow-up, it was possible to handle
the censoring problem to a reasonable approximation by simply specifying short
enough periods for the dependent variables. The first dependent variable was
defined as survival status two years after catheterization. Of the 1,213 building and
1,223 verification patients, 1,003 and 996, respectively, were tracked at least two
years. For each group, the two-year survival rate was 85%.
The second dependent variable was defined as survival status after a period of
time following catheterization equal to 20% of the patient's expected remaining
lifetime (ERL) based on age alone. For patients 30, 40, 50, 60 and 70 years old, for
example, this is roughly 9, 7, 6, 4 and 2 years, respectively. Except for ages over 71
years, the 20%-ERL criterion is a longer survival period than the fixed 2-year
criterion. Although the 2-year period is more commonly used, the ERL criterion
has the advantage of controlling for the single most dominant general factor
affecting survival: age. The 20% figure was selected by balancing length of survival
against sample size reduction due to follow-up limitations. The numbers tracked at
least 20% of their ERL were SIS building and 524 verification cases. The survival
rates were 59% and 62%, respectively.
Preliminary analyses were conducted on random subsets of the model building
datasets (randomly selected half of 1,003 patients for 2-year survival and of SIS for
20%-ERL survival). Based on these analyses, the weight normalization was set at
w = 60 for 2-year survival, and at w = 30 for 20%-ERL survival. Interestingly, the
data revealed greater instability with respect to the shorter time period. This is
analogous to the greater amount of noise in daily fluctuations than in monthly
means in meteorology. In both cases, there are unknown factors outside the
specified IVs which can affect the DV. In the case of short period DVs, single
unknown factors can affect results significantly. Over longer periods, it is more
likely for several unknown factors to come into play and introduce a certain
amount of mutual compensation, smoothing out results to a degree.
With the normalizations set at these values, the full model building samples
were processed for entropy minimax patterns. S-sequence patterns were defined as
those which minimize the conditional entropy, S(survival statusjfeatures), D-
sequence patterns were defined as those which maximize the entropy exchange,
268 R. CHRISTENSEN
TABLE XV
Predictive performance for survival of 524 patients with
coronary artery disease, assessed on categorical predictions of
whether or not survive at least 20% of ERL (expected
remaining lifetime based on age).
Sample average
Percentage correct 61%
Regression model
Percentage correct 61%
Statistical significance neg
Variance explained 4%
Skill 0%
Entropy minimax patterns
Percentage correct 71%
Statistical significance >99%
Variance explained (8 categories) 19%
Skill 26%
ENTROPY MINIMAX MODELING 269
which were withheld from the model building process. For purposes of the
assessment shown in the table, the survival probabilities were converted to
categorical "live vs. die" predictions, depending upon whether P,u,v;v. ~ 50% or
P,u<V;v. < 50%. The DV for Table XV is 20%-ERL. (Categorical assessment is not used
for the 2-year DV since almost all of the individual pattern probabilities exceed 0.5, the
sample average being 85%, and thus the categorized pattern predictor would be
equivalent to the categorized sample average, i.e., always predicting "live.")
The entropy minimax 20%-ERL survival predictions are correct for 71% of the
verification sample of 524 patients, compared to 61% for the sample average
predictor. This is statistically significant at the 0.01 level and represents a skill of
26%. By comparison, the regression model (using the ratio of survival time to 20%-
ERL as the DV and predicting "survive" if the ratio is at least 1.0) has the same
performance as the sample average, its low definitiveness arising from shrinkage
due to several iarge squared deviations. If the categorization threshold for the
regression model is shifted from 1.0 to 0.8, then it is correct in 69% of the cases,
Downloaded by [University of Bristol] at 02:45 26 February 2015
representing a skill of 21%. (The variance explained is still 4%.) This is the
percentage correct for the best entropy minimax threshold predictor ("survive" if
the ejection fraction is at least 0.46).
When the predictive performance of the entropy minimax predictions was
assessed on an individual pattern basis, the patterns in the first half of each
pattern sequence were found to perform better than those in the second half.
Examination of the details of the definitions revealed that this difference is
associated with the presence of the second half of patterns with union-type logic
involving some combinations of feature conditions that may have been picking up
chance correlations in the training data. Further research is in process involving
the idea of restricting union-type pattern searches to definitions for which each
individual feature condition is separately significant, independent of the infor-
mational significance of the pattern as a whole.
GEN. SYS.-E
270 R. CHRISTENSEN
approximated 50%, but recent improvements in therapy have raised this rate to
about that for favorable histology disease.
Data on 328 NHL. patients, collected from 20 different institutions in a clinical
trial conducted by the Southeastern Cancer Study Group.?" were analyzed for
entropy minimax patterns. 2 0 9 , 2 10 The total sample was randomly divided 218:110
into model building and verification samples, stratified on sex (M, F), age
«60, ~60), hemoglobin « 12g, ~ l2g) and histology (favorable, unfavorable).
The dependent variable was defined as survival time after completion of treatment
(either of two multi-agent chemotherapeutic regimes, COP or BCOP). The
independent variables consisted of 26 features characterizing demographic factors
(age, sex, etc.), health background (obesity, diabetes, etc.), signs and symptoms
(fever, night sweats, weight loss, Karnofsky's index of physical performance, etc.),
tests (hemoglobin level, white blood count, etc.), stage of disease (Rappaport stage,
nodal involvement, marrow involvement, etc) and treatment (chemothetapy, radio-
Downloaded by [University of Bristol] at 02:45 26 February 2015
therapy, etc.).
Based on a preliminary analysis of random subsamples of the 218 patient model
building sample, the weight normalization was set at w = 50. Pattern searches were
cond ucted within the entire 218 patient model building sample and also in several
subsamples separately. These subsamples included favorable histology, unfavorable
histology, responders (partial + complete) and complete responders. In all five of
these searches, a good prognosis pattern was found with the following definition:
.90
....I
« .80
>
FAVORABLE
> .70
HISTOLOGY (87)
'"
::::>
V>
.60
u,
0
>- .50
>-
....I .40
'"-c . 30
'"cc
0
c,
UNFAVORABLE
HISTOLOGY (131)
.20
.10
Downloaded by [University of Bristol] at 02:45 26 February 2015
0.0
10 20 30 40 50 60 70 80
MONTHS
FIGURE 4 Favorable and unfavorable histology survival curves for NHL-349 model building
sample. Median survival of 87 favorable histology patients was 62 months, and of 131 unfavorable
histology patients was 18 months.
unfavorable histologies, while the differences between good and poor prognosis are
considerably greater.
The predicted survival curves are also given in Figures 5 and 6. The continuous
dependent variable, survival time, I, was analyzed in a Gaussian potential function
representation. The analysis was conducted at a resolution of 6.1 = 6 months. Each
predicted curve corresponds to a particular pattern. Each is a smooth fit to points
determined by the means of intervals along the DV and the associated probabilities.
Note the shrinkage effect of the weight normalization. .
Each of the 110 patients in the model verification sample was assigned to one of
the 5 predicted survival curves based on the pattern matched. These curves were
then compared to K-M plots of the actual survival curves for each of the 5
subgroups of verification patients, and the predicted and observed curves were
found to be quite close. These results are shown in Figures 7 and 8.
In addition to the verification on the reserved 110 patients from the original
clinical trial, NHL-349, a new sample subsequently became available from a
second clinical trial, NHL-317. 9 8 NHL-317 contained 270 unfavorable histology
patients, of whom 131 were treated with BCOP (the remaining 139 were treated
with CHOP and are thus, assuming treatment to be relevant to prognosis, not
covered by the patterns trained on NHL-349 COP and BCOP patients). Figure 9
shows the predicted vs. observed survival curves for these 131 patients from NHL-
317. Again the comparison is quite close. The entropy minimax patterns were
correct in 78% of the cases, representing a skill of 26% compared to a 62% figure
for the sample average predictor.
The curves in Figures 7-9 show the predictive performance for the groups of
patients matching each pattern. The results may also be assessed on an individual
patient basis by converting the probabilities to categorical 2-year survival predic-
tions depending upon whether P;;;;0.5 or P<0.5 (at 1=24 months). Table XVI
272 R. CHRISTENSEN
FAVORABLE HISTOLOGY
NHL-349 MODEL BUILDING SAMPLE
...J
« I Observed (41)
>
Pr-ed i c tor-
>
<>:
:::>
V>
u.
o Intermediate Pro~nosis
>-
.... Dbse rved (40)
...J
Predictor
'"o
«
'"a.
Downloaded by [University of Bristol] at 02:45 26 February 2015
<>:
.2
.1ll
o. 0 _J-_:":-----,:':-----,L-.---::"':----:"=----=-'::_-::'.
10 20 30 40 50 60 70 80
MONTHS
FIGURE 5 Survival curves for 81 favorable histology patients in model building sample matching
good and intermediate prognosis patterns. (Data needed to determine pattern match were missing on 6
favorable histology patients in this sample.)
UNFAVORABLE HISTOLOGY
NHL-349 MODEL BUILDING SAMPLE
1.0
.90
<l. .80
>
> .70
<>:
:::> Observed (29)
V>
.60 Predictor
u.
o
....>-
...J .40
:i'" .30
o
<>:
a. .20
'-..._--------""-<o::::::----c»
-------'0 Predictor
Obse rved (52)
O. 0 L---'_--'_--'_---'_---'_---'_---'_-'
10 20 30 40 50 60 70 80
MONTHS
FIGURE 6 Survival curves for 130 unfavorable histology patients in model building sample matching
each of the three entropy minimax patterns. (Data missing on I unfavorable histology patient.)
ENTROPY MINIMAX MODELING 273
FAVORABLE HISTOLOGY
NHL-349 VERIFICATION SAMPLE
-'
«
>
> Good Pr cancs Ls
'"
:::l
<I)
Predicted
___-------cv Observed (19)
U.
o
>-
>-
-'
'"o« Predicted
Observed (20)
'"sx
Downloaded by [University of Bristol] at 02:45 26 February 2015
Q.
10 20 30 40
MONTHS
FIGURE 7 Comparison of predicted and observed survival curves for N = 39 favorable histology
verification patients in NHL-349. (Data missing on 4 favorable histology patients.)
UNFAVORABLE HISTOLOGY
NHL-349 VERIFICATION SAMPLE
-'
«
>
>
'"
:::l
<I)
Predicted
U. Observed (14)
o
>-
>- Intermediate Pro.E!nosis
-'
Pre ci c te d
'"o
« Observed (22)
'"ce
Q.
Predicted
Observed (26)
o. 0'L-----''----'-_---'-_--'--_--'--_..l.-_-''------'
10 20 30 40 50 60 70 80
MONTHS
FIGURE 8 Comparison of predicted and observed survival curves for N =62 unfavorable histology
verification patients in NHL-349. (Data missing on 5 unfavorable histology patients.)
274 R. CHRISTENSEN
UNFAVORABLE HISTOLOGY
NHL-317 (BCOP ARM)
--l
«
>
>
cc
:> _____-------(fi1 Predicted
<J>
1 Observed (46)
LL
o
>-
f- Pr cencs i s
Downloaded by [University of Bristol] at 02:45 26 February 2015
...J
co
---~<::~:::::;~==:~
..
Predicted
Observed (41)
«
co
o
'"
0.. Pr-cenos i s
Predicted
- - - - - < : l l Observed (44)
o. 0 '--------'_---'_---'_----'_----'_----'_----'_-l
10 20 30 40 50 60
~IONTHS
FIGURE 9 Comparison of predicted and observed survival curves for N = 131 NHL-317 BCOP
patients matching good, intermediate and poor prognosis patterns.
TABLE XVI
Predictive performance for 2-year survival of patients with non-Hodgkin's lymphoma.
NHL·349
Sample average
Percentage correct 54% 53%
Histology (Rappaport: favorable/unfavorable)
Percentage correct 65% 70%
Statistical significance 99% 99%
Variance explained (categorical) 2% 8%
Skill 25% 37%
Entropy minimax patterns
Percentage correct 69% 77%
Statistical significance >99% >99%
Variance explained (categorical) 11% 22%
Skill 31% 52%
ENTROPY MINIMAX MODELING 275
shows how the entropy rrurumax patterns compare on this basis to the model
building sample average and a Rappaport histology predictor. The accuracies of
both the Rappaport and the entropy minimax predictors are significant at the 0.01
level. The entropy minimax patterns add about 5 percentage points in predictive
accuracy, and 10 points in skill, to the Rappaport histology for individual patients.
For the 57 verification patients matching an extreme pattern (no. I or no. 3), the
performance of both the histology and the entropy minimax predictors improves
significantly. The histology predictor was correct in 70% of the cases and the entropy
minimax predictor was correct in 77%.
More recently, another study"!" was completed comparing the predictive
performance of various modeling methodologies on non-Hodgkin's lymphoma data.
The dependent variable used was whether or not there was complete response
(CR) to treatment, defined as absence of disease after six cycles of chemotherapy.
The methodologies compared were: stepwise variable selection using the SAS
LOG1ST procedure.I!" a pre-specified "sickness score," multiple logistic regres-
Downloaded by [University of Bristol] at 02:45 26 February 2015
TABLE XVII
Predictive results on verification sample of patterns found in model building sample,
using first building/verification split.
Actual disease
Percentage
Pattern CS MS ALS correct
I. ALS 0 12 92%
Diplopia absent, and
Fasciculations present
2. MS 0 II 0 100%
Not ALS pattern, and either
Nystagmus present, or
Spinal films normal
3. CS 8 0 89%
Not ALS or MS patterns, and
No hyper-reflexia, and
Normal gait.
4. No pattern malch 3
Tolal 10 12 16
The two pattern sets are very similar, and the feature conditions used to define
the patterns are clinically reasonable. Both sets validate with a high predictive
accuracy. The high accuracy is consistent with low value of the optimal weight
normalization, w < 10. The first set would appear to be somewhat the better
because it classifies a greater fraction of the patients (89%) and suggests the
possibility of tentatively classifying non-matches as ALS (although a larger sample
size would be necessary to attribute statistical significance to this).
Table XIX gives the overall predictive performance statistics for the two pattern
sets based on a categorical prediction of the highest probability disease. Thus, the
sample average always predicts ALS, which comprises 42% and 45% of the
samples.
ENTROPY MINIMAX MODELING 277
TABLE XVIII
Predictive results on verification sample of patterns found in model building sample, using
second building/verification split.
Actual disease
Percentage
Pattern CS MS ALS correct
1. ALS o o 13 too%
Diplopia absent, and
Fasciculations present, and
Tone decreased,
2. MS o II o 100%
Not ALS pattern, and either
Nystagmus present, or
Spinal films normal
3. CS 6 o 86%
Downloaded by [University of Bristol] at 02:45 26 February 2015
Total 9 12 17
TABLE XIX
Predictive performance for differential diagnosis of three diseases of the cervical spine.
Sample average
Percentage correct 42% 45%
Entropy minimax patterns
Percentage correct 94% 97%
Statistical significance >99% >99%
Skill 90% 94%
. GEN. SYS.-F
278 R. CHRISTENSEN
and time scale factor). A random half of the patients were used as the training
sample, the other half as the test sample.
Four distinct patterns were found, one for each disease class. [The first pattern
(with LMI the dominant class) was: U rms error between 0.13 and 0.35, Vrms
energy between 0.90 and 2.69, and 2nd K-L coefficient of V between -0.12 and
0.73. The fuJI set is given in Hirschrnan.P"] Table XX shows, for each pattern and
each classification, the probability based on the training sample and the frequency
observed in the test sample. The weight normalization was permitted to vary with
the pattern. Its values were w = 56.0, 30.4, 29.2 and 58.4, for the four patterns
respectively.
TABLE XX
Predictive performance of ECG patterns when applied to the test sample
(probabilities based on training sample and frequency ratios observed in test
sample).
Pipberger's classification
Number test
Pattern AMI LMI LVH RVH events
Table XXI gives the 4 x 4 contingency table for the patterns. Values expected by
chance are shown in parentheses (e.g., the first entry is 38 x 18/91 = 7.5). Assuming
the general conditions under which Pearson's T is approximately chi-squared
distributed, these results give X2 = 128. With v=(4-1)(4-1) +4= 13 degrees of
freedom (4 were used in the weight setting), any X2 > 35 has a statistical
ENTROPY MINIMAX MODELING 279
TABLE XXI
Contingency table for the 91 test cases, comparing predicted classification (based on
highest probability) to observed classification (Pipberger). Figures in parentheses are
the expected values assuming statistical independence of rows and columns. Excesses
of observed counts on the main diagonal over their expected values indicate greater
than random predictive success.
Observed
AMI 12 12 8 6 38
(Pat. no. I) (7.5) (10.0) (10.6) (9.6)
LMI 2 4 2 0 8
(Pat. no. 2) (1.6) (2.1) (2.3) (2.0)
Predicted
LVH I 0 5 I 7
Downloaded by [University of Bristol] at 02:45 26 February 2015
significance exceeding 99.9%. [Note, however, that a larger sample size is needed
to validate the 4-way result because of the small numbers in the LMI and LVH
categories. The (observed RVH) vs. (predicted RVH) entry alone contributes 41 to
'1. 2 , so the AMI vs. RVH differential diagnosis is clearly significant. The principal
source of error in these patterns is LMI/LVH confounding in AMI and RVH
predictions.]
TABLE XXII
Performance of nucleoside patterns: Probabilities of TIC categories, based
on training sample, compared to frequency ratios in test sample. (6 of the
Downloaded by [University of Bristol] at 02:45 26 February 2015
101 lest sample nucleosides did not match any of the 7 pallerns.)
TIC (x 100%)
Number test
Pattern 0-74 75-124 125-300 events
When the results are assessed on the basis of their ability to predict low/medium
(0-124) or high (125-300) TIC categories (see Table XXIII), the result is X2 =9.0
using Yates' continuity correction. With v=(2-1)(2-1)+2=3 degrees of freedom
(2 were used in weight setting), this chi-squared has statistical significance of 97%
against the null hypothesis of chance. Thus, there is statistical evidence that the
patterns identify subsets of compounds with more definitive probabilities of anti-
tumor activity than the sample as a whole. The enrichment over expectation is a
factor of 2.2 in the high category, but only 1.1 in the low/medium category. Since
the time of the study (1975), there have been additions to the data and
enhancements in the pattern search algorithm. Also, there has been further
research on feature specification. A new analysis would be expected to provide
further enrichment.
ENTROPY MINIMAX MODELING 281
TABLE XIII
Contingency table for the 95 test cases, comparing observed to predicted
TIC category. Expected values, assuming row-column independence, are
shown in parentheses. Excesses of observed counts on the main diagonal
over their expected values indicate greater than random predictive success.
Observed
Low-Med. 74 10 84
(Pat. no. 1-3,5-7) (65.7) (13.2)
Predicted
High 5 6 11
(Pat. no. 4) (13.3) (2.7)
Total 79 16 95
Downloaded by [University of Bristol] at 02:45 26 February 2015
TABLE XXIV
Performance on verification sample in
Iris classification problem (N = 75).
Random
Percentage correct 33%
Principal components
Percentage correct 97%
Statistical significance >99%
Skill 96%
S-Nearest neighbors
Percentage correct 93%
Statistical significance >99%
Skill 90"1.
Linear discriminants
Downloaded by [University of Bristol] at 02:45 26 February 2015
where ouIcome 0 is the disease class, and C is the condition of being above or
below the threshold for the specific symptom. The average information content of
each ~S-maximizing cut was 0.0528 nats. Each cut was subsequently assessed for
clinical acceptability based on general medical background. Half of the cuts found
on the basis of this one sample (and not considering ~S's for pairs or higher order
IV combinations) required no adjustment for medical acceptability. When the
other half of the cuts were adjusted to agree with medically accepted thresholds,
the average information content decreased only 0.0022 nats (or about 4% of the
original value).
Developing screens for radiology Another area of entropy minimax application has
been in the development of screening algorithms to aid in decision-making regarding
ordering of radiological procedures. In the U.S., for example; radiological examina-
tions contribute annually many billions of dollars to health care costs. The decision of
whether or not to order an X-ray is made on the basis of immediate clinical information
coupled with the medical record, family history and other factors (including medical-
legal precaution): When subgroups of patients who would otherwise have had X-ray
examination are identified for which the risks do not justify expenditure of these
resources, then there is an opportunity for more effective health care resources
allocation and improved patient management. 1S. 199 The socioeconomic utility of such
screening algorithms has been a primary motivation of the development of entropy
minimax patterns to recognize low risk patients with respect to skull fracture
radiology,200.221.222 radionuclide brain scanning,104,lOS nuclide uptake and thyroid
scan interpretation.l'" and lung scan analysis.I'" In each case, one or more patterns
were found which enabledthe identification of subgroups of patients for which the risks
Downloaded by [University of Bristol] at 02:45 26 February 2015
kinase. In adenyl kinase, 105 residues are in ~-helices. The entropy minimax
classification is 4th ranking in terms of high percentage of the actual ex-helix
residues identified correctly and 1st ranking in terms of low percentage of errors in
ex-helix classification predictions. The two predictors which perform best on these
data are those of Finkelstein and Ptitsyn and entropy minimax. The entropy
minimax patterns leave more ex-helix residues unclassified, while the Finkelstein
and Ptitsyn algorithm makes more erroneous ex-classifications.
TABLE XXV
Results of applying a-helix classification methods to adenyl kinase.
Molecular evolution Of possible theoretical interest has been the suggestion that
entropy minimax may play a role in understanding certain phenomena in
molecular evolution. The information content of the code sequences for protein
types common to different species have been studied by Gatlin,99.100 Smith,229
Reichert.e?" and Reichert and Wong. 213 One such protein is cytochrome c, which
is present in the cells of all eukaryotic organisms. Studies quantifying the
information content of the sequence of amino acids comprising this protein have
produced a hierarchy extending over 39 species from Baker's yeast to homo
sa piens, 2 14
Detailed analyses of the occurrence frequencies .of residues and residue com-
binations along molecular chains reveal two seemingly inconsistent effects.
First, at the single residue. level, the individual nucleotides are more nearly equi-
probable for species higher in the evolutionary hierarchy than for those lower in
the hierarchy. Both Smith 229 and Gatlirr'P? have pointed out that this means
increased potential message variety, hence a potentially higher entropy, for the
more highly developed species. The deviations from equiprobability for the higher
species are consistent with entropy maximization subject to the constraints of
the genetic code. I 01.154.229 Second, at higher association levels (pairs, triplets,
ctc.), the nucleotide sequences for higher lifeforms are more organized in the sense
of having lower entropy per residue.100.102.213 It thus appears that evolution is
playing the entropy minimax game. The global entropy (amino acid combina-
tions) is being minimized while the local entropy is being maximized.i!" As a
molecule evolves, it tends to equilibrate the proportions of the different amino
acids subject to the constraints of the genetic code, i.e., to maximize the uncon-
ENTROPY MINIMAX MODELING 285
A. Entropy Minimization
The entropy minimization aspect of entropy rnimrnax is a variation of the
partition (for discrete analyses) or potential functions (for continuous analyses) of
Downloaded by [University of Bristol] at 02:45 26 February 2015
where C is the set of partition cell boundaries in the discrete case and the set of
potential function parameters in the continuous case (in the continuous case
integration replaces summation). An alternative formulation of the entropy mini-
mization principle is maximization of the entropy exchange
or the equivalent.
iii) They may seek to maximize t.S(O, C) by varying {Ok} rather than by varying
{C;}. In entropy minimax, the {Ok} partition is defined by the specification of the
pattern discovery problem to be solved. It is the partition on which the utilities (or
disutilities) are defined in decision theory. The {C;} partition, on the other hand,
defines how events are grouped in terms of the IVs, and this is what must be
determined in pattern discovery by analysis of the data in light of background
information.
286 R. CHRISTENSEN
I. Feature selection
The most widespread application of entropy minimization is in feature selection.
Functional forms related to the mutual entropy exchange liS(O, C) have been used
as a measure of the "goodness" of a feature or characteristic by Lewis ' 72 (his
"G/'), and Sebestyen and Edie 224 (their "1"). Related forms have also been used to
measure "information correlation" of one variable to another by JeffreysI41.142.143
(his "support"), Shannon 225 (his "mutual information"), Good 110 (his "weight of
evidence"), Kullback and Leibler P" (their "directed divergence"), and others.!": 17.
39. Ill, 157. 171, 174. 195. 228. 242, 248.255
Consider, for efample, the two-sided entropy S(X) and the one-sided entropies
SI.(X) and SR(X), discussed in the previous paper (1).56 The quantities, S, SL and
SR' for the minimum entropy cut X, can be used along with the correlation
coefficient r, Pearson's T, and other variables as measures of association between
the DV and individual IVs, pairs of IVs, etc. Using such measures, one may
Downloaded by [University of Bristol] at 02:45 26 February 2015
2. Curve fitting
As discussed in Paper (1),56 minimization of the entropy of the residuals has been
suggested as a general criterion for curve fitting. This procedure was applied to
geological data by McConnent 8 1 and found to be effective in modeling curves
with a number of "displacements". due to geological faults. For a discussion of this
and other goodness-of-fit criteria, see Christensen. 53 Entropy minimization has
been combined with crossvalidation as a general procedure for curve-fitting which
includes both error assessment and an information theoretic tradeoff for predictive
reliability by controlling the complexity of the curve. 54
3. Unsupervised classification
Supervised classification sets up cell boundaries in IV-space to sort events in tenus
of DV values (the DV being the "supervisor" of the IV partition). Unsupervised
classification trys to do this without a DV. Events must thus be sorted into groups
by adjusting cell boundaries on some other basis such as a measure of local
density in IV space. An example is to define the boundaries as a network of lines
or surfaces in low density regions of the space, separating "different" regions of
high density.
Obviously some rules must be assumed, else the unsupervised classification
problem is so open-minded as to admit virtually any partition as a solution. A
sizable literature is available on methodologies suggested under a variety of
choices for the restrictions. For general reveiws, see Cormack,"? Duran and
Odell,"? Blashfield and Aldenderfer.!" and Everitt."? Examples of approaches
utilizing concepts from information theory are given by Watanabe.e"? Wallace and
Boulton,244 and.Ruspini.i"?
ENTROPY MINIMAX MODELING 287
samples, and further subdivide the building sample into training and test sub-
samples. We select a set of partitions however large, small, simple or complex we
wish. We start with a trial value of the weight normalization, say w=5. We
compute S(C) on the training data for each partition in the set and identify which
partition gives the lowest S(C). We then compute S(C) on the trial data for that
partition, and assign that partition and that value of S(C) to the normalization
w= 5. Then we change w, say to w= 10, and repeat the process.
After a sequence of these calculations, we will have value of S(C) assigned to
each trial value of w. We select the value of w (and the associated partition) for
which S(C) is a minimum. This gives us a partition with minimum entropy on the
trial data in a manner that is protected from the overfitting trap. (This is the trap
we would fall into if we tried to lower the entropy by defining an arbitrarily
contorted or contrived partition.)
As a final step, we repeat the entire process for several different random
splittings of the data and select a final S(C) which is closest to the average of these
minima. The variance of the minimum S(C) for these different splittings gives us a
measure of the precison of our choice.
This procedure does not, however, free us from the need to specify some
constraints on the set of candidate partitions. This is easily seen by noting that the
one-cell partition of the entire IV space has zero entropy (or, at least very low
entropy, depending on the details of the P(C,) estimator used). Artificially low S(C)
is similarly obtained for partitions with one cell enclosing all the data and any
number of other cells in empty regions of IV-space. Thus, "unsupervised"
classification must be supervised by some choice of constraints, even though it is
not supervised by the values of a specific DV for the data points."
B. Entropy Maximization
The entropy maximization aspect of entropy rmrnmax is estimation of expected
values and associated uncertainties of future frequency ratios. These expected value
estimates are the probabilities p(O.IC i ) , P(C i ) and P(O.). P(O.), for example, is
given by
1
P(O.) = Jo p!.(pIB, D) dp,
288 R. CHRISTENSEN
where h(pl B, D) is the maximum entropy probability density function (pdf) for the
frequency ratio p of occurrence of O, given background B and data D. The pdf
J.(pl B) is found, by the Lagrange multiplier method, as a function which
maximizes the entropy
1
S= - JfbjB) 10gfk(pIB)dp,
o
estimates based on general literature data. At the other extreme are physical
systems with precisely defined constraints governing the maximum entropy proba-
bility distributions.
where
dominate over the weight normalization for the next higher level. How far one
must go in the hierarchy, before the analysis may be truncated with a frequency
estimator approximation, depends upon how finely one wishes to resolve the
probability under the specific conditions C. (This is the informational analog of
the need to use large amounts of energy to resolve small objects in microscopic
physics.)
The stability requirement ensures that the definition of B captures all variable
factors that can affect the outcome distribution. The minimum enlargement
requirement guards against introducing, into the background, subpopulations that
are irrelevant to C.
trial sample. Frequency counts for subgroups common to both studies are
tabulated, and the value of w is set to minimize the error in using the training data
to predict the trial outcome ratios.
If only one study is available, then, in principle, one can accomplish the same
end by randomized splitting. However, this requires event-specific information of
characteristics and outcomes. This is often not available in the published reports,
especially for large samples.
Trial determinations of the weight normalization were conducted by Reichert
and Christensen-?" for four examples in the medical literature:
Feature conditions included sex, platelet count, hemoglobin level, infection status,
and temperature. Two trial samples were used. One was a second group of 107
patients also at M. D. Anderson.P? Here the error minimum was found at W= 10.
The other was a group of 94 patients at the Cleveland Clinic Foundation." For
this trial sample, the mimimum was found at w = 200. The difference in normaliz-
ations for the two trial samples illustrates the general tendency for larger
normalizations to be required when making probabilistic estimates for groups with
greater dissimilarity from the training group.
iv) Five-year survival in coronary artery disease
The training sample was taken as N = 590 patients at Cleveland Clinic. 2 0 2 The
trial sample was taken as 203 patients at the Cardiopulmonary Laboratory, Dept.
of Medicine, Queen's University, Kingston, Ontario." Nineteen subgroups of
patients were identified using various characteristics of cardiac condition and
function. The error minimum was found at IV = 20.
Downloaded by [University of Bristol] at 02:45 26 February 2015
2. Statistics
A primary application of entropy maximization in statistics has been in providing
a priori probability distributions, J(x), for Bayesian analyses. A number of
distributions have been derived by assuming different functional forms for the
constraints. The maximum entropy distribution, J(x), may be unconditional or
conditional (i.e., relative to an assumed prior distribution, g(x)). The unconditional
form maximizes the entropy of the distribution J(x):
where g(x) is the assumed prior distribution. See also Jeffreys,142.143 Good,' 10
Kullback.P" Renyi,215 and others": 42. 43, 136,240,255 (For discrete variates, the
integration is replaced by summation.) The unconditional form may be regarded
as a special case of the conditional form with a uniform assumed prior. (For
infinite ranges, an appropriate limiting function can be used.) Table XXVI gives a
list of distributions specified by some of the simpler forms of constraints. For
derivations, see Kullback '"? and others. 42, 76. 107. 136. 138. 144, 175.203.238.256
TABLE XXVI
Distribution with maximum entropy subject to various fixed expected value constraints. For pdf
functional forms, see, e.g., Christensen. ~J
Almost from the moment that entropy was tied to probabilities, physicists
entertained the notion that it was related to the incompleteness of information
about a system. Maxwell's comment about throwing a tumblerfull of water into
the ocean and trying to take it back out, his entropy-reducing demon, and Gibbs'
comment on entropy as mixed-up-ness are examples. Szilard 232 made it clear that
Maxwell's demon, in order to operate the trap door through which molecules pass,
must receive information. Brillouin,26,27 elaborating upon Szilard's analysis in
light of the work on information theory by Shannonv" and Wiener.P? developed
the notion of negentropy. Ter Haar l 14 summarized this viewpoint with the
statement that "entropy measures our ignorance or lack of knowledge or lack of
(detailed) information." See also Bergmann and Thomson!" and Richardson.t!" It
has been suggested that, rather than reflecting merely a subjective lack of
information, the second law arises from irreducible quantal dispersions of mixed
states. 121
Overlapping this work in which the law of increasing entropy was seen as a
Downloaded by [University of Bristol] at 02:45 26 February 2015
4. Spectral analysis
Applications of entropy maximization to spectral analysis (reconstruction of an
estimate of a spectral density Iunction based on a finite sample) began with the
work of Burg. 28,30 The relationship between maximum entropy and maximum
likelihood spectra has been studied.i? It has been shown that the maximum
entropy method yields a spectrum which maximizes the entropy of a stationary
random process consistent with a set of given autocovariance functions.?? The
equivalence of autoregressive (AR) and maximum entropy (ME) spectra has
been demonstrated,20,241 This makes the well-developed AR algorithms available
to the ME method, despite the fact that the two methods are quite different
conceptually. I 93 Several references are now available reviewing spectral esti-
mation"!: 122, 123, 124.217 and the role of entropy maximization. 31.37.140
5. Image reconstruction
The maximum entropy image reconstruction formalism is a multidimensional
generalization of Burg's maximum entropy spectral reconstruction. For reviews of
this development, see Kikuchi and Soffer l 53 and Frieden."? Early versions.s": 95.
96.112.166 e.g., MART, were slow and required considerable memory. Subsequent
versions, e.g., MENT,185 converge faster and have lower memory requirements
(since an array representing the source is not required). In addition, MART
produced anomolous streaks which are not present in MENT reconstructions.
(Both algorithms are only approximations to a mathematically precise constrained
294 R. CHRISTENSEN
entropy maximization, The MART approximation not only had larger errors, it
also had error regularities which showed up as streaks. The errors of MENT are
smaller and more randomly dispersed relative to visual interpretation.)
MENT has been tested on a variety of real-world data such as X-ray data on an
air bubble in a plastic tube surrounded by an iron pipe. The smooth bubble's
density profile is clearly defined with fluctuations of about ± 15% in the
reconstruction. This and other tests '2.186 have shown that the iterative MENT
algorithm performs better than two widely used techniques, Fourier space inver-
sion and convolutional back projection. The MENT reconstruction is more regular
and truer to the actual shapes of the objects. Additionally, in MENT reconstruc-
tion, zero source values are correctly reproduced, without "ringing" artifacts.
6. Pattern recognition
Pattern discovery is concerned with finding unknown relationships between a.
Downloaded by [University of Bristol] at 02:45 26 February 2015
FIGURE 10 Maximum entropy solution to an illustrative pattern recogmlton problem (see text).
First, independent variable B is examined. If its value is b, then D is examined and the dependent
variable's value is Xl or X2 depending upon whether D is d or il, respectively. If, on the other hand B is
5, then C is examined. If it is c, the value is X3- If C, then A is examined and the value if X4 or X s for a
and ii, respectively.
ACKNOWLEDGMENTS
REFERENCES
I. D. H. Ackley, G. E. Hinton and T. J. Sejnowski, "A learning algorithm for Boltzmann machines."
Cognitive Sciellce, 9, 1985, pp. 147-169.
2. M. L. Adams, "Investigation of techniques for SIMMER-II neutronics time-step controL" Rept.
LA-UR-84-3995, Los Alamos National Laboratory, Los Alamos, NM, August 3, 1984.
3. O. C. Allais, "The problem of too many measurements in pattern recognition and prediction."
IEEE 111I1. COllvelllioll Record, Part 7 (Discrimination and Measurement), March 21-25, 1966, pp.
124-130.
4. M. Bad and E. Bad, "L'algorithrne Ie plus rationnel de reconnaissance applique dans Ie diagnostic
des maladies renales." La Sallie Publique, 15-e, No. I, 1972, pp. 109-115.
5. R. G. Ballinger, R. A. Christensen, R. F. Eilbert, S. T. Old berg and E. T. Rumble, "Fission gas
release and fuel reliability at extended burnup: predictions by the SPEAR-BETA code." Tech.
Rept, EPRI RP971, Enlropy Limited, Lincoln, MA, presented at American Nuclear Society,
Topical Meeting: LWR Extended Burnup-Fuel Performance and Utilization, Williamsburg, VA,
April 4-8, 1982.
6. R. G. Ballinger, R. A. Christensen, R. F. Eilbert, S. T. Oldberg, E. T. Rumble and G. S. Was,
Downloaded by [University of Bristol] at 02:45 26 February 2015
"Clad failure modeling." Zirconium ill the Nuclear industry: Fiflh Conference, cd, by D. G.
Franklin, Am. Society for Testing and Materials, ASTM STP 754,1982, pp. 129-145.
7. W. H. Barker, "Information theory and the optimal detection search," Operations Research. 25,
1977, pp. 304-314.
8. T. P. Barnell, "Statistical prediction of North American air temperatures from Pacific predictors."
MOIIIMy Weather Review, 109, 1981, pp. 1021-1041.
9. T. P. Barnett and K. Hasselmann, "Techniques of linear prediction, with application to oceanic
and atmospheric fields in the tropical Pacific." Rev. Geophys. Space Phys., 17, 1979, pp. 949-968.
10. T. P. Barnett and R. W. Preisendorfer, "Multifield analog prediction of short-term climate
fluctuations using a climate slate vector." Jour. Almos. Sci., 35,1978, pp. 1771-1787.
I I. M. S. Bartlett, "Further aspects of the theory of multiple regression." Proc. Cambridge Phil. Soc.,
34, 1938, pp. 33-40.
12. C. F. Barton, "Computerized axial tomography for neutron radiography of nuclear fuel:' Trans.
Amer. Nuel. Soc., 27, 1977, pp, 212-213.
13. F. M. Bass, "The theory of stochastic preference and brand switching:' Jour. Market Res., II,
1974, pp. 1-20.
14. C. B. Bell, "Mutual information and maximal correlation as measures of dependence." Ann. Math.
Statisr.,33, 1962, pp. 587-595.
15. R. S. Bell and J. W. Loop, "The utility and futility of radiographic skull examination for trauma:'
New England Jour. oj Medicine, 284, 1971, pp. 236-239.
16. P. G. Bergmann and A. C. Thomson, "Generalized statistical mechanics and Onsager relations."
Phys. Rev., 91, 1953, pp. 180-184.
17. N. M. Blachman, "The amount of information that y gives about X." IEEE Trails. on IIIJorm.
Theory, IT-14, 1968, pp. 27-31.
18. R. K. Blashfield and M. S. Aldenderfer, "The literature on cluster analysis:' Multiv. Behav. Res.,
13,1978, pp. 271-295.
19. S. A. Borg and S. Rosenthal, Handbook oj Cancer Diagnosis and Staging, A Clinical Atlas. J.
Wiley, New York, 1984.
20. A. van den Bos, "Alternative interpretation of maximum entropy spectral analysis." IEEE Trans.
InJor. Theory, IT-17, 1971, pp. 493-494.
21. C. T. Bosch and B. A. Valde, "Consultants' report to the OPA/PACE special task force on
underground storage tanks." Petroleum Assoc. for Conservation of the Canadian Environment,
Ottowa, Feb. 1978.
22. B. E. Boyle, "Symptom partitioning by information maximization:' NIH Grant 5 POI GM 14940-
05, Mass. lnst. of Tech., Cambridge, MA, 1972. Entropy Minimax Sourcebook, 4, Entropy Pub.,
Lincoln, MA, 1981, pp. 201-210.
23. J. Brandrnan, R. M. Bukowski, R. Greenstreet, J. S. Hewlett and G. C. Hoffman, "Prognostic
factors affecting remission, remission duration and survival in adult acute non lymphocytic
leukemia." Callcer, 44, 1979, pp. 1062-1065.
24. L. Brieman, J. H. Friedman, R. A. Olshen and C. J. Stone, Classification and Regression Trees.
Wadsworth IntI. Group, Belmont, CA, 1984.
25. G. W. Brier and R. A. Allen, "Verification of weather forecasts:' Compendium of Meteorology, ed.
by T. F. Malone, Amer. Meteorological Society, Boston, MA, 1951, pp. 841-848.
ENTROPY MINIMAX MODELING 297
26. L. Brillouin, "Maxwell's demon cannot operate: information and entropy. I." Jour. of Applied
Physics, 22, 1951, pp. 334-343.
27. L. Brillouin, 'The negentropy principle of information." Jour. of Applied Physics, 24, 1953, pp.
1152-1163.
28. J. P. Burg, "Maximum entropy spectral analysis." Presented at the 37th Meeting of the Society of
Exploration Geophysicists, Oklahoma City, OK, Oct. 31, 1967.
29. J. P. Burg, "The relationship between maximum entropy spectra and maximum likelihood
spectra." Geophysics, 37, 1972, pp. 375-376.
30. J. P. Burg, "Maximum entropy spectral analysis." Ph.D. Dissertation, Dept. Geophys., Stanford
University, Palo Alto, CA, 1975, 136 pp.
31. J. P. Burg, D. G. Luenberger and D. L. Wenger, "Estimation of structured covariance matrices."
Proc. of the IEEE, 70,1982, pp. 963-974.
32. G. W. Burggraf and J. O. Parker, "Prognosis in coronary artery disease." Cirulcation, 51, 1975, pp.
146-156.
33. R. Bussiers and F. Snickers, "Derivation of the negative exponential model by an entropy
maximizing method." Environment and Planning, 2, 1970, pp. 295--301.
34. F. Cabanillas, J. S. Burke, T. L. Smith, T. E. Moon, J. J. Butler and V. Rodriguez, "Factors
predicting for response in adults with advanced non-Hodgkin's lymphoma." Arch. Intern. Med.,
Downloaded by [University of Bristol] at 02:45 26 February 2015
55. R. A. Christensen, "Predicting tank leakage." Conf on Managing Leaking SubsurJace Storage Tank
Risks, Groundwater Technology, Factory Mutual Conference Center, Norwood, MA, May 8-9,
1985.
56. R. A. Christensen, "Entropy minimax multivariate statistical modeling-I: Theory." lntl. Jour.
General Systems, II, 1985, pp. 231-276.
57. R. A. Christensen and R. G. Ballinger, "In-service predictions." Joint EPRI(DOE Fuel Perfor-
mance Contractors' Overview Meeting, Atlanta, GA, April 8, 1980. Entropy Minimax Sourcebook,
4, Entropy Pub., Lincoln, MA, 1981, pp. 79-81.
58. R. A. Christensen. and E. Duchane, "Element-specific failure time estimation from ensemble
statistics." Fuel Rod Mechanical Performance Modeling, Task 3: Fuel Rod Modeling and
Decision Analysis, FRMPM32-2, July 27, 1979, Entropy Minimax Sourcebook, 4, Entropy Pub.,
Lincoln, MA, 1981, pp. 687-696.
59. R. A. Christensen and R. F. Eilbert, "Temperature profiles in UO, fuel under direct electrical
heating conditions." Jour. oj Nuclear Materials, 96, 1981, pp. 285-296.
60. R. A. Christensen and R. F. Eilbert, "Estimating chance correlation likelihood for hazard axis
analysis." Fuel Rod Mechanical Performance Modeling, Task 3: Fuel Rod Modeling and Decision
Analysis, FRMPM33-2 and FRMPM34-I, Nov. 1979, Entropy Minimax Sourcebook, 4, Entropy
Downloaded by [University of Bristol] at 02:45 26 February 2015
76. D. C. Dowson and A. Wragg, "Maximum-entropy distributions having prescribed first and second
moments." IEEE Trans. Infor. Theory, IT-19, 1973, pp. 689-693.
77. B. S. Duran and P. L. Odell, Cluster Analysis; A Survey, Springer-Verlag, Berlin, 1974.
78. 1. R. Durant, R. A. Gams, A. A. Bartolucci and R. F. Dorfman, "BCNU with and without
cyclophosphamide, vincristine and prednisone (COP) and cycle-active therapy in non-Hodgkin's
lymphoma." Cancer Treat. Rep., 61, 1977, pp. 1085-1096. .
79. J. A. Edward and M. M. Fitelson, "Notes on maximum-entropy processing." IEEE Trans. lnfor.
Theory, IT-19, 1973, pp. 232-234.
80. R. F. Eilbert, "Long range weather forecasting study for Central Arizona." Report to Salt River
Project, Phoenix, EL RN-202, 1983.
81. R. F. Eilbert, "Mid-seasonal updating of winter precipitation forecasts for Central Arizona,"
Report to Salt River Project, Phoenix, EL RN·211. 1984.
82. R. F. Eilbert, "Quantitative description of entropy minimax forecasts." Report to Salt River
Project, Phoenix, EL RN-220, 1984.
83. R. F. Eilbert and R.A. Christensen, "Performance of the entropy minimax hydrological forecasts
for California, Water Years 1948-1977." Jour. Climate Appl. Meteor; 22, 1983, pp. 1654-1657.
84. W. M. Elsasser, "On quantum measurements and the role of the uncertainty relations in quantum
mechanics." Phys. Rev., 52, 1937, pp. 987-999.
85. R. S. Emmet and J. C. Livingston, "Underground petroleum storage tanks: local regulation of a
Downloaded by [University of Bristol] at 02:45 26 February 2015
groundwater hazard." Conservation Law Foundation of New England, Boston, MA, 1984.
86. S. England, T. A. Reichert and R. A. Christensen, "Entropy minimax classification of the
secondary structure of adenyl kinase," Entropy Minimax Sourcebook, 4, Entropy Pub., Lincoln,
MA, 1981, pp. 449-454.
87. S. Erlander, Optimal Interaction and the Gravity Models. Springer-Verlag, New York, 1978.
88. S. P. Evans, "A relationship between the gravity model for trip distribution and the transportation
problem in linear programming." Transpn. Res., 7, 1973, pp. 39-61.
89. B. Everitt, Cluster Analysis. J. Wiley, New York, 2nd ed., 1980.
90. R. A. Fisher, "The use of multiple measurements in taxonometric problems." Ann. Eugenics, 7,
1936, pp. 179-188.
91. R. I. Fisher, S. M. Hubbard, V. T. DeVita, C. W. Berard, R. Wesley, J. Cossman and R. C. Young,
"Factors predicting long-term survival in diffuse mixed, histiocytic, or undifferentiated lymphoma."
Blood, 58, 1981, pp. 45-51.
92. E. Fix and J. L. Hodges, Jr., "Discriminatory analysis, non parametric discrimination: consistency
properties." USAF School of Aviation Medicine, Randolf Field, TX, Project 21-49-004, Rept. 4,
Contract AF41(128)-3I, NTIS ATI-II0-633, Feb. 1951.
93. D. H. Foley, "Considerations of sample and feature size." IEEE Trans. lnfor. Theory, IT-18, 1972,
pp. 618-626.
94. D. Franklin, H. Ocken and S. T. Oldberg, "SPEAR code development." LWR Core Materials
Performance Program: Progress in 1979-1980, NTIS EPRINP·I770SR, Electric Power Research
Inst., Palo Alto, CA, 1981, pp. 4.4-4.8.
95. B. R. Frieden, "Restoring with maximum likelihood and maximum entropy." Jour. Optical Soc.
Amer., 62, 1972, pp. 511-518.
96. B. R. Frieden, "Estimation-a new role for maximum entropy." 1976 SPSE Conference Proceed-
ings, ed. by R. Shaw, Society of Photographic Scientists and Engineers, Wash., DC, 1977, pp. 261-
265.
97. B. R. Frieden, "Statistical models for the image restoration problem." Comput. Graph. and Image
Proc., 12, 1980, pp. 40-59.
98. R. A. Gams, M. Raney, A. A. Bartolucci and M. Dandy, "Phase JJI study of BCOP vs. CHOP in
unfavorable categories and malignant lymphoma." Jour. Clinical Oncology (1985 in press).
99. L. L. Gatlin, "The information content of DNA." Jour. Theoret. Bioi., 10, 1966, pp. 281-300.
100. L. L. Gatlin, "The information content of DNA II." Jour. Theoret. BioI., 18, 1968, pp. 181-194.
101. L. L. Gatlin, "The entropy maximum of protein." Math. BioSci., 13, 1972, pp. 213-227.
102. L. L. Gatlin, lrformauon Theory and the Living System. Columbia Univ, Press, New York, 1972,
pp.79-96.
103. D. A. Gift, J. W. Gard and W. R. Schonbein, "Thyroid scanning-pursuing the relationships of
signs and symptoms to nuclide uptake and scan interpretation." Report to U.S. Dept. of Energy,
EX-76-5-02-2777.A003, Dept. of Radiology, Michigan State Univ., E. Lansing, MI, Entropy
Minimax Sourcebook, 4, Entropy Pub., Lincoln, MA, 1981, pp. 417-429.
104. D. A. Gift and W. R. Schonbein, "Diagnostic yield analysis of indications for radionuclide brain
scanning." Report to Ll.S. Dept. of Energy, E(II-l)2777, Dept. of Radiology, Michigan State
Univ., E. Lansing MI, Entropy Minimax Sourcebook, 4, Entropy Pub., Lincoln, MA, 1981, pp.
405-414.
300 R. CHRISTENSEN
105. D. A. Gift. W. R. Schonbcin and E. J. Potchen, "An introduction to entropy rmrnmax pattern
detection and its use in the determination of diagnostic test efficiency." National Cancer Institute,
CA 18871·02 DHEW, Dept. of Radiology, Michigan State Univ., E. Lansing, MI, Sept. 1978,
Entropy Minimax Sourcebook, 4, Entropy Pub., Lincoln, MA, 1981, pp. 385-401.
106. D. A. Gift, W. R. Schonbein, E. L. Saenger and E. J. Potchen, "Application of an information-
theoretic method for efficacy assessment." Jour. of Nuclear Medicine, 26,1985, pp. 807-811.
107. D. V. Gokhale, "Maximum entropy characterization of some distributions." Statistical Distri-
hllliOlIS in Scientific Work, Vol. 3, ed. by G. P. Pati!, S. Kotz and J. K. Ord. D. Reidel,
Dordrecht-Holland, 1975, pp. 299-304.
108. A. Goldin and H. B. Wood, Jr., " Preclinical investigation of alkylating agents in cancer
chemotherapy." Annals of the New York Academy of Sciences, 163, 1969, pp. 954-1005.
109. A. Goldin, H. B. Wood, Jr. and R. R. Engle, "Relation of structure of purine and pyrimidine
nucleosides to antitumor activity." Cancer Chemotherapy Reports, 1 (2), 1968, pp. 1-272.
110. I. J. Good, Probability and the Weighing of Evidence, Chas. Griffin & Co., London, 1950, p. 63.
III. I. J. Good, "Maximum entropy for hypothesis formulation, especially for multidimensional
contingency tables." Ann. Math. Statist., 34, 1963, pp. 911-934.
(12. R. Gordon, R. Bender and G. T. Herman, "Algebraic reconstruction techniques (ARn for three-
dimensional electron microscopy and Xvray photography." Jour. Theo. Biol.; 29, 1970, pp. 471-
Downloaded by [University of Bristol] at 02:45 26 February 2015
481.
113. S. Guiasu, Information Theory with Applications, McGraw-Hili, New York, 1977, pp. 365-378.
114. D. ter Haar, Elements of Statistical Mechanics. Holt, Rinehart and Winston, New York, 1954, p.
160.
115. A. Hai and G. J. Klir, "An empirical investigation of reconstructability analysis: probabilistic
systems." Int. Jour. Man-Machine Studies, 22, 1985, pp. 163-192.
116. K. E. Hammermeister, T. A. DeRouen and H. T. Dodge, "Variables predictive of survival in
patients with coronary disease. Selection of univariate and multivariate analyses from the clinical,
electrocardiographic, exercise, arteriographic, and quantitative angiographic evaJuations." Circul-
atian, 59, 1979, pp. 421-430.
117. r. E. Harrell, "The LOG 1ST procedure." SUGI Supplemental Library User's Guide, 1983 Edition,
. ed. by S. P. Joyner, SAS Institute, Cary, NC, 1983, pp. 181-202.
118. F. E. Harrell, K. L. Lee, R. M. Califf, D. B. Pryor and R. A. Rosati, "Regression modelling
strategies for improved prognostic prediction." Stat. in Med., 3,1984, pp. 143-152.
119. F. E. Harrell, K. L. Lee, D. B. Matchar and T. A. Reichert, "Regression models for prognostic
prediction: advantages, problems, and suggested solutions," Cancer Trealment Reports, 69, 1985,
pp. 1071-1077.
120. P. J. Harris, r. E. Harrell, K. L. Lee, V. S. Behar and R. A. Rosati, "Survival in medically treated
coronary artery disease." Circulation, 60, 1979, pp. 1259-1269.
121. G. N. Hatsopoulos and E. P. Gyftopoulos, "A unified quantum theory of mechanics and
thermodynamics." Foundations of Physics, 6, 1976, pp. 15-31, 127-141,439-455,561-570.
122. S. Haykin (ed.), Nonlinear Methods of Spectral Analysis. Springer-Verlag, New York, 2nd ed.,
1983.
123. S. Haykin and J. Cadzow (eds.), Proc of 'he First ASSP Workshop in Spectral Estimation, IEEE
Acoustic, Speech, Signal Processing Society, McMaster Univ., Hamilton, Ontario, Canada. August
1981.
124. S. Haykin and S. Kesler, "Prediction-error filtering and maximum-entropy spectral estimation,"
. Nonlinear Methods of Spectral Analysis. ed. by S. Haykin, Springer-Verlag, New York, .1979, pp.
9-72.
125. P. Heidke, "Berechnung des erfolges und der gute der windstarkevorhersagen im sturmwarnungs-
dienst." Geogr. Ann. Stockh., 8, 1926, pp. 3W-349.
126. J. N. Helfer, "An identification of overusers of out-patient facilities." Masters Thesis, Biotech-
nology Program, Carnegie-Mellon Univ .• Pittsburg. PA, May 1973.
127. J. D. Herniter, "An entropy model of brand purchase behavior." Jour. Market Res., 10, 1973, pp.
361-375.
128. J. D. Herniter, "A comparison of the entropy model and the Hendry model." Jour. Markel Res.,
11, 1974, pp. 21-29.
129. A. D. Hirschman, "An application of entropy minimax pattern discovery in a multiple class
electrocardiographic problem." BicMedical Engineering Program, Electrical Engineering Dept.,
Carnegie-Mellon Univ., Pittsburgh, PA, Dec. 18, 1975, Entropy Minimax Sourcebook, 4, Entropy
Pub., Lincoln, MA, 1981, pp. 295-315.
ENTROPY MINIMAX MODELING 301
130. A. D. Hirschman, "Methods for efficient compression, reconstruction, and evaluation of digitized
electrocardiograms." Ph.D. Dissertation, BioMedical Engineering Program, Electrical Engineering
Dept., Carnegie-Mellon Univ., Pittsburgh, PA, 1977.
131. A. Hobson, Concepts in Statistical Mechanics. Gordon and Breach, New York, 1971, 172pp.
132. H. Hotelling, "Analysis of a complex of statistical variables into principal components." Jour.
Educ, Psychol., 24, 1933, pp. 417-441.
133. G. F. Hughes, "On the mean accuracy of statistical pattern recognizers." IEEE Trans. Infor.
Theory, IT-14, 1968, pp, 55-63.
134. R. S. Ingarden, "Information theory and variational principles in statistical theories." Bull. Acad.
Polan. Sci., Ser. Sci. Math. Astronom. Phys., 11,1963, pp. 541-547.
135. E. T. Jaynes, "Information theory and statistical mechanics." Phys. Rev., 106, 1957, pp. 620-630;
108, 1957, pp. 171-190.
136. E. T. Jaynes, "New engineering applications of information theory." Proc. oj the First Symposium
on Engineering Applications of Random Function Theory and Probability, ed. by J. L. Bogdanoff
and F. Kozin, J. Wiley, New York, 1963, pp. 163-203.
137. E. T. Jaynes, "Foundations of probability theory and statistical mechanics." Studies in the
Foundations, Methodology and Philosophy oj Science, Vol. I: Delaware Seminar in the Foundations
of Physics, ed. by M. Bunge, Springer-Verlag, New York, 1967, pp. 77-101.
Downloaded by [University of Bristol] at 02:45 26 February 2015
138. E. T. Jaynes, "Prior probabililies." IEEE Trans. Systems Science and Cybernetics, SSC-4, No.3,
1968, pp. 227-241.
139. E. T. Jaynes, "Where do we stand on maximum entropy?" The Maximum Entropy Formalism, ed.
by R. D. Levine and M. Tribus, MIT Press, Cambridge, MA, 1979, pp. 15-118.
140. E. T. Jaynes, "On the rationale of maximum-entropy methods." Proc. oj the IEEE, 70, 1982, pp.
939-952.
141. H. Jeffreys, "Further significance tests." Proc. Camb. Phil. Soc., 32, 1936, pp. 416-445.
142. H. Jeffreys, "An invariant form for the prior probability in estimation problems." Proc. Roy. Soc.
London, Ser. A, 186, 1946, pp. 453-461.
143. H. Jeffreys, Theory of Probability. Oxford at the Clarendon Press, London, 2nd ed., 1948, p. 158.
144. A. M. Kagan, Y. V. Linnik and C. R. Rao, Characterization Problems in Mathematical Statistics. J.
Wiley, New York, 1973, pp. 408-410.
145. 1. D. Kalbfleisch and R. L. Prentice, The Statis/ical Analysis of Failure Time Data, J. Wiley, New
York, 1980.
146. E. Kalnay-Rivas, A. Bayliss and J. Storch, 'The 4th order GISS model of the global atmosphere."
Beitrage zur Physik der Atmosphiire, 50, 1977, pp. 299-311.
147. E. Kalnay-Rivas and R. Livezey, "Weather predictability beyond a week: an introductory review,"
Turbulence and Predictability in Geophysical Fluid Dynamics and Climate Dynamics, ed. by M. Ghil,
R, Benz; and G. Parisi, North-Holland, Amsterdam, 1985, pp. 311-346.
148. E. L. Kaplan and P. Meier, "Nonparametric estimation from incomplete observations." Jour.
Amer, Statist. Assoc., 53, 1958, pp. 457-481.
149. J. N. Kapur, "Twenty-five years of maximum-entropy principle," Jour. of Mathematical and
Physical Sciences, 17, 1983, pp. 103-156.
150. A, Katz, Principles oj Statistical Mechanics; The InJormation Theory Approach. W, H. Freeman,
San Francisco, 1967,
151. C. R. Kennedy, R. A. Christensen and R. F, Eilbert, U0 2 Pellet Fragment Relocation; Kinetics
and Mechanics, NTIS EPRINP-1106, Electric Power Research Inst., Palo Alto, CA, 1979.
152. C. R. Kennedy, D, S, Kupperman and B. J. Wrona, "Acoustic emission from thermal-gradient
cracks in U0 2 . " Mater. £val., 34, 1976, pp, 91-96. '
153. R. Kikuchi and B. H. Soffer, "Maximum entropy image restoration. I. The entropy expression,"
Jour. Optical Soc, Amer., 67, 1977, pp, 1656-1665,
154. J, L. King, "The role of mutation in evolution." Proc. of the Sixth Berkeley Symposium on
Mathematical Statistics and Probability,S, 1971.
155, G. J. Klir and E. C. Way, "Reconstructability analysis: aims, results, open problems," Systems
Research, 2, 1985, pp. 141-163.
156. S. Kullback, "An application of information theory to multivariate analysis," Ann. Math. Statist.,
23, 1952, pp. 88-102.
157. S. Kullback, Information Theory and Statistics. 1. Wiley, New York, 1959, pp. Ill, 120, 143.
[Republished, with corrections and additions, by Dover, New York, 1968],
158. S. Kullback and R, A, Leibler, "On information and sufficiency." Annals of Math. Statist., 22,
1951, pp. 79-86.
159. L. N, Landa, "Logical-informational algorithm for learning theory." Psychological Journal (in
Russian), 2, 1962, pp. 19-40.
302 R. CHRISTENSEN
160. A. A. Langer, "Expansion coefficients on an orthonormal basis as features for the QRS complex."
Ph.D. dissertation, Carnegie-Mellon Univ., Pittsburgh, PA, 1974.
161. S. Lee, L. Rayes, E. Rumble, D. Wheeler and A. Woodis, "Comparison of COMETHE II1-J and
FCODE-BETA fission gas release predictions with measurements." SAO-279-82-PA, EPRI
RP971·2 Report, Science Applications, Inc., Palo Alto, CA, January 1982.
162. E. L. Lehmann, Testing Statistical Hypotheses. 1. Wiley, New York, 1959, p. 173.
163. C. E. Leith, "The standard error of time-average estimates of climatic means." J. Appl. Meteor.,
12, 1973, pp. 1066-1069.
164. C. E. Leith, "The design of a statistical-dynamical climate model and statistical constraints on the
predictability of climate." Appendix 2.2 of The Physical Basis of Climate and Climate Modelling,
World Meteor. Org., No. 16, 1975, 265pp.
165. C. E. Leith, "Predictability of climate." Nature, 276, 1978, pp. 352-355.
166. A. Lent, "A convergent algorithm for maximum entropy image restoration, with a medical X-ray
application." Image Analysis and Evaluation, 1976 SPSE Conference Proceedings, July 19-23, 1976,
Toronto, ed. by R. Shaw, Society of Photographic Scientists and Engineers, Wash., DC, 1977, pp.
249-257.
167. K. Leontiades, "Computationally practical entropy minimax rotations, applications to the iris data
Downloaded by [University of Bristol] at 02:45 26 February 2015
and comparison to other methods." Tech. Rept., BioMedical Engr. Program, Carnegie-Mellon
Univ., Pittsburgh, PA, 1976, Entropy Minimax Sourcebook, 4, Entropy Pub., Lincoln, MA 1981,
pp. 319-327.
168. B'. Lev and H. Theil, "A maximum entropy approach to the choice of asset depreciation." Jour.
Accoullling Res., 16, 1978,pp. 286-293.
169. R. D. Levine and M. Tribus (eds.), The Maximum Entropy Formalism. The MIT Press, Cambridge,
MA,1979,
170. G.N, Lewis, "The symmetry of time in physics." Science, 71,1930, pp. 569-577.
171. P. M. Lewis, "Approximation probability distributions to reduce storage requirements." Infor-
mation and Control, 2,1955, pp, 214-225.
172. P, M. Lewis, "The characteristic selection problem in recognition systems." IRE Trans. on Infor.
Theory, IT-S, 1962, pp. 171-178.
173. E. Lim, Coolant Channel Closure Modeling Using Pattern Recognition: A Preliminary Report,
EPRI FRMPM21-2, SAl-I 75-79-PA, Science Applications, Inc., Palo Alto, CA, June 1979.
174, E. H, Linfoot, "An informational measure of correlation." Information and Control, 1, 1957, pp.
85-89.
175, J. H. C. Lisman and M. C. A. van Zuylen, "Note on the generation of most probable frequency
distributions." Statistica Neerlandica, 26, 1972,pp. 19-23.
176. R. E. Livezey, T. N. Maisel and A. G. Barnston, "Experiments in seasonal prediction by analogs
using modified versions of the Barnett-Preisendorfer system." Proc. Eighth Climate Diagnostics
Workshop, Downsview, Ontario, Oct. 17-21, 1983, CAC(NOAA-S{f 84-115, NTIS PB84-192418,'
pp, 350-356,
177. H. van Loon and R. L. Jenne, "Estimates of seasonal mean temperature, using persistence between
seasons." Monthly Weather ReView, 103, 1975,pp. 1121-1128.
178. E. N. Lorentz, "Empirical orthogonal functions and statistical weather prediction." Rep. I,
Statistical Forecasting Proj. MIT, Cambridge, MA, 1956,49pp.
179, R. A. Madden, "Estimates of the natural variability of time averaged sea-level pressure." Monthly
Weather Review, 104, 1976, pp. 942-952.
180. R, A, Madden and D. 1. Shea, "Estimates of the natural variability to time averaged temperatures
over the United States." Momhiy Weather Review, 106, 1978, pp, 1695-1703,
181. R. K, McConnell, Jr., "Minimum description analysis of faulted data sets." Canadian Explor.
Geophys. Soc-s-Am. Geophys. Union, Mining Geophys, Symp., Toronto, May 22-23, 1980.
182. J. F. McNeer, C. F. Starmer, A, G. Bartel, V. S. Behar, Y. Kong, R. H, Peter and R. A. Rosati,
"The nature of treatment selection in coronary artery disease." Circulation, 49, 1974, pp. 606-614.
183. O. von Mering and L. W. Earley, "The diagnosis of problem patients." Human Organization, 25,
1966, pp. 20-23.
184. H. H, Merrill, A Textbook of Neurology, Lea and Febiger, Philadelphia, PA, 6th ed., 1979,
185. G. N. Minerbo, "MENT: A maximum entropy algorithm for reconstructing a source from
projection data." Comput. Graph. and Image Proc., 10, 1979, pp, 48-68.
186. G, N. Minerbo and J. G. Sanderson, "Reconstruction of a source from a few (2 or 3) projections."
Rept. LA-6747-MS, Los Alamos National Laboratory, Los Alamos, NM, 1977.
187. A. S. Monin, Weather Forecasting as a Problem in Physics (1969), tr. by J. Smagorinsky, MIT
Press, Cambridge, MA, 1972, pp, 147-148.
188, D. F. Morrison, Multivariate Statistical Methods, McGraw-Hili, New York, 2nd ed., 1976, p. 103.
ENTROPY MINIMAX MODELING 303
189. J. Namias, "Multiple causes of the North American abnormal winter 1976-77." Monthly Weather
Review, 106, 1978, pp. 279-295.
190. S. T. Oldberg, "Probabilistic code development." Planning Support Document for the EPRI Light
Water Reactor Fuel Performance Program, ed. by J. T. A. Roberts, F. E. Gelhaus, H. Ocken, N.
Hoppe, S. T. Old berg, G.R. Thomas and D. Franklin, NTIS EPRINP-737SR, Electric Power
Research Inst., Palo Alto, CA, 1978, pp. 2.52-2.57.
191. S. T. Oldberg, "New code development activities." LWR Fuel Performance Program: Progress in
1978, ed. by J. T. A. Roberts, F. E. Gelhaus, H. Ocken, N. Hoppe, S. T. Oldberg, G. R. Thomas
and D. Franklin, EPRI NP-1024SR, Electric Power Research Inst., Palo Alto, CA, 1979, pp. 2.32-
2.39.
192. S. T. Oldberg and R. A. Christensen, "Dealing with uncertainty in fuel rod modeling." Nuclear
Technology, 37, 1978, pp. 40-47.
193. E. Parzen, "Autoregressive spectral estimation, log spectral smoothing and entropy." Proc. of the
First ASSP Workshop in Spectral Estimation, ed. by S. Haykin and J. Cadzow, IEEE Acoustic,
Speech, Signal Processing Soc., McMaster Univ., Hamilton, Ontario, Canada, August 1981, pp.
131-137.
194. E. Patrassi, "Die warmeleitfahigkeit von urandioxid bei sehr hohen temperaturgradienten"
(Thermal conductivity of U0 2 at very high temperature gradients). Jour. Nucl. Mater., 22, 1967,
Downloaded by [University of Bristol] at 02:45 26 February 2015
pp.311-319.
195. W. H. Pearson, "Estimation of a correlation measure from an uncertainty measure." Psycho-
metrika, 31, No.3, 1966, pp. 421-433.
196. H. V. Pipberger, R. J. Arms and F. W. Stallman, "Automatic screening of normal and abnormal
electrocardiograms by means of a digital electronic computer," Proc. of the Soc. for Experimental
Biology and Medicine, 106, 1961, pp. 13(}-132.
197. H. V. Pipberger, "Computer analysis of electrocardiograms." Clinical Electrocardiography and
Computers, ed. by C. A. Caceres and L. S. Dreifus, Academic Press, New York, 1970, pp. 109-119.
198. T. Poston and I. Stewart, Catastrophe Theory and Its Applications. Pitman, London, 1978.
199. E. J. Potchen, "Study on the use of diagnostic radiology." Current Concepts in Radiology, 2,
Mosby Co., St. Louis, MO, 1975, pp. 18-30.
200. E. J. Potchen and W. R. Schonbein, "A strategy to study the use of radiology as an information
system in patient management." Joint Masters Dissertation. Sloan School of Management, Mass.
Inst. of Tech., Cambridge, MA, June 1973, 164pp.
201. R. W. Preisendorfer, "Model skill and model significance in linear regression hindcasts." SID Ref
Series No. 79-12, Scripps Institution of Oceanography, Univ. of Calif., La Jolla, CA, July 1979.
202. W. L. Proudfit, A. V. G. Bruschke and F. M. Sones, Jr., "Natural history of obstructive coronary
artery disease: ten-year study of 601 nonsurgical cases." Progress in Cardiovascular Diseases, 21,
1978, pp. 53-78.
203. C. R. Rao, Linear Statistical Inference and Its Applications. J. Wiley, New York, 2nd ed., 1973.
204. T. A. Reichert, "The amount of information stored in proteins and other short biological code
sequences." Proc. of the Sixth Berkeley Symposium on Mathematical Statistics and Probability,S,
1971, pp. 297-309.
205. T. A. Reichert, "Patterns of overuse of health care facilities-A comparison of methods." Proc. of
the IEEE 1973 Inti. Conf. on Cybernetics and Society, IEEE Systems, Man and Cybernetics
Society, 73 CHO 799-7 SMC, Boston, MA, November 5-7, 1973, pp. 328-329.
206. T. A. Reichert, "The security hyperannulus-a decision assist device for medical diagnosis." Proc.
of the Twenty-Seventh Annual Conf on Engineering in Medicine and Biology, Alliance for
Engineering in Medicine and Biology, Philadelphia, PA, October 6-10, 1974, p. 331.
207. T. A. Reichert and R. A. Christensen, "Validated predictions of survival in coronary artery
disease." Entropy Minimax Sourcebook, 4, Entropy Pub., Lincoln, MA, 1981, pp. 457-490.
208. T. A. Reichert and R. A. Christensen, "Anticipating and compensating for deviations from training
experience." Fifth Annual Meeting, Society for Medical Decision Making, Toronto, Canada,
October 2-5, 1983.
209. T. A. Reichert, R. A. Christensen and A. A. Bartolucci, "Patterns of prognosis I: Survival in
advanced non-Hodgkin's lymphomas." (1985 in preparation).
210. T. A. Reichert, R. A. Christensen, A. A. Bartolucci and C. Walker, "Patterns of survival in
advanced non-Hodgkin's lymphoma." Proc. of the Second Inti. Conf on Malignant Lymphoma,
Swiss League Against Cancer, Lugano, Switzerland, 1984. Malignant Lymphomas and Hodgkin's
Disease: Experimental and Therapeutic Advances, ed. by F. Cavalli, G. Bonadonna and M.
Rozencweig, Martinus NijholT, (1985 in press), 653 pp.
211. T. 'A. Reichert and A. J. Krieger, "Quantitative certainty in dilTerentia1 diagnosis." Proc. of 'he
Second Inti. Joint Conf on Pattern Recognition, IEEE 74 CHO 885-4C, Copenhagen, Denmark,
August 13-15, 1974, pp. 434-437.
304 R. CHRISTENSEN
212. T. A. Reichert and Y~ Stephanedes, "Aden dum on dilTerential diagnosis of three diseases of the
cervical spine." Entropy Minimax Sourcebook, 4, Entropy Pub., Lincoln, MA, 1981, pp. 255-258.
213. T. A. Reichert and A. K. C. Wong, "A finite state information source view of molecular genetic
phenomena." Proe. of the Pittshurgh Symposium 011 Modelling and Simulatioll, 1971, pp. 45-51.
214. T. A. Reichert, J. M. C. Yu and R. A. Christensen, "Molecular evolution as a process of message
refinement." Jour. of Molecular Euouuton. 8, 1976, pp. 41-54. .
215. A. Renyi, WahrscheinJichkeitsrechntmg, mit einem Anhanguber lrformaiionstheorie, Deutscher
Verlag der Wissenschaften, Berlin, 1962.
216. J. M. Richardson, "The hydrodynamical equations of a one component system derived from
nonequilibrium statistical mechanics." Jour. Math. Anal. Appl., I, 1960, pp. 12-60.
217. E. A. Robinson, "A historical perspective of spectrum estimation." Proc. of the IEEE, 70, 1982, pp.
885-907.
218. T. E. Rosmond, "NOGAPS: Navy Operational Global Atmospheric Prediction System." Fifth
Conference on Numerical Weather Prediction, Amer. Meteor. Soc., Monterey, CA, November 2-6,
1981, pp. 74-79.
219. E. H. Ruspini, "A new approach to clustering." Inform. and Control, 15, 1969, pp. 22-32.
220. E. L. Saenger, C. R. Buncher, B. L. Specker and R. A. McDevitt, "Determination of clinical
Downloaded by [University of Bristol] at 02:45 26 February 2015
efficacy: nuclear medicine as applied to lung scanning." Jour. of Nuclear Medicine, 26, 1985, pp.
793-806.
221. W. R. Schonbein, "Identification of patterns in diagnostic attributes in skull trauma cases using an
entropy minimax approach." Presented at IEEE Inti. Conf. on Cybernetics and Society, NTIS COO-
2427-2, Sloan School, MIT, Cambridge, MA, 1973. .
222, W. R. Schonbein, "Analysis of decisions and information in patient management." Current
Concept ill Radiology, 2, Mosby Co., SI. Louis, MO, 1975, pp. 31-58.
223. G. E. Schultz, C. D. Barry, J. Friedman, P. Y. Chou, G. D. Fasman, A. V. Finkelstein, V. I. Lim,
O.B. Ptitsyn, E. A. Kabat, T. T. Wu, M. Levitt, B. Robson and K. Nagano, "Comparison of
predicted and experimentally determined secondary structure of adenyl kinase." Nature, 250, 1974,
pp. 140-142.
224. G. Sebestyen and J. Edie, "Pattern recognition research." AFCRL-64-821, NTIS AD-608-692,
Litton Systems, lnc., Waltham, MA, June 14, 1964.
225. C. E. Shannon, "A mathematical theory of communication." The Bell Sys. Tech. Jour., 27, 1948,
pp. 379-423, 623-656.
226. D. J. Shea, "Sensitivity studies on the estimates of climate noise and potential long range
predictability of January temperature and precipitation over the U.S. and Canada." Proc. of the
Eighth' Climate Diagnostics Work.,hop, Downsview, Ontario, October 17-21, 1983, CAC(NOAA-
s/r 84-115, NTIS PB84-192418, pp. 313-321.
227. J. E. Shore, "Derivation of equilibrium and time-dependent solutions to MIMloollN and MIMloo
queueing systems using entropy maximization." Nat. Computer Conf, AFIPS, Anaheim, CA, June
5-8, 1978, pp. 483-487.
228. J. E. Shore and R. M. Gray, "Minimum cross-entropy pattern classification and cluster analysis."
I EEE Trails. 011 Pattern Analysis and Machine Intelligence, PAMI-4, 1982, pp. 11-17.
229. T. F. Smith, "The genetic code, information density, and evolution." Math. BioSei., 4, 1969, pp,
179-187.
230. T. L. Smith, E. A. Gehan, M. J. Keating and E. J. Freireich, "Prediction of remission in adult
leukemia."·Cancer,SO, 1982, pp. 466-472.
231. H. von Storch and G. Hannoschock, "Statistical aspects of estimated principal vectors (EOFs)
based on small sample sizes." Jour. Climate Appl. Meteor., 24, 1985, pp. 716--724.
232. L. Szilard, "On the decrease of entropy in a thermodynamic system by the intervention of
intelligent beings." Zeitschrift fur Physik, 53, 1929, pp. 840-856; tr. by A. Rapoport and M.
Knoller in Behavioral Science, 9, 1964, pp. 301-310; reprinted in Quantum Theory and Measure-
mem, ed, by J. A. Wheeler and W. H. Zurek, Princeton Univ. Press, Princeton, 1983, pp. 539-548.
233. R. Thom, Structural Stability and Morphogenesis (1972), tr. by D. H. Fowler, Benjamin-Addison
Wesley, New York, 1975.
234. P. D. Thompson, "A heuristic theory of large-scale turbulence and long-period velocity variations
in barotropic now." Tellus,9, 1957, pp. 69-91.
235. M. Tribus, "Information theory as the basis for thermostatics and thermodynamics." Jour. Appl.
Meek, 28, 1961, pp. 1-8.
236. M. Tribus, Thermostatics and Thermodynamics. An Introduction to Energy, Information and States
of Matter, with. Engineering Applications, D. Van Nostrand, Princeton, NJ, 1961.
237. M. Tribus, "The use of the maximum entropy estimate in the estimation of reliability:' Recent
Deielopmerus in lnformatton and Decision Processes, ed. by R. E. Machol and P. Gray, Macmillan,
New York, 1962, pp. 102-140.
ENTROPY MINIMAX MODELING 305
238. M. Tribus, RatiO/la1 Descriptions, Decisions, and Designs, Pergamon Press, New York, 1969.
239. C. A. Truesdell, The Tragicomical History of Thermodynamics 1822-1854, Springer-Verlag, New
York, 1980.
240. N. S. Tzannes and J. P. Noonan, "The mutual information principle and applications:' Inform.
and Control. 22, 1973, pp. 1-12.
241. T. J. Ulrych and T. N. Bishop. "Maximum entropy spectral analysis and autoregressive
decomposition." Rev. Geophysics and Space Physics, 43, 1975, pp. 183-200.
242. I. Vincze, "An interpretation of the I-divergence of information theory:' Trans. of the Second
Prague Con! on Information Theory, Statistical Decision Functions, Random Processes, Prague.
June 1-6, 1959, Academic Press, New York, 1960, pp. 681-684.
243. S. H. Walker and D. B. Duncan, "Estimation of the probability of an event as a function of
several independent variables:' Biometrika, 54,1967, pp. 167-179.
244. C. S. Wallace and D. M. Boulton, "An information measure for classification," Computer Journal,
II, 1968, pp. 185-194.
245. J. N. Walton, Brain's Diseases of the Nervous System. Oxford Univ. Press, London, 7th ed., 1969.
246. G. S. Was, R. A. Christensen. C. Park and R. W. Smith, "Statistical patterns of fuel failure in
stainless steel clad light water reactor fuel rods:' Nuclear Technology, 71, 1985, pp. 445-457.
247. S. Watanabe, "Une explication mathernatique du classement d'objets." Information and Prediction
Downloaded by [University of Bristol] at 02:45 26 February 2015
in Science, ed. by S. Dockx and P. Bernays, Academic Press, New York, 1965, pp. 39-76.
248. S. Watanabe, Knowing and Guessing, J. Wiley, New York, 1969.
249. M. J. Webber, Informalion Theory and Urban Spatial SUlIcture, Croom Helm, London, 1979.
250. N. Wiener, Cybernetics, The MIT Press, Cambridge, MA, 2nd ed., 1961.
251. A. G. Wilson, Entropy In Urban and Regional Modelling, Pion, London, 1970.
252. L. J. Wilson and H. R. Stanski, "Assessment of operational REEP/MDA probability of
precipitation forecasts." Eighth Conference on Probability and Statistics in Atmospheric Sciences,
Amer. Meteorological Society, Hot-Springs, AR, November 16-18, 1983, pp. 193-199.
253. S. Wold. "Pattern recognition by means of disjoint principal components models." Tech. Rept.
No.2, Research Group for Chernometrics, Umea Univ., Sweden, March 1975.
254. P. H. Woods, W. H. Tusa, P. J. Sausville, J. W. Ritz and W. E. Blain, "Technology for the storage
of hazardous liquids, a state-of-the-art review." Dept. of Environmental Conservation, Albany,
NY, January 1983.
255. P. M. Woodward and 1. L. Davies, "Information theory and inverse probability in telecommunic-
ation." Proc. IEEE, 99, Part 3,1952, pp. 37-44.
256. A. Wragg and D. C. Dowson, "Fitting continuous probability density functions over [0,00] using
information theory ideas." IEEE Trans. Infor. Theory, IT-16, 1970, pp. 226-230.
257. B. J. Wrona, J. T. A. Roberts, E. Johanson and W. D. Tuohig, "First report on apparatus \0
simulate in-reactor transient heating conditions in oxide fuel columns." Nucl. Technol., 20, 1973,
pp. 114--123.
Ronald Christensen has headed Entropy Limited, conducting statistical modeling research in science,
engineering and medicine, since 1973. Previously, he held research positions at IBM, the RAND Corporation
and the Lawrence Berkeley Laboratory. He has taught and conducted research at Carnegie-Mellon
University, the University of Maine and the University of California, Berkeley, and is the author of General
Description of Entropy Minimax, 1981, Multivariate Statistical Modeling, 1983,Order and Time, 1984, Data
Distributions. t984, and other books and papers in physics, statistics, and predictive modeling.
Dr. Christensen received a Ph.D. in Theoretical Physics from the University of California, Berkeley, a J.D.
from Harvard Law School, an M.S. in Mechanical Engineering from the California Institute of Technology,
and a B.S. in Electrical Engineering from Iowa State University. He is a rnemberIof the American
Mathematical Society, the American Statistical Association and the American Physical Society.