Cumulant Correlators From The APM
Cumulant Correlators From The APM
Cumulant Correlators From The APM
Istv
an Szapudi1 and Alexander S. Szalay2
1 NASA/Fermilab
Abstract
This work presents a set of new statistics, the cumulant correlators (CC), aimed at high precision
analysis of the galaxy distribution. They form a symmetric matrix, QN M , related to moment correlators
the same way as cumulants are related to the moments of the distribution. They encode more information
than the usual cumulants, SN s, and their extraction from data is similar to the calculation of the two-point
correlation function. Perturbation theory (PT), its generalization, the extended perturbation theory (EPT),
and the hierarchical assumption (HA) have simple predictions for these statistics. As an example, the
factorial moment correlators measured by Szapudi, Dalton, Efstathiou & Szalay (1996, hereafter SDES) in
the APM catalog are reanalyzed using this technique. While the previous analysis assumed hierarchical
structure constants, this method can directly investigate the validity of HA, along with PT, and EPT. The
results in agreement with previous findings indicate that, at the small scales used for this analysis, the
APM data supports HA. When all non-linear corrections are taken into account it is a good approximation
at the 20 percent level. It appears that PT, and a natural generalization of EPT for CC does not provide
such a good fit for the APM at small scales. Once the validity the HA is approximately established, CCs
can separate the amplitudes of different tree-types in the hierarchy up to fifth order. As an example, the
weights for the fourth order tree topologies are calculated including all non-linear corrections.
keywords large scale structure of the universe galaxies: statistics methods: data analysis
methods: statistical
2
1.
Introduction
Direct determination of higher order correlation functions (Fry & Peebles 1978, Peebles 1980,
and references therein) is burdened with the combinatorial explosion of terms, which severely
complicates their measurement and interpretation. Thus in the recent years indirect methods
became increasingly popular for high precision measurements of higher order correlations. The
simplest of these methods consists of calculating the (factorial) moments of the distribution
of counts in cells, and from that, the cumulants, SN s, of the underlying distribution (see e.g.
Peebles 1980, Gazta
naga 1992, Bouchet et al. 1993, Gazta
naga 1994, Colombi et al. 1995, Szapudi,
Meiksin, & Nichol 1996). For a point process, these quantities measure the amplitude of the
N -point correlation function averaged in a particular window. The advantages of this technique
lie in its simplicity, and its direct relation to the predictions of PT (Peebles 1980, Juszkiewicz,
Bouchet, & Colombi 1993, Bernardeau 1992, Bernardeau 1994, EPT (Colombi et al. 1996) and
the HA (Peebles 1980). Since the averaging causes a significant loss of information, alternative
methods based on moment correlators use a pair of cells (Szapudi, Szalay & Bosch
an 1992, Meiksin,
Szapudi, & Szalay 1992, SDES). In the past such methods were used mainly to estimate the
average amplitude of the different N -point correlation functions in the HA, the QN s, motivated
by the theory of the BBKGY equations in the strong clustering regime. This work presents an
alternative analysis of the factorial moment correlators which is free of assumptions, except for
the widely accepted infinitesimal Poisson model to relate the continuum limit quantities to the
measured discrete process. Instead of fitting for the QN , a matrix QN M is defined: the CCs.
Both HA and PT have specific predictions for these possibly scale dependent quantities. After
elaborating these predictions, the method is illustrated by reanalyzing the factorial moment
correlators obtained from the APM catalog by SDES. Once the HA is established, CCs contain
enough information to separate the weights of different tree topologies up to fifth order. The next
section outlines the basic theory, section 3 presents the predictions of PT, EPT, and HA. The
measurements of the 4th order coefficients of the hierarchy from the APM catalog are described
in section 4.
2.
Theory
Following SDES we define the factorial moment correlators for a pair of cells separated by a
distance r12 as
h(N1 )k (N2 )l i h(N )k i h(N )l i
, k 6= 0, l 6= 0,
(1)
wkl (r12 ) =
hN ik+l
h(N )k i
hN ik
(2)
The notation (N )k = N (N 1)..(N 1 + k) is introduced for the factorial moments of the counts in
cells, hi denotes averaging over all cell positions in the survey. The connection with the fluctuations
3
of the underlying field, , can be obtained by formally substituting (N )k / hN ik (1 + )k . The
generating function for the factorial moments in terms of the cumulants QN is
W (x) = exp
N xN Q N ,
(3)
N =1
with
N N 2 sN 1
,
(4)
N!
where s = , the variance in a cell. The generating function can be written in the above form
for any distribution that has cumulants. Generally, the QN s can have a scale dependence, while
for the HA QN = const is expected. Note the connection with the popular alternative notation,
SN = QN N N 2 exactly. Similarly, the generating function of the factorial moment correlators can
be written as
W (x, y) = W (x)W (y) (exp Q(x, y) 1) ,
(5)
N =
with
Q(x, y) = l
xM y N QN M M N N M.
(6)
M =1,N =1
This latter equation defines the CCs, QN M , with l = w11 , the two-point correlation function
between the cells. Typically in the APM survey, l s (= ) < 1 Note that the linear dependence
is factored out, however, QN M is not necessarily a constant.
Cumulants and CCs are related to the continuum limit connected moments because of the
continuum properties of the factorial moments
D
1N
N !E
D
N
1 2M
= QN N
(7)
c
= QN M M N N M l .
(8)
N !M !
Although the above equations are formally identical to SDES, there are two subtle differences:
there is no reference to the hierarchical assumption, therefore QN M becomes a matrix, and it is
understood as an exact equation, i.e. the non-linearities are included. It is convenient to define
N M , which are obtained from the generating function with the
CCs linear in l , denoted by Q
N M s coincide up to normalization with
approximation of exp Q(x, y) 1 Q(x, y) + O(l2 ). The Q
the CN M s calculated from PT by Bernardeau 1995 (see next section). Note that in the following
linear and non-linear always refers to powers of l .
The CCs can be calculated for any well behaved point process by expanding
W (x, y)/[W (x)W (y)] according to equation 5. For instance the third and fourth order
moments are
Q12 1 2 2l =
w12 /2 l
(9)
22 l /2s2 .
and follows that Q22 = Q
w22 /4 w12 + l
l2 /2,
(10)
(11)
4
3.
Predictions
In the highly nonlinear regime, the HA (e.g., Peebles 1980; BS) states that the N -point
correlation functions can be written as a sum of products of N 1 two-point correlation functions.
Each product corresponds to a tree spanning the N -points, and there is a summation over all
possible trees. The different tree topologies, labeled with k, are weighted with a constant QN k .
Our notation in detail can be found in Bosch
an, Szapudi, & Szalay 1994, Szapudi & Colombi 1996.
One of the goals of this paper is validate the HA to an unprecedented accuracy.
Comparing Equation 6 with SDES, and Szapudi & Szalay 1993, yields a linear order
prediction for the HA
N M const.
QN +M Q
(12)
13 , Q
22 , and
For instance the 4th order cumulant Q4 is approximately equal to the linear CCs Q
constant, etc. While form factors from the smoothing were shown to be negligible by Bosch
an,
Szapudi, & Szalay 1994, different tree topologies and non-linear corrections will be taken into
account next for a more accurate prediction.
The only 3rd order CC is Q12 . Tree graphs spanning three points have only one possible
topology (its weight denoted by Q3 with form factors neglected), giving altogether three possible
graphs.
E
D
(13)
12 2 = 2Q12 s l = Q3 (2l s + l2 ),
12 = Q3 at linear order.
reproducing Q
At fourth order there are two CCs Q13 , and Q22 . The sixteen possible trees spanning four
points come in two distinct topologies: four snake graphs and twelve star graphs. Their
respective amplitudes are denoted with Ra and Rb in the HA. Summing all possible graphs with
the appropriate statistical weights gives
D
13 2
= 9Q13 l s2 = 6l s2 Ra + 3l s2 Rb + 6l2 s Ra + l3 Rb ,
(14)
(15)
and
D
12 22
These two equations are linear in Ra and Rb , therefore they can be solved yielding equations (with
22 and
non-linear coefficients in terms of ) in terms of Q13 and Q22 . The linear solution is Ra = Q
Rb = 3Q13 2Q22 .
Direct comparison of Equation 6 with the coefficients CN M in Bernardeau 1995 reveals that
they are identical to the linear order CCs up to normalization
N M N N 1 M M 1 + O( 2 ).
CN M = Q
l
(16)
(17)
5
and the series CN 1 was calculated up to first non-trivial order. The interested reader is referred
to Bernardeau 1995 for detailed predictions in the weakly non-linear regime, for the present work
only Equation 17 is needed.
Although biasing is not investigated in this paper, it is worth to note that it can significantly
change the higher order correlations. In the weakly non-linear regime the results of Fry &
Gazta
naga 1993 should be generalized for CCs. Such a calculation, which is left for subsequent
research, will resolve the remaining ambiguities in the interpretation of CCs.
4.
For an initial assessment, the linear CCs were first calculated from the factorial moment
correlators measured in the APM survey (Maddox et al. 1990a, Maddox et al. 1990a, Maddox
et al. 1990c) by SDES. In what follows, a density map of cell size 0.23 and magnitude cut of
bJ = 17 20 was used (see SDES for the detailed properties of the density maps). The bottom
panel of the Figure shows the measured qN M s (the linear projected CCs; lower case symbols
refer to projected quantities) up to fifth order. To interpret the figures note that the CCs are
characterized by two relevant scales: the angular separation, and the smoothing scale, or cell size.
On the figures, only the separation is shown in degrees, (1 7h1 M pc for this magnitude cut),
while the smoothing length (always 0.23 ) remains implicit. The degeneracy and the approximate
parallel nature of the curves immediately suggest that the HA is a reasonable approximation. At
larger scales the CCs appear to roll off, while the prediction stays flat, and the degeneracy of the
curves is slightly broken. This is mainly due to fact that linear CCs were used, and cumulants are
not exactly constant at all scales as shown by Gazta
naga 1994, Szapudi, Meiksin, & Nichol 1996
(i.e. HA is slightly broken).
The middle panel of the Figure. illustrates equation 17 predicted by leading order PT. The
solid lines are the CCs qN M , N, M > 1, while the dotted lines show the corresponding q1N q1M .
Only the fourth and fifth order are shown. The degree of validity of PT can be judged from how
well the dotted and solid lines match. Since the dotted lines appear to be consistently smaller
than the solid ones this model provides a less accurate description of the data than HA. Possibly,
higher than leading order PT could improve the representation of the data; it is left for future
work.
It can be argued, that PT for the CCs is valid when both relevant scales are in the weakly
non-linear regime. While PT matches the higher order correlations in the APM for larger scales
(Gazta
naga & Frieman 1994), for the small cell size used in this work non-linearities can be
important for the present measurement (Baugh & Gazta
naga 1994). However, it was found in
N -body simulations (Colombi et al. 1996), and galaxy data (Szapudi, Meiksin, & Nichol 1996),
that the higher order correlation amplitudes, QN , measured from counts in cells are similar to
the one prescribed by PT, but with a steeper power spectrum. This phenomenological extension
of PT is the essence of EPT. The previous exercise taken at face value would suggest that EPT
6
cannot be generalized for moment correlators. A rough estimate of the errors based on Equation
12 with scaling the variance from Gazta
naga 1994, and Gazta
naga 1996 (private communication)
yields 5%, 7%, and 7% for the third, fourth, and fifth order respectively. These error-bars, which
are not necessary conservative, could only marginally exclude the natural extension of EPT at
small scales. Further measurements in N -body simulations, and high quality data are needed to
show, whether the EPT paradigm can be applied to CCs.
The HA can be examined with further scrutiny by relaxing the previous assumptions on
linearity and uniform weighting of topologies. The form factors resulting from the pair of cells
are expected to be smaller than the measurement errors and will be still neglected. Counting the
number of degrees of freedom reveals that from the cumulants and CCs it is possible to separate
the different tree topologies up to fifth order. A calculation for the third and fourth order is
presented here. The fifth order calculation is analogous, although somewhat tedious. At higher
than fifth order additional information is needed to separate the different graph types.
The long dashed line on the top panel of the Figure shows the non-linear measurement of q3
as calculated from q21 of the APM according to Eq. 13. The dotted lines show the linear solution
ra , and rb as computed from q22 , and q31 . The hierarchy predicts two horizontal lines, with the
constraint that 16q3 = 12ra + 4rb . The linear approximations on the other hand show a strong
scale dependence, increasing and even crossing over at the smallest scales: a possible sign of
non-linear effects. The full non-linear equations (14, 15) yield the result plotted with solid lines:
the non-linear corrections remove most of the scale dependence, as expected if HA is satisfied.
The residuals are probably due to the neglected form factors, measurement errors. On the left
side of the panel several amplitudes are plotted for comparison; for these points the angular scale
is irrelevant. The three sided symbols refer to third order quantities, the four sided to fourth
order. The filled triangles and squares shows the value of q3 = 1.15 and q4 = 2.2 calculated from
the averaged value of q21 = 1.15, and ra = 1.15, and rb = 5.3, respectively. The open symbols
correspond to the values of q3 = 1.7, and q4 = 4.17 measured from the factorial moments alone,
wk0 , at the scale of the cells. For a comparison, the two stars show the respective measurements of
SDES q3 = 1.16, and q4 = 1.96. The reason that SDES measured a somewhat lower q4 is that they
used linear approximations (dotted lines) only. The measurements of q3 = 1.7 by Gazta
naga 1994
in the APM and q3 = 1.6 Szapudi, Meiksin, & Nichol 1996 at the same cell size, are in excellent
agreement with the results from wk0 . The values for the fourth order in the same sources, q4 = 3.7
and q4 = 3.2, are slightly lower than above, but the agreement is still within 20 30 percent.
The above numbers suggest that, while the different measurements using the same method
are consistent with each other even in different catalogs, there is some disagreement between the
results based on moment correlators and moments. The error distribution studied by Szapudi &
Colombi 1996 provides useful clues to resolve this apparent discrepancy. Since the distribution of
errors is positively skewed and increasingly so for higher order moments, an upward fluctuation is
more likely than a downward. This effect is increasing with the order of the moments measured.
In the method proposed by this work q3 is estimated from the value of q21 . The behavior of the
errors is similar to the multiple of a second and first order quantity, thus the variance is reduced.
7
Note that this is possible, only after the hierarchy is established, i.e. a prior information is used
to reduce the scatter from cosmic errors. An accurate error estimation in this case would involve
a tedious calculation, a non-trivial generalization of Szapudi & Colombi 1996.
The de-projection using the coefficients in SDES yields Q3 = 1, Ra = 0.8, and Rb = 3.7, giving
Q4 = 1.5. This is to be compared with with Fry & Peebles 1978, where the direct determination of
the four-point correlation function from the Lick catalog yielded Ra = 2.5 0.6 and Rb = 4.3 1.2.
These results could give a clue for solving the BBKGY equations in the highly non-linear regime.
The assumption of Hamilton 1988, that only the snake graphs have a contribution, appears to be
close to our results: although both graph types have a contribution, the average is closer to the
snake coefficient. The ansatz of Bernardeau & Schaeffer 1992, Ra Q3 , is not a particularly
good approximation. In conclusion the statistics of the CCs is in excellent agreement with HA.
The method outlined here in conjunction with future data and N -body simulations will be able to
pin down the amplitudes of the higher order correlations with unprecedented accuracy.
We would like to acknowledge discussions with F. Bernardeau, S. Colombi, and J. Frieman,
and suggested improvements by the referee, E. Gazta
naga. The original measurement of the
factorial moment correlators was carried out in collaboration with G. Dalton, and G. Efstathiou.
I.S. was supported by DOE and NASA through grant NAG-5-2788 at Fermilab. A.S.S. was
supported by a NASA LTSA grant.
8
5.
Figure Caption
Lower Panel. The linear CCs, qnm , the main raw results of the paper are displayed up to fifth
order as a function of the angular separation of cells in degrees. The parallel degenerate lines
suggest the HA.
Middle Panel. The linear CCs are shown on a linear scale (solid lines) together with the prediction
from PT (dotted line). The agreement is improving towards the higher scales.
Upper Panel. The hierarchical amplitudes as calculated from the fully non-linear CCs are
displayed. The long dashed line corresponds to the estimator of q3 , the solid lines to the estimator
of ra , and rb , the amplitudes of the fourth order snake, and star graphs, respectively. The dotted
lines show the linear approximation, which breaks down at smaller scales at this level of precision.
The filled symbols mark q3 (triangle), and q4 (square) as calculated from the moment correlators.
The open symbols are the same as measured from the moments of counts in cells only. Finally, the
crosses show the measurements of q3 (triangular), and q4 (square) by SDES for comparison.
REFERENCES
Bernardeau, F. 1992, ApJ, 292, 1
Bernardeau, F. 1994, ApJ, 433, 1
Bernardeau, F. 1995, A&A, 301, 309
Bernardeau, F. 1995, & Schaeffer, R. 1992, A&A, 255, 1
Bosch
an, P., Szapudi, I., & Szalay, A. 1994, ApJS, 93, 65
Baugh, C.M., & Gazta
naga, E. 1996, MNRAS, 280, L37
Bouchet, F.R., Strauss, M.A., Davis, M., Fisher, K.B., Yahil, A., & Huchra, J.P. 1993, ApJ, 417,
36
Colombi, S., Bouchet, F.R., & Hernquist, L. 1995, A&A, 281, 301
Colombi, S., Bernardeau, F., Bouchet, F.R., & Hernquist, L. 1996, (astro-ph 9610253)
Fry, J.N., & Gazta
naga, E. 1993, ApJ, 413, 447
Fry, J.N., & Peebles, P.J.E. 1978, ApJ, 221, 19
Gazta
naga, E. 1992, ApJ, 319, L17
Gazta
naga, E. 1994, MNRAS, 268, 913
Gazta
naga, E., & Frieman, J.A. 1994, ApJ, 437, L13
9
Hamilton, A.J.S. 1988, ApJ, 332,67
Juszkiewicz, R., Bouchet, F. R., & Colombi, S. 1993, ApJ, 412, L9
Maddox, S. J., Efstathiou, G., Sutherland, W. J., & Loveday, L. 1990a, MNRAS, 242, 43P
Maddox, S. J., Sutherland, W. J., Efstathiou, G., & Loveday, L. 1990b, MNRAS, 243, 692
Maddox, S. J., Sutherland, W. J., Efstathiou, G., & Loveday, L. 1990b, MNRAS, 246, 433
Meiksin, A., Szapudi, I., & Szalay, A., 1992, ApJ, 394, 87
Peebles, P.J.E. 1980, The Large Scale Structure of the Universe (Princeton: Princeton University
Press)
Szapudi, I., & Colombi, S. 1996, ApJ, 470, 131
Szapudi, I., Dalton, G., Efstathiou, G.P., & Szalay, A. 1995, ApJ, 444, 520
Szapudi, I., Meiksin, A., & Nichol, R.C. 1996, ApJ, 473, 15
Szapudi, I. & Szalay, A. 1993, ApJ, 408, 43
Szapudi, I., Szalay, A., & Bosch
an, P. 1992, ApJ, 390, 350
This preprint was prepared with the AAS LATEX macros v4.0.