Gretl Guide
Gretl Guide
Gretl Guide
(x x
i
)
2
T
i
1
where T
i
denotes the number of valid observations on x for the given unit, x
i
denotes the group
mean, and the summation is across valid observations for the group. If T
i
< 2, however, the
standard deviation is recorded as 0.
One particular use of psd() may be worth noting. If you want to form a sub-sample of a panel that
contains only those units for which the variable x is time-varying, you can either use
smpl (pmin(x) < pmax(x)) --restrict
or
smpl (psd(x) > 0) --restrict
4.6 Missing data values
Representation and handling
Missing values are represented internally as DBL_MAX, the largest oating-point number that can be
represented on the system (which is likely to be at least 10 to the power 300, and so should not
be confused with legitimate data values). In a native-format data le they should be represented
as NA. When importing CSV data gretl accepts several common representations of missing values
including 999, the string NA (in upper or lower case), a single dot, or simply a blank cell. Blank cells
should, of course, be properly delimited, e.g. 120.6,,5.38, in which the middle value is presumed
missing.
As for handling of missing values in the course of statistical analysis, gretl does the following:
In calculating descriptive statistics (mean, standard deviation, etc.) under the summary com-
mand, missing values are simply skipped and the sample size adjusted appropriately.
Chapter 4. Data les 28
In running regressions gretl rst adjusts the beginning and end of the sample range, trun-
cating the sample if need be. Missing values at the beginning of the sample are common in
time series work due to the inclusion of lags, rst dierences and so on; missing values at the
end of the range are not uncommon due to dierential updating of series and possibly the
inclusion of leads.
If gretl detects any missing values inside the (possibly truncated) sample range for a regression,
the result depends on the character of the dataset and the estimator chosen. In many cases, the
program will automatically skip the missing observations when calculating the regression results.
In this situation a message is printed stating how many observations were dropped. On the other
hand, the skipping of missing observations is not supported for all procedures: exceptions include
all autoregressive estimators, system estimators such as SUR, and nonlinear least squares. In the
case of panel data, the skipping of missing observations is supported only if their omission leaves
a balanced panel. If missing observations are found in cases where they are not supported, gretl
gives an error message and refuses to produce estimates.
Manipulating missing values
Some special functions are available for the handling of missing values. The boolean function
missing() takes the name of a variable as its single argument; it returns a series with value 1 for
each observation at which the given variable has a missing value, and value 0 otherwise (that is, if
the given variable has a valid value at that observation). The function ok() is complementary to
missing; it is just a shorthand for !missing (where ! is the boolean NOT operator). For example,
one can count the missing values for variable x using
scalar nmiss_x = sum(missing(x))
The function zeromiss(), which again takes a single series as its argument, returns a series where
all zero values are set to the missing code. This should be used with caution one does not want
to confuse missing values and zeros but it can be useful in some contexts. For example, one can
determine the rst valid observation for a variable x using
genr time
scalar x0 = min(zeromiss(time * ok(x)))
The function misszero() does the opposite of zeromiss, that is, it converts all missing values to
zero.
It may be worth commenting on the propagation of missing values within genr formulae. The
general rule is that in arithmetical operations involving two variables, if either of the variables has
a missing value at observation t then the resulting series will also have a missing value at t. The
one exception to this rule is multiplication by zero: zero times a missing value produces zero (since
this is mathematically valid regardless of the unknown value).
4.7 Maximum size of data sets
Basically, the size of data sets (both the number of variables and the number of observations per
variable) is limited only by the characteristics of your computer. Gretl allocates memory dynami-
cally, and will ask the operating system for as much memory as your data require. Obviously, then,
you are ultimately limited by the size of RAM.
Aside from the multiple-precision OLS option, gretl uses double-precision oating-point numbers
throughout. The size of such numbers in bytes depends on the computer platform, but is typically
eight. To give a rough notion of magnitudes, suppose we have a data set with 10,000 observations
on 500 variables. Thats 5 million oating-point numbers or 40 million bytes. If we dene the
Chapter 4. Data les 29
megabyte (MB) as 1024 1024 bytes, as is standard in talking about RAM, its slightly over 38 MB.
The program needs additional memory for workspace, but even so, handling a data set of this size
should be quite feasible on a current PC, which at the time of writing is likely to have at least 256
MB of RAM.
If RAM is not an issue, there is one further limitation on data size (though its very unlikely to
be a binding constraint). That is, variables and observations are indexed by signed integers, and
on a typical PC these will be 32-bit values, capable of representing a maximum positive value of
2
31
1 = 2, 147, 483, 647.
The limits mentioned above apply to gretls native functionality. There are tighter limits with
regard to two third-party programs that are available as add-ons to gretl for certain sorts of time-
series analysis including seasonal adjustment, namely TRAMO/SEATS and X-12-ARIMA. These pro-
grams employ a xed-size memory allocation, and cant handle series of more than 600 observa-
tions.
4.8 Data le collections
If youre using gretl in a teaching context you may be interested in adding a collection of data les
and/or scripts that relate specically to your course, in such a way that students can browse and
access them easily.
There are three ways to access such collections of les:
For data les: select the menu item File, Open data, Sample le, or click on the folder icon
on the gretl toolbar.
For script les: select the menu item File, Script les, Practice le.
When a user selects one of the items:
The data or script les included in the gretl distribution are automatically shown (this includes
les relating to Ramanathans Introductory Econometrics and Greenes Econometric Analysis).
The program looks for certain known collections of data les available as optional extras,
for instance the datales from various econometrics textbooks (Davidson and MacKinnon,
Gujarati, Stock and Watson, Verbeek, Wooldridge) and the Penn World Table (PWT 5.6). (See
the data page at the gretl website for information on these collections.) If the additional les
are found, they are added to the selection windows.
The program then searches for valid le collections (not necessarily known in advance) in
these places: the system data directory, the system script directory, the user directory,
and all rst-level subdirectories of these. For reference, typical values for these directories
are shown in Table 4.1. (Note that PERSONAL is a placeholder that is expanded by Windows,
corresponding to My Documents on English-language systems.)
Linux MS Windows
system data dir /usr/share/gretl/data c:\Program Files\gretl\data
system script dir /usr/share/gretl/scripts c:\Program Files\gretl\scripts
user dir $HOME/gretl PERSONAL\gretl
Table 4.1: Typical locations for le collections
Any valid collections will be added to the selection windows. So what constitutes a valid le collec-
tion? This comprises either a set of data les in gretl XML format (with the .gdt sux) or a set of
Chapter 4. Data les 30
script les containing gretl commands (with .inp sux), in each case accompanied by a master
le or catalog. The gretl distribution contains several example catalog les, for instance the le
descriptions in the misc sub-directory of the gretl data directory and ps_descriptions in the
misc sub-directory of the scripts directory.
If you are adding your own collection, data catalogs should be named descriptions and script
catalogs should be be named ps_descriptions. In each case the catalog should be placed (along
with the associated data or script les) in its own specic sub-directory (e.g. /usr/share/gretl/
data/mydata or c:\userdata\gretl\data\mydata).
The syntax of the (plain text) description les is straightforward. Here, for example, are the rst
few lines of gretls misc data catalog:
# Gretl: various illustrative datafiles
"arma","artificial data for ARMA script example"
"ects_nls","Nonlinear least squares example"
"hamilton","Prices and exchange rate, U.S. and Italy"
The rst line, which must start with a hash mark, contains a short name, here Gretl, which
will appear as the label for this collections tab in the data browser window, followed by a colon,
followed by an optional short description of the collection.
Subsequent lines contain two elements, separated by a comma and wrapped in double quotation
marks. The rst is a datale name (leave o the .gdt sux here) and the second is a short de-
scription of the content of that datale. There should be one such line for each datale in the
collection.
A script catalog le looks very similar, except that there are three elds in the le lines: a lename
(without its .inp sux), a brief description of the econometric point illustrated in the script, and
a brief indication of the nature of the data used. Again, here are the rst few lines of the supplied
misc script catalog:
# Gretl: various sample scripts
"arma","ARMA modeling","artificial data"
"ects_nls","Nonlinear least squares (Davidson)","artificial data"
"leverage","Influential observations","artificial data"
"longley","Multicollinearity","US employment"
If you want to make your own data collection available to users, these are the steps:
1. Assemble the data, in whatever format is convenient.
2. Convert the data to gretl format and save as gdt les. It is probably easiest to convert the data
by importing them into the program from plain text, CSV, or a spreadsheet format (MS Excel
or Gnumeric) then saving them. You may wish to add descriptions of the individual variables
(the Variable, Edit attributes menu item), and add information on the source of the data (the
Data, Edit info menu item).
3. Write a descriptions le for the collection using a text editor.
4. Put the datales plus the descriptions le in a subdirectory of the gretl data directory (or user
directory).
5. If the collection is to be distributed to other people, package the data les and catalog in some
suitable manner, e.g. as a ziple.
If you assemble such a collection, and the data are not proprietary, we would encourage you to
submit the collection for packaging as a gretl optional extra.
Chapter 5
Special functions in genr
5.1 Introduction
The genr command provides a exible means of dening new variables. It is documented in the
Gretl Command Reference. This chapter oers a more expansive discussion of some of the special
functions available via genr and some of the ner points of the command.
5.2 Long-run variance
As is well known, the variance of the average of T random variables x
1
, x
2
, . . . , x
T
with equal vari-
ance
2
equals
2
/T if the data are uncorrelated. In this case, the sample variance of x
t
over the
sample size provides a consistent estimator.
If, however, there is serial correlation among the x
t
s, the variance of
X = T
1
T
t=1
x
t
must be
estimated dierently. One of the most widely used statistics for this purpose is a nonparametric
kernel estimator with the Bartlett kernel dened as
2
(k) = T
1
Tk
_
t=k
_
_
k
_
i=k
w
i
(x
t
X)(x
ti
X)
_
_
, (5.1)
where the integer k is known as the window size and the w
i
terms are the so-called Bartlett weights,
dened as w
i
= 1
i
k+1
. It can be shown that, for k large enough,
2
(k)/T yields a consistent
estimator of the variance of
X.
Gretl implements this estimator by means of the function lrvar(), which takes two arguments:
the series whose long-run variance must be estimated and the scalar k. If k is negative, the popular
choice T
1/3
is used.
5.3 Cumulative densities and p-values
The two functions cdf and pvalue provide complementary means of examining values fromseveral
probability distributions: the standard normal, Students t,
2
, F, gamma, and binomial. The syntax
of these functions is set out in the Gretl Command Reference; here we expand on some subtleties.
The cumulative density function or CDF for a random variable is the integral of the variables
density from its lower limit (typically either or 0) to any specied value x. The p-value (at
least the one-tailed, right-hand p-value as returned by the pvalue function) is the complementary
probability, the integral from x to the upper limit of the distribution, typically +.
In principle, therefore, there is no need for two distinct functions: given a CDF value p
0
you could
easily nd the corresponding p-value as 1 p
0
(or vice versa). In practice, with nite-precision
computer arithmetic, the two functions are not redundant. This requires a little explanation. In
gretl, as in most statistical programs, oating point numbers are represented as doubles
double-precision values that typically have a storage size of eight bytes or 64 bits. Since there are
only so many bits available, only so many oating-point numbers can be represented: doubles do
not model the real line. Typically doubles can represent numbers over the range (roughly) :1.7977
10
308
, but only to about 15 digits of precision.
31
Chapter 5. Special functions in genr 32
Suppose youre interested in the left tail of the
2
distribution with 50 degrees of freedom: youd
like to know the CDF value for x = 0.9. Take a look at the following interactive session:
? genr p1 = cdf(X, 50, 0.9)
Generated scalar p1 (ID 2) = 8.94977e-35
? genr p2 = pvalue(X, 50, 0.9)
Generated scalar p2 (ID 3) = 1
? genr test = 1 - p2
Generated scalar test (ID 4) = 0
The cdf function has produced an accurate value, but the pvalue function gives an answer of 1,
from which it is not possible to retrieve the answer to the CDF question. This may seem surprising
at rst, but consider: if the value of p1 above is correct, then the correct value for p2 is 18.94977
10
35
. But theres no way that value can be represented as a double: that would require over 30
digits of precision.
Of course this is an extreme example. If the x in question is not too far o into one or other tail
of the distribution, the cdf and pvalue functions will in fact produce complementary answers, as
shown below:
? genr p1 = cdf(X, 50, 30)
Generated scalar p1 (ID 2) = 0.0111648
? genr p2 = pvalue(X, 50, 30)
Generated scalar p2 (ID 3) = 0.988835
? genr test = 1 - p2
Generated scalar test (ID 4) = 0.0111648
But the moral is that if you want to examine extreme values you should be careful in selecting the
function you need, in the knowledge that values very close to zero can be represented as doubles
while values very close to 1 cannot.
5.4 Retrieving internal variables
The genr command provides a means of retrieving various values calculated by the program in
the course of estimating models or testing hypotheses. The variables that can be retrieved in this
way are listed in the Gretl Command Reference; here we say a bit more about the special variables
$test and $pvalue.
These variables hold, respectively, the value of the last test statistic calculated using an explicit
testing command and the p-value for that test statistic. If no such test has been performed at the
time when these variables are referenced, they will produce the missing value code. The explicit
testing commands that work in this way are as follows: add (joint test for the signicance of vari-
ables added to a model); adf (Augmented DickeyFuller test, see below); arch (test for ARCH); chow
(Chow test for a structural break); coeffsum (test for the sum of specied coecients); cusum (the
HarveyCollier t-statistic); kpss (KPSS stationarity test, no p-value available); lmtest (see below);
meantest (test for dierence of means); omit (joint test for the signicance of variables omitted
from a model); reset (Ramseys RESET); restrict (general linear restriction); runs (runs test for
randomness); testuhat (test for normality of residual); and vartest (test for dierence of vari-
ances). In most cases both a $test and a $pvalue are stored; the exception is the KPSS test, for
which a p-value is not currently available.
An important point to notice about this mechanismis that the internal variables $test and $pvalue
are over-written each time one of the tests listed above is performed. If you want to reference these
values, you must do so at the correct point in the sequence of gretl commands.
A related point is that some of the test commands generate, by default, more than one test statistic
and p-value; in these cases only the last values are stored. To get proper control over the retrieval
of values via $test and $pvalue you should formulate the test command in such a way that the
result is unambiguous. This comment applies in particular to the adf and lmtest commands.
Chapter 5. Special functions in genr 33
By default, the adf command generates three variants of the DickeyFuller test: one based
on a regression including a constant, one using a constant and linear trend, and one using a
constant and a quadratic trend. When you wish to reference $test or $pvalue in connection
with this command, you can control the variant that is recorded by using one of the ags
--nc, --c, --ct or --ctt with adf.
By default, the lmtest command (which must follow an OLS regression) performs several
diagnostic tests on the regression in question. To control what is recorded in $test and
$pvalue you should limit the test using one of the ags --logs, --autocorr, --squares or
--white.
As an aid in working with values retrieved using $test and $pvalue, the nature of the test to which
these values relate is written into the descriptive label for the generated variable. You can read the
label for the variable using the label command (with just one argument, the name of the variable),
to check that you have retrieved the right value. The following interactive session illustrates this
point.
? adf 4 x1 --c
Augmented Dickey-Fuller tests, order 4, for x1
sample size 59
unit-root null hypothesis: a = 1
test with constant
model: (1 - L)y = b0 + (a-1)*y(-1) + ... + e
estimated value of (a - 1): -0.216889
test statistic: t = -1.83491
asymptotic p-value 0.3638
P-values based on MacKinnon (JAE, 1996)
? genr pv = $pvalue
Generated scalar pv (ID 13) = 0.363844
? label pv
pv=Dickey-Fuller pvalue (scalar)
5.5 The discrete Fourier transform
The discrete Fourier transform can be best thought of as a linear, invertible transform of a complex
vector. Hence, if x is an n-dimensional vector whose k-th element is x
k
= a
k
+ib
k
, then the output
of the discrete Fourier transform is a vector f = (x) whose k-th element is
f
k
=
n1
_
j=0
e
i(j,k)
x
j
where (j, k) = 2i
jk
n
. Since the transformation is invertible, the vector x can be recovered from
f via the so-called inverse transform
x
k
=
1
n
n1
_
j=0
e
i(j,k)
f
j
.
The Fourier transform is used in many diverse situations on account of this key property: the
convolution of two vectors can be performed eciently by multiplying the elements of their Fourier
transforms and inverting the result. If
z
k
=
n
_
j=1
x
j
y
kj
,
then
(z) = (x) C(y).
Chapter 5. Special functions in genr 34
That is, (z)
k
= (x)
k
(y)
k
.
For computing the Fourier transform, gretl uses the external library fftw3: see Frigo and Johnson
(2005). This guarantees extreme speed and accuracy. In fact, the CPU time needed to perform
the transform is O(nlogn) for any n. This is why the array of numerical techniques employed in
fftw3 is commonly known as the Fast Fourier Transform.
Gretl provides two matrix functions
1
for performing the Fourier transform and its inverse: fft and
ffti. In fact, gretls implementation of the Fourier transform is somewhat more specialized: the
input to the fft function is understood to be real. Conversely, ffti takes a complex argument and
delivers a real result. For example:
x1 = { 1 ; 2 ; 3 }
# perform the transform
f = fft(a)
# perform the inverse transform
x2 = ffti(f)
yields
x
1
=
_
_
_
_
1
2
3
_
_
_
_
f =
_
_
_
_
6 0
1.5 0.866
1.5 0.866
_
_
_
_
x
2
=
_
_
_
_
1
2
3
_
_
_
_
where the rst column of f holds the real part and the second holds the complex part. In general,
if the input to fft has n columns, the output has 2n columns, where the real parts are stored in
the odd columns and the complex parts in the even ones. Should it be necessary to compute the
Fourier transform on several vectors with the same number of elements, it is numerically more
ecient to group them into a matrix rather than invoking fft for each vector separately.
As an example, consider the multiplication of two polynomials:
a(x) = 1 +0.5x
b(x) = 1 +0.3x 0.8x
2
c(x) = a(x) b(x) = 1 +0.8x 0.65x
2
0.4x
3
The coecients of the polynomial c(x) are the convolution of the coecients of a(x) and b(x);
the following gretl code fragment illustrates how to compute the coecients of c(x):
# define the two polynomials
a = { 1, 0.5, 0, 0 }
b = { 1, 0.3, -0.8, 0 }
# perform the transforms
fa = fft(a)
fb = fft(b)
# complex-multiply the two transforms
fc = cmult(fa, fb)
# compute the coefficients of c via the inverse transform
c = ffti(fc)
Maximum eciency would have been achieved by grouping a and b into a matrix. The computa-
tional advantage is so little in this case that the exercise is a bit silly, but the following alternative
may be preferable for a large number of rows/columns:
# define the two polynomials
a = { 1 ; 0.5; 0 ; 0 }
b = { 1 ; 0.3 ; -0.8 ; 0 }
1
See chapter 13.
Chapter 5. Special functions in genr 35
# perform the transforms jointly
f = fft(a ~ b)
# complex-multiply the two transforms
fc = cmult(f[,1:2], f[,3:4])
# compute the coefficients of c via the inverse transform
c = ffti(fc)
Traditionally, the Fourier transform in econometrics has been mostly used in time-series analysis,
the periodogram being the best known example. Example script 5.1 shows how to compute the
periodogram of a time series via the fft function.
Example 5.1: Periodogram via the Fourier transform
nulldata 50
# generate an AR(1) process
series e = normal()
series x = 0
x = 0.9*x(-1) + e
# compute the periodogram
scale = 2*pi*$nobs
X = { x }
F = fft(X)
S = sumr(F.^2)
S = S[2:($nobs/2)+1]/scale
omega = seq(1,($nobs/2)) .* (2*pi/$nobs)
omega = omega ~ S
# compare the built-in command
pergm x
print omega
Chapter 6
Sub-sampling a dataset
6.1 Introduction
Some subtle issues can arise here. This chapter attempts to explain the issues.
A sub-sample may be dened in relation to a full data set in two dierent ways: we will refer to these
as setting the sample and restricting the sample; these methods are discussed in sections 6.2
and 6.3 respectively. In addition section 6.4 discusses resampling with replacement, which is useful
in the context of bootstrapping test statistics.
The following discussion focuses on the command-line approach. But you can also invoke the
methods outlined here via the items under the Sample menu in the GUI program.
6.2 Setting the sample
By setting the sample we mean dening a sub-sample simply by means of adjusting the starting
and/or ending point of the current sample range. This is likely to be most relevant for time-series
data. For example, one has quarterly data from 1960:1 to 2003:4, and one wants to run a regression
using only data from the 1970s. A suitable command is then
smpl 1970:1 1979:4
Or one wishes to set aside a block of observations at the end of the data period for out-of-sample
forecasting. In that case one might do
smpl ; 2000:4
where the semicolon is shorthand for leave the starting observation unchanged. (The semicolon
may also be used in place of the second parameter, to mean that the ending observation should be
unchanged.) By unchanged here, we mean unchanged relative to the last smpl setting, or relative
to the full dataset if no sub-sample has been dened up to this point. For example, after
smpl 1970:1 2003:4
smpl ; 2000:4
the sample range will be 1970:1 to 2000:4.
An incremental or relative form of setting the sample range is also supported. In this case a relative
oset should be given, in the form of a signed integer (or a semicolon to indicate no change), for
both the starting and ending point. For example
smpl +1 ;
will advance the starting observation by one while preserving the ending observation, and
smpl +2 -1
36
Chapter 6. Sub-sampling a dataset 37
will both advance the starting observation by two and retard the ending observation by one.
An important feature of setting the sample as described above is that it necessarily results in
the selection of a subset of observations that are contiguous in the full dataset. The structure of
the dataset is therefore unaected (for example, if it is a quarterly time series before setting the
sample, it remains a quarterly time series afterwards).
6.3 Restricting the sample
By restricting the sample we mean selecting observations on the basis of some Boolean (logical)
criterion, or by means of a random number generator. This is likely to be most relevant for cross-
sectional or panel data.
Suppose we have data on a cross-section of individuals, recording their gender, income and other
characteristics. We wish to select for analysis only the women. If we have a gender dummy variable
with value 1 for men and 0 for women we could do
smpl gender=0 --restrict
to this eect. Or suppose we want to restrict the sample to respondents with incomes over $50,000.
Then we could use
smpl income>50000 --restrict
A question arises here. If we issue the two commands above in sequence, what do we end up with
in our sub-sample: all cases with income over 50000, or just women with income over 50000? By
default, in a gretl script, the answer is the latter: women with income over 50000. The second
restriction augments the rst, or in other words the nal restriction is the logical product of the
new restriction and any restriction that is already in place. If you want a new restriction to replace
any existing restrictions you can rst recreate the full dataset using
smpl --full
Alternatively, you can add the replace option to the smpl command:
smpl income>50000 --restrict --replace
This option has the eect of automatically re-establishing the full dataset before applying the new
restriction.
Unlike a simple setting of the sample, restricting the sample may result in selection of non-
contiguous observations from the full data set. It may also change the structure of the data set.
This can be seen in the case of panel data. Say we have a panel of ve rms (indexed by the variable
firm) observed in each of several years (identied by the variable year). Then the restriction
smpl year=1995 --restrict
produces a dataset that is not a panel, but a cross-section for the year 1995. Similarly
smpl firm=3 --restrict
produces a time-series dataset for rm number 3.
For these reasons (possible non-contiguity in the observations, possible change in the structure of
the data), gretl acts dierently when you restrict the sample as opposed to simply setting it. In
the case of setting, the program merely records the starting and ending observations and uses these
Chapter 6. Sub-sampling a dataset 38
as parameters to the various commands calling for the estimation of models, the computation of
statistics, and so on. In the case of restriction, the program makes a reduced copy of the dataset
and by default treats this reduced copy as a simple, undated cross-section.
1
If you wish to re-impose a time-series or panel interpretation of the reduced dataset you can do so
using the setobs command, or the GUI menu item Data, Dataset structure.
The fact that restricting the sample results in the creation of a reduced copy of the original
dataset may raise an issue when the dataset is very large (say, several thousands of observations).
With such a dataset in memory, the creation of a copy may lead to a situation where the computer
runs low on memory for calculating regression results. You can work around this as follows:
1. Open the full data set, and impose the sample restriction.
2. Save a copy of the reduced data set to disk.
3. Close the full dataset and open the reduced one.
4. Proceed with your analysis.
Random sub-sampling
Besides restricting the sample on some deterministic criterion, it may sometimes be useful (when
working with very large datasets, or perhaps to study the properties of an estimator) to draw a
random sub-sample from the full dataset. This can be done using, for example,
smpl 100 --random
to select 100 cases. If you want the sample to be reproducible, you should set the seed for the
random number generator rst, using set. This sort of sampling falls under the restriction
category: a reduced copy of the dataset is made.
6.4 Resampling and bootstrapping
Given an original data series x, the command
series xr = resample(x)
creates a new series each of whose elements is drawn at random from the elements of x. If the
original series has 100 observations, each element of x is selected with probability 1/100 at each
drawing. Thus the eect is to shue the elements of x, with the twist that each element of x may
appear more than once, or not at all, in xr.
The primary use of this function is in the construction of bootstrap condence intervals or p-values.
Here is a simple example. Suppose we estimate a simple regression of y on x via OLS and nd that
the slope coecient has a reported t-ratio of 2.5 with 40 degrees of freedom. The two-tailed p-
value for the null hypothesis that the slope parameter equals zero is then 0.0166, using the t(40)
distribution. Depending on the context, however, we may doubt whether the ratio of coecient to
standard error truly follows the t(40) distribution. In that case we could derive a bootstrap p-value
as shown in Example 6.1.
Under the null hypothesis that the slope with respect to x is zero, y is simply equal to its mean plus
an error term. We simulate y by resampling the residuals from the initial OLS and re-estimate the
model. We repeat this procedure a large number of times, and record the number of cases where
the absolute value of the t-ratio is greater than 2.5: the proportion of such cases is our bootstrap
p-value. For a good discussion of simulation-based tests and bootstrapping, see Davidson and
MacKinnon (2004, chapter 4).
1
With one exception: if you start with a balanced panel dataset and the restriction is such that it preserves a balanced
Chapter 6. Sub-sampling a dataset 39
Example 6.1: Calculation of bootstrap p-value
ols y 0 x
# save the residuals
genr ui = $uhat
scalar ybar = mean(y)
# number of replications for bootstrap
scalar replics = 10000
scalar tcount = 0
series ysim = 0
loop replics --quiet
# generate simulated y by resampling
ysim = ybar + resample(ui)
ols ysim 0 x
scalar tsim = abs($coeff(x) / $stderr(x))
tcount += (tsim > 2.5)
endloop
printf "proportion of cases with |t| > 2.5 = %g\n", tcount / replics
panel for example, it results in the deletion of all the observations for one cross-sectional unit then the reduced
dataset is still, by default, treated as a panel.
Chapter 7
Graphs and plots
7.1 Gnuplot graphs
A separate program, gnuplot, is called to generate graphs. Gnuplot is a very full-featured graphing
program with myriad options. It is available from www.gnuplot.info (but note that a suitable copy
of gnuplot is bundled with the packaged versions of gretl for MS Windows and Mac OS X). gretl
gives you direct access, via a graphical interface, to a subset of gnuplots options and it tries to
choose sensible values for you; it also allows you to take complete control over graph details if you
wish.
With a graph displayed, you can click on the graph window for a pop-up menu with the following
options.
Save as PNG: Save the graph in Portable Network Graphics format (the same format that you
see on screen).
Save as postscript: Save in encapsulated postscript (EPS) format.
Save as Windows metale: Save in Enhanced Metale (EMF) format.
Save to session as icon: The graph will appear in iconic form when you select Icon view from
the View menu.
Zoom: Lets you select an area within the graph for closer inspection (not available for all
graphs).
Print: (Current GTK or MS Windows only) lets you print the graph directly.
Copy to clipboard: MS Windows only, lets you paste the graph into Windows applications such
as MS Word.
Edit: Opens a controller for the plot which lets you adjust many aspects of its appearance.
Close: Closes the graph window.
Displaying data labels
For simple X-Y scatter plots, some further options are available if the dataset includes case mark-
ers (that is, labels identifying each observation).
1
With a scatter plot displayed, when you move
the mouse pointer over a data point its label is shown on the graph. By default these labels are
transient: they do not appear in the printed or copied version of the graph. They can be removed by
selecting Clear data labels from the graph pop-up menu. If you want the labels to be axed per-
manently (so they will show up when the graph is printed or copied), select the option Freeze data
labels from the pop-up menu; Clear data labels cancels this operation. The other label-related
option, All data labels, requests that case markers be shown for all observations. At present the
display of case markers is disabled for graphs containing more than 250 data points.
1
For an example of such a dataset, see the Ramanathan le data4-10: this contains data on private school enrollment
for the 50 states of the USA plus Washington, DC; the case markers are the two-letter codes for the states.
40
Chapter 7. Graphs and plots 41
GUI plot editor
Selecting the Edit option in the graph popup menu opens an editing dialog box, shown in Figure 7.1.
Notice that there are several tabs, allowing you to adjust many aspects of a graphs appearance:
font, title, axis scaling, line colors and types, and so on. You can also add lines or descriptive labels
to a graph (under the Lines and Labels tabs). The Apply button applies your changes without
closing the editor; OK applies the changes and closes the dialog.
Figure 7.1: gretls gnuplot controller
Publication-quality graphics: advanced options
The GUI plot editor has two limitations. First, it cannot represent all the myriad options that
gnuplot oers. Users who are suciently familiar with gnuplot to know what theyre missing in
the plot editor presumably dont need much help from gretl, so long as they can get hold of the
gnuplot command le that gretl has put together. Second, even if the plot editor meets your needs,
in terms of ne-tuning the graph you see on screen, a few details may need further work in order
to get optimal results for publication.
Either way, the rst step in advanced tweaking of a graph is to get access to the graph command
le.
In the graph display window, right-click and choose Save to session as icon.
If its not already open, open the icon view window either via the menu item View/Icon
view, or by clicking the session icon view button on the main-window toolbar.
Right-click on the icon representing the newly added graph and select Edit plot commands
from the pop-up menu.
You get a window displaying the plot le (Figure 7.2).
Here are the basic things you can do in this window. Obviously, you can edit the le you just
opened. You can also send it for processing by gnuplot, by clicking the Execute (cogwheel) icon
in the toolbar. Or you can use the Save as button to save a copy for editing and processing as you
wish.
Chapter 7. Graphs and plots 42
Figure 7.2: Plot commands editor
Unless youre a gnuplot expert, most likely youll only need to edit a couple of lines at the top of
the le, specifying a driver (plus options) and an output le. We oer here a brief summary of some
points that may be useful.
First, gnuplots output mode is set via the command set term followed by the name of a supported
driver (terminal in gnuplot parlance) plus various possible options. (The top line in the plot
commands window shows the set term line that gretl used to make a PNG le, commented out.)
The graphic formats that are most suitable for publication are PDF and EPS. These are supported
by the gnuplot term types pdf, pdfcairo and postscript (with the eps option). The pdfcairo
driver has the virtue that is behaves in a very similar manner to the PNG one, the output of which
you see on screen. This is provided by the version of gnuplot that is included in the gretl packages
for MS Windows and Mac OS X; if youre on Linux it may or may be supported. If pdfcairo is not
available, the pdf terminal may be available; the postscript terminal is almost certainly available.
Besides selecting a term type, if you want to get gnuplot to write the actual output le you need
to append a set output line giving a lename. Here are a few examples of the rst two lines you
might type in the window editing your plot commands. Well make these more realistic shortly.
set term pdfcairo
set output mygraph.pdf
set term pdf
set output mygraph.pdf
set term postscript eps
set output mygraph.eps
There are a couple of things worth remarking here. First, you may want to adjust the size of the
graph, and second you may want to change the font. The default sizes produced by the above
drivers are 5 inches by 3 inches for pdfcairo and pdf, and 5 inches by 3.5 inches for postscript
eps. In each case you can change this by giving a size specication, which takes the form XX,YY
(examples below).
Chapter 7. Graphs and plots 43
You may ask, why bother changing the size in the gnuplot command le? After all, PDF and EPS are
both vector formats, so the graphs can be scaled at will. True, but a uniform scaling will also aect
the font size, which may end looking wrong. You can get optimal results by experimenting with
the font and size options to gnuplots set term command. Here are some examples (comments
follow below).
# pdfcairo, regular size, slightly amended
set term pdfcairo font "Sans,6" size 5in,3.5in
# or small size
set term pdfcairo font "Sans,5" size 3in,2in
# pdf, regular size, slightly amended
set term pdf font "Helvetica,8" size 5in,3.5in
# or small
set term pdf font "Helvetica,6" size 3in,2in
# postscript, regular
set term post eps solid font "Helvetica,16"
# or small
set term post eps solid font "Helvetica,12" size 3in,2in
On the rst line we set a sans serif font for pdfcairo at a suitable size for a 5 3.5 inch plot
(which you may nd looks better than the rather letterboxy default of 5 3). And on the second
we illustrate what you might do to get a smaller 3 2 inch plot. You can specify the plot size in
centimeters if you prefer, as in
set term pdfcairo font "Sans,6" size 6cm,4cm
We then repeat the exercise for the pdf terminal. Notice that here were specifying one of the 35
standard PostScript fonts, namely Helvetica. Unlike pdfcairo, the plain pdf driver is unlikely to
be able to nd fonts other than these.
In the third pair of lines we illustrate options for the postscript driver (which, as you see, can
be abbreviated as post). Note that here we have added the option solid. Unlike most other
drivers, this one uses dashed lines unless you specify the solid option. Also note that weve
(apparently) specied a much larger font in this case. Thats because the eps option in eect tells
the postscript driver to work at half-size (among other things), so we need to double the font
size.
Table 7.1 summarizes the basics for the three drivers we have mentioned.
Terminal default size (inches) suggested font
pdfcairo 5 3 Sans,6
pdf 5 3 Helvetica,8
post eps 5 3.5 Helvetica,16
Table 7.1: Drivers for publication-quality graphics
To nd out more about gnuplot visit www.gnuplot.info. This site has documentation for the current
version of the program in various formats.
Additional tips
To be written. Line widths, enhanced text. Show a before and after example.
Chapter 7. Graphs and plots 44
7.2 Boxplots
These plots (after Tukey and Chambers) display the distribution of a variable. The central box
encloses the middle 50 percent of the data, i.e. it is bounded by the rst and third quartiles. The
whiskers extend to the minimum and maximum values. A line is drawn across the box at the
median and a + sign identies the mean see Figure 7.3.
0.05
0.1
0.15
0.2
0.25
ENROLL
median
Q1
Q3
mean
Figure 7.3: Sample boxplot
In the case of boxplots with condence intervals, dotted lines show the limits of an approximate 90
percent condence interval for the median. This is obtained by the bootstrap method, which can
take a while if the data series is very long.
After each variable specied in the boxplot command, a parenthesized boolean expression may
be added, to limit the sample for the variable in question. A space must be inserted between the
variable name or number and the expression. Suppose you have salary gures for men and women,
and you have a dummy variable GENDER with value 1 for men and 0 for women. In that case you
could draw comparative boxplots with the following line in the boxplots dialog:
salary (GENDER=1) salary (GENDER=0)
Chapter 8
Discrete variables
When a variable can take only a nite, typically small, number of values, then the variable is said to
be discrete. Some gretl commands act in a slightly dierent way when applied to discrete variables;
moreover, gretl provides a few commands that only apply to discrete variables. Specically, the
dummify and xtab commands (see below) are available only for discrete variables, while the freq
(frequency distribution) command produces dierent output for discrete variables.
8.1 Declaring variables as discrete
Gretl uses a simple heuristic to judge whether a given variable should be treated as discrete, but
you also have the option of explicitly marking a variable as discrete, in which case the heuristic
check is bypassed.
The heuristic is as follows: First, are all the values of the variable reasonably round, where this
is taken to mean that they are all integer multiples of 0.25? If this criterion is met, we then ask
whether the variable takes on a fairly small set of distinct values, where fairly small is dened
as less than or equal to 8. If both conditions are satised, the variable is automatically considered
discrete.
To mark a variable as discrete you have two options.
1. From the graphical interface, select Variable, Edit Attributes from the menu. A dialog box
will appear and, if the variable seems suitable, you will see a tick box labeled Treat this
variable as discrete. This dialog box can also be invoked via the context menu (right-click on
a variable) or by pressing the F2 key.
2. From the command-line interface, via the discrete command. The command takes one or
more arguments, which can be either variables or list of variables. For example:
list xlist = x1 x2 x3
discrete z1 xlist z2
This syntax makes it possible to declare as discrete many variables at once, which cannot
presently be done via the graphical interface. The switch --reverse reverses the declaration
of a variable as discrete, or in other words marks it as continuous. For example:
discrete foo
# now foo is discrete
discrete foo --reverse
# now foo is continuous
The command-line variant is more powerful, in that you can mark a variable as discrete even if it
does not seem to be suitable for this treatment.
Note that marking a variable as discrete does not aect its content. It is the users responsibility
to make sure that marking a variable as discrete is a sensible thing to do. Note that if you want
to recode a continuous variable into classes, you can use the genr command and its arithmetic
functions, as in the following example:
45
Chapter 8. Discrete variables 46
nulldata 100
# generate a variable with mean 2 and variance 1
genr x = normal() + 2
# split into 4 classes
genr z = (x>0) + (x>2) + (x>4)
# now declare z as discrete
discrete z
Once a variable is marked as discrete, this setting is remembered when you save the le.
8.2 Commands for discrete variables
The dummify command
The dummify command takes as argument a series x and creates dummy variables for each distinct
value present in x, which must have already been declared as discrete. Example:
open greene22_2
discrete Z5 # mark Z5 as discrete
dummify Z5
The eect of the above command is to generate 5 new dummy variables, labeled DZ5_1 through
DZ5_5, which correspond to the dierent values in Z5. Hence, the variable DZ5_4 is 1 if Z5 equals
4 and 0 otherwise. This functionality is also available through the graphical interface by selecting
the menu item Add, Dummies for selected discrete variables.
The dummify command can also be used with the following syntax:
list dlist = dummify(x)
This not only creates the dummy variables, but also a named list (see section 12.1) that can be used
afterwards. The following example computes summary statistics for the variable Y for each value
of Z5:
open greene22_2
discrete Z5 # mark Z5 as discrete
list foo = dummify(Z5)
loop foreach i foo
smpl $i --restrict --replace
summary Y
endloop
smpl --full
Since dummify generates a list, it can be used directly in commands that call for a list as input, such
as ols. For example:
open greene22_2
discrete Z5 # mark Z5 as discrete
ols Y 0 dummify(Z5)
The freq command
The freq command displays absolute and relative frequencies for a given variable. The way fre-
quencies are counted depends on whether the variable is continuous or discrete. This command is
also available via the graphical interface by selecting the Variable, Frequency distribution menu
entry.
Chapter 8. Discrete variables 47
For discrete variables, frequencies are counted for each distinct value that the variable takes. For
continuous variables, values are grouped into bins and then the frequencies are counted for each
bin. The number of bins, by default, is computed as a function of the number of valid observations
in the currently selected sample via the rule shown in Table 8.1. However, when the command is
invoked through the menu item Variable, Frequency Plot, this default can be overridden by the
user.
Observations Bins
8 n < 16 5
16 n < 50 7
50 n 850 [
n|
n > 850 29
Table 8.1: Number of bins for various sample sizes
For example, the following code
open greene19_1
freq TUCE
discrete TUCE # mark TUCE as discrete
freq TUCE
yields
Read datafile /usr/local/share/gretl/data/greene/greene19_1.gdt
periodicity: 1, maxobs: 32,
observations range: 1-32
Listing 5 variables:
0) const 1) GPA 2) TUCE 3) PSI 4) GRADE
? freq TUCE
Frequency distribution for TUCE, obs 1-32
number of bins = 7, mean = 21.9375, sd = 3.90151
interval midpt frequency rel. cum.
< 13.417 12.000 1 3.12% 3.12% *
13.417 - 16.250 14.833 1 3.12% 6.25% *
16.250 - 19.083 17.667 6 18.75% 25.00% ******
19.083 - 21.917 20.500 6 18.75% 43.75% ******
21.917 - 24.750 23.333 9 28.12% 71.88% **********
24.750 - 27.583 26.167 7 21.88% 93.75% *******
>= 27.583 29.000 2 6.25% 100.00% **
Test for null hypothesis of normal distribution:
Chi-square(2) = 1.872 with p-value 0.39211
? discrete TUCE # mark TUCE as discrete
? freq TUCE
Frequency distribution for TUCE, obs 1-32
frequency rel. cum.
12 1 3.12% 3.12% *
14 1 3.12% 6.25% *
17 3 9.38% 15.62% ***
Chapter 8. Discrete variables 48
19 3 9.38% 25.00% ***
20 2 6.25% 31.25% **
21 4 12.50% 43.75% ****
22 2 6.25% 50.00% **
23 4 12.50% 62.50% ****
24 3 9.38% 71.88% ***
25 4 12.50% 84.38% ****
26 2 6.25% 90.62% **
27 1 3.12% 93.75% *
28 1 3.12% 96.88% *
29 1 3.12% 100.00% *
Test for null hypothesis of normal distribution:
Chi-square(2) = 1.872 with p-value 0.39211
As can be seen from the sample output, a Doornik-Hansen test for normality is computed auto-
matically. This test is suppressed for discrete variables where the number of distinct values is less
than 10.
This command accepts two options: --quiet, to avoid generation of the histogram when invoked
from the command line and --gamma, for replacing the normality test with Lockes nonparametric
test, whose null hypothesis is that the data follow a Gamma distribution.
If the distinct values of a discrete variable need to be saved, the values() matrix construct can be
used (see chapter 13).
The xtab command
The xtab command cab be invoked in either of the following ways. First,
xtab ylist ; xlist
where ylist and xlist are lists of discrete variables. This produces cross-tabulations (two-way
frequencies) of each of the variables in ylist (by row) against each of the variables in xlist (by
column). Or second,
xtab xlist
In the second case a full set of cross-tabulations is generated; that is, each variable in xlist is tabu-
lated against each other variable in the list. In the graphical interface, this command is represented
by the Cross Tabulation item under the View menu, which is active if at least two variables are
selected.
Here is an example of use:
open greene22_2
discrete Z* # mark Z1-Z8 as discrete
xtab Z1 Z4 ; Z5 Z6
which produces
Cross-tabulation of Z1 (rows) against Z5 (columns)
[ 1][ 2][ 3][ 4][ 5] TOT.
[ 0] 20 91 75 93 36 315
[ 1] 28 73 54 97 34 286
TOTAL 48 164 129 190 70 601
Chapter 8. Discrete variables 49
Pearson chi-square test = 5.48233 (4 df, p-value = 0.241287)
Cross-tabulation of Z1 (rows) against Z6 (columns)
[ 9][ 12][ 14][ 16][ 17][ 18][ 20] TOT.
[ 0] 4 36 106 70 52 45 2 315
[ 1] 3 8 48 45 37 67 78 286
TOTAL 7 44 154 115 89 112 80 601
Pearson chi-square test = 123.177 (6 df, p-value = 3.50375e-24)
Cross-tabulation of Z4 (rows) against Z5 (columns)
[ 1][ 2][ 3][ 4][ 5] TOT.
[ 0] 17 60 35 45 14 171
[ 1] 31 104 94 145 56 430
TOTAL 48 164 129 190 70 601
Pearson chi-square test = 11.1615 (4 df, p-value = 0.0248074)
Cross-tabulation of Z4 (rows) against Z6 (columns)
[ 9][ 12][ 14][ 16][ 17][ 18][ 20] TOT.
[ 0] 1 8 39 47 30 32 14 171
[ 1] 6 36 115 68 59 80 66 430
TOTAL 7 44 154 115 89 112 80 601
Pearson chi-square test = 18.3426 (6 df, p-value = 0.0054306)
Pearsons
2
test for independence is automatically displayed, provided that all cells have expected
frequencies under independence greater than 10
7
. However, a common rule of thumb states that
this statistic is valid only if the expected frequency is 5 or greater for at least 80 percent of the
cells. If this condition is not met a warning is printed.
Additionally, the --row or --column options can be given: in this case, the output displays row or
column percentages, respectively.
If you want to cut and paste the output of xtab to some other program, e.g. a spreadsheet, you
may want to use the --zeros option; this option causes cells with zero frequency to display the
number 0 instead of being empty.
Chapter 9
Loop constructs
9.1 Introduction
The command loop opens a special mode in which gretl accepts a block of commands to be re-
peated zero or more times. This feature may be useful for, among other things, Monte Carlo
simulations, bootstrapping of test statistics and iterative estimation procedures. The general form
of a loop is:
loop control-expression [ --progressive | --verbose | --quiet ]
loop body
endloop
Five forms of control-expression are available, as explained in section 9.2.
Not all gretl commands are available within loops. The commands that are not presently accepted
in this context are shown in Table 9.1.
Table 9.1: Commands not usable in loops
corrgm cusum data eqnprint function hurst include leverage
nulldata open rmplot run scatters setmiss setobs tabprint
vif xcorrgm
By default, the genr command operates quietly in the context of a loop (without printing informa-
tion on the variable generated). To force the printing of feedback from genr you may specify the
--verbose option to loop. The --quiet option suppresses the usual printout of the number of
iterations performed, which may be desirable when loops are nested.
The --progressive option to loop modies the behavior of the commands print and store,
and certain estimation commands, in a manner that may be useful with Monte Carlo analyses (see
Section 9.3).
The following sections explain the various forms of the loop control expression and provide some
examples of use of loops.
If you are carrying out a substantial Monte Carlo analysis with many thousands of repetitions, memory
capacity and processing time may be an issue. To minimize the use of computer resources, run your script
using the command-line program, gretlcli, with output redirected to a le.
9.2 Loop control variants
Count loop
The simplest form of loop control is a direct specication of the number of times the loop should
be repeated. We refer to this as a count loop. The number of repetitions may be a numerical
constant, as in loop 1000, or may be read from a scalar variable, as in loop replics.
50
Chapter 9. Loop constructs 51
In the case where the loop count is given by a variable, say replics, in concept replics is an
integer; if the value is not integral, it is converted to an integer by truncation. Note that replics is
evaluated only once, when the loop is initially compiled.
While loop
A second sort of control expression takes the form of the keyword while followed by a boolean
expression. For example,
loop while essdiff > .00001
Execution of the commands within the loop will continue so long as (a) the specied condition
evaluates as true and (b) the number of iterations does not exceed the value of the internal vari-
able loop_maxiter. By default this equals 250, but you can specify a dierent value via the set
command (see the Gretl Command Reference).
Index loop
A third form of loop control uses an index variable, for example i.
1
In this case you specify starting
and ending values for the index, which is incremented by one each time round the loop. The syntax
looks like this: loop i=1..20.
The index variable may be a pre-existing scalar; if this is not the case, the variable is created
automatically and is destroyed on exit from the loop.
The index may be used within the loop body in either of two ways: you can access the integer value
of i (see Example 9.4) or you can use its string representation, $i (see Example 9.5).
The starting and ending values for the index can be given in numerical form, by reference to pre-
dened scalar variables, or as expressions that evaluate to scalars. In the latter two cases the
variables are evaluated once, at the start of the loop. In addition, with time series data you can give
the starting and ending values in the form of dates, as in loop i=1950:1..1999:4.
This form of loop control is intended to be quick and easy, and as such it is subject to certain
limitations. In particular, the index variable is always incremented by one at each iteration. If, for
example, you have
loop i=m..n
where m and n are scalar variables with values m > n at the time of execution, the index will not be
decremented; rather, the loop will simply be bypassed.
If you need more complex loop control, see the for form below.
The index loop is particularly useful in conjunction with the values() matrix function when some
operation must be carried out for each value of some discrete variable (see chapter 8). Consider
the following example:
open greene22_2
discrete Z8
v8 = values(Z8)
loop i=1..rows(v8)
scalar xi = v8[i]
smpl (Z8=xi) --restrict --replace
printf "mean(Y | Z8 = %g) = %8.5f, sd(Y | Z8 = %g) = %g\n", \
xi, mean(Y), xi, sd(Y)
endloop
1
It is common programming practice to use simple, one-character names for such variables. However, you may use any
name that is acceptable by gretl: up to 15 characters, starting with a letter, and containing nothing but letters, numerals
and the underscore character.
Chapter 9. Loop constructs 52
In this case, we evaluate the conditional mean and standard deviation of the variable Y for each
value of Z8.
Foreach loop
The fourth form of loop control also uses an index variable, in this case to index a specied list
of strings. The loop is executed once for each string in the list. This can be useful for performing
repetitive operations on a list of variables. Here is an example of the syntax:
loop foreach i peach pear plum
print "$i"
endloop
This loop will execute three times, printing out peach, pear and plum on the respective itera-
tions. The numerical value of the index starts at 1 and is incremented by 1 at each iteration.
If you wish to loop across a list of variables that are contiguous in the dataset, you can give the
names of the rst and last variables in the list, separated by .., rather than having to type all
the names. For example, say we have 50 variables AK, AL, . . . , WY, containing income levels for the
states of the US. To run a regression of income on time for each of the states we could do:
genr time
loop foreach i AL..WY
ols $i const time
endloop
This loop variant can also be used for looping across the elements in a named list (see chapter 12).
For example:
list ylist = y1 y2 y3
loop foreach i ylist
ols $i const x1 x2
endloop
Note that if you use this idiom inside a function (see chapter 10), looping across a list that has been
supplied to the function as an argument, it is necessary to use the syntax listname.$i to reference
the list-member variables. In the context of the example above, this would mean replacing the third
line with
ols ylist.$i const x1 x2
For loop
The nal form of loop control emulates the for statement in the C programming language. The
sytax is loop for, followed by three component expressions, separated by semicolons and sur-
rounded by parentheses. The three components are as follows:
1. Initialization: This is evaluated only once, at the start of the loop. Common example: setting
a scalar control variable to some starting value.
2. Continuation condition: this is evaluated at the top of each iteration (including the rst). If
the expression evaluates as true (non-zero), iteration continues, otherwise it stops. Common
example: an inequality expressing a bound on a control variable.
3. Modier: an expression which modies the value of some variable. This is evaluated prior
to checking the continuation condition, on each iteration after the rst. Common example: a
control variable is incremented or decremented.
Chapter 9. Loop constructs 53
Heres a simple example:
loop for (r=0.01; r<.991; r+=.01)
In this example the variable r will take on the values 0.01, 0.02, . . . , 0.99 across the 99 iterations.
Note that due to the nite precision of oating point arithmetic on computers it may be necessary
to use a continuation condition such as the above, r<.991, rather than the more natural r<=.99.
(Using double-precision numbers on an x86 processor, at the point where you would expect r to
equal 0.99 it may in fact have value 0.990000000000001.)
Any or all of the three expressions governing a for loop may be omitted the minimal form is
(;;). If the continuation test is omitted it is implicitly true, so you have an innite loop unless you
arrange for some other way out, such as a break statement.
If the initialization expression in a for loop takes the common form of setting a scalar variable to
a given value, the string representation of that scalars value is made available within the loop via
the accessor $varname.
9.3 Progressive mode
If the --progressive option is given for a command loop, special behavior is invoked for certain
commands, namely, print, store and simple estimation commands. By simple here we mean
commands which (a) estimate a single equation (as opposed to a system of equations) and (b) do
so by means of a single command statement (as opposed to a block of statements, as with nls and
mle). The paradigm is ols; other possibilities include tsls, wls, logit and so on.
The special behavior is as follows.
Estimators: The results from each individual iteration of the estimator are not printed. Instead,
after the loop is completed you get a printout of (a) the mean value of each estimated coecient
across all the repetitions, (b) the standard deviation of those coecient estimates, (c) the mean
value of the estimated standard error for each coecient, and (d) the standard deviation of the
estimated standard errors. This makes sense only if there is some random input at each step.
print: When this command is used to print the value of a variable, you do not get a print each time
round the loop. Instead, when the loop is terminated you get a printout of the mean and standard
deviation of the variable, across the repetitions of the loop. This mode is intended for use with
variables that have a scalar value at each iteration, for example the error sum of squares from a
regression. Data series cannot be printed in this way, and neither can matrices.
store: This command writes out the values of the specied scalars, from each time round the
loop, to a specied le. Thus it keeps a complete record of their values across the iterations. For
example, coecient estimates could be saved in this way so as to permit subsequent examination
of their frequency distribution. Only one such store can be used in a given loop.
9.4 Loop examples
Monte Carlo example
A simple example of a Monte Carlo loop in progressive mode is shown in Example 9.1.
This loop will print out summary statistics for the a and b estimates and R
2
across the 100 rep-
etitions. After running the loop, coeffs.gdt, which contains the individual coecient estimates
from all the runs, can be opened in gretl to examine the frequency distribution of the estimates in
detail.
The command nulldata is useful for Monte Carlo work. Instead of opening a real data set,
nulldata 50 (for instance) opens a dummy data set, containing just a constant and an index vari-
able, with a series length of 50. Constructed variables can then be added using the genr command.
See the set command for information on generating repeatable pseudo-random series.
Chapter 9. Loop constructs 54
Example 9.1: Simple Monte Carlo loop
nulldata 50
set seed 547
genr x = 100 * uniform()
# open a "progressive" loop, to be repeated 100 times
loop 100 --progressive
genr u = 10 * normal()
# construct the dependent variable
genr y = 10*x + u
# run OLS regression
ols y const x
# grab the coefficient estimates and R-squared
genr a = $coeff(const)
genr b = $coeff(x)
genr r2 = $rsq
# arrange for printing of stats on these
print a b r2
# and save the coefficients to file
store coeffs.gdt a b
endloop
Iterated least squares
Example 9.2 uses a while loop to replicate the estimation of a nonlinear consumption function of
the form
C = +Y
+
as presented in Greene (2000), Example 11.3. This script is included in the gretl distribution under
the name greene11_3.inp; you can nd it in gretl under the menu item File, Script les, Practice
le, Greene....
The option --print-final for the ols command arranges matters so that the regression results
will not be printed each time round the loop, but the results fromthe regression on the last iteration
will be printed when the loop terminates.
Example 9.3 shows how a loop can be used to estimate an ARMA model, exploiting the outer
product of the gradient (OPG) regression discussed by Davidson and MacKinnon in their Estimation
and Inference in Econometrics.
Indexed loop examples
Example 9.4 shows an indexed loop in which the smpl is keyed to the index variable i. Suppose we
have a panel dataset with observations on a number of hospitals for the years 1991 to 2000 (where
the year of the observation is indicated by a variable named year). We restrict the sample to each
of these years in turn and print cross-sectional summary statistics for variables 1 through 4.
Example 9.5 illustrates string substitution in an indexed loop.
The rst time round this loop the variable V will be set to equal COMP1987 and the dependent
variable for the ols will be PBT1987. The next time round V will be redened as equal to COMP1988
and the dependent variable in the regression will be PBT1988. And so on.
Chapter 9. Loop constructs 55
Example 9.2: Nonlinear consumption function
open greene11_3.gdt
# run initial OLS
ols C 0 Y
genr essbak = $ess
genr essdiff = 1
genr beta = $coeff(Y)
genr gamma = 1
# iterate OLS till the error sum of squares converges
loop while essdiff > .00001
# form the linearized variables
genr C0 = C + gamma * beta * Y^gamma * log(Y)
genr x1 = Y^gamma
genr x2 = beta * Y^gamma * log(Y)
# run OLS
ols C0 0 x1 x2 --print-final --no-df-corr --vcv
genr beta = $coeff(x1)
genr gamma = $coeff(x2)
genr ess = $ess
genr essdiff = abs(ess - essbak)/essbak
genr essbak = ess
endloop
# print parameter estimates using their "proper names"
set echo off
printf "alpha = %g\n", $coeff(0)
printf "beta = %g\n", beta
printf "gamma = %g\n", gamma
Chapter 9. Loop constructs 56
Example 9.3: ARMA 1, 1
open armaloop.gdt
genr c = 0
genr a = 0.1
genr m = 0.1
series e = 1.0
genr de_c = e
genr de_a = e
genr de_m = e
genr crit = 1
loop while crit > 1.0e-9
# one-step forecast errors
genr e = y - c - a*y(-1) - m*e(-1)
# log-likelihood
genr loglik = -0.5 * sum(e^2)
print loglik
# partials of forecast errors wrt c, a, and m
genr de_c = -1 - m * de_c(-1)
genr de_a = -y(-1) -m * de_a(-1)
genr de_m = -e(-1) -m * de_m(-1)
# partials of l wrt c, a and m
genr sc_c = -de_c * e
genr sc_a = -de_a * e
genr sc_m = -de_m * e
# OPG regression
ols const sc_c sc_a sc_m --print-final --no-df-corr --vcv
# Update the parameters
genr dc = $coeff(sc_c)
genr c = c + dc
genr da = $coeff(sc_a)
genr a = a + da
genr dm = $coeff(sc_m)
genr m = m + dm
printf " constant = %.8g (gradient = %#.6g)\n", c, dc
printf " ar1 coefficient = %.8g (gradient = %#.6g)\n", a, da
printf " ma1 coefficient = %.8g (gradient = %#.6g)\n", m, dm
genr crit = $T - $ess
print crit
endloop
genr se_c = $stderr(sc_c)
genr se_a = $stderr(sc_a)
genr se_m = $stderr(sc_m)
set echo off
printf "\n"
printf "constant = %.8g (se = %#.6g, t = %.4f)\n", c, se_c, c/se_c
printf "ar1 term = %.8g (se = %#.6g, t = %.4f)\n", a, se_a, a/se_a
printf "ma1 term = %.8g (se = %#.6g, t = %.4f)\n", m, se_m, m/se_m
Chapter 9. Loop constructs 57
Example 9.4: Panel statistics
open hospitals.gdt
loop i=1991..2000
smpl (year=i) --restrict --replace
summary 1 2 3 4
endloop
Example 9.5: String substitution
open bea.dat
loop i=1987..2001
genr V = COMP$i
genr TC = GOC$i - PBT$i
genr C = TC - V
ols PBT$i const TC V
endloop
Chapter 10
User-dened functions
10.1 Dening a function
Gretl oers a mechanism for dening functions, which may be called via the command line, in
the context of a script, or (if packaged appropriately, see section 10.5) via the programs graphical
interface.
The syntax for dening a function looks like this:
1
function return-type function-name (parameters)
function body
end function
The opening line of a function denition contains these elements, in strict order:
1. The keyword function.
2. return-type, which states the type of value returned by the function, if any. This must be one
of void (if the function does not return anything), scalar, series, matrix, list or string.
3. function-name, the unique identier for the function. Names must start with a letter. They
have a maximum length of 31 characters; if you type a longer name it will be truncated.
Function names cannot contain spaces. You will get an error if you try to dene a function
having the same name as an existing gretl command.
4. The functionss parameters, in the form of a comma-separated list enclosed in parentheses.
This may be run into the function name, or separated by white space as shown.
Function parameters can be of any of the types shown below.
2
Type Description
bool scalar variable acting as a Boolean switch
int scalar variable acting as an integer
scalar scalar variable
series data series
list named list of series
matrix matrix or vector
string string variable or string literal
bundle all-purpose container (see section 11.7)
Each element in the listing of parameters must include two terms: a type specier, and the name
by which the parameter shall be known within the function. An example follows:
1
The syntax given here diers from the standard prior to gretl version 1.8.4. For reasons of backward compatibility
the old syntax is still supported; see section 10.6 for details.
2
An additional parameter type is available for GUI use, namely obs; this is equivalent to int except for the way it is
represented in the graphical interface for calling a function.
58
Chapter 10. User-dened functions 59
function scalar myfunc (series y, list xvars, bool verbose)
Each of the type-speciers, with the exception of list and string, may be modied by prepending
an asterisk to the associated parameter name, as in
function scalar myfunc (series *y, scalar *b)
The meaning of this modication is explained below (see section 10.4); it is related to the use of
pointer arguments in the C programming language.
Function parameters: optional renements
Besides the required elements mentioned above, the specication of a function parameter may
include some additional elds, as follows:
The const modier.
For scalar or int parameters: minimum, maximum and default values; or for bool parame-
ters, just a default value.
For optional pointer and list arguments (see section 10.4), the special default value null.
For all parameters, a descriptive string.
For int parameters with minimum and maximum values specied, a set of strings to associate
with the allowed numerical values (value labels).
The rst two of these options may be useful in many contexts; the last two may be helpful if a
function is to be packaged for use in the gretl GUI (but probably not otherwise). We now expand on
each of the options.
The const modier: must be given as a prex to the basic parameter specication, as in
const matrix M
This constitutes a promise that the corresponding argument will not be modied within the
function; gretl will ag an error if the function attempts to modify the argument.
Minimum, maximum and default values for scalar or int types: These values should di-
rectly follow the name of the parameter, enclosed in square brackets and with the individual
elements separated by colons. For example, suppose we have an integer parameter order for
which we wish to specify a minimum of 1, a maximum of 12, and a default of 4. We can write
int order[1:12:4]
If you wish to omit any of the three speciers, leave the corresponding eld empty. For
example [1::4] would specify a minimum of 1 and a default of 4 while leaving the maximum
unlimited.
For a parameter of type bool (whose values are just zero or non-zero), you can specify a
default of 1 (true) or 0 (false), as in
bool verbose[0]
Descriptive string: This will show up as an aid to the user if the function is packaged (see
section 10.5 below) and called via gretls graphical interface. The string should be enclosed
in double quotes and separated from the preceding elements of the parameter specication
with a space, as in
series y "dependent variable"
Chapter 10. User-dened functions 60
Value labels: These may be used only with int parameters for which minimum and maximum
values have been specied, so there is a xed number of admissible values, and the number
of labels must match the number of values. They will show up in the graphical interface
in the form of a drop-down list, making the function writers intent clearer when an integer
argument represents a categorical selection. A set of value labels must be enclosed in braces,
and the individual labels must be enclosed in double quotes and separated by commas or
spaces. For example:
int case[1:3:1] {"Fixed effects", "Between model", "Random effects"}
If two or more of the trailing optional elds are given in a parameter specication, they must be
given in the order shown above: minmaxdefault, description, value labels. Note that there is
no facility for escaping characters within descriptive strings or value labels; these may contain
spaces but they cannot contain the double-quote character.
Here is an example of a well-formed function specication using all the elements mentioned above:
function matrix myfunc (series y "dependent variable",
list X "regressors",
int p[0::1] "lag order",
int c[1:2:1] "criterion" {"AIC", "BIC"},
bool quiet[0])
One advantage of specifying default values for parameters, where applicable, is that in script or
command-line mode users may omit trailing arguments that have defaults. For example, myfunc
above could be invoked with just two arguments, corresponding to y and X; implicitly p = 1, c = 1
and quiet is false.
Functions taking no parameters
You may dene a function that has no parameters (these are called routines in some programming
languages). In this case, use the keyword void in place of the listing of parameters:
function matrix myfunc2 (void)
The function body
The function body is composed of gretl commands, or calls to user-dened functions (that is,
function calls may be nested). A function may call itself (that is, functions may be recursive). While
the function body may contain function calls, it may not contain function denitions. That is, you
cannot dene a function inside another function. For further details, see section 10.4.
10.2 Calling a function
A user function is called by typing its name followed by zero or more arguments enclosed in
parentheses. If there are two or more arguments these should be separated by commas.
There are automatic checks in place to ensure that the number of arguments given in a function
call matches the number of parameters, and that the types of the given arguments match the types
specied in the denition of the function. An error is agged if either of these conditions is violated.
One qualication: allowance is made for omitting arguments at the end of the list, provided that
default values are specied in the function denition. To be precise, the check is that the number
of arguments is at least equal to the number of required parameters, and is no greater than the
total number of parameters.
A scalar, series or matrix argument to a function may be given either as the name of a pre-existing
variable or as an expression which evaluates to a variable of the appropriate type. Scalar arguments
may also be given as numerical values. List arguments must be specied by name.
Chapter 10. User-dened functions 61
The following trivial example illustrates a function call that correctly matches the function deni-
tion.
# function definition
function scalar ols_ess(series y, list xvars)
ols y 0 xvars --quiet
scalar myess = $ess
printf "ESS = %g\n", myess
return myess
end function
# main script
open data4-1
list xlist = 2 3 4
# function call (the return value is ignored here)
ols_ess(price, xlist)
The function call gives two arguments: the rst is a data series specied by name and the second
is a named list of regressors. Note that while the function oers the variable myess as a return
value, it is ignored by the caller in this instance. (As a side note here, if you want a function to
calculate some value having to do with a regression, but are not interested in the full results of the
regression, you may wish to use the --quiet ag with the estimation command as shown above.)
A second example shows how to write a function call that assigns a return value to a variable in the
caller:
# function definition
function series get_uhat(series y, list xvars)
ols y 0 xvars --quiet
series uh = $uhat
return uh
end function
# main script
open data4-1
list xlist = 2 3 4
# function call
series resid = get_uhat(price, xlist)
10.3 Deleting a function
If you have dened a function and subsequently wish to clear it out of memory, you can do so using
the keywords delete or clear, as in
function myfunc delete
function get_uhat clear
Note, however, that if myfunc is already a dened function, providing a newdenition automatically
overwrites the previous one, so it should rarely be necessary to delete functions explicitly.
10.4 Function programming details
Variables versus pointers
Series, scalar, and matrix arguments to functions can be passed in two ways: as they are, or as
pointers. For example, consider the following:
function series triple1(series x)
return 3*x
Chapter 10. User-dened functions 62
end function
function series triple2(series *x)
return 3*x
end function
These two functions are nearly identical (and yield the same result); the only dierence is that you
need to feed a series into triple1, as in triple1(myseries), while triple2 must be supplied a
pointer to a series, as in triple2(&myseries).
Why make the distinction? There are two main reasons for doing so: modularity and performance.
By modularity we mean the insulation of a function from the rest of the script which calls it. One of
the many benets of this approach is that your functions are easily reusable in other contexts. To
achieve modularity, variables created within a function are local to that function, and are destroyed
when the function exits, unless they are made available as return values and these values are picked
up or assigned by the caller.
In addition, functions do not have access to variables in outer scope (that is, variables that exist
in the script from which the function is called) except insofar as these are explicitly passed to the
function as arguments.
By default, when a variable is passed to a function as an argument, what the function actually gets
is a copy of the outer variable, which means that the value of the outer variable is not modied by
anything that goes on inside the function. But the use of pointers allows a function and its caller
to cooperate such that an outer variable can be modied by the function. In eect, this allows a
function to return more than one value (although only one variable can be returned directly
see below). The parameter in question is marked with a prex of * in the function denition, and
the corresponding argument is marked with the complementary prex & in the caller. For example,
function series get_uhat_and_ess(series y, list xvars, scalar *ess)
ols y 0 xvars --quiet
ess = $ess
series uh = $uhat
return uh
end function
# main script
open data4-1
list xlist = 2 3 4
# function call
scalar SSR
series resid = get_uhat_and_ess(price, xlist, &SSR)
In the above, we may say that the function is given the address of the scalar variable SSR, and it
assigns a value to that variable (under the local name ess). (For anyone used to programming in C:
note that it is not necessary, or even possible, to dereference the variable in question within the
function using the * operator. Unadorned use of the name of the variable is sucient to access the
variable in outer scope.)
An address parameter of this sort can be used as a means of oering optional information to the
caller. (That is, the corresponding argument is not strictly needed, but will be used if present). In
that case the parameter should be given a default value of null and the the function should test to
see if the caller supplied a corresponding argument or not, using the built-in function isnull().
For example, here is the simple function shown above, modied to make the lling out of the ess
value optional.
function series get_uhat_and_ess(series y, list xvars, scalar *ess[null])
ols y 0 xvars --quiet
if !isnull(ess)
ess = $ess
Chapter 10. User-dened functions 63
endif
return $uhat
end function
If the caller does not care to get the ess value, it can use null in place of a real argument:
series resid = get_uhat_and_ess(price, xlist, null)
Alternatively, trailing function arguments that have default values may be omitted, so the following
would also be a valid call:
series resid = get_uhat_and_ess(price, xlist)
Pointer arguments may also be useful for optimizing performance: even if a variable is not modied
inside the function, it may be a good idea to pass it as a pointer if it occupies a lot of memory.
Otherwise, the time gretl spends transcribing the value of the variable to the local copy may be
non-negligible, compared to the time the function spends doing the job it was written for.
Example 10.1 takes this to the extreme. We dene two functions which return the number of rows
of a matrix (a pretty fast operation). One function gets a matrix as argument, the other one a pointer
to a matrix. The two functions are evaluated on a matrix with 2000 rows and 2000 columns; on a
typical system, oating-point numbers take 8 bytes of memory, so the space occupied by the matrix
is roughly 32 megabytes.
Running the code in example 10.1 will produce output similar to the following (the actual numbers
depend on the machine youre running the example on):
Elapsed time:
without pointers (copy) = 3.66 seconds,
with pointers (no copy) = 0.01 seconds.
If a pointer argument is used for this sort of purpose and the object to which the pointer points
is not modied by the function it is a good idea to signal this to the user by adding the const
qualier, as shown for function b in Example 10.1. When a pointer argument is qualied in this
way, any attempt to modify the object within the function will generate an error.
One limitation on the use of pointer-type arguments should be noted: you cannot supply a given
variable as a pointer argument more than once in any given function call. For example, suppose we
have a function that takes two matrix-pointer arguments,
function scalar pointfunc (matrix *a, matrix *b)
And suppose we have two matrices, x and y, at the caller level. The call
pointfunc(&x, &y)
is OK, but the call
pointfunc(&x, &x) # will not work
will generate an error.
List arguments
The use of a named list as an argument to a function gives a means of supplying a function with
a set of variables whose number is unknown when the function is written for example, sets of
regressors or instruments. Within the function, the list can be passed on to commands such as
ols.
Chapter 10. User-dened functions 64
Example 10.1: Performance comparison: values versus pointer
function scalar a(matrix X)
return rows(X)
end function
function scalar b(const matrix *X)
return rows(X)
end function
nulldata 10
set echo off
set messages off
X = zeros(2000,2000)
r = 0
set stopwatch
loop 100
r = a(X)
endloop
fa = $stopwatch
set stopwatch
loop 100
r = b(&X)
endloop
fb = $stopwatch
printf "Elapsed time:\n\
\twithout pointers (copy) = %g seconds,\n\
\twith pointers (no copy) = %g seconds.\n", fa, fb
Chapter 10. User-dened functions 65
A list argument can also be unpacked using a foreach loop construct, but this requires some
care. For example, suppose you have a list X and want to calculate the standard deviation of each
variable in the list. You can do:
loop foreach i X
scalar sd_$i = sd(X.$i)
endloop
Please note: a special piece of syntax is needed in this context. If we wanted to perform the above
task on a list in a regular script (not inside a function), we could do
loop foreach i X
scalar sd_$i = sd($i)
endloop
where $i gets the name of the variable at position i in the list, and sd($i) gets its standard
deviation. But inside a function, working on a list supplied as an argument, if we want to reference
an individual variable in the list we must use the syntax listname.varname. Hence in the example
above we write sd(X.$i).
This is necessary to avoid possible collisions between the name-space of the function and the name-
space of the caller script. For example, suppose we have a function that takes a list argument, and
that denes a local variable called y. Now suppose that this function is passed a list containing
a variable named y. If the two name-spaces were not separated either wed get an error, or the
external variable y would be silently over-written by the local one. It is important, therefore, that
list-argument variables should not be visible by name within functions. To get hold of such
variables you need to use the form of identication just mentioned: the name of the list, followed
by a dot, followed by the name of the variable.
Constancy of list arguments When a named list of variables is passed to a function, the function
is actually provided with a copy of the list. The function may modify this copy (for instance, adding
or removing members), but the original list at the level of the caller is not modied.
Optional list arguments If a list argument to a function is optional, this should be indicated by
appending a default value of null, as in
function scalar myfunc (scalar y, list X[null])
In that case, if the caller gives null as the list argument (or simply omits the last argument) the
named list X inside the function will be empty. This possibility can be detected using the nelem()
function, which returns 0 for an empty list.
String arguments
String arguments can be used, for example, to provide exibility in the naming of variables created
within a function. In the following example the function mavg returns a list containing two moving
averages constructed from an input series, with the names of the newly created variables governed
by the string argument.
function list mavg (series y, string vname)
series @vname_2 = (y+y(-1)) / 2
series @vname_4 = (y+y(-1)+y(-2)+y(-3)) / 4
list retlist = @vname_2 @vname_4
return retlist
end function
Chapter 10. User-dened functions 66
open data9-9
list malist = mavg(nocars, "nocars")
print malist --byobs
The last line of the script will print two variables named nocars_2 and nocars_4. For details on
the handling of named strings, see chapter 12.
If a string argument is considered optional, it may be given a null default value, as in
function scalar foo (series y, string vname[null])
Retrieving the names of arguments
The variables given as arguments to a function are known inside the function by the names of the
corresponding parameters. For example, within the function whose signature is
function void somefun (series y)
we have the series known as y. It may be useful, however, to be able to determine the names of
the variables provided as arguments. This can be done using the function argname, which takes
the name of a function parameter as its single argument and returns a string. Here is a simple
illustration:
function void namefun (series y)
printf "the series given as y was named %s\n", argname(y)
end function
open data9-7
namefun(QNC)
This produces the output
the series given as y was named QNC
Please note that this will not always work: the arguments given to functions may be anonymous
variables, created on the y, as in somefun(log(QNC)) or somefun(CPI/100). In that case the
argname function fails to return a string. Function writers who wish to make use of this facility
should check the return from argname using the isstring() function, which returns 1 when given
the name of a string variable, 0 otherwise.
Return values
Functions can return nothing (just printing a result, perhaps), or they can return a single variable
a scalar, series, list, matrix, string, or bundle (see section 11.7). The return value, if any, is
specied via a statement within the function body beginning with the keyword return, followed by
either the name of a variable (which must be of the type announced on the rst line of the function
denition) or an expression which produces a value of the correct type.
Having a function return a list or bundle is a way of permitting the return of more than one
variable. For example, you can dene several series inside a function and package them as a list;
in this case they are not destroyed when the function exits. Here is a simple example, which also
illustrates the possibility of setting the descriptive labels for variables generated in a function.
function list make_cubes (list xlist)
list cubes = null
loop foreach i xlist --quiet
Chapter 10. User-dened functions 67
series $i3 = (xlist.$i)^3
setinfo $i3 -d "cube of $i"
list cubes += $i3
endloop
return cubes
end function
open data4-1
list xlist = price sqft
list cubelist = make_cubes(xlist)
print xlist cubelist --byobs
labels
A return statement causes the function to return (exit) at the point where it appears within the
body of the function. A function may also exit when (a) the end of the function code is reached (in
the case of a function with no return value), (b) a gretl error occurs, or (c) a funcerr statement is
reached.
The funcerr keyword, which may be followed by a string enclosed in double quotes, causes a
function to exit with an error agged. If a string is provided, this is printed on exit, otherwise a
generic error message is printed. This mechanism enables the author of a function to pre-empt an
ordinary execution error and/or oer a more specic and helpful error message. For example,
if nelem(xlist) = 0
funcerr "xlist must not be empty"
endif
A function may contain more than one return statement, as in
function scalar multi (bool s)
if s
return 1000
else
return 10
endif
end function
However, it is recommended programming practice to have a single return point from a function
unless this is very inconvenient. The simple example above would be better written as
function scalar multi (bool s)
return s ? 1000 : 10
end function
Error checking
When gretl rst reads and compiles a function denition there is minimal error-checking: the
only checks are that the function name is acceptable, and, so far as the body is concerned, that you
are not trying to dene a function inside a function (see Section 10.1). Otherwise, if the function
body contains invalid commands this will become apparent only when the function is called and
its commands are executed.
Debugging
The usual mechanism whereby gretl echoes commands and reports on the creation of new variables
is by default suppressed when a function is being executed. If you want more verbose output from
a particular function you can use either or both of the following commands within the function:
Chapter 10. User-dened functions 68
set echo on
set messages on
Alternatively, you can achieve this eect for all functions via the command set debug 1. Usually
when you set the value of a state variable using the set command, the eect applies only to the
current level of function execution. For instance, if you do set messages on within function f1,
which in turn calls function f2, then messages will be printed for f1 but not f2. The debug variable,
however, acts globally; all functions become verbose regardless of their level.
Further, you can do set debug 2: in addition to command echo and the printing of messages, this
is equivalent to setting max_verbose (which produces verbose output from the BFGS maximizer) at
all levels of function execution.
10.5 Function packages
Since gretl 1.6.0 there has been a mechanism to package functions and make them available to
other users of gretl. Here is a walk-through of the process.
Load a function in memory
There are several ways to load a function:
If you have a script le containing function denitions, open that le and run it.
Create a script le from scratch. Include at least one function denition, and run the script.
Open the GUI console and type a function denition interactively. This method is not partic-
ularly recommended; you are probably better composing a function non-interactively.
For example, suppose you decide to package a function that returns the percentage change of a
time series. Open a script le and type
function series pc(series y "Series to process")
return 100 * diff(y)/y(-1)
end function
In this case, we have appended a string to the function argument, as explained in section 10.1, so
as to make our interface more informative. This is not obligatory: if you omit the descriptive string,
gretl will supply a predened one.
Now run your function. You may want to make sure it works properly by running a few tests. For
example, you may open the console and type
genr x = uniform()
genr dpcx = pc(x)
print x dpcx --byobs
You should see something similar to gure 10.1. The function seems to work ok. Once your
function is debugged, you may proceed to the next stage.
Create a package
We rst present the mechanism for creating a function package via gretls graphical interface. This
can also be done via the command line, which oers some additional functionality for package
authors; an explanation is given later in this section.
Chapter 10. User-dened functions 69
Figure 10.1: Output of function check
Start the GUI program and take a look at the File, Function les menu. This menu contains four
items: On local machine, On server, Edit package, New package.
Select New package. (This will produce an error message unless at least one user-dened function
is currently loaded in memory see the previous point.) In the rst dialog you get to select:
A public function to package.
Zero or more private helper functions.
Public functions are directly available to users; private functions are part of the behind the scenes
mechanism in a function package.
On clicking OK a second dialog should appear (see Figure 10.2), where you get to enter the package
information (author, version, date, and a short description). You can also enter help text for the
public interface. You have a further chance to edit the code of the function(s) to be packaged, by
clicking on Edit function code. (If the package contains more than one function, a drop-down
selector will be shown.) And you get to add a sample script that exercises your package. This
will be helpful for potential users, and also for testing. A sample script is required if you want to
upload the package to the gretl server (for which a check-box is supplied).
You wont need it right now, but the button labeled Save as script allows you to reverse engineer
a function package, writing out a script that contains all the relevant function denitions.
Clicking Save in this dialog leads you to a File Save dialog. All being well, this should be pointing
towards a directory named functions, either under the gretl system directory (if you have write
permission on that) or the gretl user directory. This is the recommended place to save function
package les, since that is where the program will look in the special routine for opening such les
(see below).
Needless to say, the menu command File, Function les, Edit package allows you to make changes
to a local function package.
A word on the le you just saved. By default, it will have a .gfn extension. This is a function
package le: unlike an ordinary gretl script le, it is an XML le containing both the function code
and the extra information entered in the packager. Hackers might wish to write such a le from
scratch rather than using the GUI packager, but most people are likely to nd it awkward. Note
Chapter 10. User-dened functions 70
Figure 10.2: The package editor window
that XML-special characters in the function code have to be escaped, e.g. & must be represented as
&. Also, some elements of the function syntax dier from the standard script representation:
the parameters and return values (if any) are represented in XML. Basically, the function is pre-
parsed, and ready for fast loading using libxml.
Load a package
Why package functions in this way? To see whats on oer so far, try the next phase of the walk-
through.
Close gretl, then re-open it. Now go to File, Function les, On local machine. If the previous stage
above has gone OK, you should see the le you packaged and saved, with its short description. If
you click on Info you get a window with all the information gretl has gleaned from the function
package. If you click on the View code icon in the toolbar of this new window, you get a script
view window showing the actual function code. Now, back to the Function packages window, if
you click on the packages name, the relevant functions are loaded into gretls workspace, ready to
be called by clicking on the Call button.
After loading the function(s) from the package, open the GUI console. Try typing help foo, replac-
ing foo with the name of the public interface from the loaded function package: if any help text
was provided for the function, it should be presented.
In a similar way, you can browse and load the function packages available on the gretl server, by
selecting File, Function les, On server.
Once your package is installed on your local machine, you can use the function it contains via
the graphical interface as described above, or by using the CLI, namely in a script or through the
console. In the latter case, you load the function via the include command, specifying the package
le as the argument, complete with the .gfn extension.
To continue with our example, load the le np.gdt (supplied with gretl among the sample datasets).
Suppose you want to compute the rate of change for the variable iprod via your new function and
Chapter 10. User-dened functions 71
Figure 10.3: Using your package
store the result in a series named foo.
Go to File, Function les, On local machine. You will be shown a list of the installed packages,
including the one you have just created. If you select it and click on Execute (or double-click on
the name of the function package), a window similar to the one shown in gure 10.3 will appear.
Notice that the description string Series to process, supplied with the function denition, appears
to the left of the top series chooser.
Click Ok and the series foo will be generated (see gure 10.4). You may have to go to Data,
Refresh data in order to have your new variable show up in the main window variable list (or just
press the r key).
Figure 10.4: Percent change in industrial production
Alternatively, the same could have been accomplished by the script
Chapter 10. User-dened functions 72
include pc.gfn
open np
foo = pc(iprod)
Creating a package via the command line
The mechanism described above, for creating function packages using the GUI, is likely to be con-
venient for small to medium-sized packages but may be too cumbersome for ambitious packages
that include a large hierarchy of private functions. To facilitate the building of such packages gretl
oers the makepkg command.
To use makepkg you create three les: a driver script that loads all the functions you want to pack-
age and invokes makepkg; a small, plain-text specication le that contains the required package
details (author, version, etc.); and (in the simplest case) a plain text help le. You run the driver
script and gretl writes the package (.gfn) le.
We rst illustrate with a simple notional package. We have a gretl script le named foo.inp that
contains a function, foo, that we want to package. Our driver script would then look like this
include foo.inp
makepkg foo.gfn
Note that the makepkg command takes one argument, the name of the package le to be created.
The package specication le should have the same basename but the extension .spec. In this case
gretl will therefore look for foo.spec. It should look something like this:
# foo.spec
author = A. U. Thor
version = 1.0
date = 2011-02-01
description = Does something with time series
public = foo
help = foohelp.txt
sample-script = example.inp
min-version = 1.9.3
data-requirement = needs-time-series-data
As you can see, the format of each line in this le is key = value, with two qualications: blank
lines are permitted (and ignored, as are comment lines that start with #).
All the elds included in the above example are required, with the exception of data-requirement,
though the order in which they appear is immaterial. Heres a run-down of the basic elds:
author: the name(s) of the author(s). Accented or other non-ASCII characters should be given
as UTF-8.
version: the version number of the package, which should be limited to two integers sepa-
rated by a period.
date: the release date of the current verson of the package, in ISO 8601 format: YYYY-MM-DD.
description: a brief description of the functionality oered by the package. This will be
displayed in the GUI function packages window so it should be just one short line.
public: the listing of public functions.
help: the name of a plain text (UTF-8) le containing help; all packages must provide help.
sample-script: the name of a sample script that illustrates use of the package; all packages
must supply a sample script.
Chapter 10. User-dened functions 73
min-version: the minimum version of gretl required for the package to work correctly. If
youre unsure about this, the conservative thing is to give the current gretl version.
The public eld indicates which function or functions are to be made directly available to users (as
opposed to private helper functions). In the example above there is just one public function. Note
that any functions in memory when makepkg is invoked, other than those designated as public, are
assumed to be private functions that should also be included in the package. That is, the list of
private functions (if any) is implicit.
The data-requirement eld should be specied if the package requires time-series or panel data,
or alternatively if no dataset is required. If the data-requirement eld is omitted, the assumption
is that the package needs a dataset in place, but it doesnt matter what kind; if the packaged
functions do not use any series or lists this requirement can be explicitly relaxed. Valid values for
this eld are:
needs-time-series-data (any time-series data OK)
needs-qm-data (must be quarterly or monthly)
needs-panel-data (must be a panel)
no-data-ok (no dataset is needed)
For a more complex example, lets look at the gig (GARCH-in-gretl) package. The driver script for
building gig looks something like this:
set echo off
set messages off
include gig_mle.inp
include gig_setup.inp
include gig_estimate.inp
include gig_printout.inp
include gig_plot.inp
makepkg gig.gfn
In this case the functions to be packaged (of which there are many) are distributed across several
script les, each of which is the target of an include command. The set commands at the top are
included to cut down on the verbosity of the output.
The content of gig.spec is as follows:
author = Riccardo "Jack" Lucchetti and Stefano Balietti
version = 2.0
date = 2010-12-21
description = An assortment of univariate GARCH models
public = GUI_gig \
gig_setup gig_set_dist gig_set_pq gig_set_vQR \
gig_print gig_estimate \
gig_plot gig_dplot \
gig_bundle_print GUI_gig_plot
gui-main = GUI_gig
bundle-print = gig_bundle_print
bundle-plot = GUI_gig_plot
help = gig.pdf
sample-script = examples/example1.inp
min-version = 1.9.3
data-requirement = needs-time-series-data
Note that backslash continuation can be used for the elements of the public function listing.
Chapter 10. User-dened functions 74
In addition to the elds shown in the simple example above, gig.spec includes three optional
elds: gui-main, bundle-print and bundle-plot. These keywords are used to designate certain
functions as playing a special role in the gretl graphical interface. A function picked out in this way
must be in the public list and must satisfy certain further requirements.
gui-main: this species a function as the one which will be presented automatically to GUI
users (instead of users being faced with a choice of interfaces). This makes sense only for
packages that have multiple public functions. In addition, the gui-main function must return
a bundle (see section 11.7).
bundle-print: this picks out a function that should be used to print the contents of a bundle
returned by the gui-main function. It must take a pointer-to-bundle as its rst argument.
The second argument, if present, should be an int switch, with two or more valid values, that
controls the printing in some way. Any further arguments must have default values specied
so that they can be omitted.
bundle-plot: selects a function for the role of producing a plot or graph based on the con-
tents of a returned bundle. The requirements on this function are as for bundle-print.
The GUI special tags support a user-friendly mode of operation. On a successful call to gui-main,
gretl opens a window displaying the contents of the returned bundle (formatted via bundle-print).
Menus in this window give the user the option of saving the entire bundle (in which case its rep-
resented as an icon in the icon view window) or of extracting specic elements from the bundle
(series or matrices, for example).
If the package has a bundle-plot function, the bundle window also has a Graph menu. In gig, for
example, the bundle-plot function has this signature:
function void GUI_gig_plot(bundle *model, int ptype[0:1:0] \
"Plot type" {"Time series", "Density"})
The ptype switch is used to choose between a time-series plot of the residual and its conditional
variance, and a kernel density plot of the innovation against the theoretical distribution it is sup-
posed to follow. The use of the value-labels Time series and Density means that the Graph menu
will display these two choices.
One other feature of the gig spec le is noteworthy: the help eld species gig.pdf, documenta-
tion in PDF format. Unlike plain-text help, this cannot be rolled into the gfn (XML) le produced
by the makepkg command; rather, both gig.gfn and gig.pdf are packaged into a zip archive for
distribution. This represents a form of package which is new in gretl 1.9.4. More details will be
made available before long.
10.6 Memo: updating old-style functions
As mentioned at the start of this chapter, dierent rules were in force for dening functions prior
to gretl 1.8.4. While the old syntax is still supported to date, this may not always be the case. But
it is straightforward to convert a function to the new style. The only thing that must be changed
for compatibility with the new syntax is the declaration of the functions return type. Previously
this was placed inline in the return statement, whereas now it is placed right after the function
keyword. For example:
# old style
function triple (series x)
y = 3*x
return series y # note the "series" here
end function
Chapter 10. User-dened functions 75
# new style
function series triple (series x)
y = 3*x
return y
end function
Note also that the role of the return statement has changed (and its use has become more exible):
The return statement now causes the function to return directly, and you can have more
than one such statement, wrapped in conditionals. Before there could only be one return
statement, and its role was just to specify the type available for assignment by the caller.
The nal element in the return statement can now be an expression that evaluates to a value
of the advertised return type; before, it had to be the name of a pre-dened variable.
Chapter 11
Gretl data types
11.1 Introduction
Gretl oers the following data types:
scalar holds a single numerical value
series holds n numerical values, where n is the number of observations in the current
dataset
matrix holds a rectangular array of numerical values, of any dimensions
list holds the ID numbers of a set of series
string holds an array of characters
bundle holds a variable number of objects of various types
The numerical values mentioned above are all double-precision oating point numbers.
In this chapter we give a run-down of the basic characteristics of each of these types and also
explain their life cycle (creation, modication and destruction). The list and matrix types, whose
uses are relatively complex, are discussed at greater length in the following two chapters.
11.2 Series
We begin with the series type, which is the oldest and in a sense the most basic type in gretl. When
you open a data le in the gretl GUI, what you see in the main window are the ID numbers, names
(and descriptions, if available) of the series read from the le. All the series existing at any point in
a gretl session are of the same length, although some may have missing values. The variables that
can be added via the items under the Add menu in the main window (logs, squares and so on) are
also series.
For a gretl session to contain any series, a common series length must be established. This is
usually achieved by opening a data le, or importing a series from a database, in which case the
length is set by the rst import. But one can also use the nulldata command, which takes as it
single argument the desired length, a positive integer.
Each series has these basic attributes: an ID number, a name, and of course n numerical values. In
addition a series may have a description (which is shown in the main window and is also accessible
via the labels command), a display name for use in graphs, a record of the compaction method
used in reducing the variables frequency (for time-series data only) and a ag marking the variable
as discrete. These attributes can be edited in the GUI by choosing Edit Attributes (either under the
Variable menu or via right-click), or by means of the setinfo command.
In the context of most commands you are able to reference series by name or by ID number as you
wish. The main exception is the denition or modication of variables via a formula; here you must
use names since ID numbers would get confused with numerical constants.
Note that series ID numbers are always consecutive, and the ID number for a given series will change
if you delete a lower-numbered series. In some contexts, where gretl is liable to get confused by
such changes, deletion of low-numbered series is disallowed.
76
Chapter 11. Gretl data types 77
11.3 Scalars
The scalar type is relatively simple: just a convenient named holder for a single numerical value.
Scalars have none of the additional attributes pertaining to series, do not have public ID numbers,
and must be referenced by name. A common use of scalar variables is to record information made
available by gretl commands for further processing, as in scalar s2 = $sigma2 to record the
square of the standard error of the regression following an estimation command such as ols.
You can dene and work with scalars in gretl without having any dataset in place.
In the gretl GUI, scalar variables can be inspected and their values edited via the Scalars item
under the View menu in the main window.
11.4 Matrices
Matrices in gretl work much as in other mathematical software (e.g. MATLAB, Octave). Like scalars
they have no public ID numbers and must be referenced by name, and they can be used without any
dataset in place. Matrix indexing is 1-based: the top-left element of matrix A is A[1,1]. Matrices
are discussed at length in chapter 13; advanced users of gretl will want to study this chapter in
detail.
Matrices have one optional attribute beyond their numerical content: they may have column names
attached, which are displayed when the matrix is printed. See the colnames function for details.
In the gretl GUI, matrices can be inspected, analysed and edited via the Icon view item under the
View menu in the main window: each currently dened matrix is represented by an icon.
11.5 Lists
As with matrices, lists merit an explication of their own (see chapter 12). Briey, named lists can
(and should!) be used to make commands scripts less verbose and repetitious, and more easily
modiable. Since lists are in fact lists of series ID numbers they can be used only when a dataset is
in place.
In the gretl GUI, named lists can be inspected and edited under the Data menu in the main window,
via the item Dene or edit list.
11.6 Strings
String variables may be used for labeling, or for constructing commands. They are discussed in
chapter 12. They must be referenced by name; they can be dened in the absence of a dataset.
Such variables can be created and modied via the command-line in the gretl console or via script;
there is no means of editing them via the gretl GUI.
11.7 Bundles
A bundle is a container or wrapper for various sorts of objects specically, scalars, series,
matrices, strings and bundles. (Yes, a bundle can contain other bundles). A bundle takes the form
of a hash table or associative array: each item placed in the bundle is associated with a key string
which can used to retrieve it subsequently. We begin by explaining the mechanics of bundles then
oer some thoughts on what they are good for.
To use a bundle you must rst declare it, as in
bundle foo
Chapter 11. Gretl data types 78
To add an object to a bundle you assign to a compound left-hand value: the name of the bundle
followed by the key string in square brackets. For example, the statement
foo["matrix1"] = m
adds an object called m (presumably a matrix) to bundle foo under the key matrix1. To get an item
out of a bundle, again use the name of the bundle followed by the bracketed key, as in
matrix bm = foo["matrix1"]
A bundle key may be given as a double-quoted string literal, as shown above, or as the name of
a pre-dened string variable. Key strings have a maximum length of 15 characters and cannot
contain spaces.
Note that the key identifying an object within a given bundle is necessarily unique. If you reuse an
existing key in a new assignment, the eect is to replace the object which was previously stored
under the given key. It is not required that the type of the replacement object is the same as that
of the original.
Also note that when you add an object to a bundle, what in fact happens is that the bundle acquires
a copy of the object. The external object retains its own identity and is unaected if the bundled
object is replaced by another. Consider the following script fragment:
bundle foo
matrix m = I(3)
foo["mykey"] = m
scalar x = 20
foo["mykey"] = x
After the above commands are completed bundle foo does not contain a matrix under mykey, but
the original matrix m is still in good health.
To delete an object from a bundle use the delete command, as in
delete foo["mykey"]
This destroys the object associated with the key and removes the key from the hash table.
Besides adding, accessing, replacing and deleting individual items, the other operations that are
supported for bundles are union, printing and deletion. As regards union, if bundles b1 and b2 are
dened you can say
bundle b3 = b1 + b2
to create a new bundle that is the union of the two others. The algorithm is: create a new bundle
that is a copy of b1, then add any items from b2 whose keys are not already present in the new
bundle. (This means that bundle union is not commutative if the bundles have one or more key
strings in common.)
If b is a bundle and you say print b, you get a listing of the bundles keys along with the types of
the corresponding objects, as in
? print b
bundle b:
x (scalar)
mat (matrix)
inside (bundle)
Chapter 11. Gretl data types 79
What are bundles good for?
Bundles are unlikely to be of interest in the context of standalone gretl scripts, but they can be
very useful in the context of complex function packages where a good deal of information has to
be passed around between the component functions. Instead of using a lengthy list of individual
arguments, function A can bundle up the required data and pass it to functions B and C, where
relevant information can be extracted via a mnemonic key.
In this context bundles should be passed in pointer form (see chapter 10) as illustrated in the fol-
lowing trivial example, where a bundle is created at one level then lled out by a separate function.
# modification of bundle (pointer) by user function
function void fill_out_bundle (bundle *b)
b["mat"] = I(3)
b["str"] = "foo"
b["x"] = 32
end function
bundle my_bundle
fill_out_bundle(&my_bundle)
The bundle type can also be used to advantage as the return value from a packaged function, in
cases where a package writer wants to give the user the option of accessing various results. In the
gretl GUI, function packages that return a bundle are treated specially: the output window that
displays the printed results acquires a menu showing the bundled items (their names and types),
from which the user can save items of interest. For example, a function package that estimates a
model might return a bundle containing a vector of parameter estimates, a residual series and a
covariance matrix for the parameter estimates, among other possibilities.
As a renement to support the use of bundles as a function return type, the setnote function can
be used to add a brief explanatory note to a bundled item such notes will then be shown in the
GUI menu. This function takes three arguments: the name of a bundle, a key string, and the note.
For example
setnote(b, "vcv", "covariance matrix")
After this, the object under the key vcv in bundle b will be shown as covariance matrix in a GUI
menu.
11.8 The life cycle of gretl objects
Creation
The most basic way to create a new variable of any type is by declaration, where one states the type
followed by the name of the variable to create, as in
scalar x
series y
matrix A
and so forth. In that case the object in question is given a default initialization, as follows: a new
scalar has value NA (missing); a new series is lled with NAs; a new matrix is null (zero rows and
columns); a new string is empty; a new list has no members, and a new bundle is empty.
Declaration can be supplemented by a denite initialization, as in
Chapter 11. Gretl data types 80
scalar x = pi
series y = log(x)
matrix A = zeros(10,4)
With the exception of bundles (as noted above), new variables in gretl do not have to be declared
by type. The traditional way of creating a new variable in gretl was via the genr command (which is
still supported), as in
genr x = y/100
Here the type of x is left implicit and will be determined automatically depending on the context: if
y is a scalar, a series or a matrix x will inherit ys type (otherwise an error will be generated, since
division is applicable to these types only). Moreover, the type of a new variable can be left implicit
without use of genr:
x = y/100
In modern gretl scripting we recommend that you state the type of a new variable explicitly.
This makes the intent clearer to a reader of the script and also guards against errors that might
otherwise be dicult to understand (i.e. a certain variable turns out to be of the wrong type for
some subsequent calculation, but you dont notice at rst because you didnt say what type you
needed). An exception to this rule might reasonably be granted for clear and simple cases where
theres little possibility of confusion.
Modication
Typically, the values of variables of all types are modied by assignment, using the = operator with
the name of the variable on the left and a suitable value or formula on the right:
z = normal()
x = 100 * log(y) - log(y(-1))
M = qform(a, X)
By a suitable value we mean one that is conformable for the type in question. A gretl variable
acquires its type when it is rst created and this cannot be changed via assignment; for example, if
you have a matrix A and later want a string A, you will have to delete the matrix rst.
One point to watch out for in gretl scripting is type conicts having to do with the names of series brought
in from a data le. For example, in setting up a command loop (see chapter 9) it is very common to call the
loop index i. Now a loop index is a scalar (typically incremented each time round the loop). If you open
a data le that happens to contain a series named i you will get a type error (Types not conformable for
operation) when you try to use i as a loop index.
Although the type of an existing variable cannot be changed on the y, gretl nonetheless tries to be
as understanding as possible. For example if x is a series and you say
x = 100
gretl will give the series a constant value of 100 rather than complaining that you are trying to
assign a scalar to a series. This issue is particularly relevant for the matrix type see chapter 13
for details.
Besides using the regular assignment operator you also have the option of using an inected
equals sign, as in the C programming language. This is shorthand for the case where the new value
of the variable is a function of the old value. For example,
Chapter 11. Gretl data types 81
x += 100 # in longhand: x = x + 100
x *= 100 # in longhand: x = x * 100
For scalar variables you can use a more condensed shorthand for simple increment or decrement
by 1, namely trailing ++ or -- respectively:
x = 100
x-- # x now equals 99
x++ # x now equals 100
In the case of objects holding more than one value series, matrices and bundles you can
modify particular values within the object using an expression within square brackets to identify
the elements to access. We have discussed this above for the bundle type and chapter 13 goes into
details for matrices. As for series, there are two ways to specify particular values for modication:
you can use a simple 1-based index, or if the dataset is a time series or panel (or if it has marker
strings that identify the observations) you can use an appropriate observation string. Such strings
are displayed by gretl when you print data with the --byobs ag. Examples:
x[13] = 100 # simple index: the 13th observation
x[1995:4] = 100 # date: quarterly time series
x[2003:08] = 100 # date: monthly time series
x["AZ"] = 100 # the observation with marker string "AZ"
x[3:15] = 100 # panel: the 15th observation for the 3rd unit
Note that with quarterly or monthly time series there is no ambiguity between a simple index
number and a date, since dates always contain a colon. With annual time-series data, however,
such ambiguity exists and it is resolved by the rule that a number in brackets is always read as a
simple index: x[1905] means the nineteen-hundred and fth observation, not the observation for
the year 1905. You can specify a year by quotation, as in x["1905"].
Destruction
Objects of the types discussed above, with the important exception of named lists, are all destroyed
using the delete command: delete objectname.
Lists are an exception for this reason: in the context of gretl commands, a named list expands to
the ID numbers of the member series, so if you say
delete L
for L a list, the eect is to delete all the series in L; the list itself is not destroyed, but ends up
empty. To delete the list itself (without deleting the member series) you must invert the command
and use the list keyword:
list L delete
Chapter 12
Named lists and strings
12.1 Named lists
Many gretl commands take one or more lists of series as arguments. To make this easier to handle
in the context of command scripts, and in particular within user-dened functions, gretl oers the
possibility of named lists.
Creating and modifying named lists
A named list is created using the keyword list, followed by the name of the list, an equals sign,
and an expression that forms a list. The most basic sort of expression that works in this context is
a space-separated list of variables, given either by name or by ID number. For example,
list xlist = 1 2 3 4
list reglist = income price
Note that the variables in question must be of the series type: you cannot include scalars in a
named list.
Two abbreviations are available in dening lists:
You can use the wildcard character, *, to create a list of variables by name. For example,
dum* can be used to indicate all variables whose names begin with dum.
You can use two dots to indicate a range of variables. For example income..price indicates
the set of variables whose ID numbers are greater than or equal to that of income and less
than or equal to that of price.
In addition there are two special forms:
If you use the keyword null on the right-hand side, you get an empty list.
If you use the keyword dataset on the right, you get a list containing all the series in the
current dataset (except the pre-dened const).
The name of the list must start with a letter, and must be composed entirely of letters, numbers
or the underscore character. The maximum length of the name is 15 characters; list names cannot
contain spaces.
Once a named list has been created, it will be remembered for the duration of the gretl session
(unless you delete it), and can be used in the context of any gretl command where a list of variables
is expected. One simple example is the specication of a list of regressors:
list xlist = x1 x2 x3 x4
ols y 0 xlist
To get rid of a list, you use the following syntax:
list xlist delete
82
Chapter 12. Named lists and strings 83
Be careful: delete xlist will delete the variables contained in the list, so it implies data loss
(which may not be what you want). On the other hand, list xlist delete will simply undene
the xlist identier and the variables themselves will not be aected.
Similarly, to print the names of the variables in a list you have to invert the usual print command,
as in
list xlist print
If you just say print xlist the list will be expanded and the values of all the member variables
will be printed.
Lists can be modied in various ways. To redene an existing list altogether, use the same syntax
as for creating a list. For example
list xlist = 1 2 3
xlist = 4 5 6
After the second assignment, xlist contains just variables 4, 5 and 6.
To append or prepend variables to an existing list, we can make use of the fact that a named list
stands in for a longhand list. For example, we can do
list xlist = xlist 5 6 7
xlist = 9 10 xlist 11 12
Another option for appending a term (or a list) to an existing list is to use +=, as in
xlist += cpi
To drop a variable from a list, use -=:
xlist -= cpi
In most contexts where lists are used in gretl, it is expected that they do not contain any duplicated
elements. If you form a new list by simple concatenation, as in list L3 = L1 L2 (where L1 and
L2 are existing lists), its possible that the result may contain duplicates. To guard against this you
can form a new list as the union of two existing ones:
list L3 = L1 || L2
The result is a list that contains all the members of L1, plus any members of L2 that are not already
in L1.
In the same vein, you can construct a new list as the intersection of two existing ones:
list L3 = L1 && L2
Here L3 contains all the elements that are present in both L1 and L2.
You can also subtract one list from another:
list L3 = L1 - L2
The result contains all the elements of L1 that are not present in L2.
Chapter 12. Named lists and strings 84
Lists and matrices
Another way of forming a list is by assignment from a matrix. The matrix in question must be
interpretable as a vector containing ID numbers of (series) variables. It may be either a row or
a column vector, and each of its elements must have an integer part that is no greater than the
number of variables in the data set. For example:
matrix m = {1,2,3,4}
list L = m
The above is OK provided the data set contains at least 4 variables.
Querying a list
You can determine whether an unknown variable actually represents a list using the function
islist().
series xl1 = log(x1)
series xl2 = log(x2)
list xlogs = xl1 xl2
genr is1 = islist(xlogs)
genr is2 = islist(xl1)
The rst genr command above will assign a value of 1 to is1 since xlogs is in fact a named list.
The second genr will assign 0 to is2 since xl1 is a data series, not a list.
You can also determine the number of variables or elements in a list using the function nelem().
list xlist = 1 2 3
nl = nelem(xlist)
The (scalar) variable nl will be assigned a value of 3 since xlist contains 3 members.
You can determine whether a given series is a member of a specied list using the function
inlist(), as in
scalar k = inlist(L, y)
where L is a list and y a series. The series may be specied by name or ID number. The return value
is the (1-based) position of the series in the list, or zero if the series is not present in the list.
Generating lists of transformed variables
Given a named list of variables, you are able to generate lists of transformations of these variables
using the functions log, lags, diff, ldiff, sdiff or dummify. For example
list xlist = x1 x2 x3
list lxlist = log(xlist)
list difflist = diff(xlist)
When generating a list of lags in this way, you specify the maximum lag order inside the parenthe-
ses, before the list name and separated by a comma. For example
list xlist = x1 x2 x3
list laglist = lags(2, xlist)
or
Chapter 12. Named lists and strings 85
YpcFR YpcGE YpcIT NFR NGE NIT
1997 114.9 124.6 119.3 59830.635 82034.771 56890.372
1998 115.3 122.7 120.0 60046.709 82047.195 56906.744
1999 115.0 122.4 117.8 60348.255 82100.243 56916.317
2000 115.6 118.8 117.2 60750.876 82211.508 56942.108
2001 116.0 116.9 118.1 61181.560 82349.925 56977.217
2002 116.3 115.5 112.2 61615.562 82488.495 57157.406
2003 112.1 116.9 111.0 62041.798 82534.176 57604.658
2004 110.3 116.6 106.9 62444.707 82516.260 58175.310
2005 112.4 115.1 105.1 62818.185 82469.422 58607.043
2006 111.9 114.2 103.3 63195.457 82376.451 58941.499
Table 12.1: GDP per capita and population in 3 European countries (Source: Eurostat)
scalar order = 4
list laglist = lags(order, xlist)
These commands will populate laglist with the specied number of lags of the variables in xlist.
You can give the name of a single series in place of a list as the second argument to lags: this is
equivalent to giving a list with just one member.
The dummify function creates a set of dummy variables coding for all but one of the distinct values
taken on by the original variable, which should be discrete. (The smallest value is taken as the
omitted catgory.) Like lags, this function returns a list even if the input is a single series.
Generating series from lists
Once a list is dened, gretl oers several functions that apply to the list and return a series. In most
cases, these functions also apply to single series and behave as natural extensions when applied to
a list, but this is not always the case.
For recognizing and handling missing values, Gretl oers several functions (see the Gretl Command
Reference for details). In this context, it is worth remarking that the ok() function can be used with
a list argument. For example,
list xlist = x1 x2 x3
series xok = ok(xlist)
After these commands, the series xok will have value 1 for observations where none of x1, x2, or
x3 has a missing value, and value 0 for any observations where this condition is not met.
The functions max, min, mean, sd, sum and var behave horizontally rather than vertically when their
argument is a list. For instance, the following commands
list Xlist = x1 x2 x3
series m = mean(Xlist)
produce a series m whose i-th element is the average of x
1,i
, x
2,i
and x
3,i
; missing values, if any, are
implicitly discarded.
In addition, gretl provides three functions for weighted operations: wmean, wsd and wvar. Consider
as an illustration Table 12.1: the rst three columns are GDP per capita for France, Germany and
Italy; columns 4 to 6 contain the population for each country. If we want to compute an aggregate
indicator of per capita GDP, all we have to do is
Chapter 12. Named lists and strings 86
list Ypc = YpcFR YpcGE YpcIT
list N = NFR NGE NIT
y = wmean(Ypc, N)
so for example
y
1996
=
114.9 59830.635 +124.6 82034.771 +119.3 56890.372
59830.635 +82034.771 +56890.372
= 120.163
See the Gretl Command Reference for more details.
12.2 Named strings
For some purposes it may be useful to save a string (that is, a sequence of characters) as a named
variable that can be reused. Versions of gretl higher than 1.6.0 oer this facility, but some of the
renements noted below are available only in gretl 1.7.2 and higher.
To dene a string variable, you can use either of two commands, string or sprintf. The string
command is simpler: you can type, for example,
string s1 = "some stuff I want to save"
string s2 = getenv("HOME")
string s3 = s1 + 11
The rst eld after string is the name under which the string should be saved, then comes an
equals sign, then comes a specication of the string to be saved. This can be the keyword null, to
produce an empty string, or may take any of the following forms:
a string literal (enclosed in double quotes); or
the name of an existing string variable; or
a function that returns a string (see below); or
any of the above followed by + and an integer oset.
The role of the integer oset is to use a substring of the preceding element, starting at the given
character oset. An empty string is returned if the oset is greater than the length of the string in
question.
To add to the end of an existing string you can use the operator +=, as in
string s1 = "some stuff I want to "
string s1 += "save"
or you can use the ~ operator to join two or more strings, as in
string s1 = "sweet"
string s2 = "Home, " ~ s1 ~ " home."
Note that when you dene a string variable using a string literal, no characters are treated as
special (other than the double quotes that delimit the string). Specically, the backslash is not
used as an escape character. So, for example,
string s = "\"
Chapter 12. Named lists and strings 87
is a valid assignment, producing a string that contains a single backslash character. If you wish to
use backslash-escapes to denote newlines, tabs, embedded double-quotes and so on, use sprintf
instead.
The sprintf command is more exible. It works exactly as gretls printf command except that
the format string must be preceded by the name of a string variable. For example,
scalar x = 8
sprintf foo "var%d", x
To use the value of a string variable in a command, give the name of the variable preceded by the
at sign, @. This notation is treated as a macro. That is, if a sequence of characters in a gretl
command following the symbol @ is recognized as the name of a string variable, the value of that
variable is sustituted literally into the command line before the regular parsing of the command is
carried out. This is illustrated in the following interactive session:
? scalar x = 8
scalar x = 8
Generated scalar x (ID 2) = 8
? sprintf foo "var%d", x
Saved string as foo
? print "@foo"
var8
Note the eect of the quotation marks in the line print "@foo". The line
? print @foo
would not print a literal var8 as above. After pre-processing the line would read
print var8
It would therefore print the value(s) of the variable var8, if such a variable exists, or would generate
an error otherwise.
In some contexts, however, one wants to treat string variables as variables in their own right: to do
this, give the name of the variable without the leading @ symbol. This is the way to handle such
variables in the following contexts:
When they appear among the arguments to the commands printf and sprintf.
On the right-hand side of a string assignment.
When they appear as an argument to the function taking a string argument.
Here is an illustration of the use of named string arguments with printf:
string vstr = "variance"
Generated string vstr
printf "vstr: %12s\n", vstr
vstr: variance
Note that vstr should not be put in quotes in this context. Similarly with
? string vstr_copy = vstr
Chapter 12. Named lists and strings 88
gretldir the gretl installation directory
workdir users current gretl working directory
dotdir the directory gretl uses for temporary les
gnuplot path to, or name of, the gnuplot executable
tramo path to, or name of, the tramo executable
x12a path to, or name of, the x-12-arima executable
tramodir tramo data directory
x12adir x-12-arima data directory
Table 12.2: Built-in string variables
Built-in strings
Apart from any strings that the user may dene, some string variables are dened by gretl itself.
These may be useful for people writing functions that include shell commands. The built-in strings
are as shown in Table 12.2.
Reading strings from the environment
In addition, it is possible to read into gretls named strings, values that are dened in the external
environment. To do this you use the function getenv, which takes the name of an environment
variable as its argument. For example:
? string user = getenv("USER")
Saved string as user
? string home = getenv("HOME")
Saved string as home
? print "@users home directory is @home"
cottrells home directory is /home/cottrell
To check whether you got a non-empty value from a given call to getenv, you can use the function
strlen, which retrieves the length of the string, as in
? string temp = getenv("TEMP")
Saved empty string as temp
? scalar x = strlen(temp)
Generated scalar x (ID 2) = 0
The function isstring returns 1 if its argument is the name of a string variable, 0 otherwise.
However, if the return is 1 the string may still be empty.
At present the getenv function can only be used on the right-hand side of a string assignment,
as in the above illustrations.
Capturing strings via the shell
If shell commands are enabled in gretl, you can capture the output from such commands using the
syntax
string stringname = $(shellcommand)
That is, you enclose a shell command in parentheses, preceded by a dollar sign.
Reading from a le into a string
You can read the content of a le into a string variable using the syntax
Chapter 12. Named lists and strings 89
string stringname = readfile(lename)
The lename eld may be given as a string variable. For example
? sprintf fname "%s/QNC.rts", x12adir
Generated string fname
? string foo = readfile(fname)
Generated string foo
The above could also be accomplished using the macro variant of a string variable, provided it is
placed in quotation marks:
string foo = readfile("@x12adir/QNC.rts")
The strstr function
Invocation of this function takes the form
string stringname = strstr(s1, s2)
The eect is to search s1 for the rst occurrence of s2. If no such occurrence is found, an empty
string is returned; otherwise the portion of s1 starting with s2 is returned. For example:
? string hw = "hello world"
Saved string as hw
? string w = strstr(hw, "o")
Saved string as w
? print "@w"
o world
Chapter 13
Matrix manipulation
Together with the other two basic types of data (series and scalars), gretl oers a quite compre-
hensive array of matrix methods. This chapter illustrates the peculiarities of matrix syntax and
discusses briey some of the more complex matrix functions. For a full listing of matrix functions
and a comprehensive account of their syntax, please refer to the Gretl Command Reference.
13.1 Creating matrices
Matrices can be created using any of these methods:
1. By direct specication of the scalar values that compose the matrix in numerical form, by
reference to pre-existing scalar variables, or using computed values.
2. By providing a list of data series.
3. By providing a named list of series.
4. Using a formula of the same general type that is used with the genr command, whereby a new
matrix is dened in terms of existing matrices and/or scalars, or via some special functions.
To specify a matrix directly in terms of scalars, the syntax is, for example:
matrix A = { 1, 2, 3 ; 4, 5, 6 }
The matrix is dened by rows; the elements on each row are separated by commas and the rows
are separated by semi-colons. The whole expression must be wrapped in braces. Spaces within the
braces are not signicant. The above expression denes a 2 3 matrix. Each element should be a
numerical value, the name of a scalar variable, or an expression that evaluates to a scalar. Directly
after the closing brace you can append a single quote () to obtain the transpose.
To specify a matrix in terms of data series the syntax is, for example,
matrix A = { x1, x2, x3 }
where the names of the variables are separated by commas. Besides names of existing variables,
you can use expressions that evaluate to a series. For example, given a series x you could do
matrix A = { x, x^2 }
Each variable occupies a column (and there can only be one variable per column). You cannot use
the semicolon as a row separator in this case: if you want the series arranged in rows, append the
transpose symbol. The range of data values included in the matrix depends on the current setting
of the sample range.
Instead of giving an explicit list of variables, you may instead provide the name of a saved list (see
Chapter 12), as in
90
Chapter 13. Matrix manipulation 91
list xlist = x1 x2 x3
matrix A = { xlist }
When you provide a named list, the data series are by default placed in columns, as is natural in an
econometric context: if you want them in rows, append the transpose symbol.
As a special case of constructing a matrix from a list of variables, you can say
matrix A = { dataset }
This builds a matrix using all the series in the current dataset, apart from the constant (variable 0).
When this dummy list is used, it must be the sole element in the matrix denition {...}. You can,
however, create a matrix that includes the constant along with all other variables using horizontal
concatenation (see below), as in
matrix A = {const}~{dataset}
By default, when you build a matrix from series that include missing values the data rows that
contain NAs are skipped. But you can modify this behavior via the command set skip_missing
off. In that case NAs are converted to NaN (Not a Number). In the IEEE oating-point stan-
dard, arithmetic operations involving NaN always produce NaN. Alternatively, you can take greater
control over the observations (data rows) that are included in the matrix using the set variable
matrix_mask, as in
set matrix_mask msk
where msk is the name of a series. Subsequent commands that formmatrices fromseries or lists will
include only observations for which msk has non-zero (and non-missing) values. You can remove
this mask via the command set matrix_mask null.
Names of matrices must satisfy the same requirements as names of gretl variables in general: the name
can be no longer than 15 characters, must start with a letter, and must be composed of nothing but letters,
numbers and the underscore character.
13.2 Empty matrices
The syntax
matrix A = {}
creates an empty matrix a matrix with zero rows and zero columns.
The main purpose of the concept of an empty matrix is to enable the user to dene a starting point
for subsequent concatenation operations. For instance, if X is an already dened matrix of any size,
the commands
matrix A = {}
matrix B = A ~ X
result in a matrix B identical to X.
From an algebraic point of view, one can make sense of the idea of an empty matrix in terms of
vector spaces: if a matrix is an ordered set of vectors, then A={} is the empty set. As a consequence,
operations involving addition and multiplications dont have any clear meaning (arguably, they have
none at all), but operations involving the cardinality of this set (that is, the dimension of the space
spanned by A) are meaningful.
Chapter 13. Matrix manipulation 92
Legal operations on empty matrices are listed in Table 13.1. (All other matrix operations gener-
ate an error when an empty matrix is given as an argument.) In line with the above interpreta-
tion, some matrix functions return an empty matrix under certain conditions: the functions diag,
vec, vech, unvech when the arguments is an empty matrix; the functions I, ones, zeros,
mnormal, muniform when one or more of the arguments is 0; and the function nullspace when
its argument has full column rank.
Function Return value
A, transp(A) A
rows(A) 0
cols(A) 0
rank(A) 0
det(A) NA
ldet(A) NA
tr(A) NA
onenorm(A) NA
infnorm(A) NA
rcond(A) NA
Table 13.1: Valid functions on an empty matrix, A
13.3 Selecting sub-matrices
You can select sub-matrices of a given matrix using the syntax
A[rows,cols]
where rows can take any of these forms:
1. empty selects all rows
2. a single integer selects the single specied row
3. two integers separated by a colon selects a range of rows
4. the name of a matrix selects the specied rows
With regard to option 2, the integer value can be given numerically, as the name of an existing
scalar variable, or as an expression that evaluates to a scalar. With option 4, the index matrix given
in the rows eld must be either p 1 or 1 p, and should contain integer values in the range 1 to
n, where n is the number of rows in the matrix from which the selection is to be made.
The cols specication works in the same way, mutatis mutandis. Here are some examples.
matrix B = A[1,]
matrix B = A[2:3,3:5]
matrix B = A[2,2]
matrix idx = { 1, 2, 6 }
matrix B = A[idx,]
The rst example selects row 1 from matrix A; the second selects a 23 submatrix; the third selects
a scalar; and the fourth selects rows 1, 2, and 6 from matrix A.
If the matrix in question is n 1 or 1 m, it is OK to give just one index specier and omit the
comma. For example, A[2] selects the second element of A if A is a vector. Otherwise the comma
is mandatory.
Chapter 13. Matrix manipulation 93
In addition there is a pre-dened index specication, diag, which selects the principal diagonal of
a square matrix, as in B[diag], where B is square.
You can use selections of this sort on either the right-hand side of a matrix-generating formula or
the left. Here is an example of use of a selection on the right, to extract a 2 2 submatrix B from a
3 3 matrix A:
matrix A = { 1, 2, 3; 4, 5, 6; 7, 8, 9 }
matrix B = A[1:2,2:3]
And here are examples of selection on the left. The second line below writes a 22 identity matrix
into the bottom right corner of the 3 3 matrix A. The fourth line replaces the diagonal of A with
1s.
matrix A = { 1, 2, 3; 4, 5, 6; 7, 8, 9 }
matrix A[2:3,2:3] = I(2)
matrix d = { 1, 1, 1 }
matrix A[diag] = d
13.4 Matrix operators
The following binary operators are available for matrices:
+ addition
- subtraction
* ordinary matrix multiplication
pre-multiplication by transpose
\ matrix left division (see below)
/ matrix right division (see below)
~ column-wise concatenation
| row-wise concatenation
** Kronecker product
= test for equality
In addition, the following operators (dot operators) apply on an element-by-element basis:
.+ .- .* ./ .^ .= .> .<
Here are explanations of the less obvious cases.
For matrix addition and subtraction, in general the two matrices have to be of the same dimensions
but an exception to this rule is granted if one of the operands is a 11 matrix or scalar. The scalar
is implicitly promoted to the status of a matrix of the correct dimensions, all of whose elements
are equal to the given scalar value. For example, if A is an m n matrix and k a scalar, then the
commands
matrix C = A + k
matrix D = A - k
both produce mn matrices, with elements c
ij
= a
ij
+k and d
ij
= a
ij
k respectively.
By pre-multiplication by transpose we mean, for example, that
matrix C = XY
Chapter 13. Matrix manipulation 94
produces the product of X-transpose and Y. In eect, the expression XY is shorthand for X*Y
(which is also valid).
In matrix left division, the statement
matrix X = A \ B
is interpreted as a request to nd the matrix X that solves AX = B. If B is a square matrix, this is
in principle equivalent to A
1
B, which fails if A is singular; the numerical method employed here
is the LU decomposition. If A is a T k matrix with T > k, then X is the least-squares solution,
X = (A
A)
1
A
B, which fails if A
a
ij
. All such functions require a single matrix as argument, or
an expression which evaluates to a single matrix.
1
In this section, we review some aspects of genr functions that apply specically to matrices. A full
account of each function is available in the Gretl Command Reference.
Matrix reshaping
In addition to the methods discussed in sections 13.1 and 13.3, a matrix can also be created by
re-arranging the elements of a pre-existing matrix. This is accomplished via the mshape function.
It takes three arguments: the input matrix, A, and the rows and columns of the target matrix, r
and c respectively. Elements are read from A and written to the target in column-major order. If A
contains fewer elements than n = r c, they are repeated cyclically; if A has more elements, only
the rst n are used.
For example:
matrix a = mnormal(2,3)
a
matrix b = mshape(a,3,1)
b
1
Note that to nd the matrix square root you need the cholesky function (see below); moreover, the exp function
computes the exponential element by element, and therefore does not return the matrix exponential unless the matrix is
diagonal to get the matrix exponential, use mexp.
Chapter 13. Matrix manipulation 96
Creation and I/O
colnames diag diagcat I lower mnormal
mread muniform mwrite ones rownames seq
unvech upper vec vech zeros
Shape/size/arrangement
cols dsort mreverse mshape msortby rows
selifc selifr sort trimr
Matrix algebra
cdiv cholesky cmult det eigengen eigensym
eigsolve fft ffti ginv hdprod infnorm
inv invpd ldet mexp nullspace onenorm
polroots psdroot qform qrdecomp rank rcond
svd toepsolv tr transp varsimul
Statistics/transformations
cdemean corr corrgm cov cum fcstats
imaxc imaxr iminc iminr irf kdensity
maxc maxr mcorr mcov mcovg meanc
meanr minc minr mlag mols mpols
mrls mxtab pergm princomp quantile ranking
resample sdc sumc sumr uniq values
Data utilities
isconst ok pshrink replace
Filters
filter kfilter ksimul ksmooth lrvar
Numerical methods
BFGSmax fdjac NRmax simann
Strings
colname
Transformations
chowlin lincomb
Table 13.3: Matrix functions by category
Chapter 13. Matrix manipulation 97
matrix b = mshape(a,5,2)
b
produces
? a
a
1.2323 0.99714 -0.39078
0.54363 0.43928 -0.48467
? matrix b = mshape(a,3,1)
Generated matrix b
? b
b
1.2323
0.54363
0.99714
? matrix b = mshape(a,5,2)
Replaced matrix b
? b
b
1.2323 -0.48467
0.54363 1.2323
0.99714 0.54363
0.43928 0.99714
-0.39078 0.43928
Complex multiplication and division
Gretl has no native provision for complex numbers. However, basic operations can be performed
on vectors of complex numbers by using the convention that a vector of n complex numbers is
represented as a n 2 matrix, where the rst column contains the real part and the second the
imaginary part.
Addition and subtraction are trivial; the functions cmult and cdiv compute the complex product
and division, respectively, of two input matrices, A and B, representing complex numbers. These
matrices must have the same number of rows, n, and either one or two columns. The rst column
contains the real part and the second (if present) the imaginary part. The return value is an n 2
matrix, or, if the result has no imaginary part, an n-vector.
For example, suppose you have z
1
= [1 +2i, 3 +4i]
and z
2
= [1, i]
:
? z1 = {1,2;3,4}
z1 = {1,2;3,4}
Generated matrix z1
? z2 = I(2)
z2 = I(2)
Generated matrix z2
? conj_z1 = z1 .* {1,-1}
conj_z1 = z1 .* {1,-1}
Generated matrix conj_z1
? eval cmult(z1,z2)
eval cmult(z1,z2)
1 2
-4 3
Chapter 13. Matrix manipulation 98
? eval cmult(z1,conj_z1)
eval cmult(z1,conj_z1)
5
25
Multiple returns and the null keyword
Some functions take one or more matrices as arguments and compute one or more matrices; these
are:
eigensym Eigen-analysis of symmetric matrix
eigengen Eigen-analysis of general matrix
mols Matrix OLS
qrdecomp QR decomposition
svd Singular value decomposition (SVD)
The general rule is: the main result of the function is always returned as the result proper.
Auxiliary returns, if needed, are retrieved using pre-existing matrices, which are passed to the
function as pointers (see 10.4). If such values are not needed, the pointer may be substituted with
the keyword null.
The syntax for qrdecomp, eigensym and eigengen is of the form
matrix B = func(A, &C)
The rst argument, A, represents the input data, that is, the matrix whose decomposition or analysis
is required. The second argument must be either the name of an existing matrix preceded by & (to
indicate the address of the matrix in question), in which case an auxiliary result is written to that
matrix, or the keyword null, in which case the auxiliary result is not produced, or is discarded.
In case a non-null second argument is given, the specied matrix will be over-written with the
auxiliary result. (It is not required that the existing matrix be of the right dimensions to receive the
result.)
The function eigensym computes the eigenvalues, and optionally the right eigenvectors, of a sym-
metric n n matrix. The eigenvalues are returned directly in a column vector of length n; if the
eigenvectors are required, they are returned in an nn matrix. For example:
matrix V
matrix E = eigensym(M, &V)
matrix E = eigensym(M, null)
In the rst case E holds the eigenvalues of M and V holds the eigenvectors. In the second, E holds
the eigenvalues but the eigenvectors are not computed.
The function eigengen computes the eigenvalues, and optionally the eigenvectors, of a general
nn matrix. The eigenvalues are returned directly in an n2 matrix, the rst column holding the
real components and the second column the imaginary components.
If the eigenvectors are required (that is, if the second argument to eigengen is not null), they
are returned in an n n matrix. The column arrangement of this matrix is somewhat non-trivial:
the eigenvectors are stored in the same order as the eigenvalues, but the real eigenvectors occupy
one column, whereas complex eigenvectors take two (the real part comes rst); the total num-
ber of columns is still n, because the conjugate eigenvector is skipped. Example 13.1 provides a
(hopefully) clarifying example (see also subsection 13.6).
Chapter 13. Matrix manipulation 99
Example 13.1: Complex eigenvalues and eigenvectors
set seed 34756
matrix v
A = mnormal(3,3)
/* do the eigen-analysis */
l = eigengen(A,&v)
/* eigenvalue 1 is real, 2 and 3 are complex conjugates */
print l
print v
/*
column 1 contains the first eigenvector (real)
*/
B = A*v[,1]
c = l[1,1] * v[,1]
/* B should equal c */
print B
print c
/*
columns 2:3 contain the real and imaginary parts
of eigenvector 2
*/
B = A*v[,2:3]
c = cmult(ones(3,1)*(l[2,]),v[,2:3])
/* B should equal c */
print B
print c
Chapter 13. Matrix manipulation 100
The qrdecomp function computes the QR decomposition of an mn matrix A: A = QR, where Q
is an mn orthogonal matrix and R is an nn upper triangular matrix. The matrix Q is returned
directly, while R can be retrieved via the second argument. Here are two examples:
matrix R
matrix Q = qrdecomp(M, &R)
matrix Q = qrdecomp(M, null)
In the rst example, the triangular R is saved as R; in the second, R is discarded. The rst line
above shows an example of a simple declaration of a matrix: R is declared to be a matrix variable
but is not given any explicit value. In this case the variable is initialized as a 1 1 matrix whose
single element equals zero.
The syntax for svd is
matrix B = func(A, &C, &D)
The function svd computes all or part of the singular value decomposition of the real mn matrix
A. Let k = min(m, n). The decomposition is
A = UV
X)
1
X
X
1
(where X is the common matrix of regres-
sors) is available as $xtxinv.
If the accessors are given without any prex, they retrieve results from the last model estimated, if
any. Alternatively, they may be prexed with the name of a saved model plus a period (.), in which
case they retrieve results from the specied model. Here are some examples:
matrix u = $uhat
matrix b = m1.$coeff
matrix v2 = m1.$vcv[1:2,1:2]
The rst command grabs the residuals from the last model; the second grabs the coecient vector
from model m1; and the third (which uses the mechanism of sub-matrix selection described above)
grabs a portion of the covariance matrix from model m1.
If the model in question a VAR or VECM (only) $compan and $vma return the companion matrix and
the VMA matrices in stacked form, respectively (see section 24.2 for details). After a vector error
correction model is estimated via Johansens procedure, the matrices $jalpha and $jbeta are also
available. These have a number of columns equal to the chosen cointegration rank; therefore, the
product
matrix Pi = $jalpha * $jbeta
returns the reduced-rank estimate of A(1). Since is automatically identied via the Phillips nor-
malization (see section 25.5), its unrestricted elements do have a proper covariance matrix, which
can be retrieved through the $jvbeta accessor.
13.8 Namespace issues
Matrices share a common namespace with data series and scalar variables. In other words, no two
objects of any of these types can have the same name. It is an error to attempt to change the type
of an existing variable, for example:
scalar x = 3
matrix x = ones(2,2) # wrong!
It is possible, however, to delete or rename an existing variable then reuse the name for a variable
of a dierent type:
scalar x = 3
delete x
matrix x = ones(2,2) # OK
13.9 Creating a data series from a matrix
Section 13.1 above describes how to create a matrix from a data series or set of series. You may
sometimes wish to go in the opposite direction, that is, to copy values from a matrix into a regular
data series. The syntax for this operation is
Chapter 13. Matrix manipulation 104
series sname = mspec
where sname is the name of the series to create and mspec is the name of the matrix to copy from,
possibly followed by a matrix selection expression. Here are two examples.
series s = x
series u1 = U[,1]
It is assumed that x and U are pre-existing matrices. In the second example the series u1 is formed
from the rst column of the matrix U.
For this operation to work, the matrix (or matrix selection) must be a vector with length equal to
either the full length of the current dataset, n, or the length of the current sample range, n
. If
n
elements are drawn from the matrix; if the matrix or selection comprises n
elements, the n
1
+ z
i
2
+ d
i
3
+ (d
i
z
i
)
4
+
t
, where d
i
is a
dummy variable while x
i
and z
i
are vectors of explanatory variables.
Solution:
list X = x1 x2 x3
list Z = z1 z2
list dZ = null
loop foreach i Z
series d$i = d * $i
list dZ = dZ d$i
endloop
ols y X Z d dZ
Comment: Its amazing what string substitution can do for you, isnt it?
Chapter 14. Cheat sheet 111
Realized volatility
Problem: Given data by the minute, you want to compute the realized volatility for the hour as
RV
t
=
1
60
60
=1
y
2
t:
. Imagine your sample starts at time 1:1.
Solution:
smpl --full
genr time
genr minute = int(time/60) + 1
genr second = time % 60
setobs minute second --panel
genr rv = psd(y)^2
setobs 1 1
smpl second=1 --restrict
store foo rv
Comment: Here we trick gretl into thinking that our dataset is a panel dataset, where the minutes
are the units and the seconds are the time; this way, we can take advantage of the special
function psd(), panel standard deviation. Then we simply drop all observations but one per minute
and save the resulting data (store foo rv translates as store in the gretl datale foo.gdt the
series rv).
Looping over two paired lists
Problem: Suppose you have two lists with the same number of elements, and you want to apply
some command to corresponding elements over a loop.
Solution:
list L1 = a b c
list L2 = x y z
k1 = 1
loop foreach i L1 --quiet
k2 = 1
loop foreach j L2 --quiet
if k1=k2
ols $i 0 $j
endif
k2++
endloop
k1++
endloop
Comment: The simplest way to achieve the result is to loop over all possible combinations and
lter out the unneeded ones via an if condition, as above. That said, in some cases variable names
can help. For example, if
list Lx = x1 x2 x3
list Ly = y1 y2 y3
looping over the integers is quite intuitive and certainly more elegant:
loop i=1..3
ols y$i const x$i
endloop
Part II
Econometric methods
112
Chapter 15
Robust covariance matrix estimation
15.1 Introduction
Consider (once again) the linear regression model
y = X +u (15.1)
where y and u are T-vectors, X is a T k matrix of regressors, and is a k-vector of parameters.
As is well known, the estimator of given by Ordinary Least Squares (OLS) is
= (X
X)
1
X
y (15.2)
If the condition E(uX) = 0 is satised, this is an unbiased estimator; under somewhat weaker
conditions the estimator is biased but consistent. It is straightforward to show that when the OLS
estimator is unbiased (that is, when E(
) = E
_
(
)(
_
= (X
X)
1
X
X(X
X)
1
(15.3)
where = E(uu
) =
2
(X
X)
1
(15.4)
If the iid assumption is not satised, two things follow. First, it is possible in principle to construct
a more ecient estimator than OLS for instance some sort of Feasible Generalized Least Squares
(FGLS). Second, the simple classical formula for the variance of the least squares estimator is no
longer correct, and hence the conventional OLS standard errors which are just the square roots
of the diagonal elements of the matrix dened by (15.4) do not provide valid means of statistical
inference.
In the recent history of econometrics there are broadly two approaches to the problem of non-
iid errors. The traditional approach is to use an FGLS estimator. For example, if the departure
from the iid condition takes the form of time-series dependence, and if one believes that this
could be modeled as a case of rst-order autocorrelation, one might employ an AR(1) estimation
method such as CochraneOrcutt, HildrethLu, or PraisWinsten. If the problem is that the error
variance is non-constant across observations, one might estimate the variance as a function of the
independent variables and then perform weighted least squares, using as weights the reciprocals
of the estimated variances.
While these methods are still in use, an alternative approach has found increasing favor: that
is, use OLS but compute standard errors (or more generally, covariance matrices) that are robust
with respect to deviations from the iid assumption. This is typically combined with an emphasis on
using large datasets large enough that the researcher can place some reliance on the (asymptotic)
consistency property of OLS. This approach has been enabled by the availability of cheap computing
power. The computation of robust standard errors and the handling of very large datasets were
daunting tasks at one time, but now they are unproblematic. The other point favoring the newer
methodology is that while FGLS oers an eciency advantage in principle, it often involves making
113
Chapter 15. Robust covariance matrix estimation 114
additional statistical assumptions which may or may not be justied, which may not be easy to test
rigorously, and which may threaten the consistency of the estimator for example, the common
factor restriction that is implied by traditional FGLS corrections for autocorrelated errors.
James Stock and Mark Watsons Introduction to Econometrics illustrates this approach at the level of
undergraduate instruction: many of the datasets they use comprise thousands or tens of thousands
of observations; FGLS is downplayed; and robust standard errors are reported as a matter of course.
In fact, the discussion of the classical standard errors (labeled homoskedasticity-only) is conned
to an Appendix.
Against this background it may be useful to set out and discuss all the various options oered
by gretl in respect of robust covariance matrix estimation. The rst point to notice is that gretl
produces classical standard errors by default (in all cases apart from GMM estimation). In script
mode you can get robust standard errors by appending the --robust ag to estimation commands.
In the GUI program the model specication dialog usually contains a Robust standard errors
check box, along with a congure button that is activated when the box is checked. The congure
button takes you to a conguration dialog (which can also be reached from the main menu bar:
Tools Preferences General HCCME). There you can select from a set of possible robust
estimation variants, and can also choose to make robust estimation the default.
The specics of the available options depend on the nature of the data under consideration
cross-sectional, time series or panel and also to some extent the choice of estimator. (Although
we introduced robust standard errors in the context of OLS above, they may be used in conjunction
with other estimators too.) The following three sections of this chapter deal with matters that are
specic to the three sorts of data just mentioned. Note that additional details regarding covariance
matrix estimation in the context of GMM are given in chapter 20.
We close this introduction with a brief statement of what robust standard errors can and cannot
achieve. They can provide for asymptotically valid statistical inference in models that are basically
correctly specied, but in which the errors are not iid. The asymptotic part means that they
may be of little use in small samples. The correct specication part means that they are not a
magic bullet: if the error term is correlated with the regressors, so that the parameter estimates
themselves are biased and inconsistent, robust standard errors will not save the day.
15.2 Cross-sectional data and the HCCME
With cross-sectional data, the most likely departure from iid errors is heteroskedasticity (non-
constant variance).
1
In some cases one may be able to arrive at a judgment regarding the likely
form of the heteroskedasticity, and hence to apply a specic correction. The more common case,
however, is where the heteroskedasticity is of unknown form. We seek an estimator of the covari-
ance matrix of the parameter estimates that retains its validity, at least asymptotically, in face of
unspecied heteroskedasticity. It is not obvious, a priori, that this should be possible, but White
(1980) showed that
Var
h
(
) = (X
X)
1
X
X(X
X)
1
(15.5)
does the trick. (As usual in statistics, we need to say under certain conditions, but the conditions
are not very restrictive.)
is in this context a diagonal matrix, whose non-zero elements may be
estimated using squared OLS residuals. White referred to (15.5) as a heteroskedasticity-consistent
covariance matrix estimator (HCCME).
Davidson and MacKinnon (2004, chapter 5) oer a useful discussion of several variants on Whites
HCCME theme. They refer to the original variant of (15.5) in which the diagonal elements of
_
2
is minimized for given X and y. Suppose that
. This is almost certain to be the case: even if
OLS is not biased, it would be a miracle if the
calculated from any nite sample were exactly equal
to . But in that case the sum of squares of the true, unobserved errors,
u
2
t
=
(y
t
X
t
)
2
is
bound to be greater than
u
2
t
. The elaborated variants on HC
0
take this point on board as follows:
HC
1
: Applies a degrees-of-freedom correction, multiplying the HC
0
matrix by T/(T k).
HC
2
: Instead of using u
2
t
for the diagonal elements of
, uses u
2
t
/(1 h
t
), where h
t
=
X
t
(X
X)
1
X
t
, the t
th
diagonal element of the projection matrix, P, which has the property
that P y = y. The relevance of h
t
is that if the variance of all the u
t
is
2
, the expectation
of u
2
t
is
2
(1 h
t
), or in other words, the ratio u
2
t
/(1 h
t
) has expectation
2
. As Davidson
and MacKinnon show, 0 h
t
< 1 for all t, so this adjustment cannot reduce the the diagonal
elements of
and in general revises them upward.
HC
3
: Uses u
2
t
/(1 h
t
)
2
. The additional factor of (1 h
t
) in the denominator, relative to
HC
2
, may be justied on the grounds that observations with large variances tend to exert a
lot of inuence on the OLS estimates, so that the corresponding residuals tend to be under-
estimated. See Davidson and MacKinnon for a fuller explanation.
The relative merits of these variants have been explored by means of both simulations and the-
oretical analysis. Unfortunately there is not a clear consensus on which is best. Davidson and
MacKinnon argue that the original HC
0
is likely to perform worse than the others; nonetheless,
Whites standard errors are reported more often than the more sophisticated variants and there-
fore, for reasons of comparability, HC
0
is the default HCCME in gretl.
If you wish to use HC
1
, HC
2
or HC
3
you can arrange for this in either of two ways. In script mode,
you can do, for example,
set hc_version 2
In the GUI program you can go to the HCCME conguration dialog, as noted above, and choose any
of these variants to be the default.
15.3 Time series data and HAC covariance matrices
Heteroskedasticity may be an issue with time series data too, but it is unlikely to be the only, or
even the primary, concern.
One form of heteroskedasticity is common in macroeconomic time series, but is fairly easily dealt
with. That is, in the case of strongly trending series such as Gross Domestic Product, aggregate
consumption, aggregate investment, and so on, higher levels of the variable in question are likely
to be associated with higher variability in absolute terms. The obvious x, employed in many
macroeconometric studies, is to use the logs of such series rather than the raw levels. Provided the
proportional variability of such series remains roughly constant over time, the log transformation
is eective.
Other forms of heteroskedasticity may resist the log transformation, but may demand a special
treatment distinct from the calculation of robust standard errors. We have in mind here autore-
gressive conditional heteroskedasticity, for example in the behavior of asset prices, where large
disturbances to the market may usher in periods of increased volatility. Such phenomena call for
specic estimation strategies, such as GARCH (see chapter 23).
Chapter 15. Robust covariance matrix estimation 116
Despite the points made above, some residual degree of heteroskedasticity may be present in time
series data: the key point is that in most cases it is likely to be combined with serial correlation
(autocorrelation), hence demanding a special treatment. In Whites approach,
, the estimated
covariance matrix of the u
t
, remains conveniently diagonal: the variances, E(u
2
t
), may dier by
t but the covariances, E(u
t
u
s
), are all zero. Autocorrelation in time series data means that at
least some of the the o-diagonal elements of
should be non-zero. This introduces a substantial
complication and requires another piece of terminology; estimates of the covariance matrix that
are asymptotically valid in face of both heteroskedasticity and autocorrelation of the error process
are termed HAC (heteroskedasticity and autocorrelation consistent).
The issue of HAC estimation is treated in more technical terms in chapter 20. Here we try to
convey some of the intuition at a more basic level. We begin with a general comment: residual
autocorrelation is not so much a property of the data, as a symptom of an inadequate model. Data
may be persistent though time, and if we t a model that does not take this aspect into account
properly, we end up with a model with autocorrelated disturbances. Conversely, it is often possible
to mitigate or even eliminate the problem of autocorrelation by including relevant lagged variables
in a time series model, or in other words, by specifying the dynamics of the model more fully. HAC
estimation should not be seen as the rst resort in dealing with an autocorrelated error process.
That said, the obvious extension of Whites HCCME to the case of autocorrelated errors would
seem to be this: estimate the o-diagonal elements of
(that is, the autocovariances, E(u
t
u
s
))
using, once again, the appropriate OLS residuals:
ts
= u
t
u
s
. This is basically right, but demands
an important amendment. We seek a consistent estimator, one that converges towards the true
as the sample size tends towards innity. This cant work if we allow unbounded serial depen-
dence. Bigger samples will enable us to estimate more of the true
ts
elements (that is, for t and
s more widely separated in time) but will not contribute ever-increasing information regarding the
maximally separated
ts
pairs, since the maximal separation itself grows with the sample size.
To ensure consistency, we have to conne our attention to processes exhibiting temporally limited
dependence, or in other words cut o the computation of the
ts
values at some maximum value
of p = t s (where p is treated as an increasing function of the sample size, T, although it cannot
increase in proportion to T).
The simplest variant of this idea is to truncate the computation at some nite lag order p, where
p grows as, say, T
1/4
. The trouble with this is that the resulting
may not be a positive denite
matrix. In practical terms, we may end up with negative estimated variances. One solution to this
problem is oered by The NeweyWest estimator (Newey and West, 1987), which assigns declining
weights to the sample autocovariances as the temporal separation increases.
To understand this point it is helpful to look more closely at the covariance matrix given in (15.5),
namely,
(X
X)
1
(X
X)(X
X)
1
This is known as a sandwich estimator. The bread, which appears on both sides, is (X
X)
1
.
This is a k k matrix, and is also the key ingredient in the computation of the classical covariance
matrix. The lling in the sandwich is
= X
X
(kk) (kT) (TT) (Tk)
Since = E(uu
uu
X)
which expresses as the long-run covariance of the random k-vector X
u.
From a computational point of view, it is not necessary or desirable to store the (potentially very
large) T T matrix
as such. Rather, one computes the sandwich lling by summation as
(0) +
p
_
j=1
w
j
_
(j) +
(j)
_
Chapter 15. Robust covariance matrix estimation 117
where the k k sample autocovariance matrix
(j), for j 0, is given by
(j) =
1
T
T
_
t=j+1
u
t
u
tj
X
t
X
tj
and w
j
is the weight given to the autocovariance at lag j > 0.
This leaves two questions. How exactly do we determine the maximum lag length or bandwidth,
p, of the HAC estimator? And how exactly are the weights w
j
to be determined? We will return to
the (dicult) question of the bandwidth shortly. As regards the weights, Gretl oers three variants.
The default is the Bartlett kernel, as used by Newey and West. This sets
w
j
=
_
_
_
1
j
p+1
j p
0 j > p
so the weights decline linearly as j increases. The other two options are the Parzen kernel and the
Quadratic Spectral (QS) kernel. For the Parzen kernel,
w
j
=
_
_
1 6a
2
j
+6a
3
j
0 a
j
0.5
2(1 a
j
)
3
0.5 < a
j
1
0 a
j
> 1
where a
j
= j/(p +1), and for the QS kernel,
w
j
=
25
12
2
d
2
j
_
sinm
j
m
j
cos m
j
_
where d
j
= j/p and m
j
= 6d
i
/5.
Figure 15.1 shows the weights generated by these kernels, for p = 4 and j = 1 to 9.
Figure 15.1: Three HAC kernels
Bartlett Parzen QS
In gretl you select the kernel using the set command with the hac_kernel parameter:
set hac_kernel parzen
set hac_kernel qs
set hac_kernel bartlett
Selecting the HAC bandwidth
The asymptotic theory developed by Newey, West and others tells us in general terms how the
HAC bandwidth, p, should grow with the sample size, T that is, p should grow in proportion
to some fractional power of T. Unfortunately this is of little help to the applied econometrician,
working with a given dataset of xed size. Various rules of thumb have been suggested, and gretl
implements two such. The default is p = 0.75T
1/3
, as recommended by Stock and Watson (2003).
An alternative is p = 4(T/100)
2/9
, as in Wooldridge (2002b). In each case one takes the integer
part of the result. These variants are labeled nw1 and nw2 respectively, in the context of the set
command with the hac_lag parameter. That is, you can switch to the version given by Wooldridge
with
Chapter 15. Robust covariance matrix estimation 118
set hac_lag nw2
As shown in Table 15.1 the choice between nw1 and nw2 does not make a great deal of dierence.
T p (nw1) p (nw2)
50 2 3
100 3 4
150 3 4
200 4 4
300 5 5
400 5 5
Table 15.1: HAC bandwidth: two rules of thumb
You also have the option of specifying a xed numerical value for p, as in
set hac_lag 6
In addition you can set a distinct bandwidth for use with the Quadratic Spectral kernel (since this
need not be an integer). For example,
set qs_bandwidth 3.5
Prewhitening and data-based bandwidth selection
An alternative approach is to deal with residual autocorrelation by attacking the problem from two
sides. The intuition behind the technique known as VAR prewhitening (Andrews and Monahan,
1992) can be illustrated by a simple example. Let x
t
be a sequence of rst-order autocorrelated
random variables
x
t
= x
t1
+u
t
The long-run variance of x
t
can be shown to be
V
LR
(x
t
) =
V
LR
(u
t
)
(1 )
2
In most cases, u
t
is likely to be less autocorrelated than x
t
, so a smaller bandwidth should suce.
Estimation of V
LR
(x
t
) can therefore proceed in three steps: (1) estimate ; (2) obtain a HAC estimate
of u
t
= x
t
x
t1
; and (3) divide the result by (1 )
2
.
The application of the above concept to our problem implies estimating a nite-order Vector Au-
toregression (VAR) on the vector variables
t
= X
t
u
t
. In general, the VAR can be of any order, but
in most cases 1 is sucient; the aim is not to build a watertight model for
t
, but just to mop up
a substantial part of the autocorrelation. Hence, the following VAR is estimated
t
= A
t1
+
t
Then an estimate of the matrix X
(I
A
)
1
where
A
=
_
X
X
_
1
_
_
n
_
i=1
X
i
u
i
u
i
X
i
_
_
_
X
X
_
1
where X is the matrix of regressors (with the group means subtracted, in the case of xed eects)
u
i
denotes the vector of residuals for unit i, and n is the number of cross-sectional units. Cameron
and Trivedi (2005) make a strong case for using this estimator; they note that the ordinary White
HCCME can produce misleadingly small standard errors in the panel context because it fails to take
autocorrelation into account.
Chapter 15. Robust covariance matrix estimation 120
In cases where autocorrelation is not an issue, however, the estimator proposed by Beck and Katz
(1995) and discussed by Greene (2003, chapter 13) may be appropriate. This estimator, which takes
into account contemporaneous correlation across the units and heteroskedasticity by unit, is
BK
=
_
X
X
_
1
_
_
n
_
i=1
n
_
j=1
ij
X
i
X
j
_
_
_
X
X
_
1
The covariances
ij
are estimated via
ij
=
u
i
u
j
T
where T is the length of the time series for each unit. Beck and Katz call the associated standard
errors Panel-Corrected Standard Errors (PCSE). This estimator can be invoked in gretl via the
command
set pcse on
The Arellano default can be re-established via
set pcse off
(Note that regardless of the pcse setting, the robust estimator is not used unless the --robust ag
is given, or the Robust box is checked in the GUI program.)
Chapter 16
Panel data
16.1 Estimation of panel models
Pooled Ordinary Least Squares
The simplest estimator for panel data is pooled OLS. In most cases this is unlikely to be adequate,
but it provides a baseline for comparison with more complex estimators.
If you estimate a model on panel data using OLS an additional test item becomes available. In the
GUI model window this is the item panel diagnostics under the Tests menu; the script counterpart
is the hausman command.
To take advantage of this test, you should specify a model without any dummy variables represent-
ing cross-sectional units. The test compares pooled OLS against the principal alternatives, the xed
eects and random eects models. These alternatives are explained in the following section.
The xed and random eects models
In gretl version 1.6.0 and higher, the xed and random eects models for panel data can be es-
timated in their own right. In the graphical interface these options are found under the menu
item Model/Panel/Fixed and random eects. In the command-line interface one uses the panel
command, with or without the --random-effects option.
This section explains the nature of these models and comments on their estimation via gretl.
The pooled OLS specication may be written as
y
it
= X
it
+u
it
(16.1)
where y
it
is the observation on the dependent variable for cross-sectional unit i in period t, X
it
is a 1 k vector of independent variables observed for unit i in period t, is a k 1 vector of
parameters, and u
it
is an error or disturbance term specic to unit i in period t.
The xed and random eects models have in common that they decompose the unitary pooled
error term, u
it
. For the xed eects model we write u
it
=
i
+
it
, yielding
y
it
= X
it
+
i
+
it
(16.2)
That is, we decompose u
it
into a unit-specic and time-invariant component,
i
, and an observation-
specic error,
it
.
1
The
i
s are then treated as xed parameters (in eect, unit-specic y-intercepts),
which are to be estimated. This can be done by including a dummy variable for each cross-sectional
unit (and suppressing the global constant). This is sometimes called the Least Squares Dummy Vari-
ables (LSDV) method. Alternatively, one can subtract the group mean from each of variables and
estimate a model without a constant. In the latter case the dependent variable may be written as
y
it
= y
it
y
i
The group mean, y
i
, is dened as
y
i
=
1
T
i
T
i
_
t=1
y
it
1
It is possible to break a third component out of u
it
, namely w
t
, a shock that is time-specic but common to all the
units in a given period. In the interest of simplicity we do not pursue that option here.
121
Chapter 16. Panel data 122
where T
i
is the number of observations for unit i. An exactly analogous formulation applies to the
independent variables. Given parameter estimates,
, obtained using such de-meaned data we can
recover estimates of the
i
s using
i
=
1
T
i
T
i
_
t=1
_
y
it
X
it
_
These two methods (LSDV, and using de-meaned data) are numerically equivalent. Gretl takes the
approach of de-meaning the data. If you have a small number of cross-sectional units, a large num-
ber of time-series observations per unit, and a large number of regressors, it is more economical
in terms of computer memory to use LSDV. If need be you can easily implement this manually. For
example,
genr unitdum
ols y x du_*
(See Chapter 5 for details on unitdum).
The
i
estimates are not printed as part of the standard model output in gretl (there may be a large
number of these, and typically they are not of much inherent interest). However you can retrieve
them after estimation of the xed eects model if you wish. In the graphical interface, go to the
Save menu in the model window and select per-unit constants. In command-line mode, you can
do genr newname = $ahat, where newname is the name you want to give the series.
For the random eects model we write u
it
= v
i
+
it
, so the model becomes
y
it
= X
it
+v
i
+
it
(16.3)
In contrast to the xed eects model, the v
i
s are not treated as xed parameters, but as random
drawings from a given probability distribution.
The celebrated GaussMarkov theorem, according to which OLS is the best linear unbiased esti-
mator (BLUE), depends on the assumption that the error term is independently and identically
distributed (IID). In the panel context, the IID assumption means that E(u
2
it
), in relation to equa-
tion 16.1, equals a constant,
2
u
, for all i and t, while the covariance E(u
is
u
it
) equals zero for all
s t and the covariance E(u
jt
u
it
) equals zero for all j i.
If these assumptions are not met and they are unlikely to be met in the context of panel data
OLS is not the most ecient estimator. Greater eciency may be gained using generalized least
squares (GLS), taking into account the covariance structure of the error term.
Consider observations on a given unit i at two dierent times s and t. From the hypotheses above
it can be worked out that Var(u
is
) = Var(u
it
) =
2
v
+
2
I +
2
v
J (16.5)
where J is a square matrix with all elements equal to 1. It can be shown that the matrix
K
i
= I
T
i
J,
where = 1
_
2
2
+T
i
2
v
, has the property
K
i
K
i
=
2
I
Chapter 16. Panel data 123
It follows that the transformed system
K
i
y
i
= K
i
X
i
+K
i
u
i
(16.6)
satises the GaussMarkov conditions, and OLS estimation of (16.6) provides ecient inference.
But since
K
i
y
i
= y
i
y
i
GLS estimation is equivalent to OLS using quasi-demeaned variables; that is, variables from which
we subtract a fraction of their average. Notice that for
2
0, 1, while for
2
v
0, 0.
This means that if all the variance is attributable to the individual eects, then the xed eects
estimator is optimal; if, on the other hand, individual eects are negligible, then pooled OLS turns
out, unsurprisingly, to be the optimal estimator.
To implement the GLS approach we need to calculate , which in turn requires estimates of the
variances
2
and
2
v
. (These are often referred to as the within and between variances respec-
tively, since the former refers to variation within each cross-sectional unit and the latter to variation
between the units). Several means of estimating these magnitudes have been suggested in the liter-
ature (see Baltagi, 1995); gretl uses the method of Swamy and Arora (1972):
2
is estimated by the
residual variance from the xed eects model, and the sum
2
+T
i
2
v
is estimated as T
i
times the
residual variance from the between estimator,
y
i
=
X
i
+e
i
The latter regression is implemented by constructing a data set consisting of the group means of
all the relevant variables.
Choice of estimator
Which panel method should one use, xed eects or random eects?
One way of answering this question is in relation to the nature of the data set. If the panel comprises
observations on a xed and relatively small set of units of interest (say, the member states of the
European Union), there is a presumption in favor of xed eects. If it comprises observations on a
large number of randomly selected individuals (as in many epidemiological and other longitudinal
studies), there is a presumption in favor of random eects.
Besides this general heuristic, however, various statistical issues must be taken into account.
1. Some panel data sets contain variables whose values are specic to the cross-sectional unit
but which do not vary over time. If you want to include such variables in the model, the xed
eects option is simply not available. When the xed eects approach is implemented using
dummy variables, the problem is that the time-invariant variables are perfectly collinear with
the per-unit dummies. When using the approach of subtracting the group means, the issue is
that after de-meaning these variables are nothing but zeros.
2. A somewhat analogous prohibition applies to the random eects estimator. This estimator is
in eect a matrix-weighted average of pooled OLS and the between estimator. Suppose we
have observations on n units or individuals and there are k independent variables of interest.
If k > n, the between estimator is undened since we have only n eective observations
and hence so is the random eects estimator.
If one does not fall foul of one or other of the prohibitions mentioned above, the choice between
xed eects and random eects may be expressed in terms of the two econometric desiderata,
eciency and consistency.
From a purely statistical viewpoint, we could say that there is a tradeo between robustness and
eciency. In the xed eects approach, we do not make any hypotheses on the group eects
(that is, the time-invariant dierences in mean between the groups) beyond the fact that they exist
Chapter 16. Panel data 124
and that can be tested; see below. As a consequence, once these eects are swept out by taking
deviations from the group means, the remaining parameters can be estimated.
On the other hand, the random eects approach attempts to model the group eects as drawings
from a probability distribution instead of removing them. This requires that individual eects are
representable as a legitimate part of the disturbance term, that is, zero-mean random variables,
uncorrelated with the regressors.
As a consequence, the xed-eects estimator always works, but at the cost of not being able to
estimate the eect of time-invariant regressors. The richer hypothesis set of the random-eects
estimator ensures that parameters for time-invariant regressors can be estimated, and that esti-
mation of the parameters for time-varying regressors is carried out more eciently. These advan-
tages, though, are tied to the validity of the additional hypotheses. If, for example, there is reason
to think that individual eects may be correlated with some of the explanatory variables, then the
random-eects estimator would be inconsistent, while xed-eects estimates would still be valid.
It is precisely on this principle that the Hausman test is built (see below): if the xed- and random-
eects estimates agree, to within the usual statistical margin of error, there is no reason to think
the additional hypotheses invalid, and as a consequence, no reason not to use the more ecient RE
estimator.
Testing panel models
If you estimate a xed eects or random eects model in the graphical interface, you may notice
that the number of items available under the Tests menu in the model windowis relatively limited.
Panel models carry certain complications that make it dicult to implement all of the tests one
expects to see for models estimated on straight time-series or cross-sectional data.
Nonetheless, various panel-specic tests are printed along with the parameter estimates as a matter
of course, as follows.
When you estimate a model using xed eects, you automatically get an F-test for the null hy-
pothesis that the cross-sectional units all have a common intercept. That is to say that all the
i
s
are equal, in which case the pooled model (16.1), with a column of 1s included in the X matrix, is
adequate.
When you estimate using random eects, the BreuschPagan and Hausman tests are presented
automatically.
The BreuschPagan test is the counterpart to the F-test mentioned above. The null hypothesis is
that the variance of v
i
in equation (16.3) equals zero; if this hypothesis is not rejected, then again
we conclude that the simple pooled model is adequate.
The Hausman test probes the consistency of the GLS estimates. The null hypothesis is that these
estimates are consistent that is, that the requirement of orthogonality of the v
i
and the X
i
is satised. The test is based on a measure, H, of the distance between the xed-eects and
random-eects estimates, constructed such that under the null it follows the
2
distribution with
degrees of freedom equal to the number of time-varying regressors in the matrix X. If the value of
H is large this suggests that the random eects estimator is not consistent and the xed-eects
model is preferable.
There are two ways of calculating H, the matrix-dierence method and the regression method. The
procedure for the matrix-dierence method is this:
Collect the xed-eects estimates in a vector
and the corresponding random-eects esti-
mates in
, then form the dierence vector (
).
Form the covariance matrix of the dierence vector as Var(
) = Var(
) Var(
) = ,
where Var(
) and Var(
1
_
_
.
Given the relative eciencies of
and
, the matrix should be positive denite, in which case
H is positive, but in nite samples this is not guaranteed and of course a negative
2
value is not
admissible. The regression method avoids this potential problem. The procedure is:
Treat the random-eects model as the restricted model, and record its sum of squared resid-
uals as SSR
r
.
Estimate via OLS an unrestricted model in which the dependent variable is quasi-demeaned y
and the regressors include both quasi-demeaned X (as in the RE model) and the de-meaned
variants of all the time-varying variables (i.e. the xed-eects regressors); record the sum of
squared residuals from this model as SSR
u
.
Compute H = n(SSR
r
SSR
u
) /SSR
u
, where n is the total number of observations used. On
this variant H cannot be negative, since adding additional regressors to the RE model cannot
raise the SSR.
By default gretl computes the Hausman test via the regression method, but it uses the matrix-
dierence method if you pass the option --matrix-diff to the panel command.
Robust standard errors
For most estimators, gretl oers the option of computing an estimate of the covariance matrix that
is robust with respect to heteroskedasticity and/or autocorrelation (and hence also robust standard
errors). In the case of panel data, robust covariance matrix estimators are available for the pooled
and xed eects model but not currently for random eects. Please see section 15.4 for details.
16.2 Autoregressive panel models
Special problems arise when a lag of the dependent variable is included among the regressors in a
panel model. Consider a dynamic variant of the pooled model (16.1):
y
it
= X
it
+y
it1
+u
it
(16.7)
First, if the error u
it
includes a group eect, v
i
, then y
it1
is bound to be correlated with the error,
since the value of v
i
aects y
i
at all t. That means that OLS applied to (16.7) will be inconsistent
as well as inecient. The xed-eects model sweeps out the group eects and so overcomes this
particular problem, but a subtler issue remains, which applies to both xed and random eects
estimation. Consider the de-meaned representation of xed eects, as applied to the dynamic
model,
y
it
=
X
it
+ y
i,t1
+
it
where y
it
= y
it
y
i
and
it
= u
it
u
i
(or u
it
i
, using the notation of equation 16.2). The trouble
is that y
i,t1
will be correlated with
it
via the group mean, y
i
. The disturbance
it
inuences y
it
directly, which inuences y
i
, which, by construction, aects the value of y
it
for all t. The same
issue arises in relation to the quasi-demeaning used for random eects. Estimators which ignore
this correlation will be consistent only as T (in which case the marginal eect of
it
on the
group mean of y tends to vanish).
One strategy for handling this problem, and producing consistent estimates of and , was pro-
posed by Anderson and Hsiao (1981). Instead of de-meaning the data, they suggest taking the rst
dierence of (16.7), an alternative tactic for sweeping out the group eects:
y
it
= X
it
+y
i,t1
+
it
(16.8)
and
is inecient.
Chapter 16. Panel data 126
where
it
= u
it
= (v
i
+
it
) =
it
i,t1
. Were not in the clear yet, given the structure of the
error
it
: the disturbance
i,t1
is an inuence on both
it
and y
i,t1
= y
it
y
i,t1
. The next step
is then to nd an instrument for the contaminated y
i,t1
. Anderson and Hsiao suggest using
either y
i,t2
or y
i,t2
, both of which will be uncorrelated with
it
provided that the underlying
errors,
it
, are not themselves serially correlated.
The AndersonHsiao estimator is not provided as a built-in function in gretl, since gretls sensible
handling of lags and dierences for panel data makes it a simple application of regression with
instrumental variables see Example 16.1, which is based on a study of country growth rates by
Nerlove (1999).
3
Example 16.1: The AndersonHsiao estimator for a dynamic panel model
# Penn World Table data as used by Nerlove
open penngrow.gdt
# Fixed effects (for comparison)
panel Y 0 Y(-1) X
# Random effects (for comparison)
panel Y 0 Y(-1) X --random-effects
# take differences of all variables
diff Y X
# Anderson-Hsiao, using Y(-2) as instrument
tsls d_Y d_Y(-1) d_X ; 0 d_X Y(-2)
# Anderson-Hsiao, using d_Y(-2) as instrument
tsls d_Y d_Y(-1) d_X ; 0 d_X d_Y(-2)
Although the AndersonHsiao estimator is consistent, it is not most ecient: it does not make the
fullest use of the available instruments for y
i,t1
, nor does it take into account the dierenced
structure of the error
it
. It is improved upon by the methods of Arellano and Bond (1991) and
Blundell and Bond (1998). These methods are taken up in the next chapter.
3
Also see Clint Cummins benchmarks page, http://www.stanford.edu/~clint/bench/.
Chapter 17
Dynamic panel models
As of gretl version 1.9.2, the primary command for estimating dynamic panel models is dpanel.
The closely related arbond command has been available for some time, and is still present, but
whereas arbond only supports the so-called dierence estimator (Arellano and Bond, 1991),
dpanel is addition oers the system estimator (Blundell and Bond, 1998), which has become
the method of choice in the applied literature.
17.1 Introduction
Notation
A dynamic linear panel data model can be represented as follows (in notation based on Arellano
(2003)):
y
it
= y
i,t1
+
x
it
+
i
+v
it
(17.1)
The main idea on which the dierence estimator is based is to get rid of the individual eect via
dierencing:
1
rst-dierencing eq. (17.1) yields
y
it
= y
i,t1
+
x
it
+v
it
=
W
it
+v
it
, (17.2)
in obvious notation. The error term of (17.2) is, by construction, autocorrelated and also correlated
with the lagged dependent variable, so an estimator that takes both issues into account is needed.
The endogeneity issue is solved by noting that all values of y
i,tk
, with k > 1 can be used as
instruments for y
i,t1
: unobserved values of y
i,tk
(because they could be missing, or pre-sample)
can safely be substituted with 0. In the language of GMM, this amounts to using the relation
E(v
it
y
i,tk
) = 0, k > 1 (17.3)
as an orthogonality condition.
Autocorrelation is dealt with by noting that, if v
it
is a white noise, then the covariance matrix of the
vector whose typical element is v
it
is proportional to a matrix H that has 2 on the main diagonal,
1 on the rst subdiagonals and 0 elsewhere. In practice, one-step GMM estimation of equation
(17.2) amounts to computing
=
_
_
_
_
_
N
_
i=1
W
i
Z
i
_
_
_
_
N
_
i=1
Z
i
HZ
i
_
_
1
_
_
N
_
i=1
Z
i
W
i
_
_
_
_
_
1
_
_
N
_
i=1
W
i
Z
i
_
_
_
_
N
_
i=1
Z
i
HZ
i
_
_
1
_
_
N
_
i=1
Z
i
y
i
_
_
(17.4)
1
An alternative is orthogonal deviations: this is implemented in arbond, but not in dpanel, since it was a lot of work
and OD is very rarely seen in the wild.
127
Chapter 17. Dynamic panel models 128
where
y
i
=
_
y
i,3
y
i,T
_
W
i
=
_
y
i,2
y
i,T1
x
i,3
x
i,T
_
Z
i
=
_
_
_
_
_
_
_
y
i1
0 0 0 x
i3
0 y
i1
y
i2
0 x
i4
.
.
.
0 0 0 y
i,T2
x
iT
_
_
_
_
_
_
_
Once the 1-step estimator is computed, the sample covariance matrix of the estimated residuals
can be used instead of H to obtain 2-step estimates, which are not only consistent but asymp-
totically ecient.
2
Standard GMM theory applies, except for one thing: Windmeijer (2005) has
computed nite-sample corrections to the asymptotic covariance matrix of the parameters, which
are nowadays almost universally used.
The dierence estimator is consistent, but has been shown to have poor properties in nite samples
when is near one. People these days prefer the so-called system estimator, which complements
the dierenced data (with lagged levels used as instruments) with data in levels (using lagged
dierences as instruments). The system estimator relies on an extra orthogonality condition which
has to do with the earliest value of the dependent variable y
i,1
. The interested reader is referred
to Blundell and Bond (1998, pp. 124125) for details, but here it suces to say that this condition
is satised in mean-stationary models and brings about eciency that may be substantial in many
cases.
The set of orthogonality conditions exploited in the system approach is not very much larger than
with the dierence estimator, the reason being that most of the possible orthogonality conditions
associated with the equations in levels are redundant, given those already used for the equations
in dierences.
The key equations of the system estimator can be written as
=
_
_
_
_
_
N
_
i=1
Z
_
_
_
_
N
_
i=1
Z
_
_
1
_
_
N
_
i=1
W
_
_
_
_
_
1
_
_
N
_
i=1
Z
_
_
_
_
N
_
i=1
Z
_
_
1
_
_
N
_
i=1
y
i
_
_
(17.5)
2
In theory, the process may be iterated, but nobody seems to be interested.
Chapter 17. Dynamic panel models 129
where
y
i
=
_
y
i3
y
iT
y
i3
y
iT
_
W
i
=
_
y
i2
y
i,T1
y
i2
y
i,T1
x
i3
x
iT
x
i3
x
iT
_
Z
i
=
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
y
i1
0 0 0 0 0 x
i,3
0 y
i1
y
i2
0 0 0 x
i,4
.
.
.
0 0 0 y
i,T2
0 0 x
iT
.
.
.
0 0 0 0 y
i2
0 x
i3
.
.
.
0 0 0 0 0 y
i,T1
x
iT
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
; but since
2
is
unknown and any positive denite matrix renders the estimator consistent, people just use I. The
o-diagonal blocks should, in principle, contain the covariances between v
is
and v
it
, which would
be an identity matrix if v
it
is white noise. However, since the south-east block is typically given a
conventional value anyway, the benet in making this choice is not obvious. Some packages use I;
others use a zero matrix. Asymptotically, it should not matter, but on real datasets the dierence
between the resulting estimates can be noticeable.
Rank deciency
Both the dierence estimator (17.4) and the system estimator (17.5) depend, for their existence, on
the invertibility of A =
N
i=1
k
y
t
=
k
y
t1
+
k
t
where
k
= 1 L
k
and past levels of y
t
are perfectly valid instruments. In this example, we can
choose k = 3 and use y
1
as an instrument, so this unit is in fact perfectly usable.
Not all software packages seem to be aware of this possibility, so replicating published results may
prove tricky if your dataset contains individuals with gaps between valid observations.
17.2 Usage
One of the concepts underlying the syntax of dpanel is that you get default values for several
choices you may want to make, so that in a standard situation the command itself is very short
to write (and read). The simplest case of the model (17.1) is a plain AR(1) process:
y
i,t
= y
i,t1
+
i
+v
it
. (17.6)
If you give the command
dpanel 1 ; y
gretl assumes that you want to estimate (17.6) via the dierence estimator (17.4), using as many
orthogonality conditions as possible. The scalar 1 between dpanel and the semicolon indicates that
only one lag of y is included as an explanatory variable; using 2 would give an AR(2) model. The
syntax that gretl uses for the non-seasonal AR and MA lags in an ARMA model is also supported in
this context.
3
For example, if you want the rst and third lags of y (but not the second) included as
explanatory variables you can say
dpanel {1 3} ; y
or you can use a pre-dened matrix for this purpose:
matrix ylags = {1, 3}
dpanel ylags ; y
To use a single lag of y other than the rst you need to employ this mechanism:
3
This represents an enhancement over the arbond command.
Chapter 17. Dynamic panel models 131
dpanel {3} ; y # only lag 3 is included
dpanel 3 ; y # compare: lags 1, 2 and 3 are used
To use the system estimator instead, you add the --system option, as in
dpanel 1 ; y --system
The level orthogonality conditions and the corresponding instrument are appended automatically
(see eq. 17.5).
Regressors
If we want to introduce additional regressors, we list them after the dependent variable in the same
way as other gretl commands, such as ols.
For the dierence orthogonality relations, dpanel takes care of transforming the regressors in par-
allel with the dependent variable. Note that this diers from gretls arbond command, where only
the dependent variable is dierenced automatically; it brings us more in line with other software.
One case of potential ambiguity is when an intercept is specied but the dierence-only estimator
is selected, as in
dpanel 1 ; y const
In this case the default dpanel behavior, which agrees with Statas xtabond2, is to drop the con-
stant (since dierencing reduces it to nothing but zeros). However, for compatibility with the
DPD package for Ox, you can give the option --dpdstyle, in which case the constant is retained
(equivalent to including a linear trend in equation 17.1). A similar point applies to the period-
specic dummy variables which can be added in dpanel via the --time-dummies option: in the
dierences-only case these dummies are entered in dierenced form by default, but when the
--dpdstyle switch is applied they are entered in levels.
The standard gretl syntax applies if you want to use lagged explanatory variables, so for example
the command
dpanel 1 ; y const x(0 to -1) --system
would result in estimation of the model
y
it
= y
i,t1
+
0
+
1
x
it
+
2
x
i,t1
+
i
+v
it
.
Instruments
The default rules for instruments are:
lags of the dependent variable are instrumented using all available orthogonality conditions;
and
additional regressors are considered exogenous, so they are used as their own instruments.
If a dierent policy is wanted, the instruments should be specied in an additional list, separated
from the regressors list by a semicolon. The syntax closely mirrors that for the tsls command,
but in this context it is necessary to distinguish between regular instruments and what are often
called GMM-style instruments (that, instruments that are handled in the same block-diagonal
manner as lags of the dependent variable, as described above).
Regular instruments are transformed in the same way as regressors, and the contemporaneous
value of the transformed variable is used to form an orthogonality condition. Since regressors are
treated as exogenous by default, it follows that these two commands estimate the same model:
Chapter 17. Dynamic panel models 132
dpanel 1 ; y z
dpanel 1 ; y z ; z
The instrument specication in the second case simply conrms what is implicit in the rst: that
z is exogenous. Note, though, that if you have some additional variable z2 which you want to add
as a regular instrument, it then becomes necessary to include z in the instrument list if it is to be
treated as exogenous:
dpanel 1 ; y z ; z2 # z is now implicitly endogenous
dpanel 1 ; y z ; z z2 # z is treated as exogenous
The specication of GMM-style instruments is handled by the special constructs GMM() and
GMMlevel(). The rst of these relates to instruments for the equations in dierences, and the
second to the equations in levels. The syntax for GMM() is
GMM(varname, minlag, maxlag)
where varname is replaced by the name of a series, and minlag and maxlag are replaced by the
minimum and maximum lags to be used as instruments. The same goes for GMMlevel().
One common use of GMM() is to limit the number of lagged levels of the dependent variable used
as instruments for the equations in dierences. Its well known that although exploiting all pos-
sible orthogonality conditions yields maximal asymptotic eciency, in nite samples it may be
preferable to use a smaller subset (but see also Okui (2009)). For example, the specication
dpanel 1 ; y ; GMM(y, 2, 4)
ensures that no lags of y
t
earlier than t 4 will be used as instruments.
A second use of GMM() is to exploit more fully the potential block-diagonal orthogonality conditions
oered by an exogenous regressor, or a related variable that does not appear as a regressor. For
example, in
dpanel 1 ; y x ; GMM(z, 2, 6)
the variable x is considered an endogenous regressor, and up to 5 lags of z are used as instruments.
Note that in the following script fragment
dz = diff(z)
dpanel 1 ; y dz
dpanel 1 ; y dz ; GMM(z,0,0)
the two estimation commands should not be expected to give the same result, as the sets of orthog-
onality relationships are subtly dierent. In the latter case, you have T 2 separate orthogonality
relationships pertaining to z
it
, none of which has any implication for the other ones; in the former
case, you only have one. In terms of the Z
i
matrix, the rst form adds a single row to the bottom
of the instruments matrix, while the second form adds a diagonal block with T 2 columns, that is
_
z
i3
z
i4
z
it
_
versus
_
_
_
_
_
_
_
z
i3
0 0
0 z
i4
0
.
.
.
.
.
.
0 0 z
it
_
_
_
_
_
_
_
Chapter 17. Dynamic panel models 133
17.3 Replication of DPD results
In this section we show how to replicate the results of some of the pioneering work with dynamic
panel-data estimators by Arellano, Bond and Blundell. As the DPD manual (Doornik, Arellano and
Bond, 2006) explains, it is dicult to replicate the original published results exactly, for two main
reasons: not all of the data used in those studies are publicly available; and some of the choices
made in the original software implementation of the estimators have been superseded. Here, there-
fore, our focus is on replicating the results obtained using the current DPD package and reported
in the DPD manual.
The examples are based on the program les abest1.ox, abest3.ox and bbest1.ox. These
are included in the DPD package, along with the ArellanoBond database les abdata.bn7 and
abdata.in7.
4
The ArellanoBond data are also provided with gretl, in the le abdata.gdt. In the
following we do not show the output from DPD or gretl; it is somewhat voluminous, and is easily
generated by the user. As of this writing the results from Ox/DPD and gretl are identical in all
relevant respects for all of the examples shown.
5
A complete Ox/DPD program to generate the results of interest takes this general form:
#include <oxstd.h>
#import <packages/dpd/dpd>
main()
{
decl dpd = new DPD();
dpd.Load("abdata.in7");
dpd.SetYear("YEAR");
// model-specific code here
delete dpd;
}
In the examples below we take this template for granted and show just the model-specic code.
Example 1
The following Ox/DPD codedrawn from abest1.oxreplicates column (b) of Table 4 in Arellano
and Bond (1991), an instance of the dierences-only or GMM-DIF estimator. The dependent variable
is the log of employment, n; the regressors include two lags of the dependent variable, current and
lagged values of the log real-product wage, w, the current value of the log of gross capital, k, and
current and lagged values of the log of industry output, ys. In addition the specication includes
a constant and ve year dummies; unlike the stochastic regressors, these deterministic terms are
not dierenced. In this specication the regressors w, k and ys are treated as exogenous and serve
as their own instruments. In DPD syntax this requires entering these variables twice, on the X_VAR
and I_VAR lines. The GMM-type (block-diagonal) instruments in this example are the second and
subsequent lags of the level of n. Both 1-step and 2-step estimates are computed.
dpd.SetOptions(FALSE); // dont use robust standard errors
dpd.Select(Y_VAR, {"n", 0, 2});
dpd.Select(X_VAR, {"w", 0, 1, "k", 0, 0, "ys", 0, 1});
dpd.Select(I_VAR, {"w", 0, 1, "k", 0, 0, "ys", 0, 1});
4
See http://www.doornik.com/download.html.
5
To be specic, this is using Ox Console version 5.10, version 1.24 of the DPD package, and gretl built from CVS as of
2010-10-23, all on Linux.
Chapter 17. Dynamic panel models 134
dpd.Gmm("n", 2, 99);
dpd.SetDummies(D_CONSTANT + D_TIME);
print("\n\n***** Arellano & Bond (1991), Table 4 (b)");
dpd.SetMethod(M_1STEP);
dpd.Estimate();
dpd.SetMethod(M_2STEP);
dpd.Estimate();
Here is gretl code to do the same job:
open abdata.gdt
list X = w w(-1) k ys ys(-1)
dpanel 2 ; n X const --time-dummies --asy --dpdstyle
dpanel 2 ; n X const --time-dummies --asy --two-step --dpdstyle
Note that in gretl the switch to suppress robust standard errors is --asymptotic, here abbreviated
to --asy.
6
The --dpdstyle ag species that the constant and dummies should not be dierenced,
in the context of a GMM-DIF model. With gretls dpanel command it is not necessary to specify the
exogenous regressors as their own instruments since this is the default; similarly, the use of the
second and all longer lags of the dependent variable as GMM-type instruments is the default and
need not be stated explicitly.
Example 2
The DPD le abest3.ox contains a variant of the above that diers with regard to the choice of
instruments: the variables w and k are now treated as predetermined, and are instrumented GMM-
style using the second and third lags of their levels. This approximates column (c) of Table 4 in
Arellano and Bond (1991). We have modied the code in abest3.ox slightly to allow the use of
robust (Windmeijer-corrected) standard errors, which are the default in both DPD and gretl with
2-step estimation:
dpd.Select(Y_VAR, {"n", 0, 2});
dpd.Select(X_VAR, {"w", 0, 1, "k", 0, 0, "ys", 0, 1});
dpd.Select(I_VAR, {"ys", 0, 1});
dpd.SetDummies(D_CONSTANT + D_TIME);
dpd.Gmm("n", 2, 99);
dpd.Gmm("w", 2, 3);
dpd.Gmm("k", 2, 3);
print("\n***** Arellano & Bond (1991), Table 4 (c)\n");
print(" (but using different instruments!!)\n");
dpd.SetMethod(M_2STEP);
dpd.Estimate();
The gretl code is as follows:
open abdata.gdt
list X = w w(-1) k ys ys(-1)
list Ivars = ys ys(-1)
dpanel 2 ; n X const ; GMM(w,2,3) GMM(k,2,3) Ivars --time --two-step --dpd
Note that since we are now calling for an instrument set other then the default (following the second
semicolon), it is necessary to include the Ivars specication for the variable ys. However, it is
not necessary to specify GMM(n,2,99) since this remains the default treatment of the dependent
variable.
6
Option ags in gretl can always be truncated, down to the minimal unique abbreviation.
Chapter 17. Dynamic panel models 135
Example 3
Our third example replicates the DPD output from bbest1.ox: this uses the same dataset as the
previous examples but the model specications are based on Blundell and Bond (1998), and involve
comparison of the GMM-DIF and GMM-SYS (system) estimators. The basic specication is slightly
simplied in that the variable ys is not used and only one lag of the dependent variable appears as
a regressor. The Ox/DPD code is:
dpd.Select(Y_VAR, {"n", 0, 1});
dpd.Select(X_VAR, {"w", 0, 1, "k", 0, 1});
dpd.SetDummies(D_CONSTANT + D_TIME);
print("\n\n***** Blundell & Bond (1998), Table 4: 1976-86 GMM-DIF");
dpd.Gmm("n", 2, 99);
dpd.Gmm("w", 2, 99);
dpd.Gmm("k", 2, 99);
dpd.SetMethod(M_2STEP);
dpd.Estimate();
print("\n\n***** Blundell & Bond (1998), Table 4: 1976-86 GMM-SYS");
dpd.GmmLevel("n", 1, 1);
dpd.GmmLevel("w", 1, 1);
dpd.GmmLevel("k", 1, 1);
dpd.SetMethod(M_2STEP);
dpd.Estimate();
Here is the corresponding gretl code:
open abdata.gdt
list X = w w(-1) k k(-1)
# Blundell & Bond (1998), Table 4: 1976-86 GMM-DIF
dpanel 1 ; n X const ; GMM(w,2,99) GMM(k,2,99) --time --two-step --dpd
# Blundell & Bond (1998), Table 4: 1976-86 GMM-SYS
dpanel 1 ; n X const ; GMM(w,2,99) GMM(k,2,99) \
GMMlevel(w,1,1) GMMlevel(k,1,1) --time --two-step --dpd --system
Note the use of the --system option ag to specify GMM-SYS, including the default treatment of
the dependent variable, which corresponds to GMMlevel(n,1,1). In this case we also want to
use lagged dierences of the regressors w and k as instruments for the levels equations so we
need explicit GMMlevel entries for those variables. If you want something other than the default
treatment for the dependent variable as an instrument for the levels equations, you should give an
explicit GMMlevel specication for that variableand in that case the --system ag is redundant
(but harmless).
For the sake of completeness, note that if you specify at least one GMMlevel term, dpanel will then
include equations in levels, but it will not automatically add a default GMMlevel specication for
the dependent variable unless the --system option is given.
17.4 Cross-country growth example
The previous examples all used the ArellanoBond dataset; for this example we use the dataset
CEL.gdt, which is also included in the gretl distribution. As with the ArellanoBond data, there
are numerous missing values. Details of the provenance of the data can be found by opening the
dataset information window in the gretl GUI (Data menu, Dataset info item). This is a subset of the
BarroLee 138-country panel dataset, an approximation to which is used in Caselli, Esquivel and
Chapter 17. Dynamic panel models 136
Lefort (1996) and Bond, Hoeer and Temple (2001).
7
Both of these papers explore the dynamic
panel-data approach in relation to the issues of growth and convergence of per capita income across
countries.
The dependent variable is growth in real GDP per capita over successive ve-year periods; the
regressors are the log of the initial (ve years prior) value of GDP per capita, the log-ratio of in-
vestment to GDP, s, in the prior ve years, and the log of annual average population growth, n,
over the prior ve years plus 0.05 as stand-in for the rate of technical progress, g, plus the rate of
depreciation, (with the last two terms assumed to be constant across both countries and periods).
The original model is
5
y
it
= y
i,t5
+s
it
+(n
it
+g +) +
t
+
i
+
it
(17.7)
which allows for a time-specic disturbance
t
. The Solow model with CobbDouglas production
function implies that = , but this assumption is not imposed in estimation. The time-specic
disturbance is eliminated by subtracting the period mean from each of the series.
Equation (17.7) can be transformed to an AR(1) dynamic panel-data model by adding y
i,t5
to both
sides, which gives
y
it
= (1 +)y
i,t5
+s
it
+(n
it
+g +) +
i
+
it
(17.8)
where all variables are now assumed to be time-demeaned.
In (rough) replication of Bond et al. (2001) we now proceed to estimate the following two models:
(a) equation (17.8) via GMM-DIF, using as instruments the second and all longer lags of y
it
, s
it
and
n
it
+ g + ; and (b) equation (17.8) via GMM-SYS, using y
i,t1
, s
i,t1
and (n
i,t1
+ g + ) as
additional instruments in the levels equations. We report robust standard errors throughout. (As a
purely notational matter, we now use t 1 to refer to values ve years prior to t, as in Bond et al.
(2001)).
The gretl script to do this job is shown below. Note that the nal transformed versions of the
variables (logs, with time-means subtracted) are named ly (y
it
), linv (s
it
) and lngd (n
it
+g +).
open CEL.gdt
ngd = n + 0.05
ly = log(y)
linv = log(s)
lngd = log(ngd)
# take out time means
loop i=1..8 --quiet
smpl (time == i) --restrict --replace
ly -= mean(ly)
linv -= mean(linv)
lngd -= mean(lngd)
endloop
smpl --full
list X = linv lngd
# 1-step GMM-DIF
dpanel 1 ; ly X ; GMM(linv,2,99) GMM(lngd,2,99)
# 2-step GMM-DIF
dpanel 1 ; ly X ; GMM(linv,2,99) GMM(lngd,2,99) --two-step
# GMM-SYS
dpanel 1 ; ly X ; GMM(linv,2,99) GMM(lngd,2,99) \
GMMlevel(linv,1,1) GMMlevel(lngd,1,1) --two-step --sys
7
We say an approximation because we have not been able to replicate exactly the OLS results reported in the papers
cited, though it seems from the description of the data in Caselli et al. (1996) that we ought to be able to do so. We note
that Bond et al. (2001) used data provided by Professor Caselli yet did not manage to reproduce the latters results.
Chapter 17. Dynamic panel models 137
For comparison we estimated the same two models using Ox/DPD and the Stata command xtabond2.
(In each case we constructed a comma-separated values dataset containing the data as transformed
in the gretl script shown above, using a missing-value code appropriate to the target program.) For
reference, the commands used with Stata are reproduced below:
insheet using CEL.csv
tsset unit time
xtabond2 ly L.ly linv lngd, gmm(L.ly, lag(1 99)) gmm(linv, lag(2 99))
gmm(lngd, lag(2 99)) rob nolev
xtabond2 ly L.ly linv lngd, gmm(L.ly, lag(1 99)) gmm(linv, lag(2 99))
gmm(lngd, lag(2 99)) rob nolev twostep
xtabond2 ly L.ly linv lngd, gmm(L.ly, lag(1 99)) gmm(linv, lag(2 99))
gmm(lngd, lag(2 99)) rob nocons twostep
For the GMM-DIF model all three programs nd 382 usable observations and 30 instruments, and
yield identical parameter estimates and robust standard errors (up to the number of digits printed,
or more); see Table 17.1.
8
1-step 2-step
coe std. error coe std. error
ly(-1) 0.577564 0.1292 0.610056 0.1562
linv 0.0565469 0.07082 0.100952 0.07772
lngd 0.143950 0.2753 0.310041 0.2980
Table 17.1: GMM-DIF: BarroLee data
Results for GMM-SYS estimation are shown in Table 17.2. In this case we show two sets of gretl
results: those labeled gretl(1) were obtained using gretls --dpdstyle option, while those labeled
gretl(2) did not use that optionthe intent being to reproduce the H matrices used by Ox/DPD
and xtabond2 respectively.
gretl(1) Ox/DPD gretl(2) xtabond2
ly(-1) 0.9237 (0.0385) 0.9167 (0.0373) 0.9073 (0.0370) 0.9073 (0.0370)
linv 0.1592 (0.0449) 0.1636 (0.0441) 0.1856 (0.0411) 0.1856 (0.0411)
lngd 0.2370 (0.1485) 0.2178 (0.1433) 0.2355 (0.1501) 0.2355 (0.1501)
Table 17.2: 2-step GMM-SYS: BarroLee data (standard errors in parentheses)
In this case all three programs use 479 observations; gretl and xtabond2 use 41 instruments and
produce the same estimates (when using the same H matrix) while Ox/DPD nominally uses 66.
9
It is noteworthy that with GMM-SYS plus messy missing observations, the results depend on the
precise array of instruments used, which in turn depends on the details of the implementation of
the estimator.
Auxiliary test statistics
We have concentrated above on the parameter estimates and standard errors. It may be worth
adding a few words on the additional test statistics that typically accompany both GMM-DIF and
8
The coecient shown for ly(-1) in the Tables is that reported directly by the software; for comparability with the
original model (eq. 17.7) it is necesary to subtract 1, which produces the expected negative value indicating conditional
convergence in per capita income.
9
This is a case of the issue described in section 17.1: the full A matrix turns out to be singular and special measures
must be taken to produce estimates.
Chapter 17. Dynamic panel models 138
GMM-SYS estimation. These include the Sargan test for overidentication, one or more Wald tests
for the joint signicance of the regressors, and time dummies if applicable, and tests for rst- and
second-order autocorrelation of the residuals from the equations in dierences.
In general we see a good level of agreement between gretl, DPD and xtabond2 with regard to these
statistics, with a few relatively minor exceptions. Specically, xtabond2 computes both a Sargan
test and a Hansen test for overidentication, but what it calls the Hansen test is what DPD and
gretl call the Sargan test. (We have had diculty determining from the xtabond2 documentation
(Roodman, 2006) exactly how its Sargan test is computed.) In addition there are cases where the
degrees of freedom for the Sargan test dier between DPD and gretl; this occurs when the A matrix
is singular (section 17.1). In concept the df equals the number of instruments minus the number
of parameters estimated; for the rst of these terms gretl uses the rank of A, while DPD appears to
use the full dimension of A.
17.5 Memo: dpanel options
ag eect
--asymptotic Suppresses the use of robust standard errors
--two-step Calls for 2-step estimation (the default being 1-step)
--system Calls for GMM-SYS, with default treatment of the dependent variable,
as in GMMlevel(y,1,1)
--time-dummies Includes period-specic dummy variables
--dpdstyle Compute the H matrix as in DPD; also suppresses dierencing of
automatic time dummies and omission of intercept in the GMM-DIF
case
--verbose When --two-step is selected, prints the 1-step estimates rst
--vcv Calls for printing of the covariance matrix
--quiet Suppresses the printing of results
Chapter 18
Nonlinear least squares
18.1 Introduction and examples
Gretl supports nonlinear least squares (NLS) using a variant of the LevenbergMarquardt algorithm.
The user must supply a specication of the regression function; prior to giving this specication the
parameters to be estimated must be declared and given initial values. Optionally, the user may
supply analytical derivatives of the regression function with respect to each of the parameters.
If derivatives are not given, the user must instead give a list of the parameters to be estimated
(separated by spaces or commas), preceded by the keyword params. The tolerance (criterion for
terminating the iterative estimation procedure) can be adjusted using the set command.
The syntax for specifying the function to be estimated is the same as for the genr command. Here
are two examples, with accompanying derivatives.
# Consumption function from Greene
nls C = alpha + beta * Y^gamma
deriv alpha = 1
deriv beta = Y^gamma
deriv gamma = beta * Y^gamma * log(Y)
end nls
# Nonlinear function from Russell Davidson
nls y = alpha + beta * x1 + (1/beta) * x2
deriv alpha = 1
deriv beta = x1 - x2/(beta*beta)
end nls --vcv
Note the command words nls (which introduces the regression function), deriv (which introduces
the specication of a derivative), and end nls, which terminates the specication and calls for
estimation. If the --vcv ag is appended to the last line the covariance matrix of the parameter
estimates is printed.
18.2 Initializing the parameters
The parameters of the regression function must be given initial values prior to the nls command.
This can be done using the genr command (or, in the GUI program, via the menu item Variable,
Dene new variable).
In some cases, where the nonlinear function is a generalization of (or a restricted form of) a linear
model, it may be convenient to run an ols and initialize the parameters from the OLS coecient
estimates. In relation to the rst example above, one might do:
ols C 0 Y
genr alpha = $coeff(0)
genr beta = $coeff(Y)
genr gamma = 1
And in relation to the second example one might do:
139
Chapter 18. Nonlinear least squares 140
ols y 0 x1 x2
genr alpha = $coeff(0)
genr beta = $coeff(x1)
18.3 NLS dialog window
It is probably most convenient to compose the commands for NLS estimation in the form of a
gretl script but you can also do so interactively, by selecting the item Nonlinear Least Squares
under the Model, Nonlinear models menu. This opens a dialog box where you can type the
function specication (possibly prefaced by genr lines to set the initial parameter values) and the
derivatives, if available. An example of this is shown in Figure 18.1. Note that in this context you
do not have to supply the nls and end nls tags.
Figure 18.1: NLS dialog box
18.4 Analytical and numerical derivatives
If you are able to gure out the derivatives of the regression function with respect to the para-
meters, it is advisable to supply those derivatives as shown in the examples above. If that is not
possible, gretl will compute approximate numerical derivatives. However, the properties of the NLS
algorithm may not be so good in this case (see section 18.7).
This is done by using the params statement, which should be followed by a list of identiers
containing the parameters to be estimated. In this case, the examples above would read as follows:
# Greene
nls C = alpha + beta * Y^gamma
params alpha beta gamma
end nls
# Davidson
nls y = alpha + beta * x1 + (1/beta) * x2
params alpha beta
end nls
If analytical derivatives are supplied, they are checked for consistency with the given nonlinear
function. If the derivatives are clearly incorrect estimation is aborted with an error message. If the
Chapter 18. Nonlinear least squares 141
derivatives are suspicious a warning message is issued but estimation proceeds. This warning
may sometimes be triggered by incorrect derivatives, but it may also be triggered by a high degree
of collinearity among the derivatives.
Note that you cannot mix analytical and numerical derivatives: you should supply expressions for
all of the derivatives or none.
18.5 Controlling termination
The NLS estimation procedure is an iterative process. Iteration is terminated when the criterion for
convergence is met or when the maximum number of iterations is reached, whichever comes rst.
Let k denote the number of parameters being estimated. The maximum number of iterations is
100 (k + 1) when analytical derivatives are given, and 200 (k + 1) when numerical derivatives
are used.
Let denote a small number. The iteration is deemed to have converged if at least one of the
following conditions is satised:
Both the actual and predicted relative reductions in the error sum of squares are at most .
The relative error between two consecutive iterates is at most .
This default value of is the machine precision to the power 3/4,
1
but it can be adjusted using the
set command with the parameter nls_toler. For example
set nls_toler .0001
will relax the value of to 0.0001.
18.6 Details on the code
The underlying engine for NLS estimation is based on the minpack suite of functions, available
from netlib.org. Specically, the following minpack functions are called:
lmder LevenbergMarquardt algorithm with analytical derivatives
chkder Check the supplied analytical derivatives
lmdif LevenbergMarquardt algorithm with numerical derivatives
fdjac2 Compute nal approximate Jacobian when using numerical derivatives
dpmpar Determine the machine precision
On successful completion of the LevenbergMarquardt iteration, a GaussNewton regression is used
to calculate the covariance matrix for the parameter estimates. If the --robust ag is given a
robust variant is computed. The documentation for the set command explains the specic options
available in this regard.
Since NLS results are asymptotic, there is room for debate over whether or not a correction for
degrees of freedom should be applied when calculating the standard error of the regression (and
the standard errors of the parameter estimates). For comparability with OLS, and in light of the
reasoning given in Davidson and MacKinnon (1993), the estimates shown in gretl do use a degrees
of freedom correction.
1
On a 32-bit Intel Pentium machine a likely value for this parameter is 1.82 10
12
.
Chapter 18. Nonlinear least squares 142
18.7 Numerical accuracy
Table 18.1 shows the results of running the gretl NLS procedure on the 27 Statistical Reference
Datasets made available by the U.S. National Institute of Standards and Technology (NIST) for test-
ing nonlinear regression software.
2
For each dataset, two sets of starting values for the parameters
are given in the test les, so the full test comprises 54 runs. Two full tests were performed, one
using all analytical derivatives and one using all numerical approximations. In each case the default
tolerance was used.
3
Out of the 54 runs, gretl failed to produce a solution in 4 cases when using analytical derivatives,
and in 5 cases when using numeric approximation. Of the four failures in analytical derivatives
mode, two were due to non-convergence of the LevenbergMarquardt algorithm after the maximum
number of iterations (on MGH09 and Bennett5, both described by NIST as of Higher diculty) and
two were due to generation of range errors (out-of-bounds oating point values) when computing
the Jacobian (on BoxBOD and MGH17, described as of Higher diculty and Average diculty
respectively). The additional failure in numerical approximation mode was on MGH10 (Higher di-
culty, maximum number of iterations reached).
The table gives information on several aspects of the tests: the number of outright failures, the
average number of iterations taken to produce a solution and two sorts of measure of the accuracy
of the estimates for both the parameters and the standard errors of the parameters.
For each of the 54 runs in each mode, if the run produced a solution the parameter estimates
obtained by gretl were compared with the NIST certied values. We dene the minimum correct
gures for a given run as the number of signicant gures to which the least accurate gretl esti-
mate agreed with the certied value, for that run. The table shows both the average and the worst
case value of this variable across all the runs that produced a solution. The same information is
shown for the estimated standard errors.
4
The second measure of accuracy shown is the percentage of cases, taking into account all parame-
ters from all successful runs, in which the gretl estimate agreed with the certied value to at least
the 6 signicant gures which are printed by default in the gretl regression output.
Using analytical derivatives, the worst case values for both parameters and standard errors were
improved to 6 correct gures on the test machine when the tolerance was tightened to 1.0e14.
Using numerical derivatives, the same tightening of the tolerance raised the worst values to 5
correct gures for the parameters and 3 gures for standard errors, at a cost of one additional
failure of convergence.
Note the overall superiority of analytical derivatives: on average solutions to the test problems
were obtained with substantially fewer iterations and the results were more accurate (most notably
for the estimated standard errors). Note also that the six-digit results printed by gretl are not 100
percent reliable for dicult nonlinear problems (in particular when using numerical derivatives).
Having registered this caveat, the percentage of cases where the results were good to six digits or
better seems high enough to justify their printing in this form.
2
For a discussion of gretls accuracy in the estimation of linear models, see Appendix D.
3
The data shown in the table were gathered from a pre-release build of gretl version 1.0.9, compiled with gcc 3.3,
linked against glibc 2.3.2, and run under Linux on an i686 PC (IBM ThinkPad A21m).
4
For the standard errors, I excluded one outlier from the statistics shown in the table, namely Lanczos1. This is an odd
case, using generated data with an almost-exact t: the standard errors are 9 or 10 orders of magnitude smaller than the
coecients. In this instance gretl could reproduce the certied standard errors to only 3 gures (analytical derivatives)
and 2 gures (numerical derivatives).
Chapter 18. Nonlinear least squares 143
Table 18.1: Nonlinear regression: the NIST tests
Analytical derivatives Numerical derivatives
Failures in 54 tests 4 5
Average iterations 32 127
Mean of min. correct gures, 8.120 6.980
parameters
Worst of min. correct gures, 4 3
parameters
Mean of min. correct gures, 8.000 5.673
standard errors
Worst of min. correct gures, 5 2
standard errors
Percent correct to at least 6 gures, 96.5 91.9
parameters
Percent correct to at least 6 gures, 97.7 77.3
standard errors
Chapter 19
Maximum likelihood estimation
19.1 Generic ML estimation with gretl
Maximum likelihood estimation is a cornerstone of modern inferential procedures. Gretl provides
a way to implement this method for a wide range of estimation problems, by use of the mle com-
mand. We give here a few examples.
To give a foundation for the examples that follow, we start from a brief reminder on the basics
of ML estimation. Given a sample of size T, it is possible to dene the density function
1
for the
whole sample, namely the joint distribution of all the observations f(Y; ), where Y =
_
y
1
, . . . , y
T
_
.
Its shape is determined by a k-vector of unknown parameters , which we assume is contained in
a set , and which can be used to evaluate the probability of observing a sample with any given
characteristics.
After observing the data, the values Y are given, and this function can be evaluated for any legiti-
mate value of . In this case, we prefer to call it the likelihood function; the need for another name
stems from the fact that this function works as a density when we use the y
t
s as arguments and
as parameters, whereas in this context is taken as the functions argument, and the data Y only
have the role of determining its shape.
In standard cases, this function has a unique maximum. The location of the maximum is unaected
if we consider the logarithm of the likelihood (or log-likelihood for short): this function will be
denoted as
() = logf(Y; )
The log-likelihood functions that gretl can handle are those where () can be written as
() =
T
_
t=1
t
()
which is true in most cases of interest. The functions
t
() are called the log-likelihood contribu-
tions.
Moreover, the location of the maximum is obviously determined by the data Y. This means that the
value
(Y) =Argmax
() (19.1)
is some function of the observed data (a statistic), which has the property, under mild conditions,
of being a consistent, asymptotically normal and asymptotically ecient estimator of .
Sometimes it is possible to write down explicitly the function
(Y); in general, it need not be so. In
these circumstances, the maximum can be found by means of numerical techniques. These often
rely on the fact that the log-likelihood is a smooth function of , and therefore on the maximum
its partial derivatives should all be 0. The gradient vector, or score vector, is a function that enjoys
many interesting statistical properties in its own right; it will be denoted here as g(). It is a
1
We are supposing here that our data are a realization of continuous random variables. For discrete random variables,
everything continues to apply by referring to the probability function instead of the density. In both cases, the distribution
may be conditional on some exogenous variables.
144
Chapter 19. Maximum likelihood estimation 145
k-vector with typical element
g
i
() =
()
i
=
T
_
t=1
t
()
i
Gradient-based methods can be shortly illustrated as follows:
1. pick a point
0
;
2. evaluate g(
0
);
3. if g(
0
) is small, stop. Otherwise, compute a direction vector d(g(
0
));
4. evaluate
1
=
0
+d(g(
0
));
5. substitute
0
with
1
;
6. restart from 2.
Many algorithms of this kind exist; they basically dier from one another in the way they compute
the direction vector d(g(
0
)), to ensure that (
1
) > (
0
) (so that we eventually end up on the
maximum).
The method gretl uses to maximize the log-likelihood is a gradient-based algorithm known as the
BFGS (Broyden, Fletcher, Goldfarb and Shanno) method. This technique is used in most econometric
and statistical packages, as it is well-established and remarkably powerful. Clearly, in order to make
this technique operational, it must be possible to compute the vector g() for any value of . In
some cases this vector can be written explicitly as a function of Y. If this is not possible or too
dicult the gradient may be evaluated numerically.
The choice of the starting value,
0
, is crucial in some contexts and inconsequential in others. In
general, however, it is advisable to start the algorithm from sensible values whenever possible. If
a consistent estimator is available, this is usually a safe and ecient choice: this ensures that in
large samples the starting point will be likely close to
and convergence can be achieved in few
iterations.
The maximum number of iterations allowed for the BFGS procedure, and the relative tolerance
for assessing convergence, can be adjusted using the set command: the relevant variables are
bfgs_maxiter (default value 500) and bfgs_toler (default value, the machine precision to the
power 3/4).
Covariance matrix and standard errors
By default the covariance matrix of the parameter estimates is based on the Outer Product of the
Gradient. That is,
Var
OPG
(
) =
_
G
)G(
)
_
1
where G(
) is the T k matrix of contributions to the gradient. Two other options are available. If
the --hessian ag is given, the covariance matrix is computed from a numerical approximation to
the Hessian at convergence. If the --robust option is selected, the quasi-ML sandwich estimator
is used:
Var
QML
(
) = H(
)
1
G
)G(
)H(
)
1
where H denotes the numerical approximation to the Hessian.
Chapter 19. Maximum likelihood estimation 146
19.2 Gamma estimation
Suppose we have a sample of T independent and identically distributed observations froma Gamma
distribution. The density function for each observation x
t
is
f(x
t
) =
p
(p)
x
p1
t
exp(x
t
) (19.2)
The log-likelihood for the entire sample can be written as the logarithm of the joint density of all
the observations. Since these are independent and identical, the joint density is the product of the
individual densities, and hence its log is
(, p) =
T
_
t=1
log
_
p
(p)
x
p1
t
exp(x
t
)
_
=
T
_
t=1
t
(19.3)
where
t
= p log(x
t
) (p) logx
t
x
t
and () is the log of the gamma function. In order to estimate the parameters and p via ML, we
need to maximize (19.3) with respect to them. The corresponding gretl code snippet is
scalar alpha = 1
scalar p = 1
mle logl = p*ln(alpha * x) - lngamma(p) - ln(x) - alpha * x
params alpha p
end mle
The rst two statements
alpha = 1
p = 1
are necessary to ensure that the variables alpha and p exist before the computation of logl is
attempted. Inside the mle block these variables (which could be either scalars, vectors or a com-
bination of the two see below for an example) are identied as the parameters that should be
adjusted to maximize the likelihood via the params keyword. Their values will be changed by the
execution of the mle command; upon successful completion, they will be replaced by the ML esti-
mates. The starting value is 1 for both; this is arbitrary and does not matter much in this example
(more on this later).
The above code can be made more readable, and marginally more ecient, by dening a variable
to hold x
t
. This command can be embedded in the mle block as follows:
mle logl = p*ln(ax) - lngamma(p) - ln(x) - ax
series ax = alpha*x
params alpha p
end mle
The variable ax is not added to the params list, of course, since it is just an auxiliary variable to
facilitate the calculations. You can insert as many such auxiliary lines as you require before the
params line, with the restriction that they must contain either (a) commands to generate series,
scalars or matrices or (b) print commands (which may be used to aid in debugging).
In a simple example like this, the choice of the starting values is almost inconsequential; the algo-
rithm is likely to converge no matter what the starting values are. However, consistent method-of-
moments estimators of p and can be simply recovered from the sample mean m and variance V:
since it can be shown that
E(x
t
) = p/ V(x
t
) = p/
2
Chapter 19. Maximum likelihood estimation 147
it follows that the following estimators
= m/V
p = m
are consistent, and therefore suitable to be used as starting point for the algorithm. The gretl script
code then becomes
scalar m = mean(x)
scalar alpha = m/var(x)
scalar p = m*alpha
mle logl = p*ln(ax) - lngamma(p) - ln(x) - ax
series ax = alpha*x
params alpha p
end mle
Another thing to note is that sometimes parameters are constrained within certain boundaries: in
this case, for example, both and p must be positive numbers. Gretl does not check for this: it
is the users responsibility to ensure that the function is always evaluated at an admissible point
in the parameter space during the iterative search for the maximum. An eective technique is to
dene a variable for checking that the parameters are admissible and setting the log-likelihood as
undened if the check fails. An example, which uses the conditional assignment operator, follows:
scalar m = mean(x)
scalar alpha = m/var(x)
scalar p = m*alpha
mle logl = check ? p*ln(ax) - lngamma(p) - ln(x) - ax : NA
series ax = alpha*x
scalar check = (alpha>0) && (p>0)
params alpha p
end mle
19.3 Stochastic frontier cost function
When modeling a cost function, it is sometimes worthwhile to incorporate explicitly into the sta-
tistical model the notion that rms may be inecient, so that the observed cost deviates from the
theoretical gure not only because of unobserved heterogeneity between rms, but also because
two rms could be operating at a dierent eciency level, despite being identical under all other
respects. In this case we may write
C
i
= C
i
+u
i
+v
i
where C
i
is some variable cost indicator, C
i
is its theoretical value, u
i
is a zero-mean disturbance
term and v
i
is the ineciency term, which is supposed to be nonnegative by its very nature.
A linear specication for C
i
is often chosen. For example, the CobbDouglas cost function arises
when C
i
is a linear function of the logarithms of the input prices and the output quantities.
The stochastic frontier model is a linear model of the form y
i
= x
i
+
i
in which the error term
i
is the sum of u
i
and v
i
. A common postulate is that u
i
N(0,
2
u
) and v
i
N(0,
2
v
)
. If
independence between u
i
and v
i
is also assumed, then it is possible to show that the density
function of
i
has the form:
f(
i
) =
_
1
_
(19.4)
where () and () are, respectively, the distribution and density function of the standard normal,
=
_
2
u
+
2
v
and =
u
v
.
Chapter 19. Maximum likelihood estimation 148
As a consequence, the log-likelihood for one observation takes the form (apart form an irrelevant
constant)
t
= log
_
_
log() +
2
i
2
2
_
Therefore, a CobbDouglas cost function with stochastic frontier is the model described by the
following equations:
logC
i
= logC
i
+
i
logC
i
= c +
m
_
j=1
j
logy
ij
+
n
_
j=1
j
logp
ij
i
= u
i
+v
i
u
i
N(0,
2
u
)
v
i
N(0,
2
v
)
In most cases, one wants to ensure that the homogeneity of the cost function with respect to
the prices holds by construction. Since this requirement is equivalent to
n
j=1
j
= 1, the above
equation for C
i
can be rewritten as
logC
i
logp
in
= c +
m
_
j=1
j
logy
ij
+
n
_
j=2
j
(logp
ij
logp
in
) +
i
(19.5)
The above equation could be estimated by OLS, but it would suer from two drawbacks: rst,
the OLS estimator for the intercept c is inconsistent because the disturbance term has a non-zero
expected value; second, the OLS estimators for the other parameters are consistent, but inecient
in view of the non-normality of
i
. Both issues can be addressed by estimating (19.5) by maximum
likelihood. Nevertheless, OLS estimation is a quick and convenient way to provide starting values
for the MLE algorithm.
Example 19.1 shows how to implement the model described so far. The banks91 le contains part
of the data used in Lucchetti, Papi and Zazzaro (2001).
The script in example 19.1 is relatively easy to modify to show how one can use vectors (that is,
1-dimensional matrices) for storing the parameters to optimize: example 19.2 holds essentially the
same script in which the parameters of the cost function are stored together in a vector. Of course,
this makes also possible to use variable lists and other renements which make the code more
compact and readable.
19.4 GARCH models
GARCH models are handled by gretl via a native function. However, it is instructive to see how they
can be estimated through the mle command.
The following equations provide the simplest example of a GARCH(1,1) model:
y
t
= +
t
t
= u
t
t
u
t
N(0, 1)
h
t
= +
2
t1
+h
t1
.
Since the variance of y
t
depends on past values, writing down the log-likelihood function is not
simply a matter of summing the log densities for individual observations. As is common in time
series models, y
t
cannot be considered independent of the other observations in our sample, and
consequently the density function for the whole sample (the joint density for all observations) is
not just the product of the marginal densities.
Chapter 19. Maximum likelihood estimation 149
Example 19.1: Estimation of stochastic frontier cost function (with scalar parameters)
open banks91
# Cobb-Douglas cost function
ols cost const y p1 p2 p3
# Cobb-Douglas cost function with homogeneity restrictions
genr rcost = cost - p3
genr rp1 = p1 - p3
genr rp2 = p2 - p3
ols rcost const y rp1 rp2
# Cobb-Douglas cost function with homogeneity restrictions
# and inefficiency
scalar b0 = $coeff(const)
scalar b1 = $coeff(y)
scalar b2 = $coeff(rp1)
scalar b3 = $coeff(rp2)
scalar su = 0.1
scalar sv = 0.1
mle logl = ln(cnorm(e*lambda/ss)) - (ln(ss) + 0.5*(e/ss)^2)
scalar ss = sqrt(su^2 + sv^2)
scalar lambda = su/sv
series e = rcost - b0*const - b1*y - b2*rp1 - b3*rp2
params b0 b1 b2 b3 su sv
end mle
Chapter 19. Maximum likelihood estimation 150
Example 19.2: Estimation of stochastic frontier cost function (with matrix parameters)
open banks91
# Cobb-Douglas cost function
ols cost const y p1 p2 p3
# Cobb-Douglas cost function with homogeneity restrictions
genr rcost = cost - p3
genr rp1 = p1 - p3
genr rp2 = p2 - p3
list X = const y rp1 rp2
ols rcost X
# Cobb-Douglas cost function with homogeneity restrictions
# and inefficiency
matrix b = $coeff
scalar su = 0.1
scalar sv = 0.1
mle logl = ln(cnorm(e*lambda/ss)) - (ln(ss) + 0.5*(e/ss)^2)
scalar ss = sqrt(su^2 + sv^2)
scalar lambda = su/sv
series e = rcost - lincomb(X, b)
params b su sv
end mle
Chapter 19. Maximum likelihood estimation 151
Maximum likelihood estimation, in these cases, is achieved by considering conditional densities, so
what we maximize is a conditional likelihood function. If we dene the information set at time t as
F
t
=
_
y
t
, y
t1
, . . .
_
,
then the density of y
t
conditional on F
t1
is normal:
y
t
F
t1
N[, h
t
] .
By means of the properties of conditional distributions, the joint density can be factorized as
follows
f(y
t
, y
t1
, . . .) =
_
_
T
t=1
f(y
t
F
t1
)
_
_
f(y
0
)
If we treat y
0
as xed, then the termf(y
0
) does not depend on the unknown parameters, and there-
fore the conditional log-likelihood can then be written as the sum of the individual contributions
as
(, , , ) =
T
_
t=1
t
(19.6)
where
t
= log
_
1
_
h
t
_
y
t
_
h
t
__
=
1
2
_
log(h
t
) +
(y
t
)
2
h
t
_
The following script shows a simple application of this technique, which uses the data le djclose;
it is one of the example dataset supplied with gretl and contains daily data from the Dow Jones
stock index.
open djclose
series y = 100*ldiff(djclose)
scalar mu = 0.0
scalar omega = 1
scalar alpha = 0.4
scalar beta = 0.0
mle ll = -0.5*(log(h) + (e^2)/h)
series e = y - mu
series h = var(y)
series h = omega + alpha*(e(-1))^2 + beta*h(-1)
params mu omega alpha beta
end mle
19.5 Analytical derivatives
Computation of the score vector is essential for the working of the BFGS method. In all the previous
examples, no explicit formula for the computation of the score was given, so the algorithm was fed
numerically evaluated gradients. Numerical computation of the score for the i-th parameter is
performed via a nite approximation of the derivative, namely
(
1
, . . . ,
n
)
i
=
(
1
, . . . ,
i
+h, . . . ,
n
) (
1
, . . . ,
i
h, . . . ,
n
)
2h
where h is a small number.
In many situations, this is rather ecient and accurate. However, one might want to avoid the
approximation and specify an exact function for the derivatives. As an example, consider the
following script:
Chapter 19. Maximum likelihood estimation 152
nulldata 1000
genr x1 = normal()
genr x2 = normal()
genr x3 = normal()
genr ystar = x1 + x2 + x3 + normal()
genr y = (ystar > 0)
scalar b0 = 0
scalar b1 = 0
scalar b2 = 0
scalar b3 = 0
mle logl = y*ln(P) + (1-y)*ln(1-P)
series ndx = b0 + b1*x1 + b2*x2 + b3*x3
series P = cnorm(ndx)
params b0 b1 b2 b3
end mle --verbose
Here, 1000 data points are articially generated for an ordinary probit model:
2
y
t
is a binary
variable, which takes the value 1 if y
t
=
1
x
1t
+
2
x
2t
+
3
x
3t
+
t
> 0 and 0 otherwise. Therefore,
y
t
= 1 with probability (
1
x
1t
+
2
x
2t
+
3
x
3t
) =
t
. The probability function for one observation
can be written as
P(y
t
) =
y
t
t
(1
t
)
1y
t
Since the observations are independent and identically distributed, the log-likelihood is simply the
sum of the individual contributions. Hence
=
T
_
t=1
y
t
log(
t
) +(1 y
t
) log(1
t
)
The --verbose switch at the end of the end mle statement produces a detailed account of the
iterations done by the BFGS algorithm.
In this case, numerical dierentiation works rather well; nevertheless, computation of the analytical
score is straightforward, since the derivative
i
can be written as
i
=
i
via the chain rule, and it is easy to see that
t
=
y
t
1 y
t
1
t
i
= (
1
x
1t
+
2
x
2t
+
3
x
3t
) x
it
The mle block in the above script can therefore be modied as follows:
mle logl = y*ln(P) + (1-y)*ln(1-P)
series ndx = b0 + b1*x1 + b2*x2 + b3*x3
series P = cnorm(ndx)
series tmp = dnorm(ndx)*(y/P - (1-y)/(1-P))
deriv b0 = tmp
deriv b1 = tmp*x1
2
Again, gretl does provide a native probit command (see section 29.1), but a probit model makes for a nice example
here.
Chapter 19. Maximum likelihood estimation 153
deriv b2 = tmp*x2
deriv b3 = tmp*x3
end mle --verbose
Note that the params statement has been replaced by a series of deriv statements; these have the
double function of identifying the parameters over which to optimize and providing an analytical
expression for their respective score elements.
19.6 Debugging ML scripts
We have discussed above the main sorts of statements that are permitted within an mle block,
namely
auxiliary commands to generate helper variables;
deriv statements to specify the gradient with respect to each of the parameters; and
a params statement to identify the parameters in case analytical derivatives are not given.
For the purpose of debugging ML estimators one additional sort of statement is allowed: you can
print the value of a relevant variable at each step of the iteration. This facility is more restricted
then the regular print command. The command word print should be followed by the name of
just one variable (a scalar, series or matrix).
In the last example above a key variable named tmp was generated, forming the basis for the
analytical derivatives. To track the progress of this variable one could add a print statement within
the ML block, as in
series tmp = dnorm(ndx)*(y/P - (1-y)/(1-P))
print tmp
19.7 Using functions
The mle command allows you to estimate models that gretl does not provide natively: in some
cases, it may be a good idea to wrap up the mle block in a user-dened function (see Chapter 10),
so as to extend gretls capabilities in a modular and exible way.
As an example, we will take a simple case of a model that gretl does not yet provide natively:
the zero-inated Poisson model, or ZIP for short.
3
In this model, we assume that we observe a
mixed population: for some individuals, the variable y
t
is (conditionally on a vector of exogenous
covariates x
t
) distributed as a Poisson random variate; for some others, y
t
is identically 0. The
trouble is, we dont know which category a given individual belongs to.
For instance, suppose we have a sample of women, and the variable y
t
represents the number of
children that woman t has. There may be a certain proportion, , of women for whom y
t
= 0 with
certainty (maybe out of a personal choice, or due to physical impossibility). But there may be other
women for whom y
t
= 0 just as a matter of chance they havent happened to have any children
at the time of observation.
In formulae:
P(y
t
= kx
t
) = d
t
+(1 )
_
e
y
t
t
y
t
!
_
t
= exp(x
t
)
d
t
=
_
1 for y
t
= 0
0 for y
t
> 0
3
The actual ZIP model is in fact a bit more general than the one presented here. The specialized version discussed in
this section was chosen for the sake of simplicity. For futher details, see Greene (2003).
Chapter 19. Maximum likelihood estimation 154
Writing a mle block for this model is not dicult:
mle ll = logprob
series xb = exp(b0 + b1 * x)
series d = (y=0)
series poiprob = exp(-xb) * xb^y / gamma(y+1)
series logprob = (alpha>0) && (alpha<1) ? \
log(alpha*d + (1-alpha)*poiprob) : NA
params alpha b0 b1
end mle -v
However, the code above has to be modied each time we change our specication by, say, adding
an explanatory variable. Using functions, we can simplify this task considerably and eventually be
able to write something easy like
list X = const x
zip(y, X)
Example 19.3: Zero-inated Poisson Model user-level function
/*
user-level function: estimate the model and print out
the results
*/
function void zip(series y, list X)
matrix coef_stde = zip_estimate(y, X)
printf "\nZero-inflated Poisson model:\n"
string parnames = "alpha,"
string parnames += varname(X)
modprint coef_stde parnames
end function
Lets see how this can be done. First we need to dene a function called zip() that will take two ar-
guments: a dependent variable y and a list of explanatory variables X. An example of such function
can be seen in script 19.3. By inspecting the function code, you can see that the actual estimation
does not happen here: rather, the zip() function merely uses the built-in modprint command to
print out the results coming from another user-written function, namely zip_estimate().
The function zip_estimate() is not meant to be executed directly; it just contains the number-
crunching part of the job, whose results are then picked up by the end function zip(). In turn,
zip_estimate() calls other user-written functions to perform other tasks. The whole set of in-
ternal functions is shown in the panel 19.4.
All the functions shown in 19.3 and 19.4 can be stored in a separate inp le and executed once, at
the beginning of our job, by means of the include command. Assuming the name of this script le
is zip_est.inp, the following is an example script which (a) includes the script le, (b) generates a
simulated dataset, and (c) performs the estimation of a ZIP model on the articial data.
set echo off
set messages off
# include the user-written functions
include zip_est.inp
Chapter 19. Maximum likelihood estimation 155
Example 19.4: Zero-inated Poisson Model internal functions
/* compute log probabilities for the plain Poisson model */
function series ln_poi_prob(series y, list X, matrix beta)
series xb = lincomb(X, beta)
return -exp(xb) + y*xb - lngamma(y+1)
end function
/* compute log probabilities for the zero-inflated Poisson model */
function series ln_zip_prob(series y, list X, matrix beta, scalar p0)
# check if the probability is in [0,1]; otherwise, return NA
if (p0>1) || (p0<0)
series ret = NA
else
series ret = ln_poi_prob(y, X, beta) + ln(1-p0)
series ret = (y=0) ? ln(p0 + exp(ret)) : ret
endif
return ret
end function
/* do the actual estimation (silently) */
function matrix zip_estimate(series y, list X)
# initialize alpha to a "sensible" value: half the frequency
# of zeros in the sample
scalar alpha = mean(y=0)/2
# initialize the coeffs (we assume the first explanatory
# variable is the constant here)
matrix coef = zeros(nelem(X), 1)
coef[1] = mean(y) / (1-alpha)
# do the actual ML estimation
mle ll = ln_zip_prob(y, X, coef, alpha)
params alpha coef
end mle --hessian --quiet
return $coeff ~ $stderr
end function
Chapter 19. Maximum likelihood estimation 156
# generate the artificial data
nulldata 1000
set seed 732237
scalar truep = 0.2
scalar b0 = 0.2
scalar b1 = 0.5
series x = normal()
series y = (uniform()<truep) ? 0 : genpois(exp(b0 + b1*x))
list X = const x
# estimate the zero-inflated Poisson model
zip(y, X)
The results are as follows:
Zero-inflated Poisson model:
coefficient std. error z-stat p-value
-------------------------------------------------------
alpha 0.203069 0.0238035 8.531 1.45e-17 ***
const 0.257014 0.0417129 6.161 7.21e-10 ***
x 0.466657 0.0321235 14.53 8.17e-48 ***
A further step may then be creating a function package for accessing your new zip() function via
gretls graphical interface. For details on how to do this, see section 10.5.
Chapter 20
GMM estimation
20.1 Introduction and terminology
The Generalized Method of Moments (GMM) is a very powerful and general estimation method,
which encompasses practically all the parametric estimation techniques used in econometrics. It
was introduced in Hansen (1982) and Hansen and Singleton (1982); an excellent and thorough
treatment is given in chapter 17 of Davidson and MacKinnon (1993).
The basic principle on which GMM is built is rather straightforward. Suppose we wish to estimate
a scalar parameter based on a sample x
1
, x
2
, . . . , x
T
. Let
0
indicate the true value of . Theo-
retical considerations (either of statistical or economic nature) may suggest that a relationship like
the following holds:
E
_
x
t
g()
_
= 0 =
0
, (20.1)
with g() a continuous and invertible function. That is to say, there exists a function of the data
and the parameter, with the property that it has expectation zero if and only if it is evaluated at the
true parameter value. For example, economic models with rational expectations lead to expressions
like (20.1) quite naturally.
If the sampling model for the x
t
s is such that some version of the Law of Large Numbers holds,
then
X =
1
T
T
_
t=1
x
t
p
g(
0
);
hence, since g() is invertible, the statistic
= g
1
(
X)
p
0
,
so
is a consistent estimator of . A dierent way to obtain the same outcome is to choose, as an
estimator of , the value that minimizes the objective function
F() =
_
_
1
T
T
_
t=1
(x
t
g())
_
_
2
=
_
X g()
_
2
; (20.2)
the minimum is trivially reached at
= g
1
(
f, (20.5)
157
Chapter 20. GMM estimation 158
where
=Argmin
F(, W) (20.6)
is a consistent estimator of whatever the choice of W. However, to achieve maximum asymp-
totic eciency W must be proportional to the inverse of the long-run covariance matrix of the
orthogonality conditions; if W is not known, a consistent estimator will suce.
These considerations lead to the following empirical strategy:
1. Choose a positive denite W and compute the one-step GMM estimator
1
. Customary choices
for W are I
mp
or I
m
(Z
Z)
1
.
2. Use
1
to estimate V(f
i,j,t
()) and use its inverse as the weights matrix. The resulting esti-
mator
2
is called the two-step estimator.
3. Re-estimate V(f
i,j,t
()) by means of
2
and obtain
3
; iterate until convergence. Asymp-
totically, these extra steps are unnecessary, since the two-step estimator is consistent and
ecient; however, the iterated estimator often has better small-sample properties and should
be independent of the choice of W made at step 1.
In the special case when the number of parameters n is equal to the total number of orthogonality
conditions m p, the GMM estimator
is the same for any choice of the weights matrix W, so the
rst step is sucient; in this case, the objective function is 0 at the minimum.
If, on the contrary, n < m p, the second step (or successive iterations) is needed to achieve
eciency, and the estimator so obtained can be very dierent, in nite samples, from the one-
step estimator. Moreover, the value of the objective function at the minimum, suitably scaled by
the number of observations, yields Hansens J statistic; this statistic can be interpreted as a test
statistic that has a
2
distribution with m p n degrees of freedom under the null hypothesis of
correct specication. See Davidson and MacKinnon (1993, section 17.6) for details.
In the following sections we will show how these ideas are implemented in gretl through some
examples.
20.2 OLS as GMM
It is instructive to start with a somewhat contrived example: consider the linear model y
t
= x
t
+
u
t
. Although most of us are used to read it as the sum of a hazily dened systematic part plus an
equally hazy disturbance, a more rigorous interpretation of this familiar expression comes from
the hypothesis that the conditional mean E(y
t
x
t
) is linear and the denition of u
t
as y
t
E(y
t
x
t
).
From the denition of u
t
, it follows that E(u
t
x
t
) = 0. The following orthogonality condition is
therefore available:
E [f()] = 0, (20.7)
where f() = (y
t
x
t
)x
t
. The denitions given in the previous section therefore specialize here
to:
is ;
the instrument is x
t
;
f
i,j,t
() is (y
t
x
t
)x
t
= u
t
x
t
; the orthogonality condition is interpretable as the requirement
that the regressors should be uncorrelated with the disturbances;
Chapter 20. GMM estimation 159
W can be any symmetric positive denite matrix, since the number of parameters equals the
number of orthogonality conditions. Lets say we choose I.
The function F(, W) is in this case
F(, W) =
_
_
1
T
T
_
t=1
( u
t
x
t
)
_
_
2
and it is easy to see why OLS and GMM coincide here: the GMM objective function has the
same minimizer as the objective function of OLS, the residual sum of squares. Note, however,
that the two functions are not equal to one another: at the minimum, F(, W) = 0 while the
minimized sum of squared residuals is zero only in the special case of a perfect linear t.
The code snippet contained in Example 20.1 uses gretls gmm command to make the above opera-
tional.
Example 20.1: OLS via GMM
/* initialize stuff */
series e = 0
scalar beta = 0
matrix V = I(1)
/* proceed with estimation */
gmm
series e = y - x*beta
orthog e ; x
weights V
params beta
end gmm
We feed gretl the necessary ingredients for GMM estimation in a command block, starting with gmm
and ending with end gmm. After the end gmm statement two mutually exclusive options can be
specied: --two-step or --iterate, whose meaning should be obvious.
Three elements are compulsory within a gmm block:
1. one or more orthog statements
2. one weights statement
3. one params statement
The three elements should be given in the stated order.
The orthog statements are used to specify the orthogonality conditions. They must follow the
syntax
orthog x ; Z
where x may be a series, matrix or list of series and Z may also be a series, matrix or list. In
example 20.1, the series e holds the residuals and the series x holds the regressor. If x had been
a list (a matrix), the orthog statement would have generated one orthogonality condition for each
element (column) of x. Note the structure of the orthogonality condition: it is assumed that the
term to the left of the semicolon represents a quantity that depends on the estimated parameters
(and so must be updated in the process of iterative estimation), while the term on the right is a
constant function of the data.
Chapter 20. GMM estimation 160
The weights statement is used to specify the initial weighting matrix and its syntax is straightfor-
ward. Note, however, that when more than one step is required that matrix will contain the nal
weight matrix, which most likely will be dierent from its initial value.
The params statement species the parameters with respect to which the GMM criterion should be
minimized; it follows the same logic and rules as in the mle and nls commands.
The minimum is found through numerical minimization via BFGS (see chapters 28 and 19). The
progress of the optimization procedure can be observed by appending the --verbose switch to
the end gmm line. (In this example GMM estimation is clearly a rather silly thing to do, since a
closed form solution is easily given by OLS.)
20.3 TSLS as GMM
Moving closer to the proper domain of GMM, we now consider two-stage least squares (TSLS) as a
case of GMM.
TSLS is employed in the case where one wishes to estimate a linear model of the formy
t
= X
t
+u
t
,
but where one or more of the variables in the matrix X are potentially endogenous correlated
with the error term, u. We proceed by identifying a set of instruments, Z
t
, which are explanatory
for the endogenous variables in X but which are plausibly uncorrelated with u. The classic two-
stage procedure is (1) regress the endogenous elements of X on Z; then (2) estimate the equation
of interest, with the endogenous elements of X replaced by their tted values from (1).
An alternative perspective is given by GMM. We dene the residual u
t
as y
t
X
t
, as usual. But
instead of relying on E(uX) = 0 as in OLS, we base estimation on the condition E(uZ) = 0. In this
case it is natural to base the initial weighting matrix on the covariance matrix of the instruments.
Example 20.2 presents a model from Stock and Watsons Introduction to Econometrics. The demand
for cigarettes is modeled as a linear function of the logs of price and income; income is treated as
exogenous while price is taken to be endogenous and two measures of tax are used as instruments.
Since we have two instruments and one endogenous variable the model is over-identied and there-
fore the weights matrix will inuence the solution. Partial output from this script is shown in 20.3.
The estimated standard errors from GMM are robust by default; if we supply the --robust option
to the tsls command we get identical results.
1
20.4 Covariance matrix options
The covariance matrix of the estimated parameters depends on the choice of W through
= (J
WJ)
1
J
WWJ(J
WJ)
1
(20.8)
where J is a Jacobian term
J
ij
=
f
i
j
and is the long-run covariance matrix of the orthogonality conditions.
Gretl computes J by numeric dierentiation (there is no provision for specifying a user-supplied
analytical expression for J at the moment). As for , a consistent estimate is needed. The simplest
choice is the sample covariance matrix of the f
t
s:
0
() =
1
T
T
_
t=1
f
t
()f
t
()
(20.9)
This estimator is robust with respect to heteroskedasticity, but not with respect to autocorrela-
tion. A heteroskedasticity- and autocorrelation-consistent (HAC) variant can be obtained using the
1
The data le used in this example is available in the Stock and Watson package for gretl. See http://gretl.
sourceforge.net/gretl_data.html.
Chapter 20. GMM estimation 161
Example 20.2: TSLS via GMM
open cig_ch10.gdt
# real avg price including sales tax
genr ravgprs = avgprs / cpi
# real avg cig-specific tax
genr rtax = tax / cpi
# real average total tax
genr rtaxs = taxs / cpi
# real average sales tax
genr rtaxso = rtaxs - rtax
# logs of consumption, price, income
genr lpackpc = log(packpc)
genr lravgprs = log(ravgprs)
genr perinc = income / (pop*cpi)
genr lperinc = log(perinc)
# restrict sample to 1995 observations
smpl --restrict year=1995
# Equation (10.16) by tsls
list xlist = const lravgprs lperinc
list zlist = const rtaxso rtax lperinc
tsls lpackpc xlist ; zlist --robust
# setup for gmm
matrix Z = { zlist }
matrix W = inv(ZZ)
series e = 0
scalar b0 = 1
scalar b1 = 1
scalar b2 = 1
gmm e = lpackpc - b0 - b1*lravgprs - b2*lperinc
orthog e ; Z
weights W
params b0 b1 b2
end gmm
Chapter 20. GMM estimation 162
Example 20.3: TSLS via GMM: partial output
Model 1: TSLS estimates using the 48 observations 1-48
Dependent variable: lpackpc
Instruments: rtaxso rtax
Heteroskedasticity-robust standard errors, variant HC0
VARIABLE COEFFICIENT STDERROR T STAT P-VALUE
const 9.89496 0.928758 10.654 <0.00001 ***
lravgprs -1.27742 0.241684 -5.286 <0.00001 ***
lperinc 0.280405 0.245828 1.141 0.25401
Model 2: 1-step GMM estimates using the 48 observations 1-48
e = lpackpc - b0 - b1*lravgprs - b2*lperinc
PARAMETER ESTIMATE STDERROR T STAT P-VALUE
b0 9.89496 0.928758 10.654 <0.00001 ***
b1 -1.27742 0.241684 -5.286 <0.00001 ***
b2 0.280405 0.245828 1.141 0.25401
GMM criterion = 0.0110046
Bartlett kernel or similar. A univariate version of this is used in the context of the lrvar() function
see equation (5.1). The multivariate version is set out in equation (20.10).
k
() =
1
T
Tk
_
t=k
_
_
k
_
i=k
w
i
f
t
()f
ti
()
_
_
, (20.10)
Gretl computes the HAC covariance matrix by default when a GMM model is estimated on time
series data. You can control the kernel and the bandwidth (that is, the value of k in 20.10) using
the set command. See chapter 15 for further discussion of HAC estimation. You can also ask gretl
not to use the HAC version by saying
set force_hc on
20.5 A real example: the Consumption Based Asset Pricing Model
To illustrate gretls implementation of GMM, we will replicate the example given in chapter 3 of
Hall (2005). The model to estimate is a classic application of GMM, and provides an example of a
case when orthogonality conditions do not stem from statistical considerations, but rather from
economic theory.
A rational individual who must allocate his income between consumption and investment in a
nancial asset must in fact choose the consumption path of his whole lifetime, since investment
translates into future consumption. It can be shown that an optimal consumption path should
satisfy the following condition:
pU
(c
t
) =
k
E
_
r
t+k
U
(c
t+k
)
t
_
, (20.11)
where p is the asset price, U() is the individuals utility function, is the individuals subjective
discount rate and r
t+k
is the assets rate of return between time t and time t + k.
t
is the infor-
mation set at time t; equation (20.11) says that the utility lost at time t by purchasing the asset
instead of consumption goods must be matched by a corresponding increase in the (discounted)
Chapter 20. GMM estimation 163
future utility of the consumption nanced by the assets return. Since the future is uncertain, the
individual considers his expectation, conditional on what is known at the time when the choice is
made.
We have said nothing about the nature of the asset, so equation (20.11) should hold whatever asset
we consider; hence, it is possible to build a system of equations like (20.11) for each asset whose
price we observe.
If we are willing to believe that
the economy as a whole can be represented as a single gigantic and immortal representative
individual, and
the function U(x) =
x
r
j,t+1
p
j,t
_
C
t+1
C
t
_
1
t
_
= 1, (20.12)
where C
t
is aggregate consumption and and are the risk aversion and discount rate of the
representative individual. In this case, it is easy to see that the deep parameters and can be
estimated via GMM by using
e
t
=
r
j,t+1
p
j,t
_
C
t+1
C
t
_
1
1
as the moment condition, while any variable known at time t may serve as an instrument.
In the example code given in 20.4, we replicate selected portions of table 3.7 in Hall (2005). The
variable consrat is dened as the ratio of monthly consecutive real per capita consumption (ser-
vices and nondurables) for the US, and ewr is the returnprice ratio of a ctitious asset constructed
by averaging all the stocks in the NYSE. The instrument set contains the constant and two lags of
each variable.
The command set force_hc on on the second line of the script has the sole purpose of replicating
the given example: as mentioned above, it forces gretl to compute the long-run variance of the
orthogonality conditions according to equation (20.9) rather than (20.10).
We run gmm four times: one-step estimation for each of two initial weights matrices, then iterative
estimation starting from each set of initial weights. Since the number of orthogonality conditions
(5) is greater than the number of estimated parameters (2), the choice of initial weights should
make a dierence, and indeed we see fairly substantial dierences between the one-step estimates
(Models 1 and 2). On the other hand, iteration reduces these dierences almost to the vanishing
point (Models 3 and 4).
Part of the output is given in 20.5. It should be noted that the J test leads to a rejection of the
hypothesis of correct specication. This is perhaps not surprising given the heroic assumptions
required to move from the microeconomic principle in equation (20.11) to the aggregate system
that is actually estimated.
20.6 Caveats
A few words of warning are in order: despite its ingenuity, GMM is possibly the most fragile esti-
mation method in econometrics. The number of non-obvious choices one has to make when using
GMM is high, and in nite samples each of these can have dramatic consequences on the eventual
output. Some of the factors that may aect the results are:
1. Orthogonality conditions can be written in more than one way: for example, if E(x
t
) = 0,
then E(x
t
/ 1) = 0 holds too. It is possible that a dierent specication of the moment
conditions leads to dierent results.
Chapter 20. GMM estimation 164
Example 20.4: Estimation of the Consumption Based Asset Pricing Model
open hall.gdt
set force_hc on
scalar alpha = 0.5
scalar delta = 0.5
series e = 0
list inst = const consrat(-1) consrat(-2) ewr(-1) ewr(-2)
matrix V0 = 100000*I(nelem(inst))
matrix Z = { inst }
matrix V1 = $nobs*inv(ZZ)
gmm e = delta*ewr*consrat^(alpha-1) - 1
orthog e ; inst
weights V0
params alpha delta
end gmm
gmm e = delta*ewr*consrat^(alpha-1) - 1
orthog e ; inst
weights V1
params alpha delta
end gmm
gmm e = delta*ewr*consrat^(alpha-1) - 1
orthog e ; inst
weights V0
params alpha delta
end gmm --iterate
gmm e = delta*ewr*consrat^(alpha-1) - 1
orthog e ; inst
weights V1
params alpha delta
end gmm --iterate
Chapter 20. GMM estimation 165
Example 20.5: Estimation of the Consumption Based Asset Pricing Model output
Model 1: 1-step GMM estimates using the 465 observations 1959:04-1997:12
e = d*ewr*consrat^(alpha-1) - 1
PARAMETER ESTIMATE STDERROR T STAT P-VALUE
alpha -3.14475 6.84439 -0.459 0.64590
d 0.999215 0.0121044 82.549 <0.00001 ***
GMM criterion = 2778.08
Model 2: 1-step GMM estimates using the 465 observations 1959:04-1997:12
e = d*ewr*consrat^(alpha-1) - 1
PARAMETER ESTIMATE STDERROR T STAT P-VALUE
alpha 0.398194 2.26359 0.176 0.86036
d 0.993180 0.00439367 226.048 <0.00001 ***
GMM criterion = 14.247
Model 3: Iterated GMM estimates using the 465 observations 1959:04-1997:12
e = d*ewr*consrat^(alpha-1) - 1
PARAMETER ESTIMATE STDERROR T STAT P-VALUE
alpha -0.344325 2.21458 -0.155 0.87644
d 0.991566 0.00423620 234.070 <0.00001 ***
GMM criterion = 5491.78
J test: Chi-square(3) = 11.8103 (p-value 0.0081)
Model 4: Iterated GMM estimates using the 465 observations 1959:04-1997:12
e = d*ewr*consrat^(alpha-1) - 1
PARAMETER ESTIMATE STDERROR T STAT P-VALUE
alpha -0.344315 2.21359 -0.156 0.87639
d 0.991566 0.00423469 234.153 <0.00001 ***
GMM criterion = 5491.78
J test: Chi-square(3) = 11.8103 (p-value 0.0081)
Chapter 20. GMM estimation 166
2. As with all other numerical optimization algorithms, weird things may happen when the ob-
jective function is nearly at in some directions or has multiple minima. BFGS is usually quite
good, but there is no guarantee that it always delivers a sensible solution, if one at all.
3. The 1-step and, to a lesser extent, the 2-step estimators may be sensitive to apparently trivial
details, like the re-scaling of the instruments. Dierent choices for the initial weights matrix
can also have noticeable consequences.
4. With time-series data, there is no hard rule on the appropriate number of lags to use when
computing the long-run covariance matrix (see section 20.4). Our advice is to go by trial and
error, since results may be greatly inuenced by a poor choice. Future versions of gretl will
include more options on covariance matrix estimation.
One of the consequences of this state of things is that replicating various well-known published
studies may be extremely dicult. Any non-trivial result is virtually impossible to reproduce unless
all details of the estimation procedure are carefully recorded.
Chapter 21
Model selection criteria
21.1 Introduction
In some contexts the econometrician chooses between alternative models based on a formal hy-
pothesis test. For example, one might choose a more general model over a more restricted one if
the restriction in question can be formulated as a testable null hypothesis, and the null is rejected
on an appropriate test.
In other contexts one sometimes seeks a criterion for model selection that somehow measures the
balance between goodness of t or likelihood, on the one hand, and parsimony on the other. The
balancing is necessary because the addition of extra variables to a model cannot reduce the degree
of t or likelihood, and is very likely to increase it somewhat even if the additional variables are
not truly relevant to the data-generating process.
The best known such criterion, for linear models estimated via least squares, is the adjusted R
2
,
R
2
= 1
SSR/(nk)
TSS/(n1)
where n is the number of observations in the sample, k denotes the number of parameters esti-
mated, and SSR and TSS denote the sum of squared residuals and the total sum of squares for
the dependent variable, respectively. Compared to the ordinary coecient of determination or
unadjusted R
2
,
R
2
= 1
SSR
TSS
the adjusted calculation penalizes the inclusion of additional parameters, other things equal.
21.2 Information criteria
A more general criterion in a similar spirit is Akaikes (1974) Information Criterion (AIC). The
original formulation of this measure is
AIC = 2(
) +2k (21.1)
where (
) k
which is just 2 times the original: in this case, obviously, one wants to maximize AIC.
In the case of models estimated by least squares, the loglikelihood can be written as
(
) =
n
2
(1 +log2 logn)
n
2
logSSR (21.2)
167
Chapter 21. Model selection criteria 168
Substituting (21.2) into (21.1) we get
AIC = n(1 +log2 logn) +nlogSSR +2k
which can also be written as
AIC = nlog
_
SSR
n
_
+2k +n(1 +log2) (21.3)
Some authors simplify the formula for the case of models estimated via least squares. For instance,
William Greene writes
AIC = log
_
SSR
n
_
+
2k
n
(21.4)
This variant can be derived from (21.3) by dividing through by n and subtracting the constant
1 +log2. That is, writing AIC
G
for the version given by Greene, we have
AIC
G
=
1
n
AIC (1 +log2)
Finally, Ramanathan gives a further variant:
AIC
R
=
_
SSR
n
_
e
2k/n
which is the exponential of the one given by Greene.
Gretl began by using the Ramanathan variant, but since version 1.3.1 the program has used the
original Akaike formula (21.1), and more specically (21.3) for models estimated via least squares.
Although the Akaike criterion is designed to favor parsimony, arguably it does not go far enough
in that direction. For instance, if we have two nested models with k 1 and k parameters respec-
tively, and if the null hypothesis that parameter k equals 0 is true, in large samples the AIC will
nonetheless tend to select the less parsimonious model about 16 percent of the time (see Davidson
and MacKinnon, 2004, chapter 15).
An alternative to the AIC which avoids this problem is the Schwarz (1978) Bayesian information
criterion (BIC). The BIC can be written (in line with Akaikes formulation of the AIC) as
BIC = 2(
) +klogn
The multiplication of k by logn in the BIC means that the penalty for adding extra parameters
grows with the sample size. This ensures that, asymptotically, one will not select a larger model
over a correctly specied parsimonious model.
A further alternative to AIC, which again tends to select more parsimonious models than AIC,
is the HannanQuinn criterion or HQC (Hannan and Quinn, 1979). Written consistently with the
formulations above, this is
HQC = 2(
) +2kloglogn
The HannanQuinn calculation is based on the law of the iterated logarithm (note that the last term
is the log of the log of the sample size). The authors argue that their procedure provides a strongly
consistent estimation procedure for the order of an autoregression, and that compared to other
strongly consistent procedures this procedure will underestimate the order to a lesser degree.
Gretl reports the AIC, BIC and HQC (calculated as explained above) for most sorts of models. The
key point in interpreting these values is to know whether they are calculated such that smaller
values are better, or such that larger values are better. In gretl, smaller values are better: one wants
to minimize the chosen criterion.
Chapter 22
Time series lters
In addition to the usual application of lags and dierences, gretl provides fractional dierencing
and various lters commonly used in macroeconomics for trend-cycle decomposition: notably the
HodrickPrescott lter (Hodrick and Prescott, 1997), the BaxterKing bandpass lter (Baxter and
King, 1999) and the Butterworth lter (Butterworth, 1930).
22.1 Fractional dierencing
The concept of dierencing a time series d times is pretty obvious when d is an integer; it may seem
odd when d is fractional. However, this idea has a well-dened mathematical content: consider the
function
f(z) = (1 z)
d
,
where z and d are real numbers. By taking a Taylor series expansion around z = 0, we see that
f(z) = 1 +dz +
d(d +1)
2
z
2
+
or, more compactly,
f(z) = 1 +
_
i=1
i
z
i
with
k
=
k
i=1
(d +i 1)
k!
=
k1
d +k 1
k
The same expansion can be used with the lag operator, so that if we dened
Y
t
= (1 L)
0.5
X
t
this could be considered shorthand for
Y
t
= X
t
0.5X
t1
0.125X
t2
0.0625X
t3
In gretl this transformation can be accomplished by the syntax
genr Y = fracdiff(X,0.5)
22.2 The HodrickPrescott lter
This lter is accessed using the hpfilt() function, which takes as its rst argument the name of
the variable to be processed. (A further optional argument is explained below.)
A time series y
t
may be decomposed into a trend or growth component g
t
and a cyclical component
c
t
.
y
t
= g
t
+c
t
, t = 1, 2, . . . , T
169
Chapter 22. Time series lters 170
The HodrickPrescott lter eects such a decomposition by minimizing the following:
T
_
t=1
(y
t
g
t
)
2
+
T1
_
t=2
_
(g
t+1
g
t
) (g
t
g
t1
)
_
2
.
The rst term above is the sum of squared cyclical components c
t
= y
t
g
t
. The second term is a
multiple of the sum of squares of the trend components second dierences. This second term
penalizes variations in the growth rate of the trend component: the larger the value of , the higher
is the penalty and hence the smoother the trend series.
Note that the hpfilt function in gretl produces the cyclical component, c
t
, of the original series.
If you want the smoothed trend you can subtract the cycle from the original:
genr ct = hpfilt(yt)
genr gt = yt - ct
Hodrick and Prescott (1997) suggest that a value of = 1600 is reasonable for quarterly data. The
default value in gretl is 100 times the square of the data frequency (which, of course, yields 1600
for quarterly data). The value can be adjusted using an optional second argument to hpfilt(), as
in
genr ct = hpfilt(yt, 1300)
22.3 The Baxter and King lter
This lter is accessed using the bkfilt() function, which again takes the name of the variable to
be processed as its rst argument. The operation of the lter can be controlled via three further
optional argument.
Consider the spectral representation of a time series y
t
:
y
t
=
_
e
i
dZ()
To extract the component of y
t
that lies between the frequencies and one could apply a
bandpass lter:
c
t
=
_
()e
i
dZ()
where F
() = 1 for < < and 0 elsewhere. This would imply, in the time domain,
applying to the series a lter with an innite number of coecients, which is undesirable. The
Baxter and King bandpass lter applies to y
t
a nite polynomial in the lag operator A(L):
c
t
= A(L)y
t
where A(L) is dened as
A(L) =
k
_
i=k
a
i
L
i
The coecients a
i
are chosen such that F() = A(e
i
)A(e
i
) is the best approximation to F
()
for a given k. Clearly, the higher k the better the approximation is, but since 2k observations have
to be discarded, a compromise is usually sought. Moreover, the lter has also other appealing
theoretical properties, among which the property that A(1) = 0, so a series with a single unit root
is made stationary by application of the lter.
In practice, the lter is normally used with monthly or quarterly data to extract the business
cycle component, namely the component between 6 and 36 quarters. Usual choices for k are 8 or
Chapter 22. Time series lters 171
12 (maybe higher for monthly series). The default values for the frequency bounds are 8 and 32,
and the default value for the approximation order, k, is 8. You can adjust these values using the
full form of bkfilt(), which is
bkfilt(seriesname, f1, f2, k)
where f1 and f2 represent the lower and upper frequency bounds respectively.
22.4 The Butterworth lter
The Butterworth lter (Butterworth, 1930) is an approximation to an ideal square-wave lter.
The ideal lter divides the spectrum of a time series into a pass-band (frequencies less than some
chosen
Q)
1
Q
y (22.1)
where
= 2I
T
(L
T
+L
1
T
)
T2
and M = 2I
T
+(L
T
+L
1
T
)
T
I
T
denotes the identity matrix of order T; L
T
= [e
1
, e
2
, . . . , e
T1
, 0] is the nite-sample matrix
version of the lag operator; and Q is dened such that pre-multiplication of a T-vector of data by
Q
of order (T 2) T produces the second dierences of the data. The matrix product
Q
Q = 2I
T
(L
T
+L
1
T
)
T
is a Toeplitz matrix.
The behavior of the Butterworth lter is governed by two parameters: the frequency cuto
and
an integer order, n, which determines the number of coecients used. The that appears in (22.1)
is tan(
/2)
2n
. Higher values of n produce a better approximation to the ideal lter in principle
(i.e. a sharper cut between the pass-band and the stop-band) but there is a downside: with a greater
number of coecients numerical instability may be an issue, and the inuence of the initial values
in the sample may be exaggerated.
In gretl the Butterworth lter is implemented by the bwfilt() function,
1
which takes three argu-
ments: the series to lter, the order n and the frequency cuto,
or 4 periods.
If we set
= 68
(or thereabouts) we should be able to excise the seasonality quite cleanly using
1
The code for this lter is based on D. S. G. Pollocks programs IDEOLOG and DETREND. The Pascal source code for
the former is available from http://www.le.ac.uk/users/dsgp1 and the C sources for the latter were kindly made
available to us by the author.
2
This is the variable QNC from the Ramanathan data le data9-7.
Chapter 22. Time series lters 172
0
50000
100000
150000
200000
250000
300000
0 20 40 60 80 100 120 140 160 180
64.0 10.7 5.8 4.0 3.0 2.5 2.1
degrees
periods
1600
1800
2000
2200
2400
2600
2800
3000
3200
3400
1976 1978 1980 1982 1984 1986 1988 1990
QNC (original data)
QNC (smoothed)
0
0.2
0.4
0.6
0.8
1
0 /4 /2 3/4
Figure 22.1: The Butterworth lter applied
n = 8. The result is shown in the lower panel of the Figure, along with the frequency response or
gain plot for the chosen lter. Note the smooth and reasonably steep drop-o in gain centered on
the nominal cuto of 68
3/8.
The apparatus that supports this sort of analysis in the gretl GUI can be found under the Variable
menu in the main window: the items Periodogram and Filter. In the periodogram dialog box you
have the option of expressing the frequency axis in degrees, which is helpful when selecting a
Butterworth lter; and in the Butterworth lter dialog you have the option of plotting the frequency
response as well as the smoothed series and/or the residual or cycle.
Chapter 23
Univariate time series models
23.1 Introduction
Time series models are discussed in this chapter and the next two. Here we concentrate on ARIMA
models, unit root tests, and GARCH. The following chapter deals with VARs, and chapter 25 with
cointegration and error correction.
23.2 ARIMA models
Representation and syntax
The arma command performs estimation of AutoRegressive, Integrated, Moving Average (ARIMA)
models. These are models that can be written in the form
(L)y
t
= (L)
t
(23.1)
where (L), and (L) are polynomials in the lag operator, L, dened such that L
n
x
t
= x
tn
, and
t
is a white noise process. The exact content of y
t
, of the AR polynomial (), and of the MA
polynomial (), will be explained in the following.
Mean terms
The process y
t
as written in equation (23.1) has, without further qualications, mean zero. If the
model is to be applied to real data, it is necessary to include some term to handle the possibility
that y
t
has non-zero mean. There are two possible ways to represent processes with nonzero
mean: one is to dene
t
as the unconditional mean of y
t
, namely the central value of its marginal
distribution. Therefore, the series y
t
= y
t
t
has mean 0, and the model (23.1) applies to y
t
. In
practice, assuming that
t
is a linear function of some observable variables x
t
, the model becomes
(L)(y
t
x
t
) = (L)
t
(23.2)
This is sometimes known as a regression model with ARMA errors; its structure may be more
apparent if we represent it using two equations:
y
t
= x
t
+u
t
(L)u
t
= (L)
t
The model just presented is also sometimes known as ARMAX (ARMA + eXogenous variables). It
seems to us, however, that this label is more appropriately applied to a dierent model: another
way to include a mean term in (23.1) is to base the representation on the conditional mean of y
t
,
that is the central value of the distribution of y
t
given its own past. Assuming, again, that this can
be represented as a linear combination of some observable variables z
t
, the model would expand
to
(L)y
t
= z
t
+(L)
t
(23.3)
The formulation (23.3) has the advantage that can be immediately interpreted as the vector of
marginal eects of the z
t
variables on the conditional mean of y
t
. And by adding lags of z
t
to
173
Chapter 23. Univariate time series models 174
this specication one can estimate Transfer Function models (which generalize ARMA by adding
the eects of exogenous variable distributed across time).
Gretl provides a way to estimate both forms. Models written as in (23.2) are estimated by maximum
likelihood; models written as in (23.3) are estimated by conditional maximum likelihood. (For more
on these options see the section on Estimation below.)
In the special case when x
t
= z
t
= 1 (that is, the models include a constant but no exogenous
variables) the two specications discussed above reduce to
(L)(y
t
) = (L)
t
(23.4)
and
(L)y
t
= +(L)
t
(23.5)
respectively. These formulations are essentially equivalent, but if they represent one and the same
process and are, fairly obviously, not numerically identical; rather
=
_
1
1
. . .
p
_
_
+. . . +
p
_
y
tp
x
tp
_
+
t
+
1
t1
+. . . +
q
tq
where in this instance x
t
=
0
+x
t,1
1
+x
t,2
2
. Appending the --conditional switch, as in
arma p q ; y const x1 x2 --conditional
would estimate the following model:
y
t
= x
t
+
1
y
t1
+. . . +
p
y
tp
+
t
+
1
t1
+. . . +
q
tq
Ideally, the issue broached above could be made moot by writing a more general specication that
nests the alternatives; that is
(L)
_
y
t
x
t
_
= z
t
+(L)
t
; (23.6)
we would like to generalize the arma command so that the user could specify, for any estimation
method, whether certain exogenous variables should be treated as x
t
s or z
t
s, but were not yet at
that point (and neither are most other software packages).
Chapter 23. Univariate time series models 175
Seasonal models
A more exible lag structure is desirable when analyzing time series that display strong seasonal
patterns. Model (23.1) can be expanded to
(L)(L
s
)y
t
= (L)(L
s
)
t
. (23.7)
For such cases, a fuller form of the syntax is available, namely,
arma p q ; P Q ; y
where p and q represent the non-seasonal AR and MA orders, and P and Q the seasonal orders. For
example,
arma 1 1 ; 1 1 ; y
would be used to estimate the following model:
(1 L)(1 L
s
)(y
t
) = (1 +L)(1 +L
s
)
t
If y
t
is a quarterly series (and therefore s = 4), the above equation can be written more explicitly as
y
t
= (y
t1
) +(y
t4
) ( )(y
t5
) +
t
+
t1
+
t4
+( )
t5
Such a model is known as a multiplicative seasonal ARMA model.
Gaps in the lag structure
The standard way to specify an ARMA model in gretl is via the AR and MA orders, p and q respec-
tively. In this case all lags from 1 to the given order are included. In some cases one may wish to
include only certain specic AR and/or MA lags. This can be done in either of two ways.
One can construct a matrix containing the desired lags (positive integer values) and supply
the name of this matrix in place of p or q.
One can give a space-separated list of lags, enclosed in braces, in place of p or q.
The following code illustrates these options:
matrix pvec = {1, 4}
arma pvec 1 ; y
arma {1 4} 1 ; y
Both forms above specify an ARMA model in which AR lags 1 and 4 are used (but not 2 and 3).
This facility is available only for the non-seasonal component of the ARMA specication.
Dierencing and ARIMA
The above discussion presupposes that the time series y
t
has already been subjected to all the
transformations deemed necessary for ensuring stationarity (see also section 23.3). Dierencing is
the most common of these transformations, and gretl provides a mechanism to include this step
into the arma command: the syntax
arma p d q ; y
would estimate an ARMA(p, q) model on
d
y
t
. It is functionally equivalent to
Chapter 23. Univariate time series models 176
series tmp = y
loop i=1..d
tmp = diff(tmp)
endloop
arma p q ; tmp
except with regard to forecasting after estimation (see below).
When the series y
t
is dierenced before performing the analysis the model is known as ARIMA (I
for Integrated); for this reason, gretl provides the arima command as an alias for arma.
Seasonal dierencing is handled similarly, with the syntax
arma p d q ; P D Q ; y
where D is the order for seasonal dierencing. Thus, the command
arma 1 0 0 ; 1 1 1 ; y
would produce the same parameter estimates as
genr dsy = sdiff(y)
arma 1 0 ; 1 1 ; dsy
where we use the sdiff function to create a seasonal dierence (e.g. for quarterly data, y
t
y
t4
).
In specifying an ARIMA model with exogenous regressors we face a choice which relates back to the
discussion of the variant models (23.2) and (23.3) above. If we choose model (23.2), the regression
model with ARMA errors, how should this be extended to the case of ARIMA? The issue is whether
or not the dierencing that is applied to the dependent variable should also be applied to the
regressors. Consider the simplest case, ARIMA with non-seasonal dierencing of order 1. We may
estimate either
(L)(1 L)(y
t
X
t
) = (L)
t
(23.8)
or
(L)
_
(1 L)y
t
X
t
_
= (L)
t
(23.9)
The rst of these formulations can be described as a regression model with ARIMA errors, while the
second preserves the levels of the X variables. As of gretl version 1.8.6, the default model is (23.8),
in which dierencing is applied to both y
t
and X
t
. However, when using the default estimation
method (native exact ML, see below), the option --y-diff-only may be given, in which case gretl
estimates (23.9).
1
Estimation
The default estimation method for ARMA models is exact maximum likelihood estimation (under
the assumption that the error term is normally distributed), using the Kalman lter in conjunc-
tion with the BFGS maximization algorithm. The gradient of the log-likelihood with respect to the
parameter estimates is approximated numerically. This method produces results that are directly
comparable with many other software packages. The constant, and any exogenous variables, are
treated as in equation (23.2). The covariance matrix for the parameters is computed using a nu-
merical approximation to the Hessian at convergence.
The alternative method, invoked with the --conditional switch, is conditional maximum likeli-
hood (CML), also known as conditional sum of squares (see Hamilton, 1994, p. 132). This method
was exemplied in the script 9.3, and only a brief description will be given here. Given a sample of
size T, the CML method minimizes the sum of squared one-step-ahead prediction errors generated
1
Prior to gretl 1.8.6, the default model was (23.9). We changed this for the sake of consistency with other software.
Chapter 23. Univariate time series models 177
by the model for the observations t
0
, . . . , T. The starting point t
0
depends on the orders of the AR
polynomials in the model. The numerical maximization method used is BHHH, and the covariance
matrix is computed using a GaussNewton regression.
The CML method is nearly equivalent to maximum likelihood under the hypothesis of normality;
the dierence is that the rst (t
0
1) observations are considered xed and only enter the like-
lihood function as conditioning variables. As a consequence, the two methods are asymptotically
equivalent under standard conditions except for the fact, discussed above, that our CML imple-
mentation treats the constant and exogenous variables as per equation (23.3).
The two methods can be compared as in the following example
open data10-1
arma 1 1 ; r
arma 1 1 ; r --conditional
which produces the estimates shown in Table 23.1. As you can see, the estimates of and are
quite similar. The reported constants dier widely, as expected see the discussion following
equations (23.4) and (23.5). However, dividing the CML constant by 1 we get 7.38, which is not
far from the ML estimate of 6.93.
Table 23.1: ML and CML estimates
Parameter ML CML
6.93042 (0.923882) 1.07322 (0.488661)
0.855360 (0.0511842) 0.852772 (0.0450252)
0.588056 (0.0986096) 0.591838 (0.0456662)
Convergence and initialization
The numerical methods used to maximize the likelihood for ARMA models are not guaranteed
to converge. Whether or not convergence is achieved, and whether or not the true maximum of
the likelihood function is attained, may depend on the starting values for the parameters. Gretl
employs one of the following two initialization mechanisms, depending on the specication of the
model and the estimation method chosen.
1. Estimate a pure AR model by Least Squares (nonlinear least squares if the model requires
it, otherwise OLS). Set the AR parameter values based on this regression and set the MA
parameters to a small positive value (0.0001).
2. The HannanRissanen method: First estimate an autoregressive model by OLS and save the
residuals. Then in a second OLS pass add appropriate lags of the rst-round residuals to the
model, to obtain estimates of the MA parameters.
To see the details of the ARMA estimation procedure, add the --verbose option to the command.
This prints a notice of the initialization method used, as well as the parameter values and log-
likelihood at each iteration.
Besides the built-in initialization mechanisms, the user has the option of specifying a set of starting
values manually. This is done via the set command: the rst argument should be the keyword
initvals and the second should be the name of a pre-specied matrix containing starting values.
For example
matrix start = { 0, 0.85, 0.34 }
set initvals start
arma 1 1 ; y
Chapter 23. Univariate time series models 178
The specied matrix should have just as many parameters as the model: in the example above
there are three parameters, since the model implicitly includes a constant. The constant, if present,
is always given rst; otherwise the order in which the parameters are expected is the same as the
order of specication in the arma or arima command. In the example the constant is set to zero,
1
to 0.85, and
1
to 0.34.
You can get gretl to revert to automatic initialization via the command set initvals auto.
Two variants of the BFGS algorithm are available in gretl. In general we recommend the default vari-
ant, which is based on an implementation by Nash (1990), but for some problems the alternative,
limited-memory version (L-BFGS-B, see Byrd et al., 1995) may increase the chances of convergence
on the ML solution. This can be selected via the --lbfgs option to the arma command.
Estimation via X-12-ARIMA
As an alternative to estimating ARMA models using native code, gretl oers the option of using
the external program X-12-ARIMA. This is the seasonal adjustment software produced and main-
tained by the U.S. Census Bureau; it is used for all ocial seasonal adjustments at the Bureau.
Gretl includes a module which interfaces with X-12-ARIMA: it translates arma commands using the
syntax outlined above into a form recognized by X-12-ARIMA, executes the program, and retrieves
the results for viewing and further analysis within gretl. To use this facility you have to install
X-12-ARIMA separately. Packages for both MS Windows and GNU/Linux are available from the gretl
website, http://gretl.sourceforge.net/.
To invoke X-12-ARIMA as the estimation engine, append the ag --x-12-arima, as in
arma p q ; y --x-12-arima
As with native estimation, the default is to use exact ML but there is the option of using conditional
ML with the --conditional ag. However, please note that when X-12-ARIMA is used in conditional
ML mode, the comments above regarding the variant treatments of the mean of the process y
t
do
not apply. That is, when you use X-12-ARIMA the model that is estimated is (23.2), regardless
of whether estimation is by exact ML or conditional ML. In addition, the treatment of exogenous
regressors in the context of ARIMA dierencing is always that shown in equation (23.8).
Forecasting
ARMA models are often used for forecasting purposes. The autoregressive component, in particu-
lar, oers the possibility of forecasting a process out of sample over a substantial time horizon.
Gretl supports forecasting on the basis of ARMA models using the method set out by Box and
Jenkins (1976).
2
The Box and Jenkins algorithm produces a set of integrated AR coecients which
take into account any dierencing of the dependent variable (seasonal and/or non-seasonal) in the
ARIMA context, thus making it possible to generate a forecast for the level of the original variable.
By contrast, if you rst dierence a series manually and then apply ARMA to the dierenced series,
forecasts will be for the dierenced series, not the level. This point is illustrated in Example 23.1.
The parameter estimates are identical for the two models. The forecasts dier but are mutually
consistent: the variable fcdiff emulates the ARMA forecast (static, one step ahead within the
sample range, and dynamic out of sample).
2
See in particular their Program 4 on p. 505.
Chapter 23. Univariate time series models 179
Example 23.1: ARIMA forecasting
open greene18_2.gdt
# log of quarterly U.S. nominal GNP, 1950:1 to 1983:4
genr y = log(Y)
# and its first difference
genr dy = diff(y)
# reserve 2 years for out-of-sample forecast
smpl ; 1981:4
# Estimate using ARIMA
arima 1 1 1 ; y
# forecast over full period
smpl --full
fcast fc1
# Return to sub-sample and run ARMA on the first difference of y
smpl ; 1981:4
arma 1 1 ; dy
smpl --full
fcast fc2
genr fcdiff = (t<=1982:1)? (fc1 - y(-1)) : (fc1 - fc1(-1))
# compare the forecasts over the later period
smpl 1981:1 1983:4
print y fc1 fc2 fcdiff --byobs
The output from the last command is:
y fc1 fc2 fcdiff
1981:1 7.964086 7.940930 0.02668 0.02668
1981:2 7.978654 7.997576 0.03349 0.03349
1981:3 8.009463 7.997503 0.01885 0.01885
1981:4 8.015625 8.033695 0.02423 0.02423
1982:1 8.014997 8.029698 0.01407 0.01407
1982:2 8.026562 8.046037 0.01634 0.01634
1982:3 8.032717 8.063636 0.01760 0.01760
1982:4 8.042249 8.081935 0.01830 0.01830
1983:1 8.062685 8.100623 0.01869 0.01869
1983:2 8.091627 8.119528 0.01891 0.01891
1983:3 8.115700 8.138554 0.01903 0.01903
1983:4 8.140811 8.157646 0.01909 0.01909
Chapter 23. Univariate time series models 180
23.3 Unit root tests
The ADF test
The Augmented DickeyFuller (ADF) test is, as implemented in gretl, the t-statistic on in the
following regression:
y
t
=
t
+y
t1
+
p
_
i=1
i
y
ti
+
t
. (23.10)
This test statistic is probably the best-known and most widely used unit root test. It is a one-sided
test whose null hypothesis is = 0 versus the alternative < 0 (and hence large negative values
of the test statistic lead to the rejection of the null). Under the null, y
t
must be dierenced at least
once to achieve stationarity; under the alternative, y
t
is already stationary and no dierencing is
required.
One peculiar aspect of this test is that its limit distribution is non-standard under the null hy-
pothesis: moreover, the shape of the distribution, and consequently the critical values for the test,
depends on the form of the
t
term. A full analysis of the various cases is inappropriate here:
Hamilton (1994) contains an excellent discussion, but any recent time series textbook covers this
topic. Suce it to say that gretl allows the user to choose the specication for
t
among four
dierent alternatives:
t
command option
0 --nc
0
--c
0
+
1
t --ct
0
+
1
t +
1
t
2
--ctt
These option ags are not mutually exclusive; when they are used together the statistic will be
reported separately for each selected case. By default, gretl uses the combination --c --ct. For
each case, approximate p-values are calculated by means of the algorithm developed in MacKinnon
(1996).
The gretl command used to perform the test is adf; for example
adf 4 x1
would compute the test statistic as the t-statistic for in equation 23.10 with p = 4 in the two
cases
t
=
0
and
t
=
0
+
1
t.
The number of lags (p in equation 23.10) should be chosen as to ensure that (23.10) is a para-
metrization exible enough to represent adequately the short-run persistence of y
t
. Setting p
too low results in size distortions in the test, whereas setting p too high leads to low power. As
a convenience to the user, the parameter p can be automatically determined. Setting p to a neg-
ative number triggers a sequential procedure that starts with p lags and decrements p until the
t-statistic for the parameter
p
exceeds 1.645 in absolute value.
The ADF-GLS test
Elliott, Rothenberg and Stock (1996) proposed a variant of the ADF test which involves an alterna-
tive method of handling the parameters pertaining to the deterministic term
t
: these are estimated
rst via Generalized Least Squares, and in a second stage an ADF regression is performed using the
GLS residuals. This variant oers greater power than the regular ADF test for the cases
t
=
0
and
t
=
0
+
1
t.
The ADF-GLS test is available in gretl via the --gls option to the adf command. When this option
is selected the --nc and --ctt options become unavailable, and only one case can be selected at
Chapter 23. Univariate time series models 181
a time; by default the constant-only model is used but a trend can be added using the --ct ag.
When a trend is present in this test MacKinnon-type p-values are not available; instead we show
critical values from Table 1 in Elliott et al. (1996).
The KPSS test
The KPSS test (Kwiatkowski, Phillips, Schmidt and Shin, 1992) is a unit root test in which the null
hypothesis is opposite to that in the ADF test: under the null, the series in question is stationary;
the alternative is that the series is I(1).
The basic intuition behind this test statistic is very simple: if y
t
can be written as y
t
= + u
t
,
where u
t
is some zero-mean stationary process, then not only does the sample average of the y
t
s
provide a consistent estimator of , but the long-run variance of u
t
is a well-dened, nite number.
Neither of these properties hold under the alternative.
The test itself is based on the following statistic:
=
T
i=1
S
2
t
T
2
2
(23.11)
where S
t
=
t
s=1
e
s
and
2
is an estimate of the long-run variance of e
t
= (y
t
y). Under the null,
this statistic has a well-dened (nonstandard) asymptotic distribution, which is free of nuisance
parameters and has been tabulated by simulation. Under the alternative, the statistic diverges.
As a consequence, it is possible to construct a one-sided test based on , where H
0
is rejected if
is bigger than the appropriate critical value; gretl provides the 90, 95 and 99 percent quantiles.
The critical values are computed via the method presented by Sephton (1995), which oers greater
accuracy than the values tabulated in Kwiatkowski et al. (1992).
Usage example:
kpss m y
where m is an integer representing the bandwidth or window size used in the formula for estimating
the long run variance:
2
=
m
_
i=m
_
1
i
m+1
_
i
The
i
terms denote the empirical autocovariances of e
t
from order m through m. For this
estimator to be consistent, m must be large enough to accommodate the short-run persistence of
e
t
, but not too large compared to the sample size T. If the supplied m is non-positive a default value
is computed, namely the integer part of 4
_
T
100
_
1/4
.
The above concept can be generalized to the case where y
t
is thought to be stationary around a
deterministic trend. In this case, formula (23.11) remains unchanged, but the series e
t
is dened as
the residuals from an OLS regression of y
t
on a constant and a linear trend. This second form of
the test is obtained by appending the --trend option to the kpss command:
kpss n y --trend
Note that in this case the asymptotic distribution of the test is dierent and the critical values
reported by gretl dier accordingly.
Panel unit root tests
The most commonly used unit root tests for panel data involve a generalization of the ADF pro-
cedure, in which the joint null hypothesis is that a given times series is non-stationary for all
individuals in the panel.
Chapter 23. Univariate time series models 182
In this context the ADF regression (23.10) can be rewritten as
y
it
=
it
+
i
y
i,t1
+
p
i
_
j=1
ij
y
i,tj
+
it
(23.12)
The model (23.12) allows for maximal heterogeneity across the individuals in the panel: the pa-
rameters of the deterministic term, the autoregressive coecient , and the lag order p are all
specic to the individual, indexed by i.
One possible modication of this model is to impose the assumption that
i
= for all i; that is,
the individual time series share a common autoregressive root (although they may dier in respect
of other statistical properties). The choice of whether or not to impose this assumption has an
important bearing on the hypotheses under test. Under model (23.12) the joint null is
i
= 0 for
all i, meaning that all the individual time series are non-stationary, and the alternative (simply the
negation of the null) is that at least one individual time series is stationary. When a common is
assumed, the null is that = 0 and the alternative is that < 0. The null still says that all the
individual series are non-stationary, but the alternative now says that they are all stationary. The
choice of model should take this point into account, as well as the gain in power from forming a
pooled estimate of and, of course, the plausibility of assuming a common AR(1) coecient.
3
In gretl, the formulation (23.12) is used automatically when the adf command is used on panel
data. The joint test statistic is formed using the method of Im, Pesaran and Shin (2003). In this
context the behavior of adf diers from regular time-series data: only one case of the deterministic
term is handled per invocation of the command; the default is that
it
includes just a constant but
the --nc and --ct ags can be used to suppress the constant or to include a trend, respectively;
and the quadratic trend option --ctt is not available.
The alternative that imposes a common value of is implemented via the levinlin command.
The test statistic is computed as per Levin, Lin and Chu (2002). As with the adf command, the rst
argument is the lag order and the second is the name of the series to test; and the default case for
the deterministic component is a constant only. The options --nc and --ct have the same eect
as with adf. One renement is that the lag order may be given in either of two forms: if a scalar
is given, this is taken to represent a common value of p for all individuals, but you may instead
provide a vector holding a set of p
i
values, hence allowing the order of autocorrelation of the series
to dier by individual. So, for example, given
levinlin 2 y
levinlin {2,2,3,3,4,4} y
the rst command runs a joint ADF test with a common lag order of 2, while the second (which
assumes a panel with six individuals) allows for diering short-run dynamics. The rst argument
to levinlin can be given as a set of comma-separated integers enclosed in braces, as shown above,
or as the name of an appropriately dimensioned pre-dened matrix (see chapter 13).
Besides variants of the ADF test, the KPSS test also can be used with panel data via the kpss
command. In this case the test (of the null hypothesis that the given time series is stationary for
all individuals) is implemented using the method of Choi (2001). This is an application of meta-
analysis, the statistical technique whereby an overall or composite p-value for the test of a given
null hypothesis can be computed from the p-values of a set of separate tests. Unfortunately, in
the case of the KPSS test we are limited by the unavailability of precise p-values, although if an
individual test statistic falls between the 10 percent and 1 percent critical values we are able to
interpolate with a fair degree of condence. This gives rise to four cases.
1. All the individual KPSS test statistics fall between the 10 percent and 1 percent critical values:
the Choi method gives us a plausible composite p-value.
3
If the assumption of a common seems excessively restrictive, bear in mind that we routinely assume common slope
coecients when estimating panel models, even if this is unlikely to be literally true.
Chapter 23. Univariate time series models 183
2. Some of the KPSS test statistics exceed the 1 percent value and none fall short of the 10
percent value: we can give an upper bound for the composite p-value by setting the unknown
p-values to 0.01.
3. Some of the KPSS test statistics fall short of the 10 percent critical value but none exceed the
1 percent value: we can give a lower bound to the composite p-value by setting the unknown
p-values to 0.10.
4. None of the above conditions are satised: the Choi method fails to produce any result for
the composite KPSS test.
23.4 Cointegration tests
The generally recommended test for cointegration is the Johansen test, which is discussed in detail
in chapter 25. In this context we oer a few remarks on the cointegration test of Engle and Granger
(1987), which builds on the ADF test discussed above (section 23.3).
For the EngleGranger test, the procedure is:
1. Test each series for a unit root using an ADF test.
2. Run a cointegrating regression via OLS. For this we select one of the potentially cointegrated
variables as dependent, and include the other potentially cointegrated variables as regressors.
3. Perform an ADF test on the residuals from the cointegrating regression.
The idea is that cointegration is supported if (a) the null of non-stationarity is not rejected for each
of the series individually, in step 1, while (b) the null is rejected for the residuals at step 3. That is,
each of the individual series is I(1) but some linear combination of the series is I(0).
This test is implemented in gretl by the coint command, which requires an integer lag order
(for the ADF tests) followed by a list of variables to be tested, the rst of which will be taken
as dependent in the cointegrating regression. Please see the online help for coint, or the Gretl
Command Reference, for further details.
23.5 ARCH and GARCH
Heteroskedasticity means a non-constant variance of the error term in a regression model. Autore-
gressive Conditional Heteroskedasticity (ARCH) is a phenomenon specic to time series models,
whereby the variance of the error displays autoregressive behavior; for instance, the time series ex-
hibits successive periods where the error variance is relatively large, and successive periods where
it is relatively small. This sort of behavior is reckoned to be quite common in asset markets: an
unsettling piece of news can lead to a period of increased volatility in the market.
An ARCH error process of order q can be represented as
u
t
=
t
t
;
2
t
E(u
2
t
t1
) =
0
+
q
_
i=1
i
u
2
ti
where the
t
s are independently and identically distributed (iid) with mean zero and variance 1,
and where
t
is taken to be the positive square root of
2
t
.
t1
denotes the information set as of
time t 1 and
2
t
is the conditional variance: that is, the variance conditional on information dated
t 1 and earlier.
It is important to notice the dierence between ARCH and an ordinary autoregressive error process.
The simplest (rst-order) case of the latter can be written as
u
t
= u
t1
+
t
; 1 < < 1
Chapter 23. Univariate time series models 184
where the
t
s are independently and identically distributed with mean zero and variance
2
. With
an AR(1) error, if is positive then a positive value of u
t
will tend to be followed, with probability
greater than 0.5, by a positive u
t+1
. With an ARCH error process, a disturbance u
t
of large absolute
value will tend to be followed by further large absolute values, but with no presumption that the
successive values will be of the same sign. ARCH in asset prices is a stylized fact and is consistent
with market eciency; on the other hand autoregressive behavior of asset prices would violate
market eciency.
One can test for ARCH of order q in the following way:
1. Estimate the model of interest via OLS and save the squared residuals, u
2
t
.
2. Perform an auxiliary regression in which the current squared residual is regressed on a con-
stant and q lags of itself.
3. Find the TR
2
value (sample size times unadjusted R
2
) for the auxiliary regression.
4. Refer the TR
2
value to the
2
distribution with q degrees of freedom, and if the p-value is
small enough reject the null hypothesis of homoskedasticity in favor of the alternative of
ARCH(q).
This test is implemented in gretl via the modtest command with the --arch option, which must
follow estimation of a time-series model by OLS (either a single-equation model or a VAR). For
example,
ols y 0 x
modtest 4 --arch
This example species an ARCH order of q = 4; if the order argument is omitted, q is set equal to
the periodicity of the data. In the graphical interface, the ARCH test is accessible from the Tests
menu in the model window (again, for single-equation OLS or VARs).
GARCH
The simple ARCH(q) process is useful for introducing the general concept of conditional het-
eroskedasticity in time series, but it has been found to be insucient in empirical work. The
dynamics of the error variance permitted by ARCH(q) are not rich enough to represent the patterns
found in nancial data. The generalized ARCH or GARCH model is now more widely used.
The representation of the variance of a process in the GARCH model is somewhat (but not exactly)
analogous to the ARMA representation of the level of a time series. The variance at time t is allowed
to depend on both past values of the variance and past values of the realized squared disturbance,
as shown in the following system of equations:
y
t
= X
t
+u
t
(23.13)
u
t
=
t
t
(23.14)
2
t
=
0
+
q
_
i=1
i
u
2
ti
+
p
_
j=1
2
tj
(23.15)
As above,
t
is an iid sequence with unit variance. X
t
is a matrix of regressors (or in the simplest
case, just a vector of 1s allowing for a non-zero mean of y
t
). Note that if p = 0, GARCH collapses
to ARCH(q): the generalization is embodied in the
j
terms that multiply previous values of the
error variance.
In principle the underlying innovation,
t
, could follow any suitable probability distribution, and
besides the obvious candidate of the normal or Gaussian distribution the Students t distribution
has been used in this context. Currently gretl only handles the case where
t
is assumed to be
Chapter 23. Univariate time series models 185
Gaussian. However, when the --robust option to the garch command is given, the estimator gretl
uses for the covariance matrix can be considered Quasi-Maximum Likelihood even with non-normal
disturbances. See below for more on the options regarding the GARCH covariance matrix.
Example:
garch p q ; y const x
where p 0 and q > 0 denote the respective lag orders as shown in equation (23.15). These values
can be supplied in numerical form or as the names of pre-dened scalar variables.
GARCH estimation
Estimation of the parameters of a GARCH model is by no means a straightforward task. (Consider
equation 23.15: the conditional variance at any point in time,
2
t
, depends on the conditional
variance in earlier periods, but
2
t
is not observed, and must be inferred by some sort of Maximum
Likelihood procedure.) By default gretl uses native code that employs the BFGS maximizer; you
also have the option (activated by the --fcp command-line switch) of using the method proposed
by Fiorentini et al. (1996),
4
which was adopted as a benchmark in the study of GARCH results
by McCullough and Renfro (1998). It employs analytical rst and second derivatives of the log-
likelihood, and uses a mixed-gradient algorithm, exploiting the information matrix in the early
iterations and then switching to the Hessian in the neighborhood of the maximum likelihood. (This
progress can be observed if you append the --verbose option to gretls garch command.)
Several options are available for computing the covariance matrix of the parameter estimates in
connection with the garch command. At a rst level, one can choose between a standard and a
robust estimator. By default, the Hessian is used unless the --robust option is given, in which
case the QML estimator is used. A ner choice is available via the set command, as shown in
Table 23.2.
Table 23.2: Options for the GARCH covariance matrix
command eect
set garch_vcv hessian Use the Hessian
set garch_vcv im Use the Information Matrix
set garch_vcv op Use the Outer Product of the Gradient
set garch_vcv qml QML estimator
set garch_vcv bw BollerslevWooldridge sandwich estimator
It is not uncommon, when one estimates a GARCH model for an arbitrary time series, to nd that
the iterative calculation of the estimates fails to converge. For the GARCH model to make sense,
there are strong restrictions on the admissible parameter values, and it is not always the case
that there exists a set of values inside the admissible parameter space for which the likelihood is
maximized.
The restrictions in question can be explained by reference to the simplest (and much the most
common) instance of the GARCH model, where p = q = 1. In the GARCH(1, 1) model the conditional
variance is
2
t
=
0
+
1
u
2
t1
+
1
2
t1
(23.16)
Taking the unconditional expectation of (23.16) we get
2
=
0
+
1
2
+
1
2
4
The algorithm is based on Fortran code deposited in the archive of the Journal of Applied Econometrics by the authors,
and is used by kind permission of Professor Fiorentini.
Chapter 23. Univariate time series models 186
so that
2
=
0
1
1
1
For this unconditional variance to exist, we require that
1
+
1
< 1, and for it to be positive we
require that
0
> 0.
A common reason for non-convergence of GARCH estimates (that is, a common reason for the non-
existence of
i
and
i
values that satisfy the above requirements and at the same time maximize
the likelihood of the data) is misspecication of the model. It is important to realize that GARCH, in
itself, allows only for time-varying volatility in the data. If the mean of the series in question is not
constant, or if the error process is not only heteroskedastic but also autoregressive, it is necessary
to take this into account when formulating an appropriate model. For example, it may be necessary
to take the rst dierence of the variable in question and/or to add suitable regressors, X
t
, as in
(23.13).
Chapter 24
Multivariate time series models
Gretl provides a standard set of procedures for dealing with the multivariate time-series models
known as VARs (Vector AutoRegression). More general models such as VARMAs, nonlinear mod-
els or multivariate GARCH models are not provided as of now, although it is entirely possible
to estimate them by writing custom procedures in the gretl scripting language. In this chapter, we
will briey review gretls VAR toolbox.
24.1 Notation
A VAR is a structure whose aim is to model the time persistence of a vector of n time series, y
t
,
via a multivariate autoregression, as in
y
t
= A
1
y
t1
+A
2
y
t2
+ +A
p
y
tp
+Bx
t
+
t
(24.1)
The number of lags p is called the order of the VAR. The vector x
t
, if present, contains a set of
exogenous variables, often including a constant, possibly with a time trend and seasonal dummies.
The vector
t
is typically assumed to be a vector white noise, with covariance matrix .
Equation (24.1) can be written more compactly as
A(L)y
t
= Bx
t
+
t
(24.2)
where A(L) is a matrix polynomial in the lag operator, or as
_
_
_
_
_
_
y
t
y
t1
y
tp1
_
_
_
_
_
_
= A
_
_
_
_
_
_
y
t1
y
t2
y
tp
_
_
_
_
_
_
+
_
_
_
_
_
_
B
0
0
_
_
_
_
_
_
x
t
+
_
_
_
_
_
_
t
0
0
_
_
_
_
_
_
(24.3)
The matrix A is known as the companion matrix and equals
A =
_
_
_
_
_
_
_
A
1
A
2
A
p
I 0 0
0 I 0
.
.
.
.
.
.
.
.
.
.
.
.
_
_
_
_
_
_
_
Equation (24.3) is known as the companion form of the VAR.
Another representation of interest is the so-called VMA representation, which is written in terms
of an innite series of matrices
i
dened as
i
=
y
t
ti
(24.4)
The
i
matrices may be derived by recursive substitution in equation (24.1): for example, assuming
for simplicity that B = 0 and p = 1, equation (24.1) would become
y
t
= Ay
t1
+
t
187
Chapter 24. Multivariate time series models 188
which could be rewritten as
y
t
= A
n+1
y
tn1
+
t
+A
t1
+A
2
t2
+ +A
n
tn
In this case
i
= A
i
. In general, it is possible to compute
i
as the n n north-west block of the
i-th power of the companion matrix A (so
0
is always an identity matrix).
The VAR is said to be stable if all the eigenvalues of the companion matrix A are smaller than 1
in absolute value, or equivalently, if the matrix polynomial A(L) in equation (24.2) is such that
A(z) = 0 implies z > 1. If this is the case, lim
n
n
= 0 and the vector y
t
is stationary; as a
consequence, the equation
y
t
E(y
t
) =
_
i=0
ti
(24.5)
is a legitimate Wold representation.
If the VAR is not stable, then the inferential procedures that are called for become somewhat more
specialized, except for some simple cases. In particular, if the number of eigenvalues of A with
modulus 1 is between 1 and n1, the canonical tool to deal with these models is the cointegrated
VAR model, discussed in chapter 25.
24.2 Estimation
The gretl command for estimating a VAR is var which, in the command line interface, is invoked
in the following manner:
[ modelname <- ] var p Ylist [; Xlist]
where p is a scalar (the VAR order) and Ylist is a list of variables describing the content of y
t
. If
the list Xlist is absent, the vector x
t
is understood to contain a constant only; if present, must be
separated from Ylist by a semi-colon and contains the other exogenous variables. Note, however,
that a few common choices can be obtained in a simpler way via options: the options gretl provides
are --trend, --seasonals and --nc (no constant). Either Ylist and Xlist may be named lists
(see section 12.1). The <- construct can be used to store the model under a name (see section
3.2), if so desired. To estimate a VAR using the graphical interface, choose Time Series, Vector
Autoregression, under the Model menu.
The parameters in eq. (24.1) are typically free from restrictions, which implies that multivariate
OLS provides a consistent and asymptotically ecient estimator of all the parameters.
1
Given
the simplicity of OLS, this is what every software package, including gretl, uses: example script
24.1 exemplies the fact that the var command gives you exactly the output you would have
from a battery of OLS regressions. The advantage of using the dedicated command is that, after
estimation is done, it makes it much easier to access certain quantities and manage certain tasks.
For example, the $coeff accessor returns the estimated coecients as a matrix with n columns
and $sigma returns an estimate of the matrix , the covariance matrix of
t
.
Moreover, for each variable in the system an F test is automatically performed, in which the null hy-
pothesis is that no lags of variable j are signicant in the equation for variable i. This is commonly
known as a Granger causality test.
In addition, two accessors become available for the companion matrix ($compan) and the VMA rep-
resentation ($vma). The latter deserves a detailed description: since the VMA representation (24.5)
is of innite order, gretl denes a horizon up to which the
i
matrices are computed automatically.
By default, this is a function of the periodicity of the data (see table 24.1), but it can be set by the
user to any desired value via the set command with the horizon parameter, as in
1
In fact, under normality of
t
OLS is indeed the conditional ML estimator. You may want to use other methods if you
need to estimate a VAR in which some parameters are constrained.
Chapter 24. Multivariate time series models 189
Example 24.1: Estimation of a VAR via OLS
Input:
open sw_ch14.gdt
genr infl = 400*sdiff(log(PUNEW))
scalar p = 2
list X = LHUR infl
list Xlag = lags(p,X)
loop foreach i X
ols $i const Xlag
end loop
var p X
Output (selected portions):
Model 1: OLS, using observations 1960:3-1999:4 (T = 158)
Dependent variable: LHUR
coefficient std. error t-ratio p-value
--------------------------------------------------------
const 0.113673 0.0875210 1.299 0.1960
LHUR_1 1.54297 0.0680518 22.67 8.78e-51 ***
LHUR_2 -0.583104 0.0645879 -9.028 7.00e-16 ***
infl_1 0.0219040 0.00874581 2.505 0.0133 **
infl_2 -0.0148408 0.00920536 -1.612 0.1090
Mean dependent var 6.019198 S.D. dependent var 1.502549
Sum squared resid 8.654176 S.E. of regression 0.237830
...
VAR system, lag order 2
OLS estimates, observations 1960:3-1999:4 (T = 158)
Log-likelihood = -322.73663
Determinant of covariance matrix = 0.20382769
AIC = 4.2119
BIC = 4.4057
HQC = 4.2906
Portmanteau test: LB(39) = 226.984, df = 148 [0.0000]
Equation 1: LHUR
coefficient std. error t-ratio p-value
--------------------------------------------------------
const 0.113673 0.0875210 1.299 0.1960
LHUR_1 1.54297 0.0680518 22.67 8.78e-51 ***
LHUR_2 -0.583104 0.0645879 -9.028 7.00e-16 ***
infl_1 0.0219040 0.00874581 2.505 0.0133 **
infl_2 -0.0148408 0.00920536 -1.612 0.1090
Mean dependent var 6.019198 S.D. dependent var 1.502549
Sum squared resid 8.654176 S.E. of regression 0.237830
Chapter 24. Multivariate time series models 190
Periodicity horizon
Quarterly 20 (5 years)
Monthly 24 (2 years)
Daily 3 weeks
All other cases 10
Table 24.1: VMA horizon as a function of the dataset periodicity
set horizon 30
Calling the horizon h, the $vma accessor returns an (h+1) n
2
matrix, in which the (i +1)-th row
is the vectorized form of
i
.
VAR order selection
In order to help the user choose the most appropriate VAR order, gretl provides a special syntax
construct to the var command:
var p Ylist [; Xlist] --lagselect
When the command is invoked with the --lagselect option, estimation is performed for all lags
up to p and a table is printed: it displays, for each order, a LR test for the order p versus p 1,
plus an array of information criteria (see chapter 21). For each information criterion in the table, a
star indicates what appears to be the best choice. The same output can be obtained through the
graphical interface via the Time Series, VAR lag selection entry under the Model menu.
Warning: in nite samples the choice of p may aect the outcome of the procedure. This is not a
bug, but rather a nasty but unavoidable side eect of the way these comparisons should be made:
if your sample contains T observations, the lag selection procedure, if invoked with parameter p,
examines all VARs of order ranging form 1 to p, estimated on a sample of T p observations. In
other words, the comparison procedure does not use all the data available when estimating VARs
of order less than p to make sure that all the models compared are estimated on the same data
range. Under these circumstances, choosing a dierent value of p may alter the results, although
this is unlikely to happen if your sample size is reasonably large.
An example of this unpleasant phenomenon is given in example script 24.2. As can be seen, ac-
cording to the Hannan-Quinn criterion, order 2 seems preferable to order 1 if the maximum tested
order is 4, but the situation is reversed if the maximum tested order is 6.
24.3 Structural VARs
As of today, gretl does not provide a native implementation for the class of models known as
Structural VARs; however, it provides an implementation of the Cholesky deconposition-based
approach, which is the most classic, and certainly most popular SVAR version.
IRF and FEVD
Assume that the disturbance in equation (24.1) can be thought of as a linear function of a vector
of structural shocks u
t
, which are assumed to have unit variance and to be incorrelated to one
another, so V(u
t
) = I. If
t
= Ku
t
, it follows that = V(
t
) = KK
.
The main object of interest in this setting the sequence of matrices
C
k
=
y
t
u
ti
=
k
K, (24.6)
Chapter 24. Multivariate time series models 191
Example 24.2: VAR lag selection via Information Criteria
Input:
open denmark
list Y = 1 2 3 4
var 4 Y --lagselect
var 6 Y --lagselect
Output (selected portions):
VAR system, maximum lag order 4
The asterisks below indicate the best (that is, minimized) values
of the respective information criteria, AIC = Akaike criterion,
BIC = Schwarz Bayesian criterion and HQC = Hannan-Quinn criterion.
lags loglik p(LR) AIC BIC HQC
1 609.15315 -23.104045 -22.346466* -22.814552
2 631.70153 0.00013 -23.360844* -21.997203 -22.839757*
3 642.38574 0.16478 -23.152382 -21.182677 -22.399699
4 653.22564 0.15383 -22.950025 -20.374257 -21.965748
VAR system, maximum lag order 6
The asterisks below indicate the best (that is, minimized) values
of the respective information criteria, AIC = Akaike criterion,
BIC = Schwarz Bayesian criterion and HQC = Hannan-Quinn criterion.
lags loglik p(LR) AIC BIC HQC
1 594.38410 -23.444249 -22.672078* -23.151288*
2 615.43480 0.00038 -23.650400* -22.260491 -23.123070
3 624.97613 0.26440 -23.386781 -21.379135 -22.625083
4 636.03766 0.13926 -23.185210 -20.559827 -22.189144
5 658.36014 0.00016 -23.443271 -20.200150 -22.212836
6 669.88472 0.11243 -23.260601 -19.399743 -21.795797
Chapter 24. Multivariate time series models 192
known as the structural VMA representation. From the C
k
matrices dened in equation (24.6) two
quantities of interest may be derived: the Impulse Response Function (IRF) and the Forecast Error
Variance Decomposition (FEVD).
The IRF of variable i to shock j is simply the sequence of the elements in row i and column j of
the C
k
matrices. In formulae:
J
i,j,k
=
y
i,t
u
j,tk
As a rule, Impulse Response Functions are plotted as a function of k, and are interpreted as the
eect that a shock has on an observable variable through time. Of course, what we observe are the
estimated IRFs, so it is natural to endow them with condence intervals: following common practice
among econometric software, gretl computes the condence intervals by using the bootstrap
2
;
details are later in this section.
Another quantity of interest that may be computed from the structural VMA representation is the
Forecast Error Variance Decomposition (FEVD). The forecast error variance after h steps is given by
h
=
h
_
k=0
C
k
C
k
hence the variance for variable i is
2
i
= [
h
]
i,i
=
h
_
k=0
diag(C
k
C
k
)
i
=
h
_
k=0
n
_
l=1
(
k
c
i.l
)
2
where
k
c
i.l
is, trivially, the i, l element of C
k
. As a consequence, the share of uncertainty on variable
i that can be attributed to the j-th shock after h periods equals
`1
i,j,h
=
h
k=0
(
k
c
i.j
)
2
h
k=0
n
l=1
(
k
c
i.l
)
2
.
This makes it possible to quantify which shocks are most important to determine a certain variable
in the short and/or in the long run.
Triangularization
The formula 24.6 takes K as known, while of course it has to be estimated. The estimation problem
has been the subject of an enormous body of literature we will not even attempt to summarize
here: see for example (Ltkepohl, 2005, chapter 9).
Suce it to say that the most popular choice dates back to Sims (1980), and consists in assuming
that K is lower triangular, so its estimate is simply the Cholesky deconposition of the estimate of .
The main consequence of this choice is that the ordering of variables within the vector y
t
becomes
meaningful: since K is also the matrix of Impulse Response Functions at lag 0, the triangularity
assumption means that the rst variable in the ordering responds instantaneously only to shock
number 1, the second one only to shocks 1 and 2, and so forth. For this reason, each variable is
thought to own one shock: variable 1 owns shock number 1, etcetera.
This is the reason why in this sort of exercises the ordering of the variables is important and
the applied literature has developed the most exogenous rst mantra. Where, in this setting,
exogenous really means instantaneously insensitive to structural shocks
3
. To put it dierently,
2
It is possible, in principle, to compute analytical condence intervals via an asymptotic approximation, but this is
not a very popular choice: asymptotic formulae are known to often give a very poor approximation of the nite-sample
properties.
3
The word exogenous has caught on in this context, but its a rather unfortunate choice: for a start, each shock
impacts on every variable after one lag, so nothing is really exogenous here. A much better choice of words would
probably have been something like sturdy, but its too late now.
Chapter 24. Multivariate time series models 193
if variable foo comes before variable bar in the Y list, it follows that the shock owned by foo
aects bar instantaneously, but the reverse does not happen.
Impulse Response Functions and the FEVD can be printed out via the command line interface by us-
ing the --impulse-response and --variance-decomp options, respectively. If you need to store
them into matrices, you can compute the structural VMA and proceed from there. For example, the
following code snippet shows you how to compute a matrix containing the IRFs:
open denmark
list Y = 1 2 3 4
scalar n = nelem(Y)
var 2 Y --quiet --impulse
matrix K = cholesky($sigma)
matrix V = $vma
matrix IRF = V * (K ** I(n))
print IRF
in which the equality
vec(C
k
) = vec(
k
K) = (K
I)vec(
k
)
was used.
FIXME: show all the nice stu we have under the GUI.
IRF bootstrap
FIXME: todo
Chapter 25
Cointegration and Vector Error Correction Models
25.1 Introduction
The twin concepts of cointegration and error correction have drawn a good deal of attention in
macroeconometrics over recent years. The attraction of the Vector Error Correction Model (VECM)
is that it allows the researcher to embed a representation of economic equilibrium relationships
within a relatively rich time-series specication. This approach overcomes the old dichotomy be-
tween (a) structural models that faithfully represented macroeconomic theory but failed to t the
data, and (b) time-series models that were accurately tailored to the data but dicult if not impos-
sible to interpret in economic terms.
The basic idea of cointegration relates closely to the concept of unit roots (see section 23.3). Sup-
pose we have a set of macroeconomic variables of interest, and we nd we cannot reject the hypoth-
esis that some of these variables, considered individually, are non-stationary. Specically, suppose
we judge that a subset of the variables are individually integrated of order 1, or I(1). That is, while
they are non-stationary in their levels, their rst dierences are stationary. Given the statistical
problems associated with the analysis of non-stationary data (for example, the threat of spurious
regression), the traditional approach in this case was to take rst dierences of all the variables
before proceeding with the analysis.
But this can result in the loss of important information. It may be that while the variables in
question are I(1) when taken individually, there exists a linear combination of the variables that
is stationary without dierencing, or I(0). (There could be more than one such linear combina-
tion.) That is, while the ensemble of variables may be free to wander over time, nonetheless the
variables are tied together in certain ways. And it may be possible to interpret these ties, or
cointegrating vectors, as representing equilibrium conditions.
For example, suppose we nd some or all of the following variables are I(1): money stock, M, the
price level, P, the nominal interest rate, R, and output, Y. According to standard theories of the
demand for money, we would nonetheless expect there to be an equilibrium relationship between
real balances, interest rate and output; for example
mp =
0
+
1
y +
2
r
1
> 0,
2
< 0
where lower-case variable names denote logs. In equilibrium, then,
mp
1
y
2
r =
0
Realistically, we should not expect this condition to be satised each period. We need to allow for
the possibility of short-run disequilibrium. But if the system moves back towards equilibrium fol-
lowing a disturbance, it follows that the vector x = (m, p, y, r)
= (
1
,
2
,
3
,
4
), such that
i
y
ti
+
t
, (25.2)
where =
p
i=1
A
i
I and
i
=
p
j=i+1
A
j
. This is the VECM representation of (25.1).
The interpretation of (25.2) depends crucially on r, the rank of the matrix .
If r = 0, the processes are all I(1) and not cointegrated.
If r = n, then is invertible and the processes are all I(0).
Cointegration occurs in between, when 0 < r < n and can be written as
. In this case,
y
t
is I(1), but the combination z
t
=
y
t
is I(0). If, for example, r = 1 and the rst element
of was 1, then one could write z
t
= y
1,t
+
2
y
2,t
+ +
n
y
n,t
, which is equivalent to
saying that
y
1
t
=
2
y
2,t
+ +
n
y
n,t
z
t
is a long-run equilibrium relationship: the deviations z
t
may not be 0 but they are stationary.
In this case, (25.2) can be written as
y
t
=
t
+
y
t1
+
p1
_
i=1
i
y
ti
+
t
. (25.3)
If were known, then z
t
would be observable and all the remaining parameters could be
estimated via OLS. In practice, the procedure estimates rst and then the rest.
The rank of is investigated by computing the eigenvalues of a closely related matrix whose rank
is the same as : however, this matrix is by construction symmetric and positive semidenite. As a
consequence, all its eigenvalues are real and non-negative, and tests on the rank of can therefore
be carried out by testing how many eigenvalues are 0.
If all the eigenvalues are signicantly dierent from 0, then all the processes are stationary. If,
on the contrary, there is at least one zero eigenvalue, then the y
t
process is integrated, although
some linear combination
y
t
might be stationary. At the other extreme, if no eigenvalues are
signicantly dierent from 0, then not only is the process y
t
non-stationary, but the same holds
for any linear combination
y
t
; in other words, no cointegration occurs.
Estimation typically proceeds in two stages: rst, a sequence of tests is run to determine r, the
cointegration rank. Then, for a given rank the parameters in equation (25.3) are estimated. The two
commands that gretl oers for estimating these systems are coint2 and vecm, respectively.
The syntax for coint2 is
Chapter 25. Cointegration and Vector Error Correction Models 196
coint2 p ylist [ ; xlist [ ; zlist ] ]
where p is the number of lags in (25.1); ylist is a list containing the y
t
variables; xlist is an
optional list of exogenous variables; and zlist is another optional list of exogenous variables
whose eects are assumed to be conned to the cointegrating relationships.
The syntax for vecm is
vecm p r ylist [ ; xlist [ ; zlist ] ]
where p is the number of lags in (25.1); r is the cointegration rank; and the lists ylist, xlist and
zlist have the same interpretation as in coint2.
Both commands can be given specic options to handle the treatment of the deterministic compo-
nent
t
. These are discussed in the following section.
25.3 Interpretation of the deterministic components
Statistical inference in the context of a cointegrated system depends on the hypotheses one is
willing to make on the deterministic terms, which leads to the famous ve cases.
In equation (25.2), the term
t
is usually understood to take the form
t
=
0
+
1
t.
In order to have the model mimic as closely as possible the features of the observed data, there is a
preliminary question to settle. Do the data appear to follow a deterministic trend? If so, is it linear
or quadratic?
Once this is established, one should impose restrictions on
0
and
1
that are consistent with this
judgement. For example, suppose that the data do not exhibit a discernible trend. This means that
y
t
is on average zero, so it is reasonable to assume that its expected value is also zero. Write
equation (25.2) as
(L)y
t
=
0
+
1
t +z
t1
+
t
, (25.4)
where z
t
=
y
t
is assumed to be stationary and therefore to possess nite moments. Taking
unconditional expectations, we get
0 =
0
+
1
t +m
z
.
Since the left-hand side does not depend on t, the restriction
1
= 0 is a safe bet. As for
0
, there are
just two ways to make the above expression true: either
0
= 0 with m
z
= 0, or
0
equals m
z
.
The latter possibility is less restrictive in that the vector
0
may be non-zero, but is constrained to
be a linear combination of the columns of . In that case,
0
can be written as c, and one may
write (25.4) as
(L)y
t
=
_
c
_
_
y
t1
1
_
+
t
.
The long-run relationship therefore contains an intercept. This type of restriction is usually written
0
= 0,
where
t
_
or in VECM form
_
y
t
x
t
_
=
_
k +m
m
_
+
_
1 1
0 0
__
y
t1
x
t1
_
+
_
u
t
+
t
t
_
=
=
_
k +m
m
_
+
_
1
0
_
_
1 1
_
_
y
t1
x
t1
_
+
_
u
t
+
t
t
_
=
=
0
+
_
y
t1
x
t1
_
+
t
=
0
+z
t1
+
t
,
where is the cointegration vector and is the loadings or adjustments vector.
We are now ready to consider three possible cases:
1. m}= 0: In this case x
t
is trended, as we just saw; it follows that y
t
also follows a linear trend
because on average it keeps at a xed distance k from x
t
. The vector
0
is unrestricted.
2. m = 0 and k }= 0: In this case, x
t
is not trended and as a consequence neither is y
t
. However,
the mean distance between y
t
and x
t
is non-zero. The vector
0
is given by
0
=
_
k
0
_
which is not null and therefore the VECM shown above does have a constant term. The
constant, however, is subject to the restriction that its second element must be 0. More
generally,
0
is a multiple of the vector . Note that the VECM could also be written as
_
y
t
x
t
_
=
_
1
0
_
_
1 1 k
_
_
_
_
_
y
t1
x
t1
1
_
_
_
_
+
_
u
t
+
t
t
_
which incorporates the intercept into the cointegration vector. This is known as the restricted
constant case.
3. m = 0 and k = 0: This case is the most restrictive: clearly, neither x
t
nor y
t
are trended, and
the mean distance between them is zero. The vector
0
is also 0, which explains why this case
is referred to as no constant.
In most cases, the choice between these three possibilities is based on a mix of empirical obser-
vation and economic reasoning. If the variables under consideration seem to follow a linear trend
Chapter 25. Cointegration and Vector Error Correction Models 198
then we should not place any restriction on the intercept. Otherwise, the question arises of whether
it makes sense to specify a cointegration relationship which includes a non-zero intercept. One ex-
ample where this is appropriate is the relationship between two interest rates: generally these are
not trended, but the VAR might still have an intercept because the dierence between the two (the
interest rate spread) might be stationary around a non-zero mean (for example, because of a risk
or liquidity premium).
The previous example can be generalized in three directions:
1. If a VAR of order greater than 1 is considered, the algebra gets more convoluted but the
conclusions are identical.
2. If the VAR includes more than two endogenous variables the cointegration rank r can be
greater than 1. In this case, is a matrix with r columns, and the case with restricted constant
entails the restriction that
0
should be some linear combination of the columns of .
3. If a linear trend is included in the model, the deterministic part of the VAR becomes
0
+
1
t.
The reasoning is practically the same as above except that the focus now centers on
1
rather
than
0
. The counterpart to the restricted constant case discussed above is a restricted
trend case, such that the cointegration relationships include a trend but the rst dierences
of the variables in question do not. In the case of an unrestricted trend, the trend appears
in both the cointegration relationships and the rst dierences, which corresponds to the
presence of a quadratic trend in the variables themselves (in levels).
In order to accommodate the ve cases, gretl provides the following options to the coint2 and
vecm commands:
t
option ag description
0 --nc no constant
0
,
0
= 0 --rc restricted constant
0
--uc unrestricted constant
0
+
1
t,
1
= 0 --crt constant + restricted trend
0
+
1
t --ct constant + unrestricted trend
Note that for this command the above options are mutually exclusive. In addition, you have the
option of using the --seasonal options, for augmenting
t
with centered seasonal dummies. In
each case, p-values are computed via the approximations devised by Doornik (1998).
25.4 The Johansen cointegration tests
The two Johansen tests for cointegration are used to establish the rank of ; in other words, how
many cointegration vectors the system has. These are the -max test, for hypotheses on indi-
vidual eigenvalues, and the trace test, for joint hypotheses. Suppose that the eigenvalues
i
are
sorted from largest to smallest. The null hypothesis for the -max test on the i-th eigenvalue is
that
i
= 0. The corresponding trace test, instead, considers the hypothesis
j
= 0 for all j i.
The gretl command coint2 performs these two tests. The corresponding menu entry in the GUI is
Model, Time Series, Cointegration Test, Johansen.
As in the ADF test, the asymptotic distribution of the tests varies with the deterministic component
t
one includes in the VAR (see section 25.3 above). The following code uses the denmark data le,
supplied with gretl, to replicate Johansens example found in his 1995 book.
open denmark
coint2 2 LRM LRY IBO IDE --rc --seasonal
Chapter 25. Cointegration and Vector Error Correction Models 199
In this case, the vector y
t
in equation (25.2) comprises the four variables LRM, LRY, IBO, IDE. The
number of lags equals p in (25.2) (that is, the number of lags of the model written in VAR form).
Part of the output is reported below:
Johansen test:
Number of equations = 4
Lag order = 2
Estimation period: 1974:3 - 1987:3 (T = 53)
Case 2: Restricted constant
Rank Eigenvalue Trace test p-value Lmax test p-value
0 0.43317 49.144 [0.1284] 30.087 [0.0286]
1 0.17758 19.057 [0.7833] 10.362 [0.8017]
2 0.11279 8.6950 [0.7645] 6.3427 [0.7483]
3 0.043411 2.3522 [0.7088] 2.3522 [0.7076]
Both the trace and -max tests accept the null hypothesis that the smallest eigenvalue is 0 (see the
last row of the table), so we may conclude that the series are in fact non-stationary. However, some
linear combination may be I(0), since the -max test rejects the hypothesis that the rank of is 0
(though the trace test gives less clear-cut evidence for this, with a p-value of 0.1284).
25.5 Identication of the cointegration vectors
The core problem in the estimation of equation (25.2) is to nd an estimate of that has by con-
struction rank r, so it can be written as =
0
for specic matrices
0
and
0
, then also equals (
0
Q)(Q
1
0
) for any conformable
non-singular matrix Q. In order to nd a unique solution, it is therefore necessary to impose
some restrictions on and/or . It can be shown that the minimum number of restrictions that
is necessary to guarantee identication is r
2
. Normalizing one coecient per column to 1 (or 1,
according to taste) is a trivial rst step, which also helps in that the remaining coecients can be
interpreted as the parameters in the equilibrium relations, but this only suces when r = 1.
The method that gretl uses by default is known as the Phillips normalization, or triangular
representation.
1
The starting point is writing in partitioned form as in
=
_
1
2
_
,
where
1
is an r r matrix and
2
is (n r) r. Assuming that
1
has full rank, can be
post-multiplied by
1
1
, giving
=
_
I
1
1
_
=
_
I
B
_
,
The coecients that gretl produces are
, with B known as the matrix of unrestricted coecients.
In terms of the underlying equilibriumrelationship, the Phillips normalization expresses the system
1
For comparison with other studies, you may wish to normalize dierently. Using the set command you can do
set vecm_norm diag to select a normalization that simply scales the columns of the original such that
ij
= 1
for i = j and i r, as used in the empirical section of Boswijk and Doornik (2004). Another alternative is
set vecm_norm first, which scales such that the elements on the rst row equal 1. To suppress normalization
altogether, use set vecm_norm none. (To return to the default: set vecm_norm phillips.)
Chapter 25. Cointegration and Vector Error Correction Models 200
of r equilibrium relations as
y
1,t
= b
1,r+1
y
r+1,t
+. . . +b
1,n
y
n,t
y
2,t
= b
2,r+1
y
r+1,t
+. . . +b
2,n
y
n,t
.
.
.
y
r,t
= b
r,r+1
y
r+1,t
+. . . +b
r,n
y
r,t
where the rst r variables are expressed as functions of the remaining nr.
Although the triangular representation ensures that the statistical problem of estimating is
solved, the resulting equilibrium relationships may be dicult to interpret. In this case, the user
may want to achieve identication by specifying manually the system of r
2
constraints that gretl
will use to produce an estimate of .
As an example, consider the money demand system presented in section 9.6 of Verbeek (2004). The
variables used are m (the log of real money stock M1), infl (ination), cpr (the commercial paper
rate), y (log of real GDP) and tbr (the Treasury bill rate).
2
Estimation of can be performed via the commands
open money.gdt
smpl 1954:1 1994:4
vecm 6 2 m infl cpr y tbr --rc
and the relevant portion of the output reads
Maximum likelihood estimates, observations 1954:1-1994:4 (T = 164)
Cointegration rank = 2
Case 2: Restricted constant
beta (cointegrating vectors, standard errors in parentheses)
m 1.0000 0.0000
(0.0000) (0.0000)
infl 0.0000 1.0000
(0.0000) (0.0000)
cpr 0.56108 -24.367
(0.10638) (4.2113)
y -0.40446 -0.91166
(0.10277) (4.0683)
tbr -0.54293 24.786
(0.10962) (4.3394)
const -3.7483 16.751
(0.78082) (30.909)
Interpretation of the coecients of the cointegration matrix would be easier if a meaning could
be attached to each of its columns. This is possible by hypothesizing the existence of two long-run
relationships: a money demand equation
m = c
1
+
1
infl +
2
y +
3
tbr
and a risk premium equation
cpr = c
2
+
4
infl +
5
y +
6
tbr
2
This data set is available in the verbeek data package; see http://gretl.sourceforge.net/gretl_data.html.
Chapter 25. Cointegration and Vector Error Correction Models 201
which imply that the cointegration matrix can be normalized as
=
_
_
_
_
_
_
_
_
_
_
_
_
1 0
1
4
0 1
2
5
3
6
c
1
c
2
_
_
_
_
_
_
_
_
_
_
_
_
This renormalization can be accomplished by means of the restrict command, to be given after
the vecm command or, in the graphical interface, by selecting the Test, Linear Restrictions menu
entry. The syntax for entering the restrictions should be fairly obvious:
3
restrict
b[1,1] = -1
b[1,3] = 0
b[2,1] = 0
b[2,3] = -1
end restrict
which produces
Cointegrating vectors (standard errors in parentheses)
m -1.0000 0.0000
(0.0000) (0.0000)
infl -0.023026 0.041039
(0.0054666) (0.027790)
cpr 0.0000 -1.0000
(0.0000) (0.0000)
y 0.42545 -0.037414
(0.033718) (0.17140)
tbr -0.027790 1.0172
(0.0045445) (0.023102)
const 3.3625 0.68744
(0.25318) (1.2870)
25.6 Over-identifying restrictions
One purpose of imposing restrictions on a VECM system is simply to achieve identication. If these
restrictions are simply normalizations, they are not testable and should have no eect on the max-
imized likelihood. In addition, however, one may wish to formulate constraints on and/or that
derive from the economic theory underlying the equilibrium relationships; substantive restrictions
of this sort are then testable via a likelihood-ratio statistic.
Gretl is capable of testing general linear restrictions of the form
R
b
vec() = q (25.5)
and/or
R
a
vec() = 0 (25.6)
Note that the restriction may be non-homogeneous (q 0) but the restriction must be homo-
geneous. Nonlinear restrictions are not supported, and neither are restrictions that cross between
3
Note that in this context we are bending the usual matrix indexation convention, using the leading index to refer to
the column of (the particular cointegrating vector). This is standard practice in the literature, and defensible insofar as
it is the columns of (the cointegrating relations or equilibrium errors) that are of primary interest.
Chapter 25. Cointegration and Vector Error Correction Models 202
and . In the case where r > 1 such restrictions may be in common across all the columns of
(or ) or may be specic to certain columns of these matrices. This is the case discussed in Boswijk
(1995) and Boswijk and Doornik (2004), section 4.4.
The restrictions (25.5) and (25.6) may be written in explicit form as
vec() = H+h
0
(25.7)
and
vec(
) = G (25.8)
respectively, where and are the free parameter vectors associated with and respectively.
We may refer to the free parameters collectively as (the column vector formed by concatenating
and ). Gretl uses this representation internally when testing the restrictions.
If the list of restrictions that is passed to the restrict command contains more constraints than
necessary to achieve identication, then an LR test is performed; moreover, the restrict com-
mand can be given the --full switch, in which case full estimates for the restricted system are
printed (including the
i
terms), and the system thus restricted becomes the current model for
the purposes of further tests. Thus you are able to carry out cumulative tests, as in Chapter 7 of
Johansen (1995).
Syntax
The full syntax for specifying the restriction is an extension of that exemplied in the previous
section. Inside a restrict. . . end restrict block, valid statements are of the form
parameter linear combination = scalar
where a parameter linear combination involves a weighted sum of individual elements of or
(but not both in the same combination); the scalar on the right-hand side must be 0 for combina-
tions involving , but can be any real number for combinations involving . Below, we give a few
examples of valid restrictions:
b[1,1] = 1.618
b[1,4] + 2*b[2,5] = 0
a[1,3] = 0
a[1,1] - a[1,2] = 0
Special syntax is used when a certain constraint should be applied to all columns of : in this case,
one index is given for each b term, and the square brackets are dropped. Hence, the following
syntax
restrict
b1 + b2 = 0
end restrict
corresponds to
=
_
_
_
_
_
_
11
21
11
21
13
23
14
24
_
_
_
_
_
_
The same convention is used for : when only one index is given for an a term the restriction is
presumed to apply to all r columns of , or in other words the variable associated with the given
row of is weakly exogenous. For instance, the formulation
Chapter 25. Cointegration and Vector Error Correction Models 203
restrict
a3 = 0
a4 = 0
end restrict
species that variables 3 and 4 do not respond to the deviation from equilibrium in the previous
period. Note that when two indices are given in a restriction on the indexation is consistent with
that for restrictions: the leading index denotes the cointegrating vector and the trailing index the
equation number.
Finally, a short-cut is available for setting up complex restrictions (but currently only in relation
to ): you can specify R
b
and q, as in R
b
vec() = q, by giving the names of previously dened
matrices. For example,
matrix I4 = I(4)
matrix vR = I4**(I4~zeros(4,1))
matrix vq = mshape(I4,16,1)
restrict
R = vR
q = vq
end restrict
which manually imposes Phillips normalization on the estimates for a system with cointegrating
rank 4.
An example
Brand and Cassola (2004) propose a money demand system for the Euro area, in which they postu-
late three long-run equilibrium relationships:
money demand m=
l
l +
y
y
Fisher equation = l
Expectation theory of l = s
interest rates
where m is real money demand, l and s are long- and short-term interest rates, y is output and
is ination.
4
(The names for these variables in the gretl data le are m_p, rl, rs, y and infl,
respectively.)
The cointegration rank assumed by the authors is 3 and there are 5 variables, giving 15 elements
in the matrix. 3 3 = 9 restrictions are required for identication, and a just-identied system
would have 15 9 = 6 free parameters. However, the postulated long-run relationships feature
only three free parameters, so the over-identication rank is 3.
Example 25.1 replicates Table 4 on page 824 of the Brand and Cassola article.
5
Note that we use
the $lnl accessor after the vecm command to store the unrestricted log-likelihood and the $rlnl
accessor after restrict for its restricted counterpart.
The example continues in script 25.2, where we perform further testing to check whether (a) the
income elasticity in the money demand equation is 1 (
y
= 1) and (b) the Fisher relation is homo-
geneous ( = 1). Since the --full switch was given to the initial restrict command, additional
restrictions can be applied without having to repeat the previous ones. (The second script contains
4
A traditional formulation of the Fisher equation would reverse the roles of the variables in the second equation,
but this detail is immaterial in the present context; moreover, the expectation theory of interest rates implies that the
third equilibrium relationship should include a constant for the liquidity premium. However, since in this example the
system is estimated with the constant term unrestricted, the liquidity premium gets merged in the system intercept and
disappears from z
t
.
5
Modulo what appear to be a few typos in the article.
Chapter 25. Cointegration and Vector Error Correction Models 204
Example 25.1: Estimation of a money demand system with constraints on
Input:
open brand_cassola.gdt
# perform a few transformations
m_p = m_p*100
y = y*100
infl = infl/4
rs = rs/4
rl = rl/4
# replicate table 4, page 824
vecm 2 3 m_p infl rl rs y -q
genr ll0 = $lnl
restrict --full
b[1,1] = 1
b[1,2] = 0
b[1,4] = 0
b[2,1] = 0
b[2,2] = 1
b[2,4] = 0
b[2,5] = 0
b[3,1] = 0
b[3,2] = 0
b[3,3] = 1
b[3,4] = -1
b[3,5] = 0
end restrict
genr ll1 = $rlnl
Partial output:
Unrestricted loglikelihood (lu) = 116.60268
Restricted loglikelihood (lr) = 115.86451
2 * (lu - lr) = 1.47635
P(Chi-Square(3) > 1.47635) = 0.68774
beta (cointegrating vectors, standard errors in parentheses)
m_p 1.0000 0.0000 0.0000
(0.0000) (0.0000) (0.0000)
infl 0.0000 1.0000 0.0000
(0.0000) (0.0000) (0.0000)
rl 1.6108 -0.67100 1.0000
(0.62752) (0.049482) (0.0000)
rs 0.0000 0.0000 -1.0000
(0.0000) (0.0000) (0.0000)
y -1.3304 0.0000 0.0000
(0.030533) (0.0000) (0.0000)
Chapter 25. Cointegration and Vector Error Correction Models 205
a few printf commands, which are not strictly necessary, to format the output nicely.) It turns out
that both of the additional hypotheses are rejected by the data, with p-values of 0.002 and 0.004.
Example 25.2: Further testing of money demand system
Input:
restrict
b[1,5] = -1
end restrict
genr ll_uie = $rlnl
restrict
b[2,3] = -1
end restrict
genr ll_hfh = $rlnl
# replicate table 5, page 824
printf "Testing zero restrictions in cointegration space:\n"
printf " LR-test, rank = 3: chi^2(3) = %6.4f [%6.4f]\n", 2*(ll0-ll1), \
pvalue(X, 3, 2*(ll0-ll1))
printf "Unit income elasticity: LR-test, rank = 3:\n"
printf " chi^2(4) = %g [%6.4f]\n", 2*(ll0-ll_uie), \
pvalue(X, 4, 2*(ll0-ll_uie))
printf "Homogeneity in the Fisher hypothesis:\n"
printf " LR-test, rank = 3: chi^2(4) = %6.3f [%6.4f]\n", 2*(ll0-ll_hfh), \
pvalue(X, 4, 2*(ll0-ll_hfh))
Output:
Testing zero restrictions in cointegration space:
LR-test, rank = 3: chi^2(3) = 1.4763 [0.6877]
Unit income elasticity: LR-test, rank = 3:
chi^2(4) = 17.2071 [0.0018]
Homogeneity in the Fisher hypothesis:
LR-test, rank = 3: chi^2(4) = 15.547 [0.0037]
Another type of test that is commonly performed is the weak exogeneity test. In this context, a
variable is said to be weakly exogenous if all coecients on the corresponding row in the matrix
are zero. If this is the case, that variable does not adjust to deviations from any of the long-run
equilibria and can be considered an autonomous driving force of the whole system.
The code in Example 25.3 performs this test for each variable in turn, thus replicating the rst
column of Table 6 on page 825 of Brand and Cassola (2004). The results show that weak exogeneity
might perhaps be accepted for the long-term interest rate and real GDP (p-values 0.07 and 0.08
respectively).
Identication and testability
One point regarding VECM restrictions that can be confusing at rst is that identication (does
the restriction identify the system?) and testability (is the restriction testable?) are quite separate
matters. Restrictions can be identifying but not testable; less obviously, they can be testable but
not identifying.
This can be seen quite easily in relation to a rank-1 system. The restriction
1
= 1 is identifying
(it pins down the scale of ) but, being a pure scaling, it is not testable. On the other hand, the
restriction
1
+
2
= 0 is testable the system with this requirement imposed will almost certainly
have a lower maximized likelihood but it is not identifying; it still leaves open the scale of .
Chapter 25. Cointegration and Vector Error Correction Models 206
Example 25.3: Testing for weak exogeneity
Input:
restrict
a1 = 0
end restrict
ts_m = 2*(ll0 - $rlnl)
restrict
a2 = 0
end restrict
ts_p = 2*(ll0 - $rlnl)
restrict
a3 = 0
end restrict
ts_l = 2*(ll0 - $rlnl)
restrict
a4 = 0
end restrict
ts_s = 2*(ll0 - $rlnl)
restrict
a5 = 0
end restrict
ts_y = 2*(ll0 - $rlnl)
loop foreach i m p l s y --quiet
printf "\Delta $i\t%6.3f [%6.4f]\n", ts_$i, pvalue(X, 6, ts_$i)
endloop
Output (variable, LR test, p-value):
\Delta m 18.111 [0.0060]
\Delta p 21.067 [0.0018]
\Delta l 11.819 [0.0661]
\Delta s 16.000 [0.0138]
\Delta y 11.335 [0.0786]
Chapter 25. Cointegration and Vector Error Correction Models 207
We said above that the number of restrictions must equal at least r
2
, where r is the cointegrating
rank, for identication. This is a necessary and not a sucient condition. In fact, when r > 1 it can
be quite tricky to assess whether a given set of restrictions is identifying. Gretl uses the method
suggested by Doornik (1995), where identication is assessed via the rank of the information ma-
trix.
It can be shown that for restrictions of the sort (25.7) and (25.8) the information matrix has the
same rank as the Jacobian matrix
() =
_
(I
p
)G : (I
p
1
)H
_
A sucient condition for identication is that the rank of () equals the number of free para-
meters. The rank of this matrix is evaluated by examination of its singular values at a randomly
selected point in the parameter space. For practical purposes we treat this condition as if it were
both necessary and sucient; that is, we disregard the special cases where identication could be
achieved without this condition being met.
6
25.7 Numerical solution methods
In general, the ML estimator for the restricted VECM problem has no closed form solution, hence
the maximum must be found via numerical methods.
7
In some cases convergence may be dicult,
and gretl provides several choices to solve the problem.
Switching and LBFGS
Two maximization methods are available in gretl. The default is the switching algorithm set out
in Boswijk and Doornik (2004). The alternative is a limited-memory variant of the BFGS algorithm
(LBFGS), using analytical derivatives. This is invoked using the --lbfgs ag with the restrict
command.
The switching algorithm works by explicitly maximizing the likelihood at each iteration, with re-
spect to
, and
(the covariance matrix of the residuals) in turn. This method shares a feature
with the basic Johansen eigenvalues procedure, namely, it can handle a set of restrictions that does
not fully identify the parameters.
LBFGS, on the other hand, requires that the model be fully identied. When using LBFGS, therefore,
you may have to supplement the restrictions of interest with normalizations that serve to identify
the parameters. For example, one might use all or part of the Phillips normalization (see section
25.5).
Neither the switching algorithm nor LBFGS is guaranteed to nd the global ML solution.
8
The
optimizer may end up at a local maximum (or, in the case of the switching algorithm, at a saddle
point).
The solution (or lack thereof) may be sensitive to the initial value selected for . By default, gretl
selects a starting point using a deterministic method based on Boswijk (1995), but two further
options are available: the initialization may be adjusted using simulated annealing, or the user may
supply an explicit initial value for .
The default initialization method is:
1. Calculate the unrestricted ML
using the Johansen procedure.
6
See Boswijk and Doornik (2004), pp. 4478 for discussion of this point.
7
The exception is restrictions that are homogeneous, common to all or all (in case r > 1), and involve either
only or only. Such restrictions are handled via the modied eigenvalues method set out by Johansen (1995). We solve
directly for the ML estimator, without any need for iterative methods.
8
In developing gretls VECM-testing facilities we have considered a fair number of tricky cases from various sources.
Wed like to thank Luca Fanelli of the University of Bologna and Sven Schreiber of Goethe University Frankfurt for their
help in devising torture-tests for gretls VECM code.
Chapter 25. Cointegration and Vector Error Correction Models 208
2. If the restriction on is non-homogeneous, use the method proposed by Boswijk:
0
= [(I
r
H]
+
(I
r
h
0
(25.9)
where
= 0 and A
+
denotes the MoorePenrose inverse of A. Otherwise
0
= (H
H)
1
H
vec(
) (25.10)
3. vec(
0
) = H
0
+h
0
.
4. Calculate the unrestricted ML conditional on
0
, as per Johansen:
= S
01
0
(
0
S
11
0
)
1
(25.11)
5. If is restricted by vec(
) = G, then
0
= (G
G)
1
G
vec(
) and vec(
0
) = G
0
.
Alternative initialization methods
As mentioned above, gretl oers the option of adjusting the initialization using simulated anneal-
ing. This is invoked by adding the --jitter option to the restrict command.
The basic idea is this: we start at a certain point in the parameter space, and for each of n iterations
(currently n = 4096) we randomly select a new point within a certain radius of the previous one,
and determine the likelihood at the new point. If the likelihood is higher, we jump to the new
point; otherwise, we jump with probability P (and remain at the previous point with probability
1 P). As the iterations proceed, the system gradually cools that is, the radius of the random
perturbation is reduced, as is the probability of making a jump when the likelihood fails to increase.
In the course of this procedure many points in the parameter space are evaluated, starting with the
point arrived at by the deterministic method, which well call
0
. One of these points will be best
in the sense of yielding the highest likelihood: call it
if
>
0
, otherwise use
n
. That is, if we get an improvement in the likelihood via annealing,
we make full use of this; on the other hand, if we fail to get an improvement we nonetheless allow
the annealing to randomize the starting point. Experiments indicated that the latter eect can be
helpful.
Besides annealing, a further alternative is manual initialization. This is done by passing a prede-
ned vector to the set command with parameter initvals, as in
set initvals myvec
The details depend on whether the switching algorithm or LBFGS is used. For the switching algo-
rithm, there are two options for specifying the initial values. The more user-friendly one (for most
people, we suppose) is to specify a matrix that contains vec() followed by vec(). For example:
open denmark.gdt
vecm 2 1 LRM LRY IBO IDE --rc --seasonals
matrix BA = {1, -1, 6, -6, -6, -0.2, 0.1, 0.02, 0.03}
set initvals BA
restrict
b[1] = 1
b[1] + b[2] = 0
b[3] + b[4] = 0
end restrict
Chapter 25. Cointegration and Vector Error Correction Models 209
In this example from Johansen (1995) the cointegration rank is 1 and there are 4 variables.
However, the model includes a restricted constant (the --rc ag) so that has 5 elements. The
matrix has 4 elements, one per equation. So the matrix BA may be read as
(
1
,
2
,
3
,
4
,
5
,
1
,
2
,
3
,
4
)
The other option, which is compulsory when using LBFGS, is to specify the initial values in terms
of the free parameters, and . Getting this right is somewhat less obvious. As mentioned above,
the implicit-form restriction Rvec() = q has explicit form vec() = H+ h
0
, where H = R
, the
right nullspace of R. The vector is shorter, by the number of restrictions, than vec(). The
savvy user will then see what needs to be done. The other point to take into account is that if is
unrestricted, the eective length of is 0, since it is then optimal to compute using Johansens
formula, conditional on (equation 25.11 above). The example above could be rewritten as:
open denmark.gdt
vecm 2 1 LRM LRY IBO IDE --rc --seasonals
matrix phi = {-8, -6}
set initvals phi
restrict --lbfgs
b[1] = 1
b[1] + b[2] = 0
b[3] + b[4] = 0
end restrict
In this more economical formulation the initializer species only the two free parameters in (5
elements in minus 3 restrictions). There is no call to give values for since is unrestricted.
Scale removal
Consider a simpler version of the restriction discussed in the previous section, namely,
restrict
b[1] = 1
b[1] + b[2] = 0
end restrict
This restriction comprises a substantive, testable requirement that
1
and
2
sum to zero
and a normalization or scaling,
1
= 1. The question arises, might it be easier and more reliable
to maximize the likelihood without imposing
1
= 1?
9
If so, we could record this normalization,
remove it for the purpose of maximizing the likelihood, then reimpose it by scaling the result.
Unfortunately it is not possible to say in advance whether scale removal of this sort will give
better results, for any particular estimation problem. However, this does seem to be the case more
often than not. Gretl therefore performs scale removal where feasible, unless you
explicitly forbid this, by giving the --no-scaling option ag to the restrict command; or
provide a specic vector of initial values; or
select the LBFGS algorithm for maximization.
Scale removal is deemed infeasible if there are any cross-column restrictions on , or any non-
homogeneous restrictions involving more than one element of .
In addition, experimentation has suggested to us that scale removal is inadvisable if the system is
just identied with the normalization(s) included, so we do not do it in that case. By just identied
9
As a numerical matter, that is. In principle this should make no dierence.
Chapter 25. Cointegration and Vector Error Correction Models 210
we mean that the system would not be identied if any of the restrictions were removed. On that
criterion the above example is not just identied, since the removal of the second restriction would
not aect identication; and gretl would in fact perform scale removal in this case unless the user
specied otherwise.
Chapter 26
Forecasting
26.1 Introduction
In some econometric contexts forecasting is the prime objective: one wants estimates of the future
values of certain variables to reduce the uncertainty attaching to current decision making. In other
contexts where real-time forecasting is not the focus prediction may nonetheless be an important
moment in the analysis. For example, out-of-sample prediction can provide a useful check on
the validity of an econometric model. In other cases we are interested in questions of what if:
for example, how might macroeconomic outcomes have diered over a certain period if a dierent
policy had been pursued? In the latter cases prediction need not be a matter of actually projecting
into the future but in any case it involves generating tted values from a given model. The term
postdiction might be more accurate but it is not commonly used; we tend to talk of prediction
even when there is no true forecast in view.
This chapter oers an overview of the methods available within gretl for forecasting or prediction
(whether forward in time or not) and explicates some of the ner points of the relevant commands.
26.2 Saving and inspecting tted values
In the simplest case, the predictions of interest are just the (within sample) tted values from an
econometric model. For the single-equation linear model, y
t
= X
t
+u
t
, these are y
t
= X
t
.
In command-line mode, the y series can be retrieved, after estimating a model, using the accessor
$yhat, as in
series yh = $yhat
If the model in question takes the form of a system of equations, $yhat returns a matrix, each
column of which contains the tted values for a particular dependent variable. To extract the tted
series for, e.g., the dependent variable in the second equation, do
matrix Yh = $yhat
series yh2 = Yh[,2]
Having obtained a series of tted values, you can use the fcstats function to produce a vector of
statistics that characterize the accuracy of the predictions (see section 26.4 below).
The gretl GUI oers several ways of accessing and examining within-sample predictions. In the
model display window the Save menu contains an item for saving tted values, the Graphs menu
allows plotting of tted versus actual values, and the Analysis menu oers a display of actual, tted
and residual values.
26.3 The fcast command
The fcast command generates predictions based on the last estimated model. Several questions
arise here: How to control the range over which predictions are generated? How to control the
forecasting method (where a choice is available)? How to control the printing and/or saving of the
results? Basic answers can be found in the Gretl Command Reference; we add some more details
here.
211
Chapter 26. Forecasting 212
The forecast range
The range defaults to the currently dened sample range. If this remains unchanged following esti-
mation of the model in question, the forecast will be within sample and (with some qualications
noted below) it will essentially duplicate the information available via the retrieval of tted values
(see section 26.2 above).
A common situation is that a model is estimated over a given sample and then forecasts are
wanted for a subsequent out-of-sample range. The simplest way to accomplish this is via the
--out-of-sample option to fcast. For example, assuming we have a quarterly time-series dataset
containing observations from 1980:1 to 2008:4, four of which are to be reserved for forecasting:
# reserve the last 4 observations
smpl 1980:1 2007:4
ols y 0 xlist
fcast --out-of-sample
This will generate a forecast from 2008:1 to 2008:4.
There are two other ways of adjusting the forecast range, oering ner control:
Use the smpl command to adjust the sample range prior to invoking fcast.
Use the optional startobs and endobs arguments to fcast (which should come right after the
command word). These values set the forecast range independently of the sample range.
What if one wants to generate a true forecast that goes beyond the available data? In that case
one can use the dataset command with the addobs parameter to add extra observations before
forecasting. For example:
# use the entire dataset, which ends in 2008:4
ols y 0 xlist
dataset addobs 4
fcast 2009:1 2009:4
But this will work as stated only if the set of regressors in xlist does not contain any stochastic
regressors other than lags of y. The dataset addobs command attempts to detect and extrapolate
certain common deterministic variables (e.g., time trend, periodic dummy variables). In addition,
lagged values of the dependent variable can be supported via a dynamic forecast (see below for
discussion of the static/dynamic distinction). But future values of any other included regressors
must be supplied before such a forecast is possible. Note that specic values in a series can be
set directly by date, for example: x1[2009:1] = 120.5. Or, if the assumption of no change in the
regressors is warranted, one can do something like this:
loop t=2009:1..2009:4
loop foreach i xlist
$i[t] = $i[2008:4]
endloop
endloop
Static, dynamic and rolling forecasts
The distinction between static and dynamic forecasts applies only to dynamic models, i.e., those
that feature one or more lags of the dependent variable. The simplest case is the AR(1) model,
y
t
=
0
+
1
y
t1
+
t
(26.1)
Chapter 26. Forecasting 213
In some cases the presence of a lagged dependent variable is implicit in the dynamics of the error
term, for example
y
t
= +u
t
u
t
= u
t1
+
t
which implies that
y
t
= (1 ) +y
t1
+
t
Suppose we want to forecast y for period s using a dynamic model, say (26.1) for example. If
we have data on y available for period s 1 we could form a tted value in the usual way: y
s
=
0
+
1
y
s1
. But suppose that data are available only up to s 2. In that case we can apply the
chain rule of forecasting:
y
s1
=
0
+
1
y
s2
y
s
=
0
+
1
y
s1
This is what is called a dynamic forecast. A static forecast, on the other hand, is simply a tted
value (even if it happens to be computed out-of-sample).
Printing and saving forecasts
To be written.
26.4 Univariate forecast evaluation statistics
Let y
t
be the value of a variable of interest at time t and let f
t
be a forecast of y
t
. We dene the
forecast error as e
t
= y
t
f
t
. Given a series of T observations and associated forecasts we can
construct several measures of the overall accuracy of the forecasts. Some commonly used measures
are the Mean Error (ME), Mean Squared Error (MSE), Root Mean Squared Error (RMSE), Mean Absolute
Error (MAE), Mean Percentage Error (MPE) and Mean Absolute Percentage Error (MAPE). These are
dened as follows.
ME =
1
T
T
_
t=1
e
t
MSE =
1
T
T
_
t=1
e
2
t
RMSE =
_
1
T
T
_
t=1
e
2
t
MAE =
1
T
T
_
t=1
e
t
MPE =
1
T
T
_
t=1
100
e
t
y
t
MAPE =
1
T
T
_
t=1
100
e
t
y
t
A further relevant statistic is Theils U (Theil, 1966), dened as the positive square root of
U
2
=
1
T
T1
_
t=1
_
f
t+1
y
t+1
y
t
_
2
_
_
1
T
T1
_
t=1
_
y
t+1
y
t
y
t
_
2
_
_
1
The more accurate the forecasts, the lower the value of Theils U, which has a minimum of 0.
1
This measure can be interpreted as the ratio of the RMSE of the proposed forecasting model to the
RMSE of a nave model which simply predicts y
t+1
= y
t
for all t. The nave model yields U = 1;
values less than 1 indicate an improvement relative to this benchmark and values greater than 1 a
deterioration.
1
This statistic is sometimes called U
2
, to distinguish it from a related but dierent U dened in an earlier work by
Theil (1961). It seems to be generally accepted that the later version of Theils U is a superior statistic, so we ignore the
earlier version here.
Chapter 26. Forecasting 214
In addition, Theil (1966, pp. 3336) proposed a decomposition of the MSE which can be useful in
evaluating a set of forecasts. He showed that the MSE could be broken down into three non-negative
components as follows
MSE =
_
f y
_
2
+
_
s
f
rs
y
_
2
+
_
1 r
2
_
s
2
y
where
f and y are the sample means of the forecasts and the observations, s
f
and s
y
are the re-
spective standard deviations (using T in the denominator), and r is the sample correlation between
y and f. Dividing through by MSE we get
_
f y
_
2
MSE
+
_
s
f
rs
y
_
2
MSE
+
_
1 r
2
_
s
2
y
MSE
= 1 (26.2)
Theil labeled the three terms on the left-hand side of (26.2) the bias proportion (U
M
), regression
proportion (U
R
) and disturbance proportion (U
D
), respectively. If y and f represent the in-sample
observations of the dependent variable and the tted values from a linear regression then the rst
two components, U
M
and U
R
, will be zero (apart from rounding error), and the entire MSE will be
accounted for by the unsystematic part, U
D
. In the case of out-of-sample prediction, however (or
prediction over a sub-sample of the data used in the regression), U
M
and U
R
are not necessarily
close to zero, although this is a desirable property for a forecast to have. U
M
diers from zero if
and only if the mean of the forecasts diers from the mean of the realizations, and U
R
is non-zero
if and only if the slope of a simple regression of the realizations on the forecasts diers from 1.
The above-mentioned statistics are printed as part of the output of the fcast command. They can
also be retrieved in the form of a column vector using the function fcstats, which takes two series
arguments corresponding to y and f. The vector returned is
_
ME MSE MAE MPE MAPE U U
M
U
R
U
D
_
(Note that the RMSE is not included since it can easily be obtained given the MSE.) The series given
as arguments to fcstats must not contain any missing values in the currently dened sample
range; use the smpl command to adjust the range if needed.
26.5 Forecasts based on VAR models
To be written.
26.6 Forecasting from simultaneous systems
To be written.
Chapter 27
The Kalman Filter
27.1 Preamble
The Kalman lter has been used behind the scenes in gretl for quite some time, in computing
ARMA estimates. But user access to the Kalman lter is new and it has not yet been tested to any
great extent. We have run some tests of relatively simple cases against the benchmark of SsfPack
Basic. This is state-space software written by Koopman, Shephard and Doornik and documented in
Koopman, Shephard and Doornik (1999). It requires Doorniks ox program. Both ox and SsfPack
are available as free downloads for academic use but neither is open-source; see http://www.
ssfpack.com. Since Koopman is one of the leading researchers in this area, presumably the results
from SsfPack are generally reliable. To date we have been able to replicate the SsfPack results in
gretl with a high degree of precision.
We welcome both success reports and bug reports.
27.2 Notation
It seems that in econometrics everyone is happy with y = X + u, but we cant, as a community,
make up our minds on a standard notation for state-space models. Harvey (1989), Hamilton (1994),
Harvey and Proietti (2005) and Pollock (1999) all use dierent conventions. The notation used here
is based on James Hamiltons, with slight variations.
A state-space model can be written as
t+1
= F
t
t
+v
t
(27.1)
y
t
= A
t
x
t
+H
t
+w
t
(27.2)
where (27.1) is the state transition equation and (27.2) is the observation or measurement equation.
The state vector,
t
, is (r 1) and the vector of observables, y
t
, is (n 1); x
t
is a (k 1) vector of
exogenous variables. The (r 1) vector v
t
and the (n1) vector w
t
are assumed to be vector white
noise:
E(v
t
v
s
) = Q
t
for t = s, otherwise 0
E(w
t
w
s
) = R
t
for t = s, otherwise 0
The number of time-series observations will be denoted by T. In the special case when F
t
= F,
H
t
= H, A
t
= A, Q
t
= Q and R
t
= R, the model is said to be time-invariant.
The Kalman recursions
Using this notation, and assuming for the moment that v
t
and w
t
are mutually independent, the
Kalman recursions can be written as follows.
Initialization is via the unconditional mean and variance of
1
:
10
= E(
1
)
P
10
= E
_
_
1
E(
1
)
_ _
1
E(
1
)
_
_
215
Chapter 27. The Kalman Filter 216
Usually these are given by
10
= 0 and
vec(P
10
) = [I
r
2 F F]
1
vec(Q) (27.3)
but see below for further discussion of the initial variance.
Iteration then proceeds in two steps.
1
First we update the estimate of the state
t+1t
= F
t
tt1
+K
t
e
t
(27.4)
where e
t
is the prediction error for the observable:
e
t
= y
t
A
t
x
t
H
tt1
and K
t
is the gain matrix, given by
K
t
= F
t
P
tt1
H
t
1
t
(27.5)
with
t
= H
t
P
tt1
H
t
+R
t
The second step then updates the estimate of the variance of the state using
P
t+1t
= F
t
P
tt1
F
t
K
t
t
K
t
+Q
t
(27.6)
Cross-correlated disturbances
The formulation given above assumes mutual independence of the disturbances in the state and
observation equations, v
t
and w
t
. This assumption holds good in many practical applications, but
a more general formulation allows for cross-correlation. In place of (27.1)(27.2) we may write
t+1
= F
t
t
+B
t
t
y
t
= A
t
x
t
+H
t
+C
t
t
where
t
is a (p 1) disturbance vector, all the elements of which have unit variance, B
t
is (r p)
and C
t
is (np).
The no-correlation case is nested thus: dene v
t
and w
t
as modied versions of v
t
and w
t
, scaled
such that each element has unit variance, and let
t
=
_
v
t
w
t
_
so that p = r +n. Then (suppressing time subscripts for simplicity) let
B =
_
rr
.
.
.
. 0
rn
_
C =
_
0
nr
.
.
.
.
nn
_
where and are lower triangular matrices satisfying Q =
and R =
= 0.
In the general case p is not necessarily equal to r +n, and BC
t
)
1
t
(27.7)
Otherwise, the equations given earlier hold good, if we write BB
in place of Q and CC
in place of
R.
In the account of gretls Kalman facility below we take the uncorrelated case as the baseline, but
add remarks on how to handle the correlated case where applicable.
1
For a justication of the following formulae see the classic book by Anderson and Moore (1979) or, for a more modern
treatment, Pollock (1999) or Hamilton (1994). A transcription of R. E. Kalmans original paper (Kalman, 1960) is available
at http://www.cs.unc.edu/~welch/kalman/kalmanPaper.html.
Chapter 27. The Kalman Filter 217
27.3 Intended usage
The Kalman lter can be used in three ways: two of these are the classic forward and backward
pass, or ltering and smoothing respectively; the third use is simulation. In the ltering/smoothing
case you have the data y
t
and you want to reconstruct the states
t
(and the forecast errors as a by-
product), but we may also have a computational apparatus that does the reverse: given articially-
generated series w
t
and v
t
, generate the states
t
(and the observables y
t
as a by-product).
The usefulness of the classical lter is well known; the usefulness of the Kalman lter as a sim-
ulation tool may be huge too. Think for instance of Monte Carlo experiments, simulation-based
inferencesee Gourieroux and Monfort (1996) or Bayesian methods, especially in the context of
the estimation of DSGE models.
27.4 Overview of syntax
Using the Kalman lter in gretl is a two-step process. First you set up your lter, using a block
of commands starting with kalman and ending with end kalmanmuch like the gmm command.
Then you invoke the functions kfilter, ksmooth or ksimul to do the actual work. The next two
sections expand on these points.
27.5 Dening the lter
Each line within the kalman . . . end kalman block takes the form
keyword value
where keyword represents a matrix, as shown below. (An additional matrix which may be useful in
some cases is introduced later under the heading Constant term in the state transition.)
Keyword Symbol Dimensions
obsy y T n
obsymat H r n
obsx x T k
obsxmat A k n
obsvar R nn
statemat F r r
statevar Q r r
inistate
10
r 1
inivar P
10
r r
For the data matrices y and x the corresponding value may be the name of a predened matrix, the
name of a data series, or the name of a list of series.
2
For the other inputs, value may be the name of a predened matrix or, if the input in question
happens to be (11), the name of a scalar variable or a numerical constant. If the value of a
coecient matrix is given as the name of a matrix or scalar variable, the input is not hard-wired
into the Kalman structure, rather a record is made of the name of the variable and on each run
of a Kalman function (as described below) its value is re-read. It is therefore possible to write one
kalman block and then do several ltering or smoothing passes using dierent sets of coecients.
3
2
Note that the data matrices obsy and obsx have T rows. That is, the column vectors y
t
and x
t
in (27.1) and (27.2) are
in fact the transposes of the t-dated rows of the full matrices.
3
Note, however, that the dimensions of the various input matrices are dened via the initial kalman set-up and it is an
error if any of the matrices are changed in size.
Chapter 27. The Kalman Filter 218
An example of this technique is provided later, in the example scripts 27.1 and 27.2. This facility
to alter the values of the coecients between runs of the lter is to be distinguished from the case
of time-varying matrices, which is discussed below.
Not all of the above-mentioned inputs need be specied in every case; some are optional. (In
addition, you can specify the matrices in any order.) The mandatory elements are y, H, F and Q, so
the minimal kalman block looks like this:
kalman
obsy y
obsymat H
statemat F
statevar Q
end kalman
The optional matrices are listed below, along with the implication of omitting the given matrix.
Keyword If omitted. . .
obsx no exogenous variables in observation equation
obsxmat no exogenous variables in observation equation
obsvar no disturbance term in observation equation
inistate
10
is set to a zero vector
inivar P
10
is set automatically
It might appear that the obsx (x) and obsxmat (A) matrices must go togethereither both are
given or neither is given. But an exception is granted for convenience. If the observation equation
includes a constant but no additional exogenous variables, you can give a (1n) value for A without
having to specify obsx. More generally, if the row dimension of A is 1 greater than the column
dimension of x, it is assumed that the rst element of A is associated with an implicit column of
1s.
Regarding the automatic initialization of P
10
(in case no inivar input is given): by default this
is done as in equation (27.3). However, this method is applicable only if all the eigenvalues of F
lie inside the unit circle. If this condition is not satised we instead apply a diuse prior, setting
P
10
= I
r
with = 10
7
. If you wish to impose this diuse prior from the outset, append the option
ag --diffuse to the end kalman statement.
4
Time-varying matrices
Any or all of the matrices obsymat, obsxmat, obsvar, statemat and statevar may be time-
varying. In that case the value corresponding to the matrix keyword should be given in a special
form: the name of an existing matrix plus a function call which modies that matrix, separated by
a semicolon. Note that in this case you must use a matrix variable, even if the matrix in question
happens to be 1 1.
For example, suppose the matrix H is time-varying. Then we might write
obsymat H ; modify_H(&H, theta)
where modify_H is a user-dened function which modies matrix H (and theta is a suitable addi-
tional argument to that function, if required).
4
Initialization of the Kalman lter outside of the case where equation (27.3) applies has been the subject of much
discussion in the literaturesee for example de Jong (1991), Koopman (1997). At present gretl does not implement any
of the more elaborate proposals that have been made.
Chapter 27. The Kalman Filter 219
The above is just an illustration: the matrix argument does not have to come rst, and the function
can have as many arguments as you like. The essential point is that the function must modify the
specied matrix, which requires that it be given as an argument in pointer form (preceded by &).
The function need not return any value directly; if it does, that value is ignored.
Such matrix-modifying functions will be called at each time-step of the lter operation, prior to
performing any calculations. They have access to the current time-step of the Kalman lter via the
internal variable $kalman_t, which has value 1 on the rst step, 2 on the second, and so on, up
to step T. They also have access to the previous n-vector of forecast errors, e
t1
, under the name
$kalman_uhat. When t = 1 this will be a zero vector.
Correlated disturbances
Dening a lter in which the disturbances v
t
and w
t
are correlated involves one modication to the
account given above. If you append the --cross option ag to the end kalman statement, then
the matrices corresponding to the keywords statevar and obsvar are interpreted not as Q and R
but rather as B and C as discussed in section 27.2. Gretl then computes Q = BB
and R = CC
as
well as the cross-product BC
and utilizes the modied expression for the gain as given in equation
(27.7). As mentioned above, B should be (r p) and C should be (np), where p is the number of
elements in the combined disturbance vector
t
.
Constant term in the state transition
In some applications it is useful to be able to represent a constant term in the state transition
equation explicitly; that is, equation (27.1) becomes
t+1
= +F
t
t
+v
t
(27.8)
This is never strictly necessary; the system (27.1) and (27.2) is general enough to accommodate
such a term, by absorbing it as an extra (unvarying) element in the state vector. But this comes
at the cost of expanding all the matrices that touch the state (, F, v, Q, H), making the model
relatively awkward to formulate and forecasts relatively expensive to compute.
As a simple illustration, consider a univariate model in which the state, s
t
, is just a random walk
with drift and the observed variable, y
t
, is the state plus white noise:
s
t+1
= +s
t
+v
t
(27.9)
y
t
= s
t
+w
t
(27.10)
Putting this into the standard form of (27.1) and (27.2) we get:
_
s
t+1
_
=
_
1 1
0 1
__
s
t
_
+
_
v
t
0
_
, Q =
_
2
v
0
0 0
_
y
t
=
_
1 0
_
_
s
t
_
+w
t
In such a simple case the notational and computational burden is not very great; nonetheless it is
clearly more natural to express this system in the form of (27.9) and (27.10) and in a multivariate
model the gain in parsimony could be substantial.
For this reason we support the use of an additional named matrix in the kalman setup, namely
stconst. This corresponds to in equation (27.8); it should be an r 1 vector (or if r = 1 may be
given as the name of a scalar variable). The use of stconst in setting up a lter corresponding to
(27.9) and (27.10) is shown below.
matrix H = {1}
matrix R = {1}
Chapter 27. The Kalman Filter 220
matrix F = {1}
matrix Q = {1}
matrix mu = {0.05}
kalman
obsy y
obsymat H
obsvar R
statemat F
statevar Q
stconst mu
end kalman
Handling of missing values
It is acceptable for the data matrices, obsy and obsx, to contain missing values. In this case the
ltering operation will work around the missing values, and the ksmooth function can be used to
obtain estimates of these values. However, there are two points to note.
First, gretls default behavior is to skip missing observations when constructing matrices from data
series. To change this, use the set command thus:
set skip_missing off
Second, the handling of missing values is not yet quite right for the case where the observable
vector y
t
contains more than one element. At present, if any of the elements of y
t
are missing
the entire observation is ignored. Clearly it should be possible to make use of any non-missing
elements, and this is not very dicult in principle, its just awkward and is not implemented yet.
Persistence and identity of the lter
At present there is no facility to create a named lter. Only one lter can exist at any point
in time, namely the one created by the last kalman block.
5
If a lter is already dened, and you
give a new kalman block, the old lter is over-written. Otherwise the existing lter persists (and
remains available for the kfilter, ksmooth and ksimul functions) until either (a) the gretl session
is terminated or (b) the command delete kalman is given.
27.6 The kfilter function
Once a lter is established, as discussed in the previous section, kfilter can be used to run a
forward, forecasting pass. This function returns a scalar code: 0 for successful completion, or 1
if numerical problems were encountered. On successful completion, two scalar accessor variables
become available: $kalman_lnl, which gives the overall log-likelihood under the joint normality
assumption,
=
1
2
_
_
nT log(2) +
T
_
t=1
log
t
+
T
_
t=1
e
1
t
e
t
_
_
and $kalman_s2, which gives the estimated variance,
2
=
1
nT
T
_
t=1
e
1
t
e
t
5
This is not quite true: more precisely, there can be no more than one Kalman lter at each level of function execution.
That is, if a gretl script creates a Kalman lter, a user-dened function called from that script may also create a lter,
without interfering with the original one.
Chapter 27. The Kalman Filter 221
(but see below for modications to these formulae for the case of a diuse prior). In addition the
accessor $kalman_llt gives a (T 1) vector, element t of which is
t
=
1
2
_
nlog(2) +log
t
+e
1
t
e
t
_
The kfilter function does not require any arguments, but up to ve matrix quantities may be
retrieved via optional pointer arguments. Each of these matrices has T rows, one for each time-
step; the contents of the rows are shown in the following listing.
1. Forecast errors for the observable variables: e
t
, n columns.
2. Variance matrix for the forecast errors: vech(
t
)
, n(n+1)/2 columns.
3. Estimate of the state vector:
tt1
, r columns.
4. MSE of estimate of the state vector: vech(P
tt1
)
, rn columns.
Unwanted trailing arguments can be omitted, otherwise unwanted arguments can be skipped by
using the keyword null. For example, the following call retrieves the forecast errors in the matrix
E and the estimate of the state vector in S:
matrix E S
kfilter(&E, null, &S)
Matrices given as pointer arguments do not have to be correctly dimensioned in advance; they will
be resized to receive the specied content.
Further note: in general, the arguments to kfilter should all be matrix-pointers, but under two
conditions you can give a pointer to a series variable instead. The conditions are: (i) the matrix
in question has just one column in context (for example, the rst two matrices will have a single
column if the length of the observables vector, n, equals 1) and (ii) the time-series length of the
lter is equal to the current gretl sample size.
Likelihood under the diuse prior
There seems to be general agreement in the literature that the log-likelihood calculation should
be modied in the case of a diuse prior for P
10
. However, it is not clear to us that there is a
well-dened correct method for this. At present we emulate SsfPack (see Koopman et al. (1999)
and section 27.1). In case P
10
= I
r
, we set d = r and calculate
=
1
2
_
_
(nT d) log(2) +
T
_
t=1
log
t
+
T
_
t=1
e
1
t
e
t
dlog()
_
_
and
2
=
1
nT d
T
_
t=1
e
1
t
e
t
27.7 The ksmooth function
This function returns the (T r) matrix of smoothed estimates of the state vectorthat is, esti-
mates based on all T observations: row t of this matrix holds
tT
. This function has no required
arguments but it oers one optional matrix-pointer argument, which retrieves the variance of the
smoothed state estimate, P
tT
. The latter matrix is (T r(r +1)/2); each row is in transposed vech
form. Examples:
Chapter 27. The Kalman Filter 222
matrix S = ksmooth() # smoothed state only
matrix P
S = ksmooth(&P) # the variance is wanted
These values are computed via a backward pass of the lter, from t = T to t = 1, as follows:
L
t
= F
t
K
t
H
t
u
t1
= H
t
1
t
e
t
+L
t
u
t
U
t1
= H
t
1
t
H
t
+L
t
U
t
L
t
tT
=
tt1
+P
tt1
u
t1
P
tT
= P
tt1
P
tt1
U
t1
P
tt1
with initial values u
T
= 0 and U
T
= 0.
6
This iteration is preceded by a special forward pass in which the matrices K
t
,
1
t
,
tt1
and P
tt1
are stored for all t. If F is time-varying, its values for all t are stored on the forward pass, and
similarly for H.
27.8 The ksimul function
This simulation function takes up to three arguments. The rst, mandatory, argument is a (T r)
matrix containing articial disturbances for the state transition equation: row t of this matrix
represents v
t
. If the current lter has a non-null R (obsvar) matrix, then the second argument
should be a (T n) matrix containing articial disturbances for the observation equation, on the
same pattern. Otherwise the second argument should be given as null. If r = 1 you may give a
series for the rst argument, and if n = 1 a series is acceptable for the second argument.
Provided that the current lter does not include exogenous variables in the observation equation
(obsx), the T for simulation need not equal that dened by the original obsy data matrix: in eect T
is temporarily redened by the row dimension of the rst argument to ksimul. Once the simulation
is completed, the T value associated with the original data is restored.
The value returned by ksimul is a (T n) matrix holding simulated values for the observables at
each time step. A third optional matrix-pointer argument allows you to retrieve a (T r) matrix
holding the simulated state vector. Examples:
matrix Y = ksimul(V) # obsvar is null
Y = ksimul(V, W) # obsvar is non-null
matrix S
Y = ksimul(V, null, &S) # the simulated state is wanted
The initial value
1
is calculated thus: we nd the matrix T such that TT
= P
10
(as given by the
inivar element in the kalman block), multiply it into v
1
, and add the result to
10
(as given by
inistate).
If the disturbances are correlated across the two equations the arguments to ksimul must be
revised: the rst argument should be a (T p) matrix, each row of which represents
t
(see sec-
tion 27.2), and the second argument should be given as null.
27.9 Example 1: ARMA estimation
As is well known, the Kalman lter provides a very ecient way to compute the likelihood of ARMA
models; as an example, take an ARMA(1,1) model
y
t
= y
t1
+
t
+
t1
6
See I. Karibzhanovs exposition at http://www.econ.umn.edu/~karib003/help/kalcvs.htm.
Chapter 27. The Kalman Filter 223
One of the ways the above equation can be cast in state-space form is by dening a latent process
t
= (1 L)
1
t
. The observation equation corresponding to (27.2) is then
y
t
=
t
+
t1
(27.11)
and the state transition equation corresponding to (27.1) is
_
t
t1
_
=
_
0
1 0
__
t1
t2
_
+
_
t
0
_
The gretl syntax for a corresponding kalman block would be
matrix H = {1; theta}
matrix F = {phi, 0; 1, 0}
matrix Q = {s^2, 0; 0, 0}
kalman
obsy y
obsymat H
statemat F
statevar Q
end kalman
Note that the observation equation (27.11) does not include an error term; this is equivalent
to saying that V(w
t
) = 0 and, as a consequence, the kalman block does not include an obsvar
keyword.
Once the lter is set up, all it takes to compute the log-likelihood for given values of , and
2
is to execute the kfilter() function and use the $kalman_lnl accessor (which returns the
total log-likelihood) or, more appropriately if the likelihood has to be maximized through mle, the
$kalman_llt accessor, which returns the series of individual contribution to the log-likelihood for
each observation. An example is shown in script 27.1.
27.10 Example 2: local level model
Suppose we have a series y
t
=
t
+
t
, where
t
is a random walk with normal increments of
variance
2
1
and
t
is a normal white noise with variance
2
2
, independent of
t
. This is known as
the local level model in Harveys (1989) terminology, and it can be cast in state-space form as
equations (27.1)-(27.2) with F = 1, v
t
N(0,
2
1
), H = 1 and w
t
N(0,
2
2
). The translation to a
kalman block is
kalman
obsy y
obsymat 1
statemat 1
statevar s2
obsvar s1
end kalman --diffuse
The two unknown parameters
2
1
and
2
2
can be estimated via maximum likelihood. Script 27.2
provides an example of simulation and estimation of such a model. For the sake of brevity, simu-
lation is carried out via ordinary gretl commands, rather than the state-space apparatus described
above.
The example contains two functions: the rst one carries out the estimation of the unknown pa-
rameters
2
1
and
2
2
via maximum likelihood; the second one uses these estimates to compute a
smoothed estimate of the unobservable series
t
calles muhat. A plot of
t
and its estimate is
presented in Figure 27.1.
Chapter 27. The Kalman Filter 224
Example 27.1: ARMA estimation
function void arma11_via_kalman(series y)
/* parameter initalization */
phi = 0
theta = 0
sigma = 1
/* Kalman filter setup */
matrix H = {1; theta}
matrix F = {phi, 0; 1, 0}
matrix Q = {sigma^2, 0; 0, 0}
kalman
obsy y
obsymat H
statemat F
statevar Q
end kalman
/* maximum likelihood estimation */
mle logl = ERR ? NA : $kalman_llt
H[2] = theta
F[1,1] = phi
Q[1,1] = sigma^2
ERR = kfilter()
params phi theta sigma
end mle -h
end function
# ------------------------ main ---------------------------
open arma.gdt # open the "arma" example dataset
arma11_via_kalman(y) # estimate an arma(1,1) model
arma 1 1 ; y --nc # check via native command
Chapter 27. The Kalman Filter 225
Example 27.2: Local level model
function matrix local_level (series y)
/* starting values */
scalar s1 = 1
scalar s2 = 1
/* Kalman filter set-up */
kalman
obsy y
obsymat 1
statemat 1
statevar s2
obsvar s1
end kalman --diffuse
/* ML estimation */
mle ll = ERR ? NA : $kalman_llt
ERR = kfilter()
params s1 s2
end mle
return s1 ~ s2
end function
function series loclev_sm (series y, scalar s1, scalar s2)
/* return the smoothed estimate of \mu_t */
kalman
obsy y
obsymat 1
statemat 1
statevar s2
obsvar s1
end kalman --diffuse
series ret = ksmooth()
return ret
end function
/* -------------------- main script -------------------- */
nulldata 200
set seed 202020
setobs 1 1 --special
true_s1 = 0.25
true_s2 = 0.5
v = normal() * sqrt(true_s1)
w = normal() * sqrt(true_s2)
mu = 2 + cum(w)
y = mu + v
matrix Vars = local_level(y) # estimate the variances
muhat = loclev_sm(y, Vars[1], Vars[2]) # compute the smoothed state
Chapter 27. The Kalman Filter 226
-8
-6
-4
-2
0
2
4
6
8
10
0 50 100 150 200
mu
muhat
Figure 27.1: Local level model:
t
and its smoothed estimate
By appending the following code snippet to the example in Table 27.2, one may check the results
against the R command StructTS.
foreign language=R --send-data
y <- gretldata[,"y"]
a <- StructTS(y, type="level")
a
StateFromR <- as.ts(tsSmooth(a))
gretl.export(StateFromR)
end foreign
append @dotdir/StateFromR.csv
ols Uhat 0 StateFromR --simple
Chapter 28
Numerical methods
Several functions are available to aid in the construction of special-purpose estimators: one group
of functions are used to maximize user-supplied functions by using numerical methods: BFGS,
NewtonRaphson and Simulated Annealing. Another function is fdjac, which produces a forward-
dierence approximation to the Jacobian.
28.1 The maximizer functions
The BFGSmax function has two required arguments: a vector holding the initial values of a set of
parameters, and a call to a function that calculates the (scalar) criterion to be maximized, given
the current parameter values and any other relevant data. If the object is in fact minimization, this
function should return the negative of the criterion. On successful completion, BFGSmax returns the
maximized value of the criterion and the matrix given via the rst argument holds the parameter
values which produce the maximum. Here is an example:
matrix X = { dataset }
matrix theta = { 1, 100 }
scalar J = BFGSmax(theta, ObjFunc(&theta, &X))
It is assumed here that ObjFunc is a user-dened function (see Chapter 10) with the following
general set-up:
function scalar ObjFunc (matrix *theta, matrix *X)
scalar val = ... # do some computation
return val
end function
Example 28.1: Finding the minimum of the Rosenbrock function
function scalar Rosenbrock(matrix *param)
scalar x = param[1]
scalar y = param[2]
return -(1-x)^2 - 100 * (y - x^2)^2
end function
matrix theta = { 0 , 0 }
set max_verbose 1
M = BFGSmax(theta, Rosenbrock(&theta))
print theta
The operation of the BFGS maximizer can be adjusted using the set variables bfgs_maxiter and
bfgs_toler (see Chapter 19). In addition you can provoke verbose output from the maximizer by
assigning a positive value to max_verbose, again via the set command.
227
Chapter 28. Numerical methods 228
The Rosenbrock function is often used as a test problem for optimization algorithms. It is also
known as Rosenbrocks Valley or Rosenbrocks Banana Function, on account of the fact that its
contour lines are banana-shaped. It is dened by:
f(x, y) = (1 x)
2
+100(y x
2
)
2
The function has a global minimum at (x, y) = (1, 1) where f(x, y) = 0. Example 28.1 shows a
gretl script that discovers the minimum using BFGSmax (giving a verbose account of progress).
28.2 Supplying analytical derivatives for BFGS
An optional third argument to the BFGSmax function enables the user to supply analytical deriva-
tives of the criterion function with respect to the parameters (without which a numerical approxi-
mation to the gradient is computed). This argument is similar to the second one in that it species
a function call. In this case the function that is called must have the following signature.
Its rst argument should be a pre-dened matrix correctly dimensioned to hold the gradient; that
is, if the parameter vector contains k elements, the gradient matrix must also be a k-vector. This
matrix argument must be given in pointer form so that its content can be modied by the func-
tion. (Note that unlike the parameter vector, where the choice of initial values can be important,
the initial values given to the gradient are immaterial and do not aect the results.)
In addition the gradient function must have as one of its argument the parameter vector. This may
be given in pointer form (which enhances eciency) but that is not required. Additional arguments
may be specied if necessary.
Given the current parameter values, the function call must ll out the gradient vector appropriately.
It is not required that the gradient function returns any value directly; if it does, that value is
ignored.
Example 28.2 illustrates, showing how the Rosenbrock script can be modied to use analytical
derivatives. (Note that since this is a minimization problem the values written into g[1] and g[2]
in the function Rosen_grad are in fact the derivatives of the negative of the Rosenbrock function.)
28.3 Computing a Jacobian
Gretl oers the possibility of dierentiating numerically a user-dened function via the fdjac
function.
This function again takes two arguments: an n 1 matrix holding initial parameter values and a
function call that calculates and returns an m 1 matrix, given the current parameter values and
any other relevant data. On successful completion it returns an mn matrix holding the Jacobian.
For example,
matrix Jac = fdjac(theta, SumOC(&theta, &X))
where we assume that SumOC is a user-dened function with the following structure:
function matrix SumOC (matrix *theta, matrix *X)
matrix V = ... # do some computation
return V
end function
This may come in handy in several cases: for example, if you use BFGSmax to estimate a model, you
may wish to calculate a numerical approximation to the relevant Jacobian to construct a covariance
matrix for your estimates.
Another example is the delta method: if you have a consistent estimator of a vector of parameters
, and a consistent estimate of its covariance matrix , you may need to compute estimates for a
Chapter 28. Numerical methods 229
Example 28.2: Rosenbrock function with analytical gradient
function scalar Rosenbrock (matrix *param)
scalar x = param[1]
scalar y = param[2]
return -(1-x)^2 - 100 * (y - x^2)^2
end function
function void Rosen_grad (matrix *g, matrix *param)
scalar x = param[1]
scalar y = param[2]
g[1] = 2*(1-x) + 2*x*(200*(y-x^2))
g[2] = -200*(y - x^2)
end function
matrix theta = { 0, 0 }
matrix grad = { 0, 0 }
set max_verbose 1
M = BFGSmax(theta, Rosenbrock(&theta), Rosen_grad(&grad, &theta))
print theta
print grad
nonlinear continuous transformation = g(). In this case, a standard result in asymptotic theory
is that
_
_
_
T
_
_
d
N(0, )
_
_
_
=
_
_
_
= g(
)
p
= g()
T
_
_
d
N(0, JJ
)
_
_
_
where T is the sample size and J is the Jacobian
g(x)
x
x=
.
Script 28.3 exemplies such a case: the example is taken from Greene (2003), section 9.3.1. The
slight dierences between the results reported in the original source and what gretl returns are due
to the fact that the Jacobian is computed numerically, rather than analytically as in the book.
Chapter 28. Numerical methods 230
Example 28.3: Delta Method
function matrix MPC(matrix *param, matrix *Y)
beta = param[2]
gamma = param[3]
y = Y[1]
return beta*gamma*y^(gamma-1)
end function
# William Greene, Econometric Analysis, 5e, Chapter 9
set echo off
set messages off
open greene5_1.gdt
# Use OLS to initialize the parameters
ols realcons 0 realdpi --quiet
genr a = $coeff(0)
genr b = $coeff(realdpi)
genr g = 1.0
# Run NLS with analytical derivatives
nls realcons = a + b * (realdpi^g)
deriv a = 1
deriv b = realdpi^g
deriv g = b * realdpi^g * log(realdpi)
end nls
matrix Y = realdpi[2000:4]
matrix theta = $coeff
matrix V = $vcv
mpc = MPC(&theta, &Y)
matrix Jac = fdjac(theta, MPC(&theta, &Y))
Sigma = qform(Jac, V)
printf "\nmpc = %g, std.err = %g\n", mpc, sqrt(Sigma)
scalar teststat = (mpc-1)/sqrt(Sigma)
printf "\nTest for MPC = 1: %g (p-value = %g)\n", \
teststat, pvalue(n,abs(teststat))
Chapter 29
Discrete and censored dependent variables
This chapter deals with models for dependent variables that are discrete or censored or otherwise
limited (as in event counts or durations, which must be positive) and that therefore call for estima-
tion methods other than the classical linear model. We discuss several estimators (mostly based on
the Maximum Likelihood principle), adding some details and examples to complement the material
on these methods in the Gretl Command Reference.
29.1 Logit and probit models
It often happens that one wants to specify and estimate a model in which the dependent variable
is not continuous, but discrete. A typical example is a model in which the dependent variable is
the occupational status of an individual (1 = employed, 0 = unemployed). A convenient way of
formalizing this situation is to consider the variable y
i
as a Bernoulli random variable and analyze
its distribution conditional on the explanatory variables x
i
. That is,
y
i
=
_
1 P
i
0 1 P
i
(29.1)
where P
i
= P(y
i
= 1x
i
) is a given function of the explanatory variables x
i
.
In most cases, the function P
i
is a cumulative distribution function F, applied to a linear combi-
nation of the x
i
s. In the probit model, the normal cdf is used, while the logit model employs the
logistic function (). Therefore, we have
probit P
i
= F(z
i
) = (z
i
) (29.2)
logit P
i
= F(z
i
) = (z
i
) =
1
1 +e
z
i
(29.3)
z
i
=
k
_
j=1
x
ij
j
(29.4)
where z
i
is commonly known as the index function. Note that in this case the coecients
j
cannot
be interpreted as the partial derivatives of E(y
i
x
i
) with respect to x
ij
. However, for a given value
of x
i
it is possible to compute the vector of slopes, that is
slope
j
( x) =
F(z)
x
j
z= z
Gretl automatically computes the slopes, setting each explanatory variable at its sample mean.
Another, equivalent way of thinking about this model is in terms of an unobserved variable y
i
which can be described thus:
y
i
=
k
_
j=1
x
ij
j
+
i
= z
i
+
i
(29.5)
We observe y
i
= 1 whenever y
i
> 0 and y
i
= 0 otherwise. If
i
is assumed to be normal, then we
have the probit model. The logit model arises if we assume that the density function of
i
is
(
i
) =
(
i
)
i
=
e
i
(1 +e
i
)
2
231
Chapter 29. Discrete and censored dependent variables 232
Both the probit and logit model are estimated in gretl via maximum likelihood, where the log-
likelihood can be written as
L() =
_
y
i
=0
ln[1 F(z
i
)] +
_
y
i
=1
lnF(z
i
), (29.6)
which is always negative, since 0 < F() < 1. Since the score equations do not have a closed form
solution, numerical optimization is used. However, in most cases this is totally transparent to the
user, since usually only a few iterations are needed to ensure convergence. The --verbose switch
can be used to track the maximization algorithm.
Example 29.1: Estimation of simple logit and probit models
open greene19_1
logit GRADE const GPA TUCE PSI
probit GRADE const GPA TUCE PSI
As an example, we reproduce the results given in chapter 21 of Greene (2000), where the eective-
ness of a program for teaching economics is evaluated by the improvements of students grades.
Running the code in example 29.1 gives the output reported in Table 29.1; note that, for the probit
model, a conditional moment test on skewness and kurtosis is printed out automatically as a test
for normality.
In this context, the $uhat accessor function takes a special meaning: it returns generalized resid-
uals as dened in Gourieroux, Monfort, Renault and Trognon (1987), which can be interpreted as
unbiased estimators of the latent disturbances
i
. These are dened as
u
i
=
_
_
_
y
i
P
i
for the logit model
y
i
( z
i
)
( z
i
)
(1 y
i
)
( z
i
)
1( z
i
)
for the probit model
(29.7)
Among other uses, generalized residuals are often used for diagnostic purposes. For example, it is
very easy to set up an omitted variables test equivalent to the familiar LM test in the context of a
linear regression; example 29.2 shows how to perform a variable addition test.
Example 29.2: Variable addition test in a probit model
open greene19_1
probit GRADE const GPA PSI
series u = $uhat
%$
ols u const GPA PSI TUCE -q
printf "Variable addition test for TUCE:\n"
printf "Rsq * T = %g (p. val. = %g)\n", $trsq, pvalue(X,1,$trsq)
The perfect prediction problem
One curious characteristic of logit and probit models is that (quite paradoxically) estimation is not
feasible if a model ts the data perfectly; this is called the perfect prediction problem. The reason
Chapter 29. Discrete and censored dependent variables 233
Model 1: Logit estimates using the 32 observations 1-32
Dependent variable: GRADE
VARIABLE COEFFICIENT STDERROR T STAT SLOPE
(at mean)
const -13.0213 4.93132 -2.641
GPA 2.82611 1.26294 2.238 0.533859
TUCE 0.0951577 0.141554 0.672 0.0179755
PSI 2.37869 1.06456 2.234 0.449339
Mean of GRADE = 0.344
Number of cases correctly predicted = 26 (81.2%)
f(betax) at mean of independent vars = 0.189
McFaddens pseudo-R-squared = 0.374038
Log-likelihood = -12.8896
Likelihood ratio test: Chi-square(3) = 15.4042 (p-value 0.001502)
Akaike information criterion (AIC) = 33.7793
Schwarz Bayesian criterion (BIC) = 39.6422
Hannan-Quinn criterion (HQC) = 35.7227
Predicted
0 1
Actual 0 18 3
1 3 8
Model 2: Probit estimates using the 32 observations 1-32
Dependent variable: GRADE
VARIABLE COEFFICIENT STDERROR T STAT SLOPE
(at mean)
const -7.45232 2.54247 -2.931
GPA 1.62581 0.693883 2.343 0.533347
TUCE 0.0517288 0.0838903 0.617 0.0169697
PSI 1.42633 0.595038 2.397 0.467908
Mean of GRADE = 0.344
Number of cases correctly predicted = 26 (81.2%)
f(betax) at mean of independent vars = 0.328
McFaddens pseudo-R-squared = 0.377478
Log-likelihood = -12.8188
Likelihood ratio test: Chi-square(3) = 15.5459 (p-value 0.001405)
Akaike information criterion (AIC) = 33.6376
Schwarz Bayesian criterion (BIC) = 39.5006
Hannan-Quinn criterion (HQC) = 35.581
Predicted
0 1
Actual 0 18 3
1 3 8
Test for normality of residual -
Null hypothesis: error is normally distributed
Test statistic: Chi-square(2) = 3.61059
with p-value = 0.164426
Table 29.1: Example logit and probit
Chapter 29. Discrete and censored dependent variables 234
why this problem arises is easy to see by considering equation (29.6): if for some vector and scalar
k its the case that z
i
< k whenever y
i
= 0 and z
i
> k whenever y
i
= 1, the same thing is true
for any multiple of . Hence, L() can be made arbitrarily close to 0 simply by choosing enormous
values for . As a consequence, the log-likelihood has no maximum, despite being bounded.
Gretl has a mechanism for preventing the algorithm from iterating endlessly in search of a non-
existent maximum. One sub-case of interest is when the perfect prediction problem arises because
of a single binary explanatory variable. In this case, the oending variable is dropped from the
model and estimation proceeds with the reduced specication. Nevertheless, it may happen that
no single perfect classier exists among the regressors, in which case estimation is simply impos-
sible and the algorithm stops with an error. This behavior is triggered during the iteration process
if
maxz
i
i:y
i
=0
<minz
i
i:y
i
=1
If this happens, unless your model is trivially mis-specied (like predicting if a country is an oil
exporter on the basis of oil revenues), it is normally a small-sample problem: you probably just
dont have enough data to estimate your model. You may want to drop some of your explanatory
variables.
This problem is well analyzed in Stokes (2004); the results therein are replicated in the example
script murder_rates.inp.
29.2 Ordered response models
These models constitute a simple variation on ordinary logit/probit models, and are usually applied
when the dependent variable is a discrete and ordered measurement not simply binary, but on
an ordinal rather than an interval scale. For example, this sort of model may be applied when the
dependent variable is a qualitative assessment such as Good, Average and Bad.
In the general case, consider an ordered response variable, y, that can take on any of the J+1 values
0, 1, 2, . . . , J. We suppose, as before, that underlying the observed response is a latent variable,
y
= X + = z +
Now dene cut points,
1
<
2
< <
J
, such that
y = 0 if y
1
y = 1 if
1
< y
2
.
.
.
y = J if y
>
J
For example, if the response takes on three values there will be two such cut points,
1
and
2
.
The probability that individual i exhibits response j, conditional on the characteristics x
i
, is then
given by
P(y
i
= j x
i
) =
_
_
P(y
1
x
i
) = F(
1
z
i
) for j = 0
P(
j
< y
j+1
x
i
) = F(
j+1
z
i
) F(
j
z
i
) for 0 < j < J
P(y
>
J
x
i
) = 1 F(
J
z
i
) for j = J
(29.8)
The unknown parameters
j
are estimated jointly with the s via maximum likelihood. The
j
estimates are reported by gretl as cut1, cut2 and so on. For the probit variant, a conditional
moment test for normality constructed in the spirit of Chesher and Irish (1987) is also included.
Note that the
j
parameters can be shifted arbitrarily by adding a constant to z
i
, so the model is
under-identied if there is some linear combination of the explanatory variables which is constant.
The most obvious case in which this occurs is when the model contains a constant term; for this
Chapter 29. Discrete and censored dependent variables 235
reason, gretl drops automatically the intercept if present. However, it may happen that the user
inadventently species a list of regressors that may be combined in such a way to produce a con-
stant (for example, by using a full set of dummy variables for a discrete factor). If this happens,
gretl will also drop any oending regressors.
In order to apply these models in gretl, the dependent variable must either take on only non-
negative integer values, or be explicitly marked as discrete. (In case the variable has non-integer
values, it will be recoded internally.) Note that gretl does not provide a separate command for
ordered models: the logit and probit commands automatically estimate the ordered version if
the dependent variable is acceptable, but not binary.
Example 29.3 reproduces the results presented in section 15.10 of Wooldridge (2002a). The ques-
tion of interest in this analysis is what dierence it makes, to the allocation of assets in pension
funds, whether individual plan participants have a choice in the matter. The response variable is
an ordinal measure of the weight of stocks in the pension portfolio. Having reported the results
of estimation of the ordered model, Wooldridge illustrates the eect of the choice variable by ref-
erence to an average participant. The example script shows how one can compute this eect in
gretl.
After estimating ordered models, the $uhat accessor yields generalized residuals as in binary mod-
els; additionally, the $yhat accessor function returns z
i
, so it is possible to compute an unbiased
estimator of the latent variable y
i
simply by adding the two together.
29.3 Multinomial logit
When the dependent variable is not binary and does not have a natural ordering, multinomial
models are used. Multinomial logit is supported in gretl via the --multinomial option to the
logit command. Simple models can also be handled via the mle command (see chapter 19). We
give here an example of such a model. Let the dependent variable, y
i
, take on integer values
0, 1, . . . p. The probability that y
i
= k is given by
P(y
i
= kx
i
) =
exp(x
i
k
)
p
j=0
exp(x
i
j
)
For the purpose of identication one of the outcomes must be taken as the baseline; it is usually
assumed that
0
= 0, in which case
P(y
i
= kx
i
) =
exp(x
i
k
)
1 +
p
j=1
exp(x
i
j
)
and
P(y
i
= 0x
i
) =
1
1 +
p
j=1
exp(x
i
j
)
.
Example 29.4 reproduces Table 15.2 in Wooldridge (2002a), based on data on career choice from
Keane and Wolpin (1997). The dependent variable is the occupational status of an individual (0 = in
school; 1 = not in school and not working; 2 = working), and the explanatory variables are education
and work experience (linear and square) plus a black binary variable. The full data set is a panel;
here the analysis is conned to a cross-section for 1987.
Chapter 29. Discrete and censored dependent variables 236
Example 29.3: Ordered probit model
/*
Replicate the results in Wooldridge, Econometric Analysis of Cross
Section and Panel Data, section 15.10, using pension-plan data from
Papke (AER, 1998).
The dependent variable, pctstck (percent stocks), codes the asset
allocation responses of "mostly bonds", "mixed" and "mostly stocks"
as {0, 50, 100}.
The independent variable of interest is "choice", a dummy indicating
whether individuals are able to choose their own asset allocations.
*/
open pension.gdt
# demographic characteristics of participant
list DEMOG = age educ female black married
# dummies coding for income level
list INCOME = finc25 finc35 finc50 finc75 finc100 finc101
# Papkes OLS approach
ols pctstck const choice DEMOG INCOME wealth89 prftshr
# save the OLS choice coefficient
choice_ols = $coeff(choice)
# estimate ordered probit
probit pctstck choice DEMOG INCOME wealth89 prftshr
k = $ncoeff
matrix b = $coeff[1:k-2]
a1 = $coeff[k-1]
a2 = $coeff[k]
/*
Wooldridge illustrates the choice effect in the ordered probit
by reference to a single, non-black male aged 60, with 13.5 years
of education, income in the range $50K - $75K and wealth of $200K,
participating in a plan with profit sharing.
*/
matrix X = {60, 13.5, 0, 0, 0, 0, 0, 0, 1, 0, 0, 200, 1}
# with choice = 0
scalar Xb = (0 ~ X) * b
P0 = cdf(N, a1 - Xb)
P50 = cdf(N, a2 - Xb) - P0
P100 = 1 - cdf(N, a2 - Xb)
E0 = 50 * P50 + 100 * P100
# with choice = 1
Xb = (1 ~ X) * b
P0 = cdf(N, a1 - Xb)
P50 = cdf(N, a2 - Xb) - P0
P100 = 1 - cdf(N, a2 - Xb)
E1 = 50 * P50 + 100 * P100
printf "\nWith choice, E(y) = %.2f, without E(y) = %.2f\n", E1, E0
printf "Estimated choice effect via ML = %.2f (OLS = %.2f)\n", E1 - E0,
choice_ols
Chapter 29. Discrete and censored dependent variables 237
Example 29.4: Multinomial logit
Input:
open keane.gdt
smpl (year=87) --restrict
logit status 0 educ exper expersq black --multinomial
Output (selected portions):
Model 1: Multinomial Logit, using observations 1-1738 (n = 1717)
Missing or incomplete observations dropped: 21
Dependent variable: status
Standard errors based on Hessian
coefficient std. error z p-value
--------------------------------------------------------
status = 2
const 10.2779 1.13334 9.069 1.20e-19 ***
educ -0.673631 0.0698999 -9.637 5.57e-22 ***
exper -0.106215 0.173282 -0.6130 0.5399
expersq -0.0125152 0.0252291 -0.4961 0.6199
black 0.813017 0.302723 2.686 0.0072 ***
status = 3
const 5.54380 1.08641 5.103 3.35e-07 ***
educ -0.314657 0.0651096 -4.833 1.35e-06 ***
exper 0.848737 0.156986 5.406 6.43e-08 ***
expersq -0.0773003 0.0229217 -3.372 0.0007 ***
black 0.311361 0.281534 1.106 0.2687
Mean dependent var 2.691322 S.D. dependent var 0.573502
Log-likelihood -907.8572 Akaike criterion 1835.714
Schwarz criterion 1890.198 Hannan-Quinn 1855.874
Number of cases correctly predicted = 1366 (79.6%)
Likelihood ratio test: Chi-square(8) = 583.722 [0.0000]
Chapter 29. Discrete and censored dependent variables 238
29.4 Bivariate probit
The bivariate probit model is simply a two-equation system in which each equation is a probit
model, but the two disturbance terms may not be independent. In formulae,
y
1,i
=
k
1
_
j=1
x
ij
j
+
1,i
y
1,i
= 1 y
1,i
> 0 (29.9)
y
2,i
=
k
2
_
j=1
z
ij
j
+
2,i
y
2,i
= 1 y
2,i
> 0 (29.10)
_
2,i
2,i
_
N
_
0,
_
1
1
__
(29.11)
The explanatory variables for the rst equation x and for the second equation z may overlap
example contained in biprobit.inp
$uhat and $yhat are matrices
FIXME: expand.
29.5 The Tobit model
The Tobit model is used when the dependent variable of a model is censored. Assume a latent
variable y
i
can be described as
y
i
=
k
_
j=1
x
ij
j
+
i
,
where
i
N(0,
2
). If y
i
were observable, the models parameters could be estimated via ordinary
least squares. On the contrary, suppose that we observe y
i
, dened as
y
i
=
_
_
a for y
i
a
y
i
for a < y
i
< b
b for y
i
b
(29.12)
In most cases found in the applied literature, a = 0 and b = , so in practice negative values of y
i
are not observed and are replaced by zeros.
In this case, regressing y
i
on the x
i
s does not yield consistent estimates of the parameters ,
because the conditional mean E(y
i
x
i
) is not equal to
k
j=1
x
ij
j
. It can be shown that restricting
the sample to non-zero observations would not yield consistent estimates either. The solution is to
estimate the parameters via maximum likelihood. The syntax is simply
tobit depvar indvars
As usual, progress of the maximization algorithm can be tracked via the --verbose switch, while
$uhat returns the generalized residuals. Note that in this case the generalized residual is dened
as u
i
= E(
i
y
i
= 0) for censored observations, so the familiar equality u
i
= y
i
y
i
only holds for
uncensored observations, that is, when y
i
> 0.
An important dierence between the Tobit estimator and OLS is that the consequences of non-
normality of the disturbance term are much more severe: non-normality implies inconsistency for
the Tobit estimator. For this reason, the output for the Tobit model includes the Chesher and Irish
(1987) normality test by default.
The general case in which a is nonzero and/or b is nite can be handled by using the options
--llimit and --rlimit. So, for example,
Chapter 29. Discrete and censored dependent variables 239
tobit depvar indvars --llimit=10
would tell gretl that the left bound a is set to 10.
29.6 Interval regression
The interval regression model arises when the dependent variable is unobserved for some (possibly
all) observations; what we observe instead is an interval in which the dependent variable lies. In
other words, the data generating process is assumed to be
y
i
= x
i
+
i
but we only know that m
i
y
i
M
i
, where the interval may be left- or right-unbounded (but
not both). If m
i
= M
i
, we eectively observe y
i
and no information loss occurs. In practice, each
observation belongs to one of four categories:
1. left-unbounded, when m
i
= ,
2. right-unbounded, when M
i
= ,
3. bounded, when < m
i
< M
i
< and
4. point observations when m
i
= M
i
.
It is interesting to note that this model bears similarities to other models in several special cases:
When all observations are point observations the model trivially reduces to the ordinary linear
regression model.
The interval model could be thought of an ordered probit model (see 29.2) in which the cut
points (the
j
coecients in eq. 29.8) are observed and dont need to be estimated.
The Tobit model (see 29.5) is a special case of the interval model in which m
i
and M
i
do not
depend on i, that is, the censoring limits are the same for all observations. As a matter of
fact, gretls tobit commands is handled internally as a special case of the interval model.
The gretl command intreg estimates interval models by maximum likelihood, assuming normality
of the disturbance term
i
. Its syntax is
intreg minvar maxvar X
where minvar contains the m
i
series, with NAs for left-unbounded observations, and maxvar con-
tains M
i
, with NAs for right-unbounded observations. By default, standard errors are computed
using the negative inverse of the Hessian. If the --robust ag is given, then QML or HuberWhite
standard errors are calculated instead. In this case the estimated covariance matrix is a sandwich
of the inverse of the estimated Hessian and the outer product of the gradient.
If the model specication contains regressors other than just a constant, the output includes a
chi-square statistic for testing the joint null hypothesis that none of these regressors has any
eect on the outcome. This is a Wald statistic based on the estimated covariance matrix. If you
wish to construct a likelihood ratio test, this is easily done by estimating both the full model
and the null model (containing only the constant), saving the log-likelihood in both cases via the
$lnl accessor, and then referring twice the dierence between the two log-likelihoods to the chi-
square distribution with k degrees of freedom, where k is the number of additional regressors (see
the pvalue command in the Gretl Command Reference). Also included is a conditional moment
normality test, similar to those provided for the probit, ordered probit and Tobit models (see
above). An example is contained in the sample script wtp.inp, provided with the gretl distribution.
Chapter 29. Discrete and censored dependent variables 240
Example 29.5: Interval model on articial data
Input:
nulldata 100
# generate artificial data
set seed 201449
x = normal()
epsilon = 0.2*normal()
ystar = 1 + x + epsilon
lo_bound = floor(ystar)
hi_bound = ceil(ystar)
# run the interval model
intreg lo_bound hi_bound const x
# estimate ystar
gen_resid = $uhat
yhat = $yhat + gen_resid
corr ystar yhat
Output (selected portions):
Model 1: Interval estimates using the 100 observations 1-100
Lower limit: lo_bound, Upper limit: hi_bound
coefficient std. error t-ratio p-value
---------------------------------------------------------
const 0.993762 0.0338325 29.37 1.22e-189 ***
x 0.986662 0.0319959 30.84 8.34e-209 ***
Chi-square(1) 950.9270 p-value 8.3e-209
Log-likelihood -44.21258 Akaike criterion 94.42517
Schwarz criterion 102.2407 Hannan-Quinn 97.58824
sigma = 0.223273
Left-unbounded observations: 0
Right-unbounded observations: 0
Bounded observations: 100
Point observations: 0
...
corr(ystar, yhat) = 0.98960092
Under the null hypothesis of no correlation:
t(98) = 68.1071, with two-tailed p-value 0.0000
As with the probit and Tobit models, after a model has been estimated the $uhat accessor returns
the generalized residual, which is an estimate of
i
: more precisely, it equals y
i
x
i
for point
observations and E(
i
m
i
, M
i
, x
i
) otherwise. Note that it is possible to compute an unbiased pre-
dictor of y
i
by summing this estimate to x
i
i
=
k
_
j=1
x
ij
j
+
i
(29.13)
s
i
=
p
_
j=1
z
ij
j
+
i
(29.14)
and the observation rule is given by
y
i
=
_
y
i
for s
i
> 0
for s
i
0
(29.15)
In this context, the symbol indicates that for some observations we simply do not have data on
y: y
i
may be 0, or missing, or anything else. A dummy variable d
i
is normally used to set censored
observations apart.
One of the most popular applications of this model in econometrics is a wage equation coupled
with a labor force participation equation: we only observe the wage for the employed. If y
i
and s
i
were (conditionally) independent, there would be no reason not to use OLS for estimating equation
(29.13); otherwise, OLS does not yield consistent estimates of the parameters
j
.
Since conditional independence between y
i
and s
i
is equivalent to conditional independence be-
tween
i
and
i
, one may model the co-dependence between
i
and
i
as
i
=
i
+v
i
;
substituting the above expression in (29.13), you obtain the model that is actually estimated:
y
i
=
k
_
j=1
x
ij
j
+
i
+v
i
,
so the hypothesis that censoring does not matter is equivalent to the hypothesis H
0
: = 0, which
can be easily tested.
The parameters can be estimated via maximum likelihood under the assumption of joint normality
of
i
and
i
; however, a widely used alternative method yields the so-called Heckit estimator, named
after Heckman (1979). The procedure can be briey outlined as follows: rst, a probit model is t
on equation (29.14); next, the generalized residuals are inserted in equation (29.13) to correct for
the eect of sample selection.
Gretl provides the heckit command to carry out estimation; its syntax is
heckit y X ; d Z
where y is the dependent variable, X is a list of regressors, d is a dummy variable holding 1 for
uncensored observations and Z is a list of explanatory variables for the censoring equation.
Since in most cases maximum likelihood is the method of choice, by default gretl computes ML
estimates. The 2-step Heckit estimates can be obtained by using the --two-step option. After
estimation, the $uhat accessor contains the generalized residuals. As in the ordinary Tobit model,
the residuals equal the dierence between actual and tted y
i
only for uncensored observations
(those for which d
i
= 1).
Example 29.6 shows two estimates from the dataset used in Mroz (1987): the rst one replicates
Table 22.7 in Greene (2003),
1
while the second one replicates table 17.1 in Wooldridge (2002a).
1
Note that the estimates given by gretl do not coincide with those found in the printed volume. They do, however,
match those found on the errata web page for Greenes book: http://pages.stern.nyu.edu/~wgreene/Text/Errata/
ERRATA5.htm.
Chapter 29. Discrete and censored dependent variables 242
Example 29.6: Heckit model
open mroz87.gdt
genr EXP2 = AX^2
genr WA2 = WA^2
genr KIDS = (KL6+K618)>0
# Greenes specification
list X = const AX EXP2 WE CIT
list Z = const WA WA2 FAMINC KIDS WE
heckit WW X ; LFP Z --two-step
heckit WW X ; LFP Z
# Wooldridges specification
series NWINC = FAMINC - WW*WHRS
series lww = log(WW)
list X = const WE AX EXP2
list Z = X NWINC WA KL6 K618
heckit lww X ; LFP Z --two-step
29.8 Count data
Here the dependent variable is assumed to be a non-negative integer, so a probabilistic description
of y
i
x
i
must hinge on some discrete distribution. The most common model is the Poisson model,
in which
P(y
i
= Yx
i
) = e
Y
i
Y!
i
= exp
_
_
_
j
x
ij
j
_
_
In some cases, an oset variable is needed. The number of occurrences of y
i
in a given time is
assumed to be strictly proportional to the oset variable n
i
. In the epidemiology literature, the
oset is known as population at risk. In this case, the model becomes
i
= n
i
exp
_
_
_
j
x
ij
j
_
_
Another way to look at the oset variable is to consider its natural log as just another explanatory
variable whose coecient is constrained to be one.
Estimation is carried out by maximum likelihood and follows the syntax
poisson depvar indep
If an oset variable is needed, it has to be specied at the end of the command, separated from the
list of explanatory variables by a semicolon, as in
poisson depvar indep ; offset
Chapter 29. Discrete and censored dependent variables 243
It should be noted that the poisson command does not use, internally, the same optimization
engines as most other gretl command, such as arma or tobit. As a consequence, some details may
dier: the --verbose option will yield dierent output and settings such as bfgs_toler will not
work.
Overdispersion
In the Poisson model, E(y
i
x
i
) = V(y
i
x
i
) =
i
, that is, the conditional mean equals the conditional
variance by construction. In many cases, this feature is at odds with the data, as the conditional
variance is often larger than the mean; this phenomenon is called overdispersion. The output
from the poisson command includes a conditional moment test for overdispersion (as per David-
son and MacKinnon (2004), section 11.5), which is printed automatically after estimation.
Overdispersion can be attributed to unmodeled heterogeneity between individuals. Two data points
with the same observable characteristics x
i
= x
j
may dier because of some unobserved scale
factor s
i
}= s
j
so that
E(y
i
x
i
, s
i
) =
i
s
i
}=
j
s
j
= E(y
j
x
j
, s
j
)
even though
i
=
j
. In other words, y
i
is a Poisson random variable conditional on both x
i
and s
i
,
but since s
i
is unobservable, the only thing we can we can use, P(y
i
x
i
), will not follow the Poisson
distribution.
It is often assumed that s
i
can be represented as a gamma random variable with mean 1 and
variance : the parameter is estimated together with the vector , and measures the degree of
heterogeneity between individuals.
In this case, the conditional probability for y
i
given x
i
can be shown to be
P(y
i
x
i
) =
(y
i
+
1
)
(
1
) (y
i
+1)
_
i
i
+
1
_
y
i
_
1
i
+
1
_
1
(29.16)
which is known as the Negative Binomial Model. The conditional mean is still E(y
i
x
i
) =
i
, but
the variance equals V(y
i
x
i
) =
i
(1 +
i
). The gretl command for this model is negbin depvar
indep.
There is also a less used variant of the negative binomial model, in which the conditional vari-
ance is a scalar multiple of the conditional mean, that is V(y
i
x
i
) =
i
(1 +). To distinguish
between the two, the model (29.16) is termed Type 2. Gretl implements model 1 via the
option --model1.
A script which exemplies the above models is included among gretls sample scripts, under
the name camtriv.inp.
FIXME: expand.
29.9 Duration models
In some contexts we wish to apply econometric methods to measurements of the duration of certain
states. Classic examples include the following:
From engineering, the time to failure of electronic or mechanical components: how long do,
say, computer hard drives last until they malfunction?
From the medical realm: how does a new treatment aect the time from diagnosis of a certain
condition to exit from that condition (where exit might mean death or full recovery)?
From economics: the duration of strikes, or of spells of unemployment.
Chapter 29. Discrete and censored dependent variables 244
In each case we may be interested in how the durations are distributed, and how they are aected
by relevant covariates. There are several approaches to this problem; the one we discuss here
which is currently the only one supported by gretl is estimation of a parametric model by means
of Maximum Likelihood. In this approach we hypothesize that the durations follow some denite
probability law and we seek to estimate the parameters of that law, factoring in the inuence of
covariates.
We may express the density (PDF) of the durations as f(t, X, ), where t is the length of time in the
state in question, X is a matrix of covariates, and is a vector of parameters. The likelihood for a
sample of n observations indexed by i is then
L =
n
i=1
f(t
i
, x
i
, )
Rather than working with the density directly, however, it is standard practice to factor f() into
two components, namely a hazard function, , and a survivor function, S. The survivor function
gives the probability that a state lasts at least as long as t; it is therefore 1 F(t, X, ) where F
is the CDF corresponding to the density f(). The hazard function addresses this question: given
that a state has persisted as long as t, what is the likelihood that it ends within a short increment
of time beyond t that is, it ends between t and t +? Taking the limit as goes to zero, we end
up with the ratio of the density to the survivor function:
2
(t, X, ) =
f(t, X, )
S(t, X, )
(29.17)
so the log-likelihood can be written as
=
n
_
i=1
logf(t
i
, x
i
, ) =
n
_
i=1
log(t
i
, x
i
, ) +logS(t
i
, x
i
, ) (29.18)
One point of interest is the shape of the hazard function, in particular its dependence (or not) on
time since the state began. If does not depend on t we say the process in question exhibits du-
ration independence: the probability of exiting the state at any given moment neither increases nor
decreases based simply on how long the state has persisted to date. The alternatives are positive
duration dependence (the likelihood of exiting the state rises, the longer the state has persisted)
or negative duration dependence (exit becomes less likely, the longer it has persisted). Finally, the
behavior of the hazard with respect to time need not be monotonic; some parameterizations allow
for this possibility and some do not.
Since durations are inherently positive the probability distribution used in modeling must respect
this requirement, giving a density of zero for t 0. Four common candidates are the exponential,
Weibull, log-logistic and log-normal, the Weibull being the most common choice. The table below
displays the density and the hazard function for each of these distributions as they are commonly
parameterized, written as functions of t alone. ( and denote, respectively, the Gaussian PDF
and CDF.)
density, f(t) hazard, (t)
Exponential exp(t)
Weibull
t
1
exp[(t)
t
1
Log-logistic
(t)
1
[1 +(t)
]
2
(t)
1
[1 +(t)
]
Log-normal
1
t
_
(logt )/
_
1
t
_
(logt )/
_
_
(logt )/
_
2
For a fuller discussion see, for example, Davidson and MacKinnon (2004).
Chapter 29. Discrete and censored dependent variables 245
The hazard is constant for the exponential distribution. For the Weibull, it is monotone increasing
in t if > 1, or monotone decreasing for < 1. (If = 1 the Weibull collapses to the exponential.)
The log-logistic and log-normal distributions allow the hazard to vary with t in a non-monotonic
fashion.
Covariates are brought into the picture by allowing them to govern one of the parameters of the
density, so that the durations are not identically distributed across cases. For example, when using
the log-normal distribution it is natural to make , the expected value of logt, depend on the
covariates, X. This is typically done via a linear index function: = X.
Note that the expressions for the log-normal density and hazard contain the term (logt )/.
Replacing with X this becomes (logt X)/. It turns out that this constitutes a useful simpli-
fying change of variables for all of the distributions discussed here. As in Kalbeisch and Prentice
(2002), we dene
w
i
(logt
i
x
i
)/
The interpretation of the scale factor, , in this expression depends on the distribution. For the
log-normal, represents the standard deviation of logt; for the Weibull and the log-logistic it
corresponds to 1/; and for the exponential it is xed at unity. For distributions other than the
log-normal, X corresponds to log, or in other words = exp(X).
With this change of variables, the density and survivor functions may be written compactly as
follows (the exponential is the same as the Weibull).
density, f(w
i
) survivor, S(w
i
)
Weibull exp(w
i
e
w
i
) exp(e
w
i
)
Log-logistic e
w
i
(1 +e
w
i
)
2
(1 +e
w
i
)
1
Log-normal (w
i
) (w
i
)
In light of the above we may think of the generic parameter vector , as in f(t, X, ), as composed
of the coecients on the covariates, , plus (in all cases apart from the exponential) the additional
parameter .
A complication in estimation of is posed by incomplete spells. That is, in some cases the state
in question may not have ended at the time the observation is made (e.g. some workers remain
unemployed, some components have not yet failed). If we use t
i
to denote the time from entering
the state to either (a) exiting the state or (b) the observation window closing, whichever comes rst,
then all we know of the right-censored cases (b) is that the duration was at least as long as t
i
.
This can be handled by rewriting the the log-likelihood (compare 29.18) as
i
=
n
_
i=1
i
logS (w
i
) +(1
i
)
_
log +logf (w
i
)
_
(29.19)
where
i
equals 1 for censored cases (incomplete spells), and 0 for complete observations. The
rationale for this is that the log-density equals the sum of the log hazard and the log survivor
function, but for the incomplete spells only the survivor function contributes to the likelihood. So
in (29.19) we are adding up the log survivor function alone for the incomplete cases, plus the full
log density for the completed cases.
Implementation in gretl and illustration
The duration command accepts a list of series on the usual pattern: dependent variable followed
by covariates. If right-censoring is present in the data this should be represented by a dummy
variable corresponding to
i
above, separated from the covariates by a semicolon. For example,
duration durat 0 X ; cens
Chapter 29. Discrete and censored dependent variables 246
where durat measures durations, 0 represents the constant (which is required for such models), X
is a named list of regressors, and cens is the censoring dummy.
By default the Weibull distribution is used; you can substitute any of the other three distribu-
tions discussed here by appending one of the option ags --exponential, --loglogistic or
--lognormal.
Interpreting the coecients in a duration model requires some care, and we will work through an
illustrative case. The example comes from section 20.3 of Wooldridge (2002a), and it concerns
criminal recidivism.
3
The data (lename recid.gdt) pertain to a sample of 1,445 convicts released
from prison between July 1, 1977 and June 30, 1978. The dependent variable is the time in months
until they are again arrested. The information was gathered retrospectively by examining records
in April 1984; the maximum possible length of observation is 81 months. Right-censoring is impor-
tant: when the date were compiled about 62 percent had not been arrested. The dataset contains
several covariates, which are described in the data le; we will focus below on interpretation of the
married variable, a dummy which equals 1 if the respondent was married when imprisoned.
Example 29.7 shows the gretl commands for a Weibull model along with most of the output. Con-
sider rst the scale factor, . The estimate is 1.241 with a standard error of 0.048. (We dont print
a z score and p-value for this term since H
0
: = 0 is not of interest.) Recall that corresponds
to 1/; we can be condent that is less than 1, so recidivism displays negative duration depen-
dence. This makes sense: it is plausible that if a past oender manages to stay out of trouble for
an extended period his risk of engaging in crime again diminishes. (The exponential model would
therefore not be appropriate in this case.)
On a priori grounds, however, we may doubt the monotonic decline in hazard that is implied by
the Weibull specication. Even if a person is liable to return to crime, it seems relatively unlikely
that he would do so straight out of prison. In the data, we nd that only 2.6 percent of those
followed were rearrested within 3 months. The log-normal specication, which allows the hazard
to rise and then fall, may be more appropriate. Using the duration command again with the same
covariates but the --lognormal ag, we get a log-likelihood of 1597 as against 1633 for the
Weibull, conrming that the log-normal gives a better t.
Let us now focus on the married coecient, which is positive in both specications but larger and
more sharply estimated in the log-normal variant. The rst thing is to get the interpretation of the
sign right. Recall that X enters negatively into the intermediate variable w. The Weibull hazard is
(w
i
) = e
w
i
, so being married reduces the hazard of re-oending, or in other words lengthens the
expected duration out of prison. The same qualitative interpretation applies for the log-normal.
To get a better sense of the married eect, it is useful to show its impact on the hazard across time.
We can do this by plotting the hazard for two values of the index function X: in each case the
values of all the covariates other than married are set to their means (or some chosen values) while
married is set rst to 0 then to 1. Example 29.8 provides a script that does this, and the resulting
plots are shown in Figure 29.1. Note that when computing the hazards we need to multiply by the
Jacobian of the transformation from t
i
to w
i
= log(t
i
x
i
)/, namely 1/t. Note also that the
estimate of is available via the accessor $sigma, but it is also present as the last element in the
coecient vector obtained via $coeff.
A further dierence between the Weibull and log-normal specications is illustrated in the plots.
The Weibull is an instance of a proportional hazard model. This means that for any sets of values of
the covariates, x
i
and x
j
, the ratio of the associated hazards is invariant with respect to duration. In
this example the Weibull hazard for unmarried individuals is always 1.1637 times that for married.
In the log-normal variant, on the other hand, this ratio gradually declines from 1.6703 at one month
to 1.1766 at 100 months.
3
Germn Rodrguez of Princeton University has a page discussing this example and displaying estimates from Stata at
http://data.princeton.edu/pop509a/recid1.html.
Chapter 29. Discrete and censored dependent variables 247
Example 29.7: Weibull model for recidivism data
Input:
open recid.gdt
list X = workprg priors tserved felon alcohol drugs \
black married educ age
duration durat 0 X ; cens
duration durat 0 X ; cens --lognormal
Partial output:
Model 1: Duration (Weibull), using observations 1-1445
Dependent variable: durat
coefficient std. error z p-value
--------------------------------------------------------
const 4.22167 0.341311 12.37 3.85e-35 ***
workprg -0.112785 0.112535 -1.002 0.3162
priors -0.110176 0.0170675 -6.455 1.08e-10 ***
tserved -0.0168297 0.00213029 -7.900 2.78e-15 ***
felon 0.371623 0.131995 2.815 0.0049 ***
alcohol -0.555132 0.132243 -4.198 2.69e-05 ***
drugs -0.349265 0.121880 -2.866 0.0042 ***
black -0.563016 0.110817 -5.081 3.76e-07 ***
married 0.188104 0.135752 1.386 0.1659
educ 0.0289111 0.0241153 1.199 0.2306
age 0.00462188 0.000664820 6.952 3.60e-12 ***
sigma 1.24090 0.0482896
Chi-square(10) 165.4772 p-value 2.39e-30
Log-likelihood -1633.032 Akaike criterion 3290.065
Model 2: Duration (log-normal), using observations 1-1445
Dependent variable: durat
coefficient std. error z p-value
---------------------------------------------------------
const 4.09939 0.347535 11.80 4.11e-32 ***
workprg -0.0625693 0.120037 -0.5213 0.6022
priors -0.137253 0.0214587 -6.396 1.59e-10 ***
tserved -0.0193306 0.00297792 -6.491 8.51e-11 ***
felon 0.443995 0.145087 3.060 0.0022 ***
alcohol -0.634909 0.144217 -4.402 1.07e-05 ***
drugs -0.298159 0.132736 -2.246 0.0247 **
black -0.542719 0.117443 -4.621 3.82e-06 ***
married 0.340682 0.139843 2.436 0.0148 **
educ 0.0229194 0.0253974 0.9024 0.3668
age 0.00391028 0.000606205 6.450 1.12e-10 ***
sigma 1.81047 0.0623022
Chi-square(10) 166.7361 p-value 1.31e-30
Log-likelihood -1597.059 Akaike criterion 3218.118
Chapter 29. Discrete and censored dependent variables 248
Example 29.8: Create plots showing conditional hazards
open recid.gdt -q
# leave married separate for analysis
list X = workprg priors tserved felon alcohol drugs \
black educ age
# Weibull variant
duration durat 0 X married ; cens
# coefficients on all Xs apart from married
matrix beta_w = $coeff[1:$ncoeff-2]
# married coefficient
scalar mc_w = $coeff[$ncoeff-1]
scalar s_w = $sigma
# Log-normal variant
duration durat 0 X married ; cens --lognormal
matrix beta_n = $coeff[1:$ncoeff-2]
scalar mc_n = $coeff[$ncoeff-1]
scalar s_n = $sigma
list allX = 0 X
# evaluate X\beta at means of all variables except marriage
scalar Xb_w = meanc({allX}) * beta_w
scalar Xb_n = meanc({allX}) * beta_n
# construct two plot matrices
matrix mat_w = zeros(100, 3)
matrix mat_n = zeros(100, 3)
loop t=1..100 -q
# first column, duration
mat_w[t, 1] = t
mat_n[t, 1] = t
wi_w = (log(t) - Xb_w)/s_w
wi_n = (log(t) - Xb_n)/s_n
# second col: hazard with married = 0
mat_w[t, 2] = (1/t) * exp(wi_w)
mat_n[t, 2] = (1/t) * pdf(z, wi_n) / cdf(z, -wi_n)
wi_w = (log(t) - (Xb_w + mc_w))/s_w
wi_n = (log(t) - (Xb_n + mc_n))/s_n
# third col: hazard with married = 1
mat_w[t, 3] = (1/t) * exp(wi_w)
mat_n[t, 3] = (1/t) * pdf(z, wi_n) / cdf(z, -wi_n)
endloop
colnames(mat_w, "months unmarried married")
colnames(mat_n, "months unmarried married")
gnuplot 2 3 1 --with-lines --supp --matrix=mat_w --output=weibull.plt
gnuplot 2 3 1 --with-lines --supp --matrix=mat_n --output=lognorm.plt
Chapter 29. Discrete and censored dependent variables 249
0.006
0.008
0.010
0.012
0.014
0.016
0.018
0.020
0 20 40 60 80 100
months
Weibull
unmarried
married
0.006
0.008
0.010
0.012
0.014
0.016
0.018
0.020
0 20 40 60 80 100
months
Log-normal
unmarried
married
Figure 29.1: Recidivism hazard estimates for married and unmarried ex-convicts
Chapter 29. Discrete and censored dependent variables 250
Alternative representations of the Weibull model
One point to watch out for with the Weibull duration model is that the estimates may be represented
in dierent ways. The representation given by gretl is sometimes called the accelerated failure-time
(AFT) metric. An alternative that one sometimes sees is the log relative-hazard metric; in fact this is
the metric used in Wooldridges presentation of the recidivism example. To get from AFT estimates
to log relative-hazard form it is necessary to multiply the coecients by
1
. For example, the
married coecient in the Weibull specication as shown here is 0.188104 and is 1.24090, so the
alternative value is 0.152, which is what Wooldridge shows (2002a, Table 20.1).
Fitted values and residuals
By default, gretl computes tted values (accessible via $yhat) as the conditional mean of duration.
The formulae are shown below (where denotes the gamma function, and the exponential variant
is just Weibull with = 1).
Weibull Log-logistic Log-normal
exp(X) (1 +) exp(X)
sin()
exp(X +
2
/2)
The expression given for the log-logistic mean, however, is valid only for < 1; otherwise the
expectation is undened, a point that is not noted in all software.
4
Alternatively, if the --medians option is given, gretls duration command will produce conditional
medians as the content of $yhat. For the Weibull the median is exp(X)(log2)
so as to minimize the sum of absolute residuals. Hence the method is known as Least Absolute
Deviations or LAD. While the OLS problem has a straightforward analytical solution, LAD is a linear
programming problem.
Quantile regression is a generalization of median regression: the regression function predicts the
conditional -quantile of the dependent variable for example the rst quartile ( = .25) or the
ninth decile ( = .90).
If the classical conditions for the validity of OLS are satised that is, if the error term is inde-
pendently and identically distributed, conditional on X then quantile regression is redundant:
all the conditional quantiles of the dependent variable will march in lockstep with the conditional
mean. Conversely, if quantile regression reveals that the conditional quantiles behave in a manner
quite distinct from the conditional mean, this suggests that OLS estimation is problematic.
As of version 1.7.5, gretl oers quantile regression functionality (in addition to basic LAD regres-
sion, which has been available since early in gretls history via the lad command).
1
30.2 Basic syntax
The basic invocation of quantile regression is
quantreg tau reglist
where
reglist is a standard gretl regression list (dependent variable followed by regressors, including
the constant if an intercept is wanted); and
tau is the desired conditional quantile, in the range 0.01 to 0.99, given either as a numerical
value or the name of a pre-dened scalar variable (but see below for a further option).
Estimation is via the FrischNewton interior point solver (Portnoy and Koenker, 1997), which is sub-
stantially faster than the traditional BarrodaleRoberts (1974) simplex approach for large prob-
lems.
1
We gratefully acknowledge our borrowing from the quantreg package for GNU R (version 4.17). The core of the
quantreg package is composed of Fortran code written by Roger Koenker; this is accompanied by various driver and
auxiliary functions written in the R language by Koenker and Martin Mchler. The latter functions have been re-worked
in C for gretl. We have added some guards against potential numerical problems in small samples.
251
Chapter 30. Quantile regression 252
By default, standard errors are computed according to the asymptotic formula given by Koenker
and Bassett (1978). Alternatively, if the --robust option is given, we use the sandwich estimator
developed in Koenker and Zhao (1994).
2
30.3 Condence intervals
An option --intervals is available. When this is given we print condence intervals for the para-
meter estimates instead of standard errors. These intervals are computed using the rank inversion
method and in general they are asymmetrical about the point estimates that is, they are not
simply plus or minus so many standard errors. The specics of the calculation are inected by
the --robust option: without this, the intervals are computed on the assumption of IID errors
(Koenker, 1994); with it, they use the heteroskedasticity-robust estimator developed by Koenker
and Machado (1999).
By default, 90 percent intervals are produced. You can change this by appending a condence value
(expressed as a decimal fraction) to the intervals option, as in
quantreg tau reglist --intervals=.95
When the condence intervals option is selected, the parameter estimates are calculated using
the BarrodaleRoberts method. This is simply because the FrischNewton code does not currently
support the calculation of condence intervals.
Two further details. First, the mechanisms for generating condence intervals for quantile esti-
mates require that the model has at least two regressors (including the constant). If the --intervals
option is given for a model containing only one regressor, an error is agged. Second, when a model
is estimated in this mode, you can retrieve the condence intervals using the accessor $coeff_ci.
This produces a k 2 matrix, where k is the number of regressors. The lower bounds are in the
rst column, the upper bounds in the second. See also section 30.5 below.
30.4 Multiple quantiles
As a further option, you can give tau as a matrix either the name of a predened matrix or in
numerical form, as in {.05, .25, .5, .75, .95}. The given model is estimated for all the
values and the results are printed in a special form, as shown below (in this case the --intervals
option was also given).
Model 1: Quantile estimates using the 235 observations 1-235
Dependent variable: foodexp
With 90 percent confidence intervals
VARIABLE TAU COEFFICIENT LOWER UPPER
const 0.05 124.880 98.3021 130.517
0.25 95.4835 73.7861 120.098
0.50 81.4822 53.2592 114.012
0.75 62.3966 32.7449 107.314
0.95 64.1040 46.2649 83.5790
income 0.05 0.343361 0.343327 0.389750
0.25 0.474103 0.420330 0.494329
0.50 0.560181 0.487022 0.601989
0.75 0.644014 0.580155 0.690413
0.95 0.709069 0.673900 0.734441
2
These correspond to the iid and nid options in Rs quantreg package, respectively.
Chapter 30. Quantile regression 253
0.3
0.35
0.4
0.45
0.5
0.55
0.6
0.65
0.7
0.75
0 0.2 0.4 0.6 0.8 1
tau
Coefficient on income
Quantile estimates with 90% band
OLS estimate with 90% band
Figure 30.1: Regression of food expenditure on income; Engels data
The gretl GUI has an entry for Quantile Regression (under /Model/Robust estimation), and you can
select multiple quantiles there too. In that context, just give space-separated numerical values (as
per the predened options, shown in a drop-down list).
When you estimate a model in this way most of the standard menu items in the model window
are disabled, but one extra item is available graphs showing the sequence for a given coef-
cient in comparison with the OLS coecient. An example is shown in Figure 30.1. This sort of
graph provides a simple means of judging whether quantile regression is redundant (OLS is ne) or
informative.
In the example shown based on data on household income and food expenditure gathered by
Ernst Engel (18211896) it seems clear that simple OLS regression is potentially misleading. The
crossing of the OLS estimate by the quantile estimates is very marked.
However, it is not always clear what implications should be drawn from this sort of conict. With
the Engel data there are two issues to consider. First, Engels famous law claims an income-
elasticity of food consumption that is less than one, and talk of elasticities suggests a logarithmic
formulation of the model. Second, there are two apparently anomalous observations in the data
set: household 105 has the third-highest income but unexpectedly low expenditure on food (as
judged from a simple scatter plot), while household 138 (which also has unexpectedly low food
consumption) has much the highest income, almost twice that of the next highest.
With n = 235 it seems reasonable to consider dropping these observations. If we do so, and adopt
a loglog formulation, we get the plot shown in Figure 30.2. The quantile estimates still cross the
OLS estimate, but the evidence against OLS is much less compelling: the 90 percent condence
bands of the respective estimates overlap at all the quantiles considered.
30.5 Large datasets
As noted above, when you give the --intervals option with the quantreg command, which calls
for estimation of condence intervals via rank inversion, gretl switches from the default Frisch
Newton algorithm to the BarrodaleRoberts simplex method.
Chapter 30. Quantile regression 254
0.76
0.78
0.8
0.82
0.84
0.86
0.88
0.9
0.92
0.94
0.96
0 0.2 0.4 0.6 0.8 1
tau
Coefficient on log(income)
Quantile estimates with 90% band
OLS estimate with 90% band
Figure 30.2: Loglog regression; 2 observations dropped from full Engel data set.
This is OK for moderately large datasets (up to, say, a few thousand observations) but on very large
problems the simplex algorithm may become seriously bogged down. For example, Koenker and
Hallock (2001) present an analysis of the determinants of birth weights, using 198377 observations
and with 15 regressors. Generating condence intervals via BarrodaleRoberts for a single value of
took about half an hour on a Lenovo Thinkpad T60p with 1.83GHz Intel Core 2 processor.
If you want condence intervals in such cases, you are advised not to use the --intervals option,
but to compute them using the method of plus or minus so many standard errors. (One Frisch
Newton run took about 8 seconds on the same machine, showing the superiority of the interior
point method.) The script below illustrates:
quantreg .10 y 0 xlist
scalar crit = qnorm(.95)
matrix ci = $coeff - crit * $stderr
ci = ci~($coeff + crit * $stderr)
print ci
The matrix ci will contain the lower and upper bounds of the (symmetrical) 90 percent condence
intervals.
To avoid a situation where gretl becomes unresponsive for a very long time we have set the maxi-
mum number of iterations for the BorrodaleRoberts algorithm to the (somewhat arbitrary) value
of 1000. We will experiment further with this, but for the meantime if you really want to use this
method on a large dataset, and dont mind waiting for the results, you can increase the limit using
the set command with parameter rq_maxiter, as in
set rq_maxiter 5000
Chapter 31
Nonparametric methods
The main focus of gretl is on parametric estimation, but we oer a selection of nonparametric
methods. The most basic of these
various tests for dierence in distribution (Sign test, Wilcoxon rank-sumtest, Wilcoxon signed-
rank test);
the Runs test for randomness; and
nonparametric measures of association: Spearmans rho and Kendalls tau.
Details on the above can be found by consulting the help for the commands difftest, runs, corr
and spearman. In the GUI program these items are found under the Tools menu and the Robust
estimation item under the Model menu.
In this chapter we concentrate on two relatively complex methods for nonparametric curve-tting
and prediction, namely William Clevelands loess (also known as lowess) and the Nadaraya
Watson estimator.
31.1 Locally weighted regression (loess)
Loess (Cleveland, 1979) is a nonparametric smoother employing locally weighted polynomial re-
gression. It is intended to yield an approximation to g() when the dependent variable, y, can be
expressed as
y
i
= g(x
i
) +
i
for some smooth function g().
Given a sample of n observations on the variables y and x, the procedure is to run a weighted least
squares regression (a polynomial of order d = 0, 1 or 2 in x) localized to each data point, i. In each
such regression the sample consists of the r nearest neighbors (in the x dimension) to the point i,
with weights that are inversely related to the distance x
i
x
k
, k = 1, . . . , r. The predicted value
y
i
is then obtained by evaluating the estimated polynomial at x
i
. The most commonly used order
is d = 1.
A bandwidth parameter 0 < q 1 controls the proportion of the total number of data points used
in each regression; thus r = qn (rounded up to an integer). Larger values of q lead to a smoother
tted series, smaller values to a series that tracks the actual data more closely; 0.25 q 0.5 is
often a suitable range.
In gretls implementation of loess the weighting scheme is that given by Cleveland, namely,
w
k
(x
i
) = W(h
1
i
(x
k
x
i
))
where h
i
is the distance between x
i
and its r
th
nearest neighbor, and W() is the tricube function,
W(x) =
_
(1 x
3
)
3
for x < 1
0 for x 1
255
Chapter 31. Nonparametric methods 256
The local regression can be made robust via an adjustment based on the residuals, e
i
= y
i
y
i
.
Robustness weights,
k
, are dened by
k
= B(e
k
/6s)
where s is the median of the e
i
and B() is the bisquare function,
B(x) =
_
(1 x
2
)
2
for x < 1
0 for x 1
The polynomial regression is then re-run using weight
k
w
k
(x
i
) at (x
k
, y
k
).
The loess() function in gretl takes up to ve arguments as follows: the y series, the x series, the
order d, the bandwidth q, and a boolean switch to turn on the robust adjustment. The last three
arguments are optional: if they are omitted the default values are d = 1, q = 0.5 and no robust
adjustment. An example of a full call to loess() is shown below; in this case a quadratic in x is
specied, three quarters of the data points will be used in each local regression, and robustness is
turned on:
series yh = loess(y, x, 2, 0.75, 1)
An illustration of loess is provided in Example 31.1: we generate a series that has a deterministic
sine wave component overlaid with noise uniformly distributed on (1, 1). Loess is then used to
retrieve a good approximation to the sine function. The resulting graph is shown in Figure 31.1.
Example 31.1: Loess script
nulldata 120
series x = index
scalar n = $nobs
series y = sin(2*pi*x/n) + uniform(-1, 1)
series yh = loess(y, x, 2, 0.75, 0)
gnuplot y yh x --output=display --with-lines=yh
-2
-1.5
-1
-0.5
0
0.5
1
1.5
2
0 20 40 60 80 100 120
x
loess fit
Figure 31.1: Loess: retrieving a sine wave
Chapter 31. Nonparametric methods 257
31.2 The NadarayaWatson estimator
The NadarayaWatson nonparametric estimator (Nadaraya, 1964; Watson, 1964) is an estimator
for the conditional mean of a variable Y, available in a sample of size n, for a given value of a
conditioning variable X, and is dened as
m(X) =
n
j=1
y
j
K
h
(X x
j
)
n
j=1
K
h
(X x
j
)
where K
h
() is the so-called kernel function, which is usually some simple transform of a density
function that depends on a scalar called the bandwidth. The one gretl uses is given by
K
h
(x) = exp
_
x
2
2h
_
for x < and zero otherwise. The scalar is used to prevent numerical problems when the
kernel function is evaluated too far away from zero and is called the trim parameter.
Example 31.2: NadarayaWatson example
# Nonparametric regression example: husbands age on wifes age
open mroz87.gdt
# initial value for the bandwidth
scalar h = $nobs^(-0.2)
# three increasingly smoother estimates
series m0 = nadarwat(HA, WA, h)
series m1 = nadarwat(HA, WA, h * 5)
series m2 = nadarwat(HA, WA, h * 10)
# produce the graph
dataset sortby WA
gnuplot m0 m1 m2 HA WA --output=display --with-lines
Example 31.2 produces the graph shown in Figure 31.2 (after some slight editing).
The choice of the bandwidth is up to the user: larger values of h lead to a smoother m() function;
smaller values make the m() function follow the y
i
values more closely, so that the function
appears more jagged. In fact, as h , m(x
i
)
Y; on the contrary, if h 0, observations for
which x
i
}= X are not taken into account at all when computing m(X).
Also, the statistical properties of m() vary with h: its variance can be shown to be decreasing in
h, while its squared bias is increasing in h. It can be shown that choosing h n
1/5
minimizes the
RMSE, so that value is customarily taken as a reference point.
Note that the kernel function has its tails trimmed. The scalar , which controls the level at
which trimming occurs is set by default at 4 h; this setting, however, may be changed via the set
command. For example,
set nadarwat_trim 10
sets = 10 h. This may at times produce more sensible results in regions of X with sparse
support; however, you should be aware that in those same cases machine precision (division by
numerical zero) may render your results spurious. The default is relatively safe, but experimenting
with larger values may be a sensible strategy in some cases.
Chapter 31. Nonparametric methods 258
30
35
40
45
50
55
60
30 35 40 45 50 55 60
H
A
WA
m0
m1
m2
Figure 31.2: NadarayaWatson example for several choices of the bandwidth parameter
A common variant of the NadarayaWatson estimator is the so-called leave-one-out estimator:
this is a variant of the estimator that does not use the i-th observation for evaluating m(x
i
). This
makes the estimator more robust numerically and its usage is often advised for inference purposes.
In formulae, the leave-one-out estimator is
m(x
i
) =
j}=i
y
j
K
h
(x
i
x
j
)
j}=i
K
h
(x
i
x
j
)
In order to have gretl compute the leave-one-out estimator, just reverse the sign of h: if we changed
example 31.2 by substituting
scalar h = $nobs^(-0.2)
with
scalar h = -($nobs^(-0.2))
the rest of the example would have stayed unchanged, the only dierence being the usage of the
leave-one-out estimator.
Although X could be, in principle, any value, in the typical usage of this estimator you want to
compute m(X) for X equal to one or more values actually observed in your sample, that is m(x
i
).
If you need a point estimate of m(X) for some value of X which is not present among the valid
observations of your dependent variable, you may want to add some fake observations to your
dataset in which y is missing and x contains the values you want m(x) evaluated at. For example,
the following script evaluates m(x) at regular intervals between -2.0 and 2.0:
nulldata 120
set seed 120496
# first part of the sample: actual data
smpl 1 100
Chapter 31. Nonparametric methods 259
x = normal()
y = x^2 + sin(x) + normal()
# second part of the sample: fake x data
smpl 101 120
x = (obs-110) / 5
# compute the Nadaraya-Watson estimate
# with bandwidth equal to 0.4 (note that
# 100^(-0.2) = 0.398)
smpl full
m = nadarwat(y, x, 0.4)
# show m(x) for the fake x values only
smpl 101 120
print x m -o
and running it produces
x m
101 -1.8 1.165934
102 -1.6 0.730221
103 -1.4 0.314705
104 -1.2 0.026057
105 -1.0 -0.131999
106 -0.8 -0.215445
107 -0.6 -0.269257
108 -0.4 -0.304451
109 -0.2 -0.306448
110 0.0 -0.238766
111 0.2 -0.038837
112 0.4 0.354660
113 0.6 0.908178
114 0.8 1.485178
115 1.0 2.000003
116 1.2 2.460100
117 1.4 2.905176
118 1.6 3.380874
119 1.8 3.927682
120 2.0 4.538364
Part III
Technical details
260
Chapter 32
Gretl and T
E
X
32.1 Introduction
T
E
X initially developed by Donald Knuth of Stanford University and since enhanced by hundreds
of contributors around the world is the gold standard of scientic typesetting. Gretl provides
various hooks that enable you to preview and print econometric results using the T
E
X engine, and
to save output in a form suitable for further processing with T
E
X.
This chapter explains the ner points of gretls T
E
X-related functionality. The next section describes
the relevant menu items; section 32.3 discusses ways of ne-tuning T
E
X output; section 32.4 ex-
plains how to handle the encoding of characters not found in English; and section 32.5 gives some
pointers on installing (and learning) T
E
X if you do not already have it on your computer. (Just to
be clear: T
E
X is not included with the gretl distribution; it is a separate package, including several
programs and a large number of supporting les.)
Before proceeding, however, it may be useful to set out briey the stages of production of a nal
document using T
E
X. For the most part you dont have to worry about these details, since, in regard
to previewing at any rate, gretl handles them for you. But having some grasp of what is going on
behind the scences will enable you to understand your options better.
The rst step is the creation of a plain text source le, containing the text or mathematics to be
typset, interspersed with mark-up that denes how it should be formatted. The second step is to
run the source through a processing engine that does the actual formatting. Typically this is either:
a program called latex that generates so-called DVI (device-independent) output, or
a program called pdatex that generates PDF output.
1
For previewing, one uses either a DVI viewer (typically xdvi on GNU/Linux systems) or a PDF viewer
(for example, Adobes Acrobat Reader or xpdf), depending on how the source was processed. If
the DVI route is taken, theres then a third step to produce printable output, typically using the
program dvips to generate a PostScript le. If the PDF route is taken, the output is ready for
printing without any further processing.
On the MS Windows and Mac OS X platforms, gretl calls pdatex to process the source le, and
expects the operating system to be able to nd the default viewer for PDF output; DVI is not
supported. On GNU/Linux the default is to take the DVI route, but if you prefer to use PDF you
can do the following: select the menu item Tools, Preferences, General then the Programs tab.
Find the item titled Command to compile TeX les, and set this to pdflatex. Make sure the
Command to view PDF les is set to something appropriate.
32.2 T
E
X-related menu items
The model window
The fullest T
E
X support in gretl is found in the GUI model window. This has a menu item titled
LaTeX with sub-items View, Copy, Save and Equation options (see Figure 32.1).
1
Experts will be aware of something called plain T
E
X, which is processed using the program tex. The great majority
of T
E
X users, however, use the L
A
T
E
X macros, initially developed by Leslie Lamport. Gretl does not support plain T
E
X.
261
Chapter 32. Gretl and T
E
X 262
Figure 32.1: L
A
T
E
X menu in model window
The rst three sub-items have branches titled Tabular and Equation. By Tabular we mean that
the model is represented in the form of a table; this is the fullest and most explicit presentation of
the results. See Table 32.1 for an example; this was pasted into the manual after using the Copy,
Tabular item in gretl (a few lines were edited out for brevity).
Table 32.1: Example of L
A
T
E
X tabular output
Model 1: OLS estimates using the 51 observations 151
Dependent variable: ENROLL
Variable Coecient Std. Error t-statistic p-value
const 0.241105 0.0660225 3.6519 0.0007
CATHOL 0.223530 0.0459701 4.8625 0.0000
PUPIL 0.00338200 0.00271962 1.2436 0.2198
WHITE 0.152643 0.0407064 3.7499 0.0005
Mean of dependent variable 0.0955686
S.D. of dependent variable 0.0522150
Sum of squared residuals 0.0709594
Standard error of residuals ( ) 0.0388558
Unadjusted R
2
0.479466
Adjusted
R
2
0.446241
F(3, 47) 14.4306
The Equation option is fairly self-explanatory the results are written across the page in equa-
tion format, as below:
ENROLL = 0.241105
(0.066022)
+0.223530
(0.04597)
CATHOL 0.00338200
(0.0027196)
PUPIL 0.152643
(0.040706)
WHITE
T = 51
R
2
= 0.4462 F(3, 47) = 14.431 = 0.038856
(standard errors in parentheses)
The distinction between the Copy and Save options (for both tabular and equation) is twofold.
First, Copy puts the T
E
X source on the clipboard while with Save you are prompted for the name
of a le into which the source should be saved. Second, with Copy the material is copied as a
Chapter 32. Gretl and T
E
X 263
fragment while with Save it is written as a complete le. The point is that a well-formed T
E
X
source le must have a header that denes the documentclass (article, report, book or whatever)
and tags that say \begin{document} and \end{document}. This material is included when you do
Save but not when you do Copy, since in the latter case the expectation is that you will paste
the data into an existing T
E
X source le that already has the relevant apparatus in place.
The items under Equation options should be self-explanatory: when printing the model in equa-
tion form, do you want standard errors or t-ratios displayed in parentheses under the parameter
estimates? The default is to show standard errors; if you want t-ratios, select that item.
Other windows
Several other sorts of output windows also have T
E
X preview, copy and save enabled. In the case of
windows having a graphical toolbar, look for the T
E
X button. Figure 32.2 shows this icon (second
from the right on the toolbar) along with the dialog that appears when you press the button.
Figure 32.2: T
E
X icon and dialog
One aspect of gretls T
E
X support that is likely to be particularly useful for publication purposes is
the ability to produce a typeset version of the model table (see section 3.4). An example of this is
shown in Table 32.2.
32.3 Fine-tuning typeset output
There are three aspects to this: adjusting the appearance of the output produced by gretl in
L
A
T
E
X preview mode; adjusting the formatting of gretls tabular output for models when using the
tabprint command; and incorporating gretls output into your own T
E
X les.
Previewing in the GUI
As regards preview mode, you can control the appearance of gretls output using a le named
gretlpre.tex, which should be placed in your gretl user directory (see the Gretl Command Ref-
erence). If such a le is found, its contents will be used as the preamble to the T
E
X source. The
default value of the preamble is as follows:
\documentclass[11pt]{article}
\usepackage[latin1]{inputenc} %% but see below
\usepackage{amsmath}
\usepackage{dcolumn,longtable}
\begin{document}
\thispagestyle{empty}
Chapter 32. Gretl and T
E
X 264
Table 32.2: Example of model table output
OLS estimates
Dependent variable: ENROLL
Model 1 Model 2 Model 3
const 0.2907
0.2411
0.08557
(0.07853) (0.06602) (0.05794)
CATHOL 0.2216
0.2235
0.2065
0.1526
(0.04074) (0.04071)
ADMEXP 0.1551
(0.1342)
n 51 51 51
R
2
0.4502 0.4462 0.2956
96.09 95.36 88.69
Standard errors in parentheses
* indicates signicance at the 10 percent level
** indicates signicance at the 5 percent level
Chapter 32. Gretl and T
E
X 265
Note that the amsmath and dcolumn packages are required. (For some sorts of output the longtable
package is also needed.) Beyond that you can, for instance, change the type size or the font by al-
tering the documentclass declaration or including an alternative font package.
The line \usepackage[latin1]{inputenc} is automatically changed if gretl nds itself running
on a system where UTF-8 is the default character encoding see section 32.4 below.
In addition, if you should wish to typeset gretl output in more than one language, you can set
up per-language preamble les. A localized preamble le is identied by a name of the form
gretlpre_xx.tex, where xx is replaced by the rst two letters of the current setting of the LANG
environment variable. For example, if you are running the program in Polish, using LANG=pl_PL,
then gretl will do the following when writing the preamble for a T
E
X source le.
1. Look for a le named gretlpre_pl.tex in the gretl user directory. If this is not found, then
2. look for a le named gretlpre.tex in the gretl user directory. If this is not found, then
3. use the default preamble.
Conversely, suppose you usually run gretl in a language other than English, and have a suitable
gretlpre.tex le in place for your native language. If on some occasions you want to produce T
E
X
output in English, then you could create an additional le gretlpre_en.tex: this le will be used
for the preamble when gretl is run with a language setting of, say, en_US.
Command-line options
After estimating a model via a script or interactively via the gretl console or using the command-
line program gretlcli you can use the commands tabprint or eqnprint to print the model to
le in tabular format or equation format respectively. These options are explained in the Gretl
Command Reference.
If you wish alter the appearance of gretls tabular output for models in the context of the tabprint
command, you can specify a custom row format using the --format ag. The format string must
be enclosed in double quotes and must be tied to the ag with an equals sign. The pattern for the
format string is as follows. There are four elds, representing the coecient, standard error, t-
ratio and p-value respectively. These elds should be separated by vertical bars; they may contain
a printf-type specication for the formatting of the numeric value in question, or may be left
blank to suppress the printing of that column (subject to the constraint that you cant leave all the
columns blank). Here are a few examples:
--format="%.4f|%.4f|%.4f|%.4f"
--format="%.4f|%.4f|%.3f|"
--format="%.5f|%.4f||%.4f"
--format="%.8g|%.8g||%.4f"
The rst of these specications prints the values in all columns using 4 decimal places. The second
suppresses the p-value and prints the t-ratio to 3 places. The third omits the t-ratio. The last one
again omits the t, and prints both coecient and standard error to 8 signicant gures.
Once you set a custom format in this way, it is remembered and used for the duration of the gretl
session. To revert to the default formatting you can use the special variant --format=default.
Further editing
Once you have pasted gretls T
E
X output into your own document, or saved it to le and opened it
in an editor, you can of course modify the material in any wish you wish. In some cases, machine-
generated T
E
X is hard to understand, but gretls output is intended to be human-readable and
-editable. In addition, it does not use any non-standard style packages. Besides the standard L
A
T
E
X
Chapter 32. Gretl and T
E
X 266
document classes, the only les needed are, as noted above, the amsmath, dcolumn and longtable
packages. These should be included in any reasonably full T
E
X implementation.
32.4 Character encodings
People using gretl in English-speaking locales are unlikely to have a problem with this, but if youre
generating T
E
X output in a locale where accented characters (not in the ASCII character set) are
employed, you may want to pay attention here.
Gretl generates T
E
X output using whatever character encoding is standard on the local system. If
the system encoding is in the ISO-8859 family, this will probably be OK wihout any special eort on
the part of the user. Newer GNU/Linux systems, however, typically use Unicode (UTF-8). This is also
OK so long as your T
E
X system can handle UTF-8 input, which requires use of the latex-ucs package.
So: if you are using gretl to generate T
E
X in a non-English locale, where the system encoding is UTF-
8, you will need to ensure that the latex-ucs package is installed. This package may or may not be
installed by default when you install T
E
X.
For reference, if gretl detects a UTF-8 environment, the following lines are used in the T
E
X preamble:
\usepackage{ucs}
\usepackage[utf8x]{inputenc}
32.5 Installing and learning T
E
X
This is not the place for a detailed exposition of these matters, but here are a few pointers.
So far as we know, every GNU/Linux distribution has a package or set of packages for T
E
X, and in
fact these are likely to be installed by default. Check the documentation for your distribution. For
MS Windows, several packaged versions of T
E
X are available: one of the most popular is MiKT
E
X at
http://www.miktex.org/. For Mac OS X a nice implementation is iT
E
XMac, at http://itexmac.
sourceforge.net/. An essential starting point for online T
E
X resources is the Comprehensive T
E
X
Archive Network (CTAN) at http://www.ctan.org/.
As for learning T
E
X, many useful resources are available both online and in print. Among online
guides, Tony Roberts L
A
T
E
X: from quick and dirty to style and nesse is very helpful, at
http://www.sci.usq.edu.au/staff/robertsa/LaTeX/latexintro.html
An excellent source for advanced material is The L
A
T
E
X Companion (Goossens et al., 2004).
Chapter 33
Gretl and R
33.1 Introduction
R is, by far, the largest free statistical project.
1
Like gretl, it is a GNU project and the two have a
lot in common; however, gretls approach focuses on ease of use much more than R, which instead
aims to encompass the widest possible range of statistical procedures.
As is natural in the free software ecosystem, we dont view ourselves as competitors to R,
2
but
rather as projects sharing a common goal who should support each other whenever possible. For
this reason, gretl provides a way to interact with R and thus enable users to pool the capabilities of
the two packages.
In this chapter, we will explain how to exploit Rs power from within gretl. We assume that the
reader has a working installation of R available and a basic grasp of Rs syntax.
3
Despite several valiant attempts, no graphical shell has gained wide acceptance in the R community:
by and large, the standard method of working with R is by writing scripts, or by typing commands
at the R prompt, much in the same way as one would write gretl scripts or work with the gretl
console. In this chapter, the focus will be on the methods available to execute R commands without
leaving gretl.
33.2 Starting an interactive R session
The easiest way to use R from gretl is in interactive mode. Once you have your data loaded in gretl,
you can select the menu item Tools, Start GNU R and an interactive R session will be started, with
your dataset automatically pre-loaded.
A simple example: OLS on cross-section data
For this example we use Ramanathans dataset data4-1, one of the sample les supplied with gretl.
We rst run, in gretl, an OLS regression of price on sqft, bedrms and baths. The basic results are
shown in Table 33.1.
Table 33.1: OLS house price regression via gretl
Variable Coecient Std. Error t-statistic p-value
const 129.062 88.3033 1.4616 0.1746
sqft 0.154800 0.0319404 4.8465 0.0007
bedrms 21.587 27.0293 0.7987 0.4430
baths 12.192 43.2500 0.2819 0.7838
1
Rs homepage is at http://www.r-project.org/.
2
OK, who are we kidding? But its friendly competition!
3
The main reference for R documentation is http://cran.r-project.org/manuals.html. In addition, R tutorials
abound on the Net; as always, Google is your friend.
267
Chapter 33. Gretl and R 268
We will now replicate the above results using R. Select the menu item Tools, Start GNU R. A
window similar to the one shown in gure 33.1 should appear.
Figure 33.1: R window
The actual look of the R window may be somewhat dierent from what you see in Figure 33.1
(especially for Windows users), but this is immaterial. The important point is that you have a
window where you can type commands to R. If the above procedure doesnt work and no R window
opens, it means that gretl was unable to launch R. You should ensure that R is installed and working
on your system and that gretl knows where it is. The relevant settings can be found by selecting
the Tools, Preferences, General menu entry, under the Programs tab.
Assuming R was launched successfully, you will see notication that the data from gretl are avail-
able. In the background, gretl has arranged for two R commands to be executed, one to load the
gretl dataset in the form of a data frame (one of several forms in which R can store data) and one
to attach the data so that the variable names dened in the gretl workspace are available as valid
identiers within R.
In order to replicate gretls OLS estimation, go into the R window and type at the prompt
model <- lm(price ~ sqft + bedrms + baths)
summary(model)
You should see something similar to Figure 33.2. Surprise the estimates coincide! To get out,
just close the R window or type q() at the R prompt.
Time series data
We now turn to an example which uses time series data: we will compare gretls and Rs estimates
of Box and Jenkins immortal airline model. The data are contained in the bjg sample dataset.
The following gretl code
open bjg
arima 0 1 1 ; 0 1 1 ; lg --nc
produces the estimates shown in Table 33.2.
Chapter 33. Gretl and R 269
Figure 33.2: OLS regression on house prices via R
Table 33.2: Airline model from Box and Jenkins (1976) selected portion of gretls estimates
Variable Coecient Std. Error t-statistic p-value
1
0.401824 0.0896421 4.4825 0.0000
1
0.556936 0.0731044 7.6184 0.0000
Variance of innovations 0.00134810
Log-likelihood 244.696
Akaike information criterion 483.39
Chapter 33. Gretl and R 270
If we now open an R session as described in the previous subsection, the data-passing mechanism
is slightly dierent. Since our data were dened in gretl as time series, we use an R time-series
object (ts for short) for the transfer. In this way we can retain in R useful information such as the
periodicity of the data and the sample limits. The downside is that the names of individual series,
as dened in gretl, are not valid identiers. In order to extract the variable lg, one needs to use the
syntax lg <- gretldata[, "lg"].
ARIMA estimation can be carried out by issuing the following two R commands:
lg <- gretldata[, "lg"]
arima(lg, c(0,1,1), seasonal=c(0,1,1))
which yield
Coefficients:
ma1 sma1
-0.4018 -0.5569
s.e. 0.0896 0.0731
sigma^2 estimated as 0.001348: log likelihood = 244.7, aic = -483.4
Happily, the estimates again coincide.
33.3 Running an R script
Opening an R window and keying in commands is a convenient method when the job is small. In
some cases, however, it would be preferable to have R execute a script prepared in advance. One
way to do this is via the source() command in R. Alternatively, gretl oers the facility to edit an R
script and run it, having the current dataset pre-loaded automatically. This feature can be accessed
via the File, Script Files menu entry. By selecting User le, one can load a pre-existing R script;
if you want to create a new script instead, select the New script, R script menu entry.
Figure 33.3: Editing window for R scripts
In either case, you are presented with a window very similar to the editor window used for ordinary
gretl scripts, as in Figure 33.3.
There are two main dierences. First, you get syntax highlighting for Rs syntax instead of gretls.
Second, clicking on the Execute button (the gears icon), launches an instance of R in which your
commands are executed. Before R is actually run, you are asked if you want to run R interactively
or not (see Figure 33.4).
An interactive run opens an R instance similar to the one seen in the previous section: your data
will be pre-loaded (if the pre-load data box is checked) and your commands will be executed.
Once this is done, you will nd yourself at the R prompt, where you can enter more commands.
Chapter 33. Gretl and R 271
Figure 33.4: Editing window for R scripts
A non-interactive run, on the other hand, will execute your script, collect the output from R and
present it to you in an output window; R will be run in the background. If, for example, the script
in Figure 33.3 is run non-interactively, a window similar to Figure 33.5 will appear.
Figure 33.5: Output from a non-interactive R run
33.4 Taking stu back and forth
As regards the passing of data between the two programs, so far we have only considered passing
series from gretl to R. In order to achieve a satisfactory degree of interoperability, more is needed.
In the following sub-sections we see how matrices can be exchanged, and how data can be passed
from R back to gretl.
Chapter 33. Gretl and R 272
Passing matrices from gretl to R
For passing matrices from gretl to R, you can use the mwrite matrix function described in section
13.6. For example, the following gretl code fragment generates the matrix
A =
_
_
_
_
_
_
3 7 11
4 8 12
5 9 13
6 10 14
_
_
_
_
_
_
and stores it into the le mymatfile.mat.
matrix A = mshape(seq(3,14),4,3)
err = mwrite(A, "mymatfile.mat")
In order to retrieve this matrix from R, all you have to do is
A <- as.matrix(read.table("mymatfile.mat", skip=1))
Although in principle you can give your matrix le any valid lename, a couple of conventions may
prove useful. First, you may want to use an informative le sux such as .mat, but this is a
matter of taste. More importantly, the exact location of the le created by mwrite could be an
issue. By default, if no path is specied in the le name, gretl stores matrix les in the current
work directory. However, it may be wise for the purpose at hand to use the directory in which gretl
stores all its temporary les, whose name is stored in the built-in string dotdir (see section 12.2).
The value of this string is automatically passed to R as the string variable gretl.dotdir, so the
above example may be rewritten more cleanly as
Gretl side:
matrix A = mshape(seq(3,14),4,3)
err = mwrite(A, "@dotdir/mymatfile.mat")
R side:
fname <- paste(gretl.dotdir, "mymatfile.mat", sep="")
A <- as.matrix(read.table(fname, skip=1))
Passing data from R to gretl
For passing data in the opposite direction, gretl denes a special function that can be used in the R
environment. An R object will be written as a temporary le in gretls dotdir directory, from where
it can be easily retrieved from within gretl.
The name of this function is gretl.export(), and it accepts one argument, the object to be ex-
ported. At present, the objects that can be exported with this method are matrices, data frames
and time-series objects. The function creates a text le, with the same name as the exported object,
in gretls temporary directory. Data frames and time-series objects are stored as CSV les, and can
be retrieved by using gretls append command. Matrices are stored in a special text format that is
understood by gretl (see section 13.6); the le sux is in this case .mat, and to read the matrix in
gretl you must use the mread() function.
As an example, we take the airline data and use them to estimate a structural time series model
la Harvey (1989). The model we will use is the Basic Structural Model (BSM), in which a time series
is decomposed into three terms:
y
t
=
t
+
t
+
t
Chapter 33. Gretl and R 273
where
t
is a trend component,
t
is a seasonal component and
t
is a noise term. In turn, the
following is assumed to hold:
t
=
t1
+
t
t
=
t
t
=
t
where
s
is the seasonal dierencing operator, (1 L
s
), and
t
,
t
and
t
are mutually uncorre-
lated white noise processes. The object of the analysis is to estimate the variances of the noise
components (which may be zero) and to recover estimates of the latent processes
t
(the level),
t
(the slope) and
t
.
Gretl does not provide (yet) a command for estimating this class of models, so we will use Rs
StructTS command and import the results back into gretl. Once the bjg dataset is loaded in gretl,
we pass the data to R and execute the following script:
# extract the log series
y <- gretldata[, "lg"]
# estimate the model
strmod <- StructTS(y)
# save the fitted components (smoothed)
compon <- as.ts(tsSmooth(strmod))
# save the estimated variances
vars <- as.matrix(strmod$coef)
# export into gretls temp dir
gretl.export(compon)
gretl.export(vars)
Running this script via gretl produces minimal output:
current data loaded as ts object "gretldata"
wrote /home/cottrell/.gretl/compon.csv
wrote /home/cottrell/.gretl/vars.mat
However, we are now able to pull the results back into gretl by executing the following commands,
either from the console or by creating a small script:
4
append @dotdir/compon.csv
vars = mread("@dotdir/vars.mat")
The rst command reads the estimated time-series components from a CSV le, which is the format
that the passing mechanism employs for series. The matrix vars is read from the le vars.mat.
After the above commands have been executed, three new series will have appeared in the gretl
workspace, namely the estimates of the three components; by plotting them together with the
original data, you should get a graph similar to Figure 33.6. The estimates of the variances can be
seen by printing the vars matrix, as in
? print vars
vars (4 x 1)
0.00077185
0.0000
0.0013969
0.0000
4
This example will work on Linux and presumably on OSX without modications. On the Windows platform, you may
have to substitute the / character with \.
Chapter 33. Gretl and R 274
4.6
4.8
5
5.2
5.4
5.6
5.8
6
6.2
6.4
6.6
1949 1955 1961
lg
4.6
4.8
5
5.2
5.4
5.6
5.8
6
6.2
1949 1955 1961
level
0.01
0.01005
0.0101
0.01015
0.0102
0.01025
1949 1955 1961
slope
-0.25
-0.2
-0.15
-0.1
-0.05
0
0.05
0.1
0.15
0.2
0.25
0.3
1949 1955 1961
sea
Figure 33.6: Estimated components from BSM
That is,
2
= 0.00077185,
2
= 0,
2
= 0.0013969,
2
= 0
Notice that, since
2
l
norm, Communications of the ACM 17: 319320.
Baxter, M. and R. G. King (1999) Measuring business cycles: Approximate band-pass lters for
economic time series, The Review of Economics and Statistics 81(4): 575593.
Beck, N. and J. N. Katz (1995) What to do (and not to do) with time-series cross-section data, The
American Political Science Review 89: 634647.
Blundell, R. and S. Bond (1998) Initial conditions and moment restrictions in dynamic panel data
models, Journal of Econometrics 87: 115143.
Bond, S., A. Hoeer and J. Temple (2001) GMMestimation of empirical growth models. Economics
Papers from Economics Group, Nueld College, University of Oxford, No 2001-W21.
Boswijk, H. P. (1995) Identiability of cointegrated systems. Tinbergen Institute Discussion Paper
95-78. http://www.ase.uva.nl/pp/bin/258fulltext.pdf.
Boswijk, H. P. and J. A. Doornik (2004) Identifying, estimating and testing restricted cointegrated
systems: An overview, Statistica Neerlandica 58(4): 440465.
Box, G. E. P. and G. Jenkins (1976) Time Series Analysis: Forecasting and Control, San Franciso:
Holden-Day.
Brand, C. and N. Cassola (2004) A money demand system for euro area M3, Applied Economics
36(8): 817838.
Butterworth, S. (1930) On the theory of lter ampliers, Experimental Wireless & The Wireless
Engineer 7: 536541.
Byrd, R. H., P. Lu, J. Nocedal and C. Zhu (1995) A limited memory algorithm for bound constrained
optimization, SIAM Journal on Scientic Computing 16: 11901208.
305
Bibliography 306
Cameron, A. C. and P. K. Trivedi (2005) Microeconometrics, Methods and Applications, Cambridge:
Cambridge University Press.
Caselli, F., G. Esquivel and F. Lefort (1996) Reopening the convergence debate: A new look at
cross-country growth empirics, Journal of Economic Growth 1(3): 363389.
Chesher, A. and M. Irish (1987) Residual analysis in the grouped and censored normal linear
model, Journal of Econometrics 34: 3361.
Choi, I. (2001) Unit root tests for panel data, Journal of International Money and Finance 20(2):
249272.
Cleveland, W. S. (1979) Robust locally weighted regression and smoothing scatterplots, Journal
of the American Statistical Association 74(368): 829836.
Cribari-Neto, F. and S. G. Zarkos (2003) Econometric and statistical computing using Ox, Compu-
tational Economics 21: 277295.
Davidson, R. and J. G. MacKinnon (1993) Estimation and Inference in Econometrics, New York:
Oxford University Press.
(2004) Econometric Theory and Methods, New York: Oxford University Press.
Doornik, J. A. (1995) Testing general restrictions on the cointegrating space. Discussion Paper,
Nueld College. http://www.doornik.com/research/coigen.pdf.
(1998) Approximations to the asymptotic distribution of cointegration tests, Journal of
Economic Surveys 12: 573593. Reprinted with corrections in McAleer and Oxley (1999).
(2007) Object-Oriented Matrix Programming Using Ox, London: Timberlake Consultants
Press, third edn. www.doornik.com.
Doornik, J. A., M. Arellano and S. Bond (2006) Panel Data estimation using DPD for Ox.
Elliott, G., T. J. Rothenberg and J. H. Stock (1996) Ecient tests for an autoregressive unit root,
Econometrica 64: 813836.
Engle, R. F. and C. W. J. Granger (1987) Co-integration and error correction: Representation, esti-
mation, and testing, Econometrica 55: 251276.
Fiorentini, G., G. Calzolari and L. Panattoni (1996) Analytic derivatives and the computation of
GARCH estimates, Journal of Applied Econometrics 11: 399417.
Frigo, M. and S. G. Johnson (2005) The design and implementation of FFTW3, Proceedings of the
IEEE 93 2: 216231.
Goossens, M., F. Mittelbach and A. Samarin (2004) The L
A
T
E
X Companion, Boston: Addison-Wesley,
second edn.
Gourieroux, C. and A. Monfort (1996) Simulation-Based Econometric Methods, Oxford: Oxford Uni-
versity Press.
Gourieroux, C., A. Monfort, E. Renault and A. Trognon (1987) Generalized residuals, Journal of
Econometrics 34: 532.
Greene, W. H. (2000) Econometric Analysis, Upper Saddle River, NJ: Prentice-Hall, fourth edn.
(2003) Econometric Analysis, Upper Saddle River, NJ: Prentice-Hall, fth edn.
Hall, A. D. (2005) Generalized Method of Moments, Oxford: Oxford University Press.
Hamilton, J. D. (1994) Time Series Analysis, Princeton, NJ: Princeton University Press.
Bibliography 307
Hannan, E. J. and B. G. Quinn (1979) The determination of the order of an autoregression, Journal
of the Royal Statistical Society, B 41: 190195.
Hansen, L. P. (1982) Large sample properties of generalized method of moments estimation,
Econometrica 50: 10291054.
Hansen, L. P. and K. J. Singleton (1982) Generalized instrumental variables estimation of nonlinear
rational expectations models, Econometrica 50: 12691286.
Harvey, A. C. (1989) Forecasting, structural time series models and the Kalman lter, Cambridge:
Cambridge University Press.
Harvey, A. C. and T. Proietti (2005) Readings in Unobserved Component Models, Oxford: Oxford
University Press.
Hausman, J. A. (1978) Specication tests in econometrics, Econometrica 46: 12511271.
Heckman, J. (1979) Sample selection bias as a specication error, Econometrica 47: 153161.
Hodrick, R. and E. C. Prescott (1997) Postwar U.S. business cycles: An empirical investigation,
Journal of Money, Credit and Banking 29: 116.
Im, K. S., M. H. Pesaran and Y. Shin (2003) Testing for unit roots in heterogeneous panels, Journal
of Econometrics 115: 5374.
Johansen, S. (1995) Likelihood-Based Inference in Cointegrated Vector Autoregressive Models, Ox-
ford: Oxford University Press.
de Jong, P. (1991) The diuse Kalman lter, The Annals of Statistics 19: 10731083.
Kalbeisch, J. D. and R. L. Prentice (2002) The Statistical Analysis of Failure Time Data, New York:
Wiley, second edn.
Kalman, R. E. (1960) A new approach to linear ltering and prediction problems, Transactions of
the ASMEJournal of Basic Engineering 82(Series D): 3545.
Keane, M. P. and K. I. Wolpin (1997) The career decisions of young men, Journal of Political
Economy 105: 473522.
Koenker, R. (1994) Condence intervals for regression quantiles. In P. Mandl and M. Huskova
(eds.), Asymptotic Statistics, pp. 349359. New York: Springer-Verlag.
Koenker, R. and G. Bassett (1978) Regression quantiles, Econometrica 46: 3350.
Koenker, R. and K. Hallock (2001) Quantile regression, Journal of Economic Perspectives 15(4):
143156.
Koenker, R. and J. Machado (1999) Goodness of t and related inference processes for quantile
regression, Journal of the American Statistical Association 94: 12961310.
Koenker, R. and Q. Zhao (1994) L-estimation for linear heteroscedastic models, Journal of Non-
parametric Statistics 3: 223235.
Koopman, S. J. (1997) Exact initial Kalman ltering and smoothing for nonstationary time series
models, Journal of the American Statistical Association 92: 16301638.
Koopman, S. J., N. Shephard and J. A. Doornik (1999) Statistical algorithms for models in state
space using SsfPack 2.2, Econometrics Journal 2: 113166.
Kwiatkowski, D., P. C. B. Phillips, P. Schmidt and Y. Shin (1992) Testing the null of stationarity
against the alternative of a unit root: How sure are we that economic time series have a unit
root?, Journal of Econometrics 54: 159178.
Bibliography 308
Levin, A., C.-F. Lin and J. Chu (2002) Unit root tests in panel data: asymptotic and nite-sample
properties, Journal of Econometrics 108: 124.
Lucchetti, R., L. Papi and A. Zazzaro (2001) Banks ineciency and economic growth: A micro
macro approach, Scottish Journal of Political Economy 48: 400424.
Ltkepohl, H. (2005) Applied Time Series Econometrics, Springer.
MacKinnon, J. G. (1996) Numerical distribution functions for unit root and cointegration tests,
Journal of Applied Econometrics 11: 601618.
McAleer, M. and L. Oxley (1999) Practical Issues in Cointegration Analysis, Oxford: Blackwell.
McCullough, B. D. and C. G. Renfro (1998) Benchmarks and software standards: A case study of
GARCH procedures, Journal of Economic and Social Measurement 25: 5971.
Mroz, T. (1987) The sensitivity of an empirical model of married womens hours of work to eco-
nomic and statistical assumptions, Econometrica 5: 765799.
Nadaraya, E. A. (1964) On estimating regression, Theory of Probability and its Applications 9:
141142.
Nash, J. C. (1990) Compact Numerical Methods for Computers: Linear Algebra and Function Min-
imisation, Bristol: Adam Hilger, second edn.
Nerlove, M. (1999) Properties of alternative estimators of dynamic panel models: An empirical
analysis of cross-country data for the study of economic growth. In C. Hsiao, K. Lahiri, L.-F. Lee
and M. H. Pesaran (eds.), Analysis of Panels and Limited Dependent Variable Models. Cambridge:
Cambridge University Press.
Newey, W. K. and K. D. West (1987) A simple, positive semi-denite, heteroskedasticity and auto-
correlation consistent covariance matrix, Econometrica 55: 703708.
(1994) Automatic lag selection in covariance matrix estimation, Review of Economic Stud-
ies 61: 631653.
Okui, R. (2009) The optimal choice of moments in dynamic panel data models, Journal of Econo-
metrics 151(1): 116.
Pollock, D. S. G. (1999) A Handbook of Time-Series Analysis, Signal Processing and Dynamics, New
York: Academic Press.
(2000) Trend estimation and de-trending via rational square-wave lters, Journal of
Econometrics 99(2): 317334.
Portnoy, S. and R. Koenker (1997) The Gaussian hare and the Laplacian tortoise: computability of
squared-error versus absolute-error estimators, Statistical Science 12(4): 279300.
Ramanathan, R. (2002) Introductory Econometrics with Applications, Fort Worth: Harcourt, fth
edn.
Roodman, D. (2006) How to do xtabond2: An introduction to dierence and system GMM in
Stata. Center for Global Development, Working Paper Number 103.
Schwarz, G. (1978) Estimating the dimension of a model, Annals of Statistics 6: 461464.
Sephton, P. S. (1995) Response surface estimates of the KPSS stationarity test, Economics Letters
47: 255261.
Sims, C. A. (1980) Macroeconomics and reality, Econometrica 48: 148.
Steinhaus, S. (1999) Comparison of mathematical programs for data analysis (edition 3). Univer-
sity of Frankfurt. http://www.informatik.uni-frankfurt.de/~stst/ncrunch/.
Bibliography 309
Stock, J. H. and M. W. Watson (2003) Introduction to Econometrics, Boston: Addison-Wesley.
Stokes, H. H. (2004) On the advantage of using two or more econometric software systems to
solve the same problem, Journal of Economic and Social Measurement 29: 307320.
Swamy, P. A. V. B. and S. S. Arora (1972) The exact nite sample properties of the estimators of
coecients in the error components regression models, Econometrica 40: 261275.
Theil, H. (1961) Economic Forecasting and Policy, Amsterdam: North-Holland.
(1966) Applied Economic Forecasting, Amsterdam: North-Holland.
Verbeek, M. (2004) A Guide to Modern Econometrics, New York: Wiley, second edn.
Watson, G. S. (1964) Smooth regression analysis, Shankya Series A 26: 359372.
White, H. (1980) A heteroskedasticity-consistent covariance matrix astimator and a direct test for
heteroskedasticity, Econometrica 48: 817838.
Windmeijer, F. (2005) A nite sample correction for the variance of linear ecient two-step GMM
estimators, Journal of Econometrics 126: 2551.
Wooldridge, J. M. (2002a) Econometric Analysis of Cross Section and Panel Data, Cambridge, MA:
MIT Press.
(2002b) Introductory Econometrics, A Modern Approach, Mason, OH: South-Western, sec-
ond edn.
Yalta, A. T. and A. Y. Yalta (2007) GRETL 1.6.0 and its numerical accuracy, Journal of Applied
Econometrics 22: 849854.