Neural Generalised Autoregressive Conditional Heteroskedasticity

February 24, 2022 Neural˙GARCH
Neural Generalised AutoRegressive Conditional

Heteroskedasticity
Zexuan Yin∗ † and Paolo Barucca†‡

arXiv:2202.11285v1 [cs.LG] 23 Feb 2022
†Deparment of Computer Science, University College London, WC1E 7JE, United Kingdom
‡[email protected]
(v1.1 released November 2021)
We propose Neural GARCH, a class of methods to model conditional heteroskedasticity in financial time
series. Neural GARCH is a neural network adaptation of the GARCH(1,1) model in the univariate case,
and the diagonal BEKK(1,1) model in the multivariate case. We allow the coefficients of a GARCH
model to be time-varying in order to reflect the constantly changing dynamics of financial markets. The
time-varying coefficients are parameterised by a recurrent neural network that is trained with stochastic
gradient variational Bayes. We propose two variants of our model, one with normal innovations and the
other with Student’s t innovations. We test our models on a wide range of univariate and multivariate
financial time series, and we find that the Neural Student’s t model consistently outperforms the others.
Keywords: Heteroskedasticity; Recurrent neural networks; Variational inference; Volatility Forecasting

JEL Classification: C32, C45, C53,
1. Introduction
Modelling conditional heteroskedasticity (time-varying volatility) in financial time series such as

energy prices (Chan and Grant 2016), cryptocurrencies (Chu et al. 2017), and foreign currency
exchange rates Malik (2005) is of great importance to financial practitioners as it allows better
decision making with regards to portfolio selection, asset pricing and risk management. In the
univariate setting, popular methods include Autoregressive Conditional Heteroskedastic models
(ARCH) (Engle 1982) and Generalised GARCH (GARCH) models (Bollerslev 1986). ARCH and
GARCH models are regression-based models estimated using maximum likelihood, and are capable
of capturing stylised facts about financial time series such as volatility clustering (Bauwens et al.
2006). The ARCH(p) model describes the conditional volatility as a function of p lagged squared
residuals, and similarly the GARCH(p,q) model include contributions due to the last q conditional
variances. Many variants of the GARCH model have been proposed to better capture properties of
financial time series, for example the EGARCH (Nelson 1991) and GJR-GARCH (Glosten et al.
1993) models were designed to capture the so-called leverage effect, which describes the negative
relationship between asset price and volatility.
In a multivariate setting, instead of modelling only time-varying conditional variances, for an
n-dimensional system, we estimate the n × n time-varying variance-covariance matrix. This allows
us to investigate interactions between the volatility of different time series and whether there is
a transmission of volatility (spillover effect) between markets (Bauwens et al. 2006, Erten et al.
2012). Popular multivariate GARCH models include the VEC model (Bollerslev et al. 1988), the
∗ Corresponding author. Email: [email protected] 1

BEKK model (Engle and Kroner 1995), the GO-GARCH model (Van Der Weide 2002) and DCC
model (Christodoulakis and Satchell 2002, Tse and Tsui 2002, Engle 2002).
In this paper we focus specifically on GARCH(1,1) models in the univariate case and the diagonal
BEKK(1,1) model in the multivariate case to model daily financial asset returns. We consider
several assets classes such as foreign exchange rates, commodities and stock indices. GARCH(1,1)
models work well in general practical settings due to their simplicity and robustness to overfitting
(Wu et al. 2013).
In traditional GARCH models, the estimated coefficients are constant which imply a stationary
returns process with a constant unconditional mean and variance (Bollerslev 1986). However, there
is evidence in existing literature that relaxing the stationary constraint on the returns time series
can often lead to a better performance as it allows the model to better capture time-varying
market conditions. In Stǎricǎ and Granger (2005) the authors modelled daily S&P 500 returns
with locally stationary models and found that most of the dynamics were concentrated in shifts of
the unconditional variance, and forecasts based on non-stationary unconditional modelling yielded
a better performance than a stationary GARCH(1,1) model. Similarly, the authors in Wu et al.
(2013) designed a GARCH(1,1) model with time-varying coefficients that followed a random walk
process, and they reported better forecasting performances in the test dataset relative to the
GARCH(1,1) model.
To this end, we propose univariate and multivariate GARCH models with time-varying coeffi-
cients that are parameterised by a recurrent neural network. Our method allows the simplicity and
interpretability of GARCH models to be combined with the expressive power of neural networks,
and this approach follows a trend in the literature that combines classical time series models with
deep learning. In Rangapuram et al. (2018) for example, the authors proposed to parameterise
the coefficients of a linear Gaussian state space model with a recurrent neural network, and the
latent states were then inferred using a Kalman filter. This approach is advantageous as the neu-
ral network allows modelling of more complex relationships between time steps whilst preserving
the structural form of the state space model. Similarly, by preserving the structural form of the
BEKK model, we can obtain covariance matrices that are symmetric and positive definite (Engle
and Kroner 1995) without the need of implementing further constraints. We treat the time-varying
GARCH coefficients as latent variables to be inferred, and to achieve this we leverage recent ad-
vances in amortised variational inference in the form of a variational autoencoder (VAE) (Kingma
and Welling 2014), and subsequent combinations of a VAE with a recurrent neural network (so-
called Variational RNN, or VRNN) (Chung et al. 2015, Bayer and Osendorfer 2014, Krishnan et al.
2017, Fabius and van Amersfoort 2015, Fraccaro et al. 2016, Karl et al. 2017) to allow efficient
structured inference over a sequence of latent random variables.
The rest of the paper is organised as follows: in Section 2 we outline the preliminary mathematical
concepts of GARCH modelling and amortised variational inference, in Section 3 we introduce the
generative and inference model components of Neural GARCH, and in Section 4 we present the
performance of Neural GARCH on univariate and multivariate daily returns time series covering
foreign exchange rates, commodity prices, and stock indices.
2. Preliminaries
2.1. Univariate GARCH Model

The GARCH(p,q) model (Bollerslev 1986) for a returns process rt is specified in terms of the
conditional mean equation:
rt ∼ N (0, σt2 ), (1)

and the conditional variance equation:

p
X q
X
σt2 = ω + 2
αi rt−i + 2
βj σt−j . (2)
i=1 j=1
Under the GARCH(1,1) model, the returns process rt is covariance stationary with a constant
ω
unconditional mean and variance given by E[rt ] = 0 and E[rt2 ] = 1−α−β , where ω > 0, α ≥ 0 and
2
β ≥ 0 to ensure that σt > 0, and α + β < 1 to ensure a finite unconditional variance. For parameter
estimation assuming normal innovations, the following log-likelihood function is maximised:
T
X 1 r2
L=− ( log(σt2 ) + t 2 ) (3)
2 2σt
t=1
To model the leptokurtic (fat-tailed) behaviour of financial returns, the authors in Bollerslev
(1987) considered GARCH models with Student’s t innovations with the following log-likelihood
function to be maximised:
T
X ν+1 ν 1 1 (ν + 1) rt2
L=− (logΓ( ) + logΓ( ) + log(ν − 2) + log(σt2 ) + log(1 + )), (4)
t=1
2 2 2 2 2 (ν − 2)σt2
where ν > 2 is the degree of freedom and Γ is the gamma function.
2.2. BEKK Model

The BEKK multivariate GARCH model (Engle and Kroner 1995) parameterises an n-dimensional
multivariate returns process r t ∈ Rn×T :
r t ∼ N (0, Σt ), (5)
p
X q
X
Σt = ΩT Ω + ATi r t−i r Tt−i Ai + B Tj Σt−j B j , (6)
i=1 j=1
where Σt is the n × n symmetric and positive-definite covariance matrix, Ω is an upper triangular

matrix with n(n+1)
2 non-zero entries, A and B are n × n coefficient matrices. In our paper we
consider the diagonal-BEKK model where A and B are diagonal matrices.
2.3. Neural Network Variational Inference

For a latent variable model with parameters θ, target variable y and latent variable z, we wish to
maximise the marginal likelihood with the latent variable integrated out, which often involves an
intractable integral:
Z
logPθ (y) = log Pθ (y|z)Pθ (z) dz, (7)
instead we perform variable inference by approximating the actual posterior distribution Pθ (z|y)
with a variational approxiation qφ (z|y) and maximise the evidence lower bound (ELBO) where
logPθ (y) ≥ ELBO, which is equivalent to minimising the Kullback-Leiber (KL) divergence between
the variational posterior qφ (z|y) and the actual posterior Pθ (z|y) (Kingma and Welling 2014):
logPθ (y) = ELBO + KL(qφ (z|y)||Pθ (z|y)), (8)
where the ELBO is given by:
ELBO = Ez∼q(z|x) [logPθ (x|z)] − KL(qφ (z|x)||pθ (z)), (9)
where Pθ (z) is a prior distribution for z, and in a variational autoencoder (VAE) the generative
and inference distributions logPθ (x|z) and qφ (z|x) are parameterised by neural networks. An un-
informative prior such as N (0, 1) is often used for the prior Pθ (z), however in our model we adopt
a learned prior distribution Pθ (z|It−1 ) where It−1 is the information set up to the time step t − 1.
This learned prior approach has achieved great success in sequential generation tasks such as video
prediction (Franceschi et al. 2020, Denton and Fergus 2018).
3. Materials and Methods
3.1. Neural GARCH Models

In this section we introduce the intuition and various components of Neural GARCH models. We
shall focus specifically on univariate and multivariate GARCH(1,1) models as we would like to
keep the GARCH model structure as simple as possible and delegate the modelling of complex
relationships between time steps to the underlying neural network which outputs the coefficients
of the GARCH models. For the rest of this paper, we use the terms (multivariate)GARCH(1,1)
and BEKK(1,1) interchangeably when referring to multivariate systems.
In neural GARCH, the coefficients {ω, α, β} in the univariate case and {Ω, A, B} in the
multivariate case are allowed to vary freely with time. This approach allows the model to capture
the time-varying nature of market dynamics Wu et al. (2013). The GARCH(1,1) and BEKK(1,1)
models thus become:
σt2 = ωt + αt rt−1
2 2
+ βt σt−1 , (10)
Σt = ΩTt Ωt + ATt r t−1 r Tt−1 At + B Tt Σt−1 B t , (11)
For notation purposes we define the parameter set γ t = [ωt , αt , βt ]T for GARCH(1,1) and γ t =
[Ωt , At , B t ]T for BEKK(1,1).
In our proposed framework, γ t is a multivariate normal latent random variable with a diagonal
covariance matrix to be estimated at every time step. For GARCH(1,1) this involves an estimation
of a vector of size 3 for a model with normal innovations:
 
ωt
γ t = αt  ∼ N (µt , Σγ,t ), (12)
βt
and the vector [σω2 t , σα2 t , σβ2t ]T represents the diagonal elements of the covariance matrix Σγ,t . Here
we have written the covariance matrix of the parameter set γ t as Σγ,t in order to distinguish it from
the covariance matrix of the asset returns Σt . For neural GARCH(1,1) with Student’s t innovations,
γ t is augmented with the degree of freedom parameter νt such that γ t = [ωt , αt , βt , νt ]T .
For the multivariate diagonal BEKK(1,1), we adopt a similarly methodology. For a system of n
assets, γ t of a model with normal innovations is a vector of size 2n + n(n+1)
2 (Engle and Kroner
n(n+1)
1995), and with Student’s t innovations γ t is of size 2n + 2 + 1. As an example, for a system
of 2 assets (n = 2), the BEKK model is given by:
T
c 0 c11,t c12,t a11,t 0 r1,t−1 r1,t−1 a11,t 0
Σt = 11,t +
c21,t c22,t 0 c22,t 0 a22,t r2,t−1 r2,t−1 0 a22,t
2 2

b11,t 0 σ11,t σ12,t b11,t 0
+ 2 2 , (13)
0 b22,t σ21,t σ22,t 0 b22,t
where aij,t is the i, jth element of the matrix At , the parameter set γ t , which also has a multivariate
normal distribution, is given by:
γ t = [a11,t , a22,t , b11,t , b22,t , c11,t , c12,t , c22,t ]T (14)
The main contribution of our paper is the estimation of γ t with a recurrent neural network
(RNN) and a multilayer perceptron (MLP). We provide the exact estimation schemes in Sections
3.2 and 3.3. Since we assume a multivariate normal distribution with a diagonal covariance matrix
for γ t , we need to estimate the means and variances of the elements in γ t with our neural network.
3.2. Generative Model

The generative model distribution Pθ (r 1:T , Σ1:T , γ 1:T ) of a general multivariate neural GARCH is
presented in Figure 1 and given by (15). For the univariate case, one simply replaces Σt in (15)
with σt2 .
T
Y
Pθ (r 1:T , Σ1:T , γ 1:T ) = P (γ 0 )P (Σ0 ) Pθ (r t |Σt )Pθ (Σt |γ t , r t−1 , Σt−1 )Pθ (γ t |γ t−1 , r 1:t−1 ). (15)
t=1
The initial priors were set to delta distributions, P (Σ0 ) was centered on the covariance matrix
estimated using the training dataset, and P (γ 0 ) was centered on a vector of 1s. The predictive
distribution Pθ (γ t |γ t−1 , r 1:t−1 ) takes as input the information set It−1 = {γ t−1 , r 1:t−1 } and predicts
the 1-step-ahead value γ t . For this parameterisation, we leverage a recurrent neural network to carry
r 1:t−1 such that:
Pθ (γ t |γ t−1 , r 1:t−1 ) = Pθ (γ t |γ t−1 , ht−1 ), (16)
where ht is the hidden state of the underlying RNN. In our model we use a gated recurrent unit
(GRU) Cho et al. (2014). We then use an MLP which takes as input It−1 as maps it to the means
and variances of the elements in γ t . In the 2-dimensional example given in (14), the estimation is
done using:
[µa11,t , ..., µc22,t , σa211,t , ..., σc222,t ]T = M LPpred (γ t−1 , ht−1 ), (17)
and we apply a sigmoid function on the neural network output to ensure that the estimated
variances of the elements in γ t and the GARCH coefficients are non-negative. We have also tested
other ways to ensure non-negativity such as using a softplus function, however we found that
applying a sigmoid function gave the best performance. For neural GARCH with Student’s t
innovations, we require that ν > 2 in order to have a well-defined covariance. Since appyling the
sigmoid function ensures our estimated coefficients are non-negative, we estimate ν 0 = ν −2 (instead
of ν directly) to ensure ν > 2.
The conditional distribution Pθ (Σt |γ t , r t−1 , Σt−1 ) is a delta distribution centered on (10) in the
univariate case and (11) in the multivariate case as we can calculate the covariance matrix Σt
deterministically given {γ t , r t−1 , Σt−1 }. The distribution Pθ (r t |Σt ) is the likelihood function and
we have provided their logarithms (in the univariate case) in (3) for normal innovations and (4)
for Student’s t innovations.
Figure 1. Generative model of neural GARCH. The generative MLP takes as input {γ t−1 , ht−1 } and outputs the
estimated means and variances of the elements in γ t .
3.3. Inference Model

The inference model distribution qφ (Σ1:T , γ 1:T |r 1:T ) is presented in Figure 2 and can be factorised
as:
T
Y
qφ (Σ1:T , γ 1:T |r 1:T ) = P (γ 0 )P (Σ0 ) qφ (Σt |γ t , r t−1 , Σt−1 )qφ (γ t |γ t−1 , r 1:t ), (18)
t=1
where P (γ 0 ) and P (Σ0 ) are the same as in the generative model, qφ (Σt |γ t , r t−1 , Σt−1 ) has the
same functional form (a delta distribution) as Pθ (Σt |γ t , r t−1 , Σt−1 ), however γ t is now drawn from
the posterior distribution qφ (γ t |γ t−1 , r 1:t ) where:
qφ (γ t |γ t−1 , r 1:t ) = qφ (γ t |γ t−1 , ht ). (19)
We note that the generative and inference networks share the same underlying recurrent neural
network but uses information at different time steps. The generative model predicts γ t using the
information set It−1 and the inference model infers γ t using It . The inference MLP (M LPinf ) how-
ever is different to that of the generative model (M LPpred ) and it outputs the posterior estimates
of the elements of γ t :
[µa11,t , ..., µc22,t , σa211,t , ..., σc222,t ]Tpost = M LPinf (γ t−1 , ht ). (20)
3.4. Model Training

For neural network training we optimise the generative and inference model parameters (θ and
φ) jointly using stochastic gradient variational Bayes Kingma and Welling (2014). Our objective
Figure 2. Inference model of neural GARCH. The inference MLP outputs the posterior estimate of γ t conditioned
on available information up to time t.
function is the ELBO defined as:

T
X
ELBO(θ, φ) = Eγt ∼qφ [logPθ (r t |γ t )] − KL(qφ (γ t |γ t−1 , r 1:t )||Pθ (γ t |γ t−1 , r 1:t−1 )), (21)
n=1
and we seek:
{θ∗ , φ∗ } = argmax ELBO(θ, φ). (22)

θ,φ
3.5. Model Prediction

Neural GARCH produces 1-step-ahead conditional volatility predictions. Given It = {γ t , Σt , r t:t },
we use (17) to obtain our prediction of γ t+1 by drawing from the multivariate normal distribution
whose parameters are given by M LPpred . We then obtain our estimate of Σt+1 deterministically
using (11). To estimate Σt+2 , we would now have access to r t+1 and therefore we obtain the
posterior estimate of γ t+1 using (20) and predict Σt+2 using the posterior estimate of Σt+1 . This
posterior update is crucial as it ensures that we use all available and up-to-date information to
predict the next covariance matrix.
3.6. Experiments
We test neural GARCH on a range of daily asset log returns time series covering univariate and
multivariate foreign exchange rates (20 pairs), commodity prices (brent crude, silver and gold) and
stock indices (DAX, S&P, NASDAQ, FTSE100, Dow Jones). We provide a brief data description
in Table 1.
Table 1. Description of asset log returns time series analysed in our experiments.
Dataset N Time Series Frequency Observations Date Range
Foreign exchange 20 daily 3128 05/08/2011 - 05/08/2021
Brent crude 1 daily 2065 05/08/2013 - 05/08/2021
Silver & gold 2 daily 3109 05/08/2011 - 05/08/2021
Stock indices 5 daily 2054 05/08/2013 - 05/08/2021
For model training, we split each time series such that 80% was used in training, 10% for val-
idation and 10% for testing. The underyling recurrent neural network (GRU) has a hidden state
size 64, the generative and inference MLPs (M LPpred and M LPinf ) are both 3-layer MLPs with
64 hidden nodes and ReLU activation functions.
For univariate time series, we compare the performance of six models: GARCH(1,1)-
Normal, GARCH(1,1)-Student’s t, Neural-GARCH(1,1) and Neural-GARCH(1,1)-Student’s t,
EGARCH(1,1,1)-Normal and EGARCH(1,1,1)-Student’s t. Although neural GARCH is an adap-
tation of the GARCH(1,1) model, we include the EGARCH(1,1,1) model as a benchmark as it is
capable of accounting for the asymmetric leverage effect: negative shocks lead to larger volatilities
than positive shocks, where the middle index represents the order of the asymmetric term. We
would like to investigate whether the data driven approach of neural GARCH allows it to model
the leverage effect without the explicit dependence on an asymmetric term as in an EGARCH
model. For multivariate time series, we compare the performance of multivariate GARCH(1,1)
(BEKK(1,1)) with normal and Student’s t innovations against their neural network adaptations.
We evaluate the model performance using the log-likelihood of the test dataset.
4. Results & Discussion
In Tables 2, 3, 4 and 5 we provide the log-likelihoods evaluated on the test dataset for commod-
ity prices, stock indices, and univariate and multivariate foreign exchange time series. We have
highlighed the best model for each time series in bold. For commodity prices, we observe that
EGARCH(1,1,1)-Student’s t is the best performer on Brent crude, whilst Neural-GARCH(1,1)-
Student’s t performs best on silver and gold price returns.
For stock indices we observe that Neural-GARCH(1,1)-Student’s t performs best on the DAX
AND Dow Jones indices whilst EGARCH(1,1,1)-Student’s t performs best on S&P500, NASDAQ
and FTSE 100. The fact that the neural GARCH models perform better than EGARCH in some
datasets shows that our data-driven approach can learn to accommodate many but not all scenar-
ios of the leverage effect, and therefore in cases where EGARCH outperforms, there are benefits
associated with the direct modelling of the asymmetric effect. For univariate foreign exchange time
series, we observe that the Neural GARCH variants outperform traditional GARCH models on
16 out of 20 time series, and where neural GARCH outperforms, Neural-GARCH(1,1) with nor-
mal innovations performs better on 5/16 time series and Neural-GARCH(1,1)-Student’s t performs
better on 11/16 time series.
Table 2. Test log-likelihoods for commodity price time series. Best result highlighed in bold, higher log-likelihood
is better.
Time series GARCH(1,1)-Normal GARCH(1,1)-Student’s t Neural-GARCH(1,1) Neural-GARCH(1,1)-Student’s t EGARCH(1,1,1)-Normal EGARCH(1,1,1)-Student’s t
BRENT -298.738 -298.689 -307.921 -295.895 -299.966 −292.798
SILVER -554.595 -551.936 -541.713 −514.476 -572.780 -581.834
GOLD -462.28 -450.752 -473.074 −421.566 -462.857 -468.509
Table 3. Test log-likelihoods for stock index time series.

DAX -261.275 -268.944 -259.321 −244.190 -257.767 -266.163
SNP -300.849 -298.614 -308.559 -295.934 -300.577 −284.841
NASDAQ -327.547 -326.401 -331.539 -320.387 -334.237 −312.366
FTSE -324.437 −314.480 -326.572 -315.606 -322.425 −311.135
DOW -298.406 -302.196 -315.164 −284.247 -292.974 -293.486
For multivariate foreign exchange time series, we observe that Neural-BEKK(1,1)-Student’s t is

the best performer on 8/9 time series considered. Across different assets we see that the Student’s
t version of Neural GARCH consistently performs better than the traditional GARCH models as
well as Neural GARCH with normal innovations. This suggests that a model with Student’s t
innovation does indeed model the leptokurtic behaviour of financial time series returns better than
a model with normal innovations. This finding is in line with our expectations after surveying the
literature (for example Bollerslev (1987) and Heracleous (2007)).
In order to evaluate whether the models’ performances across different time series are statistically
significant, we plotted a critical difference (cd) diagram by following the approach of the authors
in Ismail Fawaz et al. (2019) where a Friedman test at α = 0.05 Friedman (1940) was first used to
reject the null hypothesis that the four models are equivalent and have equal rankings, and then a
Table 4. Test log-likelihoods for univariate foreign exchange time series.

AUDCAD −397.251 -402.582 -409.553 -398.645 -397.776 -473.302
AUDCHF -311.566 -308.029 −293.853 -294.010 -309.295 -312.965
AUDJPY -346.024 -350.401 -353.213 −335.945 -346.478 -354.095
AUDNZD -303.986 -318.345 -307.44 −301.514 -303.627 -322.777
AUDUSD -423.602 -424.594 -432.518 −422.753 -424.498 -425.807
CADJPY -351.749 -359.545 −349.209 -349.842 -350.460 -362.875
CHFJPY -238.566 -241.360 -215.536 −208.710 -230.120 -253.050
EURAUD -338.378 -344.922 -347.995 −336.604 -337.481 -347.259
EURCAD -347.177 -359.499 −345.989 -347.730 -346.547 -366.701
EURCHF -277.643 -153.502 -156.567 −142.963 -275.073 -321.051
EURGBP -366.187 -378.950 -373.515 −364.619 -364.727 -389.416
EURJPY -266.674 -278.327 -267.374 −256.341 -262.667 -290.897
EURUSD -332.917 -347.818 −330.471 -334.488 -334.178 -361.348
GBPAUD −335.530 -346.944 -353.800 -344.842 -335.812 -353.034
GBPJPY -330.030 -348.729 -337.981 −324.559 -329.013 -359.506
GBPUSD −418.593 -431.554 -423.460 -419.658 -420.534 -441.162
NZDUSD −415.648 -416.944 -425.841 -417.380 -416.094 -417.153
USDCAD -408.008 -416.483 −404.614 -413.507 -406.735 -419.863
USDCHF -315.963 -303.351 -276.461 −260.177 -282.682 -308.410
USDJPY -295.295 -304.539 -291.419 −277.477 -294.519 -318.100
Table 5. Test log-likelihoods for multivariate foreign exchange time series.

Time series GARCH(1,1)-Normal GARCH(1,1)-Student’s t Neural-GARCH(1,1) Neural-GARCH(1,1)-Student’s t
EURGBP,EURCHF -643.521 -558.275 -523.725 −513.214
GBPJPY GBPUSD -629.950 -656.198 -649.221 −605.305
AUDCHF AUDJPY -534.49 -522.934 -497.726 −477.992
EURGBP,EURUSD,EURJPY -920.085 -959.420 -985.156 −917.907
USDCAD,USDCHF,USDJPY -1008.821 -998.041 -990.601 −957.912
EURGBP,GBPJPY,USDJPY −916.957 -943.66 -1011.435 -966.806
GBPAUD,GBPJPY,GBPUSD -971.522 -991.8238 -1037.296 −967.500
EURCHF,EURGBP,EURJPY,EURUSD -1196.477 -1127.192 -1105.298 −1078.165
AUDJPY,AUDCHF,EURCHF,GBPJPY -1505.540 -862.995 -865.471 −783.955
post-hoc test was done using a Wilcoxon signed-rank test Wilcoxon (1945) at the 95% confidence
level. The critical diagram shows average rankings of the models across different datasets.
In Figure 3 we show the cd plot for univariate time series. A bold horizontal line indi-
cates no significant difference amongst the group of models that are on the line. In the uni-
variate experiments we observe no significant difference amongst the group: EGARCH(1,1,1)-
Student’s T, GARCH(1,1)-Student’s T and Neural-GARCH(1,1); likewise, there is also no signifi-
cant difference amongst the group: GARCH(1,1)-Student’s T, Neural-GARCH(1,1), GARCH(1,1)-
Normal and EGARCH(1,1,1)-Normal. We also observe that on average, GARCH(1,1)-Normal and
EGARCH(1,1,1)-Normal perform significantly better than EGARCH(1,1,1)-Student’s T. We estab-
lish that Neural-GARCH(1,1)-Student’s t is the best performer overall on the univariate datasets,
and it significantly outperforms the other models with an average rank of 1.8929.
Figure 3. Critical difference diagram of the univariate experiments. A horizontal bold line indicates no significant
difference amongst the group of models. We establish that Neural-GARCH(1,1)-Student’s t is the best performer in
the univariate experiments.
Figure 4. Critical difference diagram showing the average rankings of GARCH(1,1) and Neural-GARCH(1,1) with
normal and Student’s t innovations on all time series experiments. We find that Neural-GARCH(1,1)-Student’s t is
the best-performing model with an average rank of 1.4324.
In Figure 4 we show the cd plot constructed using all the time series experiments (univariate
and multivariate). Our aim is to compare the class of traditional GARCH(1,1) models against
their neural network adaptations. We observe that there is no significant difference between
GARCH(1,1)-Student’s t, Neural-GARCH(1,1) and GARCH(1,1)-Normal, and we establish that
Neural-GARCH(1,1)-Student’s t is the best performer overall with an average ranking of 1.4324.
For a GARCH(1,1) model, the returns process is often assumed to be stationary with a constant
unconditional mean and variance. Neural GARCH(1,1) relaxes this stationary assumption. The
unconditional variance of Neural-GARCH(1,1) in the univariate case
σt2 = ωt + αt rt2 + βt σt−1

2
(23)
is obtained by taking the expectation of (23):
E[rt2 ] = E[ωt + αt rt−1

2 2
+ βt σt−1 ]
2 2
= ωt + αt E[rt−1 ] + βt E[σt−1 ] (24)
2
= ωt + (αt + βt )E[rt−1 ].
For a GARCH(1,1) model with constant coefficients {ω, α, β}, we have E[rt2 ] = E[rt−1 2 ] (constant
ω
unconditional variance) and therefore 1−α−β . With Neural-GARCH(1,1), E[rt2 ] 6= E[rt−1 2 ] however
we can assume that the parameters {ωt , αt , βt } change gradually with no sudden jumps and there-
fore E[rt2 ] ≈ E[rt−1
2 ] (Bri 2017) and we can approximate the time-varying unconditional variance
of Neural-GARCH(1,1) with E[rt2 ] ≈ 1−αωtt−βt with αt + βt < 1.

Results from our analysis of the Neural-GARCH(1,1) coefficients show a consistent pattern when
compared to GARCH(1,1) models. We provide an example for the currency pair USDCHF in
Figure 5, which shows the time-varying parameter set {ωt , αt , βt } of Neural-GARCH(1,1) against
the constant set {ω, α, β} of GARCH(1,1). We observe across different time series that Neural-
GARCH(1,1) consistently estimates a higher value for ω and α, and a lower value for β. In Figure
6 we show the zoomed-in images of the Neural-GARCH(1,1) coefficients shown in Figure 5 for
the currency pair USDCHF. We observe that the coefficients follow well-behaved time-varying
behaviour and similar dynamics is observed across all three parameters. This shows the effectiveness
of our learned prior neural network (M LPpred ) which models the distribution Pθ (γ t |γ t−1 , r 1:t−1 ).
Figure 5. Plots of Neural-GARCH(1,1) coefficients against GARCH(1,1) coefficients. The blue line represents the
Neural-GARCH(1,1) αt (left), βt (middle) and ωt (right), and the orange line shows the GARCH(1,1) coefficients.
Having time-varying coefficients allows us to model the financial returns time series as a non-
stationary process with a 0 unconditional mean but time-varying unconditional variance. Similarly,
the authors in Stǎricǎ and Granger (2005) reported that by relaxing the stationarity assumption on
daily S&P 500 returns and using locally stationary linear models, a better forecasting performance
was achieved, and in their analysis they showed most of the dynamics of the returns time series to be
concentrated in shifts of the unconditional variance. Our model provides a data-driven approach
Figure 6. Zoomed-in plots of the Neural-GARCH(1,1) coefficients shown in Figure 5 for USDCHF.
to modelling the returns process. During model training we optimise over the neural network
parameters without implementing any external constraints, however we observe in Figure 6 that
the model nonetheless outputs time-varying coefficients that satisfy the condition αt + βt < 1,
which is required for the model to have a well-defined unconditional variance.
5. Conclusions
In this paper we propose neural GARCH: a neural network adaptation of the univariate
GARCH(1,1) and multivariate diagonal BEKK(1,1) models to model conditional heteroskedasticity
in financial time series. Our model consists of a recurrent neural network that captures the tem-
poral dynamics of the returns process and a multilayer perceptron to predict the next-step-ahead
GARCH coefficients, which are then used to determine the conditional volatilities. The generative
model of neural GARCH makes predictions based on all available information, and the inference
model makes updated posterior estimates of the GARCH coefficients when new information be-
comes available. We tested two versions of neural GARCH on univariate and multivariate financial
returns time series: one with normal innovations and the other with Student’s t innovations. When
compared against their GARCH counterparts we observe that neural GARCH Student’s t is the
best performer and from our analysis we hypothesise that this is due to the neural network’s ability
to capture complex temporal dynamics present in the time series and also allowing us to relax the
stationarity assumption that is fundamental to traditional GARCH models.
Acknowledgement
The authors would like to thank Fabio Caccioli, Department of Computer Science, University
College London, for proofreading the manuscript and providing feedback.
References
Changing dynamics: Time-varying autoregressive models using generalized additive modeling.. Psychological
Methods, 2017, 22, 409–425.
Bauwens, L., Laurent, S. and Rombouts, J.V., Multivariate GARCH models: A survey. Journal of Applied
Econometrics, 2006, 21, 79–109.
Bayer, J. and Osendorfer, C., Learning Stochastic Recurrent Networks. arXiv preprint, 2014, pp. 1–9.
Bollerslev, T., Generalised Autoregressive Conditional Heteroskedasticity. Journal of Econometrics, 1986,
31, 307–327.
Bollerslev, T., A Conditionally Heteroskedastic Time Series Model for Speculative Prices and Rates of
Return. The Review of Economics and Statistics, 1987, 69, 542–547.
Bollerslev, T., Engle, R.F. and Wooldridge, J.M., A Capital Asset Pricing Model with Time-Varying Co-
variances. Journal of Political Economy, 1988, 96, 116–131.
Chan, J.C. and Grant, A.L., Modeling energy price dynamics: GARCH versus stochastic volatility. Energy
Economics, 2016, 54, 182–189.
Cho, K., van Merrienboer, B., Bahdanau, D. and Bengio, Y., On the Properties of Neural Machine Trans-
lation: Encoder–Decoder Approaches. In Proceedings of the In Eighth Workshop on Syntax, Semantics
and Structure in Statistical Translation (SSST-8), 2014.
Christodoulakis, G.A. and Satchell, S.E., Correlated ARCH (CorrARCH): Modelling the time-varying con-
ditional correlation between financial asset returns. European Journal of Operational Research, 2002,
139, 351–370.
Chu, J., Chan, S., Nadarajah, S. and Osterrieder, J., GARCH Modelling of Cryptocurrencies. Journal of
Risk and Financial Management, 2017, 10, 17.
Chung, J., Kastner, K., Dinh, L., Goel, K., Courville, A. and Bengio, Y., A recurrent latent variable model
for sequential data. In Proceedings of the Advances in Neural Information Processing Systems, Vol.
2015-January, pp. 2980–2988, 2015.
Denton, E. and Fergus, R., Stochastic Video Generation with a Learned Prior. In Proceedings of the 35th
International Conference on Machine Learning, ICML 2018, Vol. 3, pp. 1906–1919, 2018.
Engle, R., Autoregressive Conditional Heteroscedasticity with Estimates of the Variance of United Kingdom
Inflation. Econometrica, 1982, 50, 987–1007.
Engle, R., Dynamic Conditional Correlation. Journal of Business and Economic Statistics, 2002, 20, 339–
350.
Engle, R. and Kroner, K., Multivariate Simultaneous Generalized Arch. Econometric Theory, 1995, 11,
122–150.
Erten, I., Murat, M. and Okay, N., Volatility Spillovers in Emerging Markets During the Global Financial
Crisis : Diagonal BEKK Approach. Munich Personal RePEc Archive, 2012, pp. 1–18.
Fabius, O. and van Amersfoort, J.R., Variational recurrent auto-encoders. 3rd International Conference on
Learning Representations, ICLR 2015 - Workshop Track Proceedings, 2015, pp. 1–5.
Fraccaro, M., Sønderby, S.K., Paquet, U. and Winther, O., Sequential neural models with stochastic layers.
In Proceedings of the Advances in Neural Information Processing Systems, pp. 2207–2215, 2016.
Franceschi, J.Y., Delasalles, E., Chen, M., Lamprier, S. and Gallinari, P., Stochastic latent residual video
prediction. In Proceedings of the 37th International Conference on Machine Learning, ICML 2020, Vol.
PartF16814, pp. 3191–3204, 2020.
Friedman, M., A comparison of alternative tests of significance for the problem of m rankings. The Annals
of Mathematical Statistics, 1940, 11, 86–92.
Glosten, L.R., Jagannathan, R. and Runkle, D.E., On the Relation between the Expected Value and the
Volatility of the Nominal Excess Return on Stocks. The Journal of Finance, 1993, 48, 1779–1801.
Heracleous, M., Sample Kurtosis, GARCH-t and the Degrees of Freedom Issue. EUR Working Papers, 2007.
Ismail Fawaz, H., Forestier, G., Weber, J., Idoumghar, L. and Muller, P.A., Deep learning for time series
classification: a review. Data Mining and Knowledge Discovery, 2019, 33, 917–963.
Karl, M., Soelch, M., Bayer, J. and Van Der Smagt, P., Deep variational Bayes filters: Unsupervised learning
of state space models from raw data. In Proceedings of the 5th International Conference on Learning
Representations, ICLR 2017 - Conference Track Proceedings, ii, pp. 1–13, 2017.
Kingma, D.P. and Welling, M., Auto-encoding variational bayes. In Proceedings of the 2nd International
Conference on Learning Representations, ICLR 2014 - Conference Track Proceedings, pp. 1–14, 2014.
Krishnan, R.G., Shalit, U. and Sontag, D., Structured inference networks for nonlinear state space models. In
Proceedings of the 31st AAAI Conference on Artificial Intelligence, AAAI 2017, pp. 2101–2109, 2017.
Malik, A.K., European exchange rate volatility dynamics: An empirical investigation. Journal of Empirical
Finance, 2005, 12, 187–215.
Nelson, D., Conditional Heteroskedasticity in Asset Returns : A New Approach. Econometrica, 1991, 59,
347–370.
Rangapuram, S.S., Seeger, M., Gasthaus, J., Stella, L., Wang, Y. and Januschowski, T., Deep state space
models for time series forecasting. In Proceedings of the Advances in Neural Information Processing
Systems, pp. 7785–7794, 2018.
Stǎricǎ, C. and Granger, C., Nonstationarities in stock returns. Review of Economics and Statistics, 2005,
87, 503–522.
Tse, Y.K. and Tsui, A.K., A multivariate generalized autoregressive conditional heteroscedasticity model
with time-varying correlations. Journal of Business and Economic Statistics, 2002, 20, 351–362.
Van Der Weide, R., GO-GARCH: A multivariate generalized orthogonal GARCH model. Journal of Applied
Econometrics, 2002, 17, 549–564.
Wilcoxon, F., Individual comparisons by ranking methods. Biometrics Bulletin, 1945, 1, 80–83.
Wu, Y., Lobato, J.M.H. and Ghahramani, Z., Dynamic covariance models for multivariate financial time
series. In Proceedings of the 30th International Conference on Machine Learning, ICML 2013, Vol. 28,
pp. 1595–1603, 2013.

Neural Generalised Autoregressive Conditional Heteroskedasticity

Uploaded by

Copyright:

Available Formats

Neural Generalised Autoregressive Conditional Heteroskedasticity

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Neural Generalised Autoregressive Conditional Heteroskedasticity

Uploaded by

Copyright:

Available Formats

February 24, 2022 Neural˙GARCH

Neural Generalised AutoRegressive Conditional

Zexuan Yin∗ † and Paolo Barucca†‡

(v1.1 released November 2021)

Keywords: Heteroskedasticity; Recurrent neural networks; Variational inference; Volatility Forecasting

Modelling conditional heteroskedasticity (time-varying volatility) in financial time series such as

∗ Corresponding author. Email: [email protected] 1

2.1. Univariate GARCH Model

rt ∼ N (0, σt2 ), (1)

and the conditional variance equation:

where ν > 2 is the degree of freedom and Γ is the gamma function.

2.2. BEKK Model

where Σt is the n × n symmetric and positive-definite covariance matrix, Ω is an upper triangular

2.3. Neural Network Variational Inference

logPθ (y) = ELBO + KL(qφ (z|y)||Pθ (z|y)), (8)

where the ELBO is given by:

ELBO = Ez∼q(z|x) [logPθ (x|z)] − KL(qφ (z|x)||pθ (z)), (9)

3. Materials and Methods

3.1. Neural GARCH Models

Σt = ΩTt Ωt + ATt r t−1 r Tt−1 At + B Tt Σt−1 B t , (11)

γ t = [a11,t , a22,t , b11,t , b22,t , c11,t , c12,t , c22,t ]T (14)

3.2. Generative Model

Pθ (γ t |γ t−1 , r 1:t−1 ) = Pθ (γ t |γ t−1 , ht−1 ), (16)

3.3. Inference Model

qφ (γ t |γ t−1 , r 1:t ) = qφ (γ t |γ t−1 , ht ). (19)

3.4. Model Training

function is the ELBO defined as:

{θ∗ , φ∗ } = argmax ELBO(θ, φ). (22)

3.5. Model Prediction

4. Results & Discussion

Table 3. Test log-likelihoods for stock index time series.

For multivariate foreign exchange time series, we observe that Neural-BEKK(1,1)-Student’s t is

Table 4. Test log-likelihoods for univariate foreign exchange time series.

Table 5. Test log-likelihoods for multivariate foreign exchange time series.

σt2 = ωt + αt rt2 + βt σt−1

is obtained by taking the expectation of (23):

E[rt2 ] = E[ωt + αt rt−1

of Neural-GARCH(1,1) with E[rt2 ] ≈ 1−αωtt−βt with αt + βt < 1.

You might also like